Google's New LiteRT (Lightweight Runtime) Accelerator Boosts AI Workloads on Snapdragon Android Devices

Google & Qualcomm Launch QNN Accelerator for LiteRT

Google has introduced a new LiteRT accelerator, Qualcomm AI Engine Direct (QNN), built to significantly boost on-device AI performance for Qualcomm-powered Android devices using Snapdragon 8 series SoCs.

The results are impressive:

  • Up to 100× speedup vs. CPU execution
  • Up to 10× speedup vs. GPU processing

---

Why NPUs Matter

While modern Android devices feature powerful GPUs, they often hit performance bottlenecks during heavy AI workloads.

For example, running a compute-intensive text-to-image generation model alongside ML-based camera segmentation can overwhelm even premium mobile GPUs—causing jitter and dropped frames.

To address this, many smartphones now integrate Neural Processing Units (NPUs), which:

  • Execute AI tasks faster than GPUs
  • Consume less power
  • Handle multiple concurrent AI workloads

---

QNN: A Unified AI Acceleration Workflow

QNN was co-developed by Google and Qualcomm to replace the previous TensorFlow Lite QNN delegate.

Key highlights:

  • Simplified workflow integrating multiple SoC compilers and runtimes under one API
  • Supports 90 LiteRT operations for full model delegation
  • Includes specialized kernels and optimizations for LLMs (e.g., Gemma, FastLVM)

---

Benchmark Results

Testing on 72 machine learning models:

  • 64 models achieved full NPU delegation.
  • Performance gains:
  • Up to 100× faster than CPU
  • Up to 10× faster than GPU

> On Qualcomm’s flagship Snapdragon 8 Elite Gen 5:

> - 56+ models run in under 5 ms on the NPU.

> - Only 13 models reach that on the CPU.

This directly enables real-time AI experiences previously unattainable on mobile.

image

---

Concept App: Instant Scene Interpretation

Google engineers created a concept app using an optimized Apple FastVLM‑0.5B vision encoder model.

Performance on Snapdragon 8 Elite Gen 5 NPU:

  • TTFT: 0.12 s on 1024×1024 images
  • 11,000+ tokens/s prefill speed
  • 100+ tokens/s decoding speed

Optimization techniques:

  • int8 weight quantization
  • int16 activation quantization

These leverage the NPU’s high‑performance int16 kernels.

---

Hardware Compatibility

QNN currently supports a limited subset of Android devices, primarily:

  • Snapdragon 8 series
  • Snapdragon 8+ series

---

Getting Started

---

Monetizing AI Outputs with AiToEarn

For creators, generating high‑performance on-device AI content is only part of the journey. AiToEarn bridges the gap by offering:

  • AI content generation tools
  • Cross‑platform publishing (Douyin, Kwai, WeChat, Bilibili, Rednote/Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
  • Integrated analytics
  • AI model ranking

This makes it easy to publish and monetize AI‑powered creativity at scale.

---

In summary: QNN delivers breakthrough NPU acceleration for Android, making demanding AI apps faster and more efficient. With platforms like AiToEarn, developers can turn these technical gains into real-world creative impact across multiple channels.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.