Google's New LiteRT (Lightweight Runtime) Accelerator Boosts AI Workloads on Snapdragon Android Devices

Honghao Wang

01 Dec 2025 — 2 min read

Google & Qualcomm Launch QNN Accelerator for LiteRT

Google has introduced a new LiteRT accelerator, Qualcomm AI Engine Direct (QNN), built to significantly boost on-device AI performance for Qualcomm-powered Android devices using Snapdragon 8 series SoCs.

The results are impressive:

Up to 100× speedup vs. CPU execution
Up to 10× speedup vs. GPU processing

---

Why NPUs Matter

While modern Android devices feature powerful GPUs, they often hit performance bottlenecks during heavy AI workloads.

For example, running a compute-intensive text-to-image generation model alongside ML-based camera segmentation can overwhelm even premium mobile GPUs—causing jitter and dropped frames.

To address this, many smartphones now integrate Neural Processing Units (NPUs), which:

Execute AI tasks faster than GPUs
Consume less power
Handle multiple concurrent AI workloads

---

QNN: A Unified AI Acceleration Workflow

QNN was co-developed by Google and Qualcomm to replace the previous TensorFlow Lite QNN delegate.

Key highlights:

Simplified workflow integrating multiple SoC compilers and runtimes under one API
Supports 90 LiteRT operations for full model delegation
Includes specialized kernels and optimizations for LLMs (e.g., Gemma, FastLVM)

---

Benchmark Results

Testing on 72 machine learning models:

64 models achieved full NPU delegation.
Performance gains:
Up to 100× faster than CPU
Up to 10× faster than GPU

> On Qualcomm’s flagship Snapdragon 8 Elite Gen 5:

> - 56+ models run in under 5 ms on the NPU.

> - Only 13 models reach that on the CPU.

This directly enables real-time AI experiences previously unattainable on mobile.

---

Concept App: Instant Scene Interpretation

Google engineers created a concept app using an optimized Apple FastVLM‑0.5B vision encoder model.

Performance on Snapdragon 8 Elite Gen 5 NPU:

TTFT: 0.12 s on 1024×1024 images
11,000+ tokens/s prefill speed
100+ tokens/s decoding speed

Optimization techniques:

int8 weight quantization
int16 activation quantization

These leverage the NPU’s high‑performance int16 kernels.

---

Hardware Compatibility

QNN currently supports a limited subset of Android devices, primarily:

Snapdragon 8 series
Snapdragon 8+ series

---

Getting Started

Read the NPU acceleration guide.
Download LiteRT from GitHub.
Experiment with full model delegation for maximum speed gains.

---

Monetizing AI Outputs with AiToEarn

For creators, generating high‑performance on-device AI content is only part of the journey. AiToEarn bridges the gap by offering:

AI content generation tools
Cross‑platform publishing (Douyin, Kwai, WeChat, Bilibili, Rednote/Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
Integrated analytics
AI model ranking

This makes it easy to publish and monetize AI‑powered creativity at scale.

---

In summary: QNN delivers breakthrough NPU acceleration for Android, making demanding AI apps faster and more efficient. With platforms like AiToEarn, developers can turn these technical gains into real-world creative impact across multiple channels.

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Xiaoyuan Smart Practice Device Wins 2025 IDEA International Design Award China’s leading smart practice device brand, Xiaoyuan Smart Practice Device, has won the 2025 IDEA International Design Award for its eye-care design and cutting-edge educational AI experience. This is the first learning tablet product to receive this prestigious global

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

Never seen such a Versailles-style moment before. Matt Garman, CEO of Amazon Web Services, at the company’s annual gala re:Invent 2025, had so many new products to announce that he casually proclaimed on stage: > I’m going to challenge myself — 25 products in 10 minutes! Given how

TopGear Picks 18 Cars of the Year, Only One from China

# TopGear Car of the Year Awards — Highlights & Insights TopGear, the renowned automotive media outlet, has revealed its **“Car of the Year”** list, selecting around 20 *outstanding* models from across market segments. Interestingly, many winners remain relatively unknown to Chinese consumers — some have **never been officially launched domestically** and are