Google's New LiteRT (Lightweight Runtime) Accelerator Boosts AI Workloads on Snapdragon Android Devices
Google & Qualcomm Launch QNN Accelerator for LiteRT
Google has introduced a new LiteRT accelerator, Qualcomm AI Engine Direct (QNN), built to significantly boost on-device AI performance for Qualcomm-powered Android devices using Snapdragon 8 series SoCs.
The results are impressive:
- Up to 100× speedup vs. CPU execution
- Up to 10× speedup vs. GPU processing
---
Why NPUs Matter
While modern Android devices feature powerful GPUs, they often hit performance bottlenecks during heavy AI workloads.
For example, running a compute-intensive text-to-image generation model alongside ML-based camera segmentation can overwhelm even premium mobile GPUs—causing jitter and dropped frames.
To address this, many smartphones now integrate Neural Processing Units (NPUs), which:
- Execute AI tasks faster than GPUs
- Consume less power
- Handle multiple concurrent AI workloads
---
QNN: A Unified AI Acceleration Workflow
QNN was co-developed by Google and Qualcomm to replace the previous TensorFlow Lite QNN delegate.
Key highlights:
- Simplified workflow integrating multiple SoC compilers and runtimes under one API
- Supports 90 LiteRT operations for full model delegation
- Includes specialized kernels and optimizations for LLMs (e.g., Gemma, FastLVM)
---
Benchmark Results
Testing on 72 machine learning models:
- 64 models achieved full NPU delegation.
- Performance gains:
- Up to 100× faster than CPU
- Up to 10× faster than GPU
> On Qualcomm’s flagship Snapdragon 8 Elite Gen 5:
> - 56+ models run in under 5 ms on the NPU.
> - Only 13 models reach that on the CPU.
This directly enables real-time AI experiences previously unattainable on mobile.

---
Concept App: Instant Scene Interpretation
Google engineers created a concept app using an optimized Apple FastVLM‑0.5B vision encoder model.
Performance on Snapdragon 8 Elite Gen 5 NPU:
- TTFT: 0.12 s on 1024×1024 images
- 11,000+ tokens/s prefill speed
- 100+ tokens/s decoding speed
Optimization techniques:
- int8 weight quantization
- int16 activation quantization
These leverage the NPU’s high‑performance int16 kernels.
---
Hardware Compatibility
QNN currently supports a limited subset of Android devices, primarily:
- Snapdragon 8 series
- Snapdragon 8+ series
---
Getting Started
- Read the NPU acceleration guide.
- Download LiteRT from GitHub.
- Experiment with full model delegation for maximum speed gains.
---
Monetizing AI Outputs with AiToEarn
For creators, generating high‑performance on-device AI content is only part of the journey. AiToEarn bridges the gap by offering:
- AI content generation tools
- Cross‑platform publishing (Douyin, Kwai, WeChat, Bilibili, Rednote/Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
- Integrated analytics
- AI model ranking
This makes it easy to publish and monetize AI‑powered creativity at scale.
---
In summary: QNN delivers breakthrough NPU acceleration for Android, making demanding AI apps faster and more efficient. With platforms like AiToEarn, developers can turn these technical gains into real-world creative impact across multiple channels.