5 Impressive GitHub Projects for AI-Controlled Smartphones
AI-Powered Mobile Automation Overview

Traditionally, automating mobile phone operations required tools like Appium or Airtest, along with detailed knowledge of an app’s element IDs (`resource-id`, `xpath`, etc.).
However, any app update that changed these IDs would break the automation scripts.
Thanks to AI large models — particularly vision models — controlling smartphones through AI is now practical. Below are some popular open-source projects that enable AI-driven mobile control.
---
1. MobiAgent — Mobile Intelligent Agent Framework
Developer: IPADS Lab
Purpose: AI agents autonomously operate mobile devices.

Example Tasks
- "Find the top-selling men's jeans on Xiaohongshu, search the same product on Taobao, collect brand/name/price, and send them via WeChat."
- "Open Ele.me and order a lemon water from Mixue Bingcheng."
How It Works
MobiAgent decomposes tasks into three specialized modules:
- Planner — Creates the overall plan.
- Decider — Determines where to click next.
- Grounder — Locates precise positions on the screen.

Core Components
- MobiMind Model Family — Intelligence core with models of varying scales.
- AgentRR Acceleration Framework — Optimizes repeated tasks for faster execution.
- MobiFlow Benchmark — Standardized scenarios across 10+ mainstream apps for evaluating performance.
Repo:
---
2. Mobile-Agent — Alibaba Open Source
Purpose: AI performs cross-app operations by visually understanding the screen.

Example Task
- "Search for Jinan travel guides on Xiaohongshu, sort by favorites, and save the first note."
Key Features
- Recognizes text, icons, and buttons visually — no backend API required.
- Uses ADB (Android Debug Bridge) for command execution.
- Captures screens after each step to self-correct actions.
Repo:
---
3. Droidrun — Mobile Automation Agent Framework
Platform: Android & iOS
Stars: 6.2K on GitHub

Concept
AI handles "thinking," while the framework performs actions — no reliance on hard-coded UI elements.
Example Task
- "Find next week’s available 2-person apartments in San Francisco and return the cheapest option."
Repo:
---
4. AppAgent — Tencent Open Source
Full Name: Multimodal Agents as Smartphone Users
Goal: Give AI agents human-like perception and interaction skills.

Key Features
- Captures screenshots via ADB, sends them to a multimodal AI model.
- Decides actions (tap/swipe) based on UI element analysis.
- Learns new apps through:
- Autonomous exploration
- Observation of human demonstrations
- Builds a Knowledge Base for future operations without relearning.
Repo: https://github.com/TencentQQGYLab/AppAgent
---
5. mobile-use — Voice-Controlled Mobile Automation
Stars: 1.8K
Platform: Android & iOS
Developer: Minitap AI Team

How It Works
- Captures current mobile screen.
- Sends screenshot + spoken/user instruction to a multimodal AI model.
- Model outputs coordinates or actions (tap/swipe/input).
- Executes via ADB.
- Takes new screenshot to verify progress, repeating until task completion.
Technical Notes
- Integrates Maestro mobile testing framework for reliable device interaction.
- Supports multiple large-model backends: OpenAI API, local models, or other services.
Repo: https://github.com/minitap-ai/mobile-use
---
Why This Matters
The growing ecosystem of AI-driven mobile automation — from AppAgent to mobile-use — is enabling:
- Human-like UI workflow learning
- Cross-app task execution
- Novel productivity tools
- Accessibility enhancements
---
Bonus: AiToEarn — AI Content Monetization Platform
For creators and developers looking to combine AI mobile automation with publishing:
- AiToEarn官网:
- Features:
- AI content generation
- Cross-platform publishing (Douyin, WeChat, Bilibili, Facebook, Instagram, YouTube, etc.)
- Analytics
- AI model ranking:
By integrating AI agents for task automation with AiToEarn for multi-platform distribution and monetization, creators can streamline digital productivity and maximize reach.
---
💡 Tip: Bookmark the repos above — these projects are rapidly evolving and could reshape how we interact with mobile devices.