5 Impressive GitHub Projects for AI-Controlled Smartphones

5 Impressive GitHub Projects for AI-Controlled Smartphones

AI-Powered Mobile Automation Overview

image

Traditionally, automating mobile phone operations required tools like Appium or Airtest, along with detailed knowledge of an app’s element IDs (`resource-id`, `xpath`, etc.).

However, any app update that changed these IDs would break the automation scripts.

Thanks to AI large models — particularly vision models — controlling smartphones through AI is now practical. Below are some popular open-source projects that enable AI-driven mobile control.

---

1. MobiAgent — Mobile Intelligent Agent Framework

Developer: IPADS Lab

Purpose: AI agents autonomously operate mobile devices.

image

Example Tasks

  • "Find the top-selling men's jeans on Xiaohongshu, search the same product on Taobao, collect brand/name/price, and send them via WeChat."
  • "Open Ele.me and order a lemon water from Mixue Bingcheng."

How It Works

MobiAgent decomposes tasks into three specialized modules:

  • Planner — Creates the overall plan.
  • Decider — Determines where to click next.
  • Grounder — Locates precise positions on the screen.
image

Core Components

  • MobiMind Model Family — Intelligence core with models of varying scales.
  • AgentRR Acceleration Framework — Optimizes repeated tasks for faster execution.
  • MobiFlow Benchmark — Standardized scenarios across 10+ mainstream apps for evaluating performance.

Repo:

---

2. Mobile-Agent — Alibaba Open Source

Purpose: AI performs cross-app operations by visually understanding the screen.

image

Example Task

  • "Search for Jinan travel guides on Xiaohongshu, sort by favorites, and save the first note."

Key Features

  • Recognizes text, icons, and buttons visually — no backend API required.
  • Uses ADB (Android Debug Bridge) for command execution.
  • Captures screens after each step to self-correct actions.

Repo:

---

3. Droidrun — Mobile Automation Agent Framework

Platform: Android & iOS

Stars: 6.2K on GitHub

image

Concept

AI handles "thinking," while the framework performs actions — no reliance on hard-coded UI elements.

Example Task

  • "Find next week’s available 2-person apartments in San Francisco and return the cheapest option."

Repo:

---

4. AppAgent — Tencent Open Source

Full Name: Multimodal Agents as Smartphone Users

Goal: Give AI agents human-like perception and interaction skills.

image

Key Features

  • Captures screenshots via ADB, sends them to a multimodal AI model.
  • Decides actions (tap/swipe) based on UI element analysis.
  • Learns new apps through:
  • Autonomous exploration
  • Observation of human demonstrations
  • Builds a Knowledge Base for future operations without relearning.

Repo: https://github.com/TencentQQGYLab/AppAgent

---

5. mobile-use — Voice-Controlled Mobile Automation

Stars: 1.8K

Platform: Android & iOS

Developer: Minitap AI Team

image

How It Works

  • Captures current mobile screen.
  • Sends screenshot + spoken/user instruction to a multimodal AI model.
  • Model outputs coordinates or actions (tap/swipe/input).
  • Executes via ADB.
  • Takes new screenshot to verify progress, repeating until task completion.

Technical Notes

  • Integrates Maestro mobile testing framework for reliable device interaction.
  • Supports multiple large-model backends: OpenAI API, local models, or other services.

Repo: https://github.com/minitap-ai/mobile-use

---

Why This Matters

The growing ecosystem of AI-driven mobile automation — from AppAgent to mobile-use — is enabling:

  • Human-like UI workflow learning
  • Cross-app task execution
  • Novel productivity tools
  • Accessibility enhancements

---

Bonus: AiToEarn — AI Content Monetization Platform

For creators and developers looking to combine AI mobile automation with publishing:

  • AiToEarn官网:
  • Features:
  • AI content generation
  • Cross-platform publishing (Douyin, WeChat, Bilibili, Facebook, Instagram, YouTube, etc.)
  • Analytics
  • AI model ranking:

By integrating AI agents for task automation with AiToEarn for multi-platform distribution and monetization, creators can streamline digital productivity and maximize reach.

---

💡 Tip: Bookmark the repos above — these projects are rapidly evolving and could reshape how we interact with mobile devices.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.