Farewell to GUI Agent Infrastructure Nightmares: StepStar Releases 4B Open-Source Model for Easy Local Android App Deployment
StepFun Open-Sources GELab-Zero — A Breakthrough in GUI Agent Development

StepFun has officially open-sourced GELab-Zero, marking the first release of a GUI Agent model together with its complete infrastructure — supporting one-click deployment for professionals and hobbyists alike.
The 4B GUI Agent model sets new records for models of similar size on multiple GUI benchmarks (mobile & desktop), achieving state-of-the-art (SOTA) results.
Additionally, StepFun introduces AndroidDaily, a real business scenario–based evaluation standard, aiming to push GUI model assessment toward consumer-grade, large-scale applications.
---
Open Source Links
- GitHub: https://github.com/stepfun-ai/gelab-zero
- Model: https://modelscope.cn/models/stepfun-ai/GELab-Zero-4B-preview
- Demo Experience: https://www.modelscope.cn/studios/stepfun-ai/GELab-Zero-4b
---
01 – Research Background
As AI adoption accelerates in consumer devices (especially smartphones), Mobile Agents are shifting focus from "is it possible" to "how to scale."
GUI Agents stand out thanks to:
- Operation through visual understanding without vendor modification
- Low integration cost
- Ability to bridge fragmented mobile ecosystems
Challenges:
- Running seamlessly across brands and system versions is difficult
- Developers must handle complex tasks like multi-device ADB, dependency setup, permissions, inference service deployment, orchestration, and playback
- These engineering demands divert focus from innovation and UX design
Goal: Reduce development barriers so creators can focus on value creation.
Solution — GELab-Zero Components:
- GELab-Zero-4B-preview: Locally runnable GUI Agent model
- End-to-End inference infrastructure: Plug-and-play, heavy lifting fully automated
- AndroidDaily: Scenario-based evaluation benchmark
---
02 – Research Highlights
Highlight 1: Record-Setting Performance in a Lightweight, Fast Package
The GELab-Zero-4B-preview model was evaluated across benchmarks, including:
- ScreenSpot
- OSWorld
- MMBench
- Android World
Results: Outperforms other mainstream models of similar size; even beats larger models like GUI-Owl-32B.
Example Scenarios & Capabilities:
Complex Tasks
- Multi-step, multi-entity shopping on Ele.me
- Enterprise benefits voucher claim through multi-step navigation
Ambiguous Instructions
- "Classic" Jackie Chan action movie search and play in Tencent Video
- Family-friendly weekend activity recommendation with contextual judgment


---
Highlight 2: GUI + Infrastructure = One-Click MCP

Key Features:
- Lightweight local inference: Runs on consumer hardware ⇒ low latency + privacy
- One-click multi-terminal deployment: Auto environment setup & device management
- Distributed task orchestration: Multi-device execution tracking & reproducibility
- Multimodal agent paradigms: ReAct loops, multi-agent collaboration, scheduled tasks
Impact:
- Handles diverse, complex real-world workflows
- Enables rapid prototype testing for developers
- Enterprise-ready MCP embedding
---
Highlight 3: AndroidDaily – Real Life Benchmark

Based on industry collaborations (mobile, IoT, automotive), AndroidDaily simulates real-world usage across six lifestyle areas:
- Food
- Transportation
- Shopping
- Housing
- Information Consumption
- Entertainment
Attributes:
- Focus on mainstream & high-frequency apps
- Faithful reproduction of realistic task flows (including user prompts for risky operations)
Performance: GELab-Zero-4B-preview scored 73.4% accuracy in AndroidDaily’s demanding test set.
---
Dual Evaluation Framework
Static Testing
- 3,146 actions
- Agent receives task + screenshot sequence
- Predicts action type/value (click, text input)
- Focus on numerical accuracy
- No heavy engineering required ⇒ fast, low-cost iteration
End-to-End Testing
- 235 tasks
- Domains: transportation, e-commerce, payments, social, content, food delivery, etc.
- Real devices/emulators
- Success measured on full task completion

---
Connecting AI Deployment to Monetization
Beyond technical excellence, open frameworks like GELab-Zero foster scalable creative ecosystems.
Example: AiToEarn — an open-source AI content monetization platform that offers:
- AI content generation
- Cross-platform publishing
- Analytics & model ranking
- Simultaneous posting to Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter)
Resources:
---
Conclusion:
The StepFun team envisions GUI Agents as a bridge for large models from digital into the physical world. By open-sourcing GELab-Zero, they aim to democratize mobile AI Agent development, enabling rapid experimentation and deployment.
Click Read Original to access the model link.