Farewell to GUI Agent Infrastructure Nightmares: StepStar Releases 4B Open-Source Model for Easy Local Android App Deployment

Honghao Wang

03 Dec 2025 — 3 min read

StepFun Open-Sources GELab-Zero — A Breakthrough in GUI Agent Development

StepFun has officially open-sourced GELab-Zero, marking the first release of a GUI Agent model together with its complete infrastructure — supporting one-click deployment for professionals and hobbyists alike.

The 4B GUI Agent model sets new records for models of similar size on multiple GUI benchmarks (mobile & desktop), achieving state-of-the-art (SOTA) results.

Additionally, StepFun introduces AndroidDaily, a real business scenario–based evaluation standard, aiming to push GUI model assessment toward consumer-grade, large-scale applications.

---

Open Source Links

GitHub: https://github.com/stepfun-ai/gelab-zero
Model: https://modelscope.cn/models/stepfun-ai/GELab-Zero-4B-preview
Demo Experience: https://www.modelscope.cn/studios/stepfun-ai/GELab-Zero-4b

---

01 – Research Background

As AI adoption accelerates in consumer devices (especially smartphones), Mobile Agents are shifting focus from "is it possible" to "how to scale."

GUI Agents stand out thanks to:

Operation through visual understanding without vendor modification
Low integration cost
Ability to bridge fragmented mobile ecosystems

Challenges:

Running seamlessly across brands and system versions is difficult
Developers must handle complex tasks like multi-device ADB, dependency setup, permissions, inference service deployment, orchestration, and playback
These engineering demands divert focus from innovation and UX design

Goal: Reduce development barriers so creators can focus on value creation.

Solution — GELab-Zero Components:

GELab-Zero-4B-preview: Locally runnable GUI Agent model
End-to-End inference infrastructure: Plug-and-play, heavy lifting fully automated
AndroidDaily: Scenario-based evaluation benchmark

---

02 – Research Highlights

Highlight 1: Record-Setting Performance in a Lightweight, Fast Package

The GELab-Zero-4B-preview model was evaluated across benchmarks, including:

ScreenSpot
OSWorld
MMBench
Android World

Results: Outperforms other mainstream models of similar size; even beats larger models like GUI-Owl-32B.

Example Scenarios & Capabilities:

Complex Tasks

Multi-step, multi-entity shopping on Ele.me
Enterprise benefits voucher claim through multi-step navigation

Ambiguous Instructions

"Classic" Jackie Chan action movie search and play in Tencent Video
Family-friendly weekend activity recommendation with contextual judgment

---

Highlight 2: GUI + Infrastructure = One-Click MCP

Key Features:

Lightweight local inference: Runs on consumer hardware ⇒ low latency + privacy
One-click multi-terminal deployment: Auto environment setup & device management
Distributed task orchestration: Multi-device execution tracking & reproducibility
Multimodal agent paradigms: ReAct loops, multi-agent collaboration, scheduled tasks

Impact:

Handles diverse, complex real-world workflows
Enables rapid prototype testing for developers
Enterprise-ready MCP embedding

---

Highlight 3: AndroidDaily – Real Life Benchmark

Based on industry collaborations (mobile, IoT, automotive), AndroidDaily simulates real-world usage across six lifestyle areas:

Food
Transportation
Shopping
Housing
Information Consumption
Entertainment

Attributes:

Focus on mainstream & high-frequency apps
Faithful reproduction of realistic task flows (including user prompts for risky operations)

Performance: GELab-Zero-4B-preview scored 73.4% accuracy in AndroidDaily’s demanding test set.

---

Dual Evaluation Framework

Static Testing

3,146 actions
Agent receives task + screenshot sequence
Predicts action type/value (click, text input)
Focus on numerical accuracy
No heavy engineering required ⇒ fast, low-cost iteration

End-to-End Testing

235 tasks
Domains: transportation, e-commerce, payments, social, content, food delivery, etc.
Real devices/emulators
Success measured on full task completion

---

Connecting AI Deployment to Monetization

Beyond technical excellence, open frameworks like GELab-Zero foster scalable creative ecosystems.

Example: AiToEarn — an open-source AI content monetization platform that offers:

AI content generation
Cross-platform publishing
Analytics & model ranking
Simultaneous posting to Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter)

Resources:

---

Conclusion:

The StepFun team envisions GUI Agents as a bridge for large models from digital into the physical world. By open-sourcing GELab-Zero, they aim to democratize mobile AI Agent development, enabling rapid experimentation and deployment.

Click Read Original to access the model link.