Farewell to GUI Agent Infrastructure Nightmares: StepStar Releases 4B Open-Source Model for Easy Local Android App Deployment

Farewell to GUI Agent Infrastructure Nightmares: StepStar Releases 4B Open-Source Model for Easy Local Android App Deployment

StepFun Open-Sources GELab-Zero — A Breakthrough in GUI Agent Development

image

StepFun has officially open-sourced GELab-Zero, marking the first release of a GUI Agent model together with its complete infrastructure — supporting one-click deployment for professionals and hobbyists alike.

The 4B GUI Agent model sets new records for models of similar size on multiple GUI benchmarks (mobile & desktop), achieving state-of-the-art (SOTA) results.

Additionally, StepFun introduces AndroidDaily, a real business scenario–based evaluation standard, aiming to push GUI model assessment toward consumer-grade, large-scale applications.

---

---

01 – Research Background

As AI adoption accelerates in consumer devices (especially smartphones), Mobile Agents are shifting focus from "is it possible" to "how to scale."

GUI Agents stand out thanks to:

  • Operation through visual understanding without vendor modification
  • Low integration cost
  • Ability to bridge fragmented mobile ecosystems

Challenges:

  • Running seamlessly across brands and system versions is difficult
  • Developers must handle complex tasks like multi-device ADB, dependency setup, permissions, inference service deployment, orchestration, and playback
  • These engineering demands divert focus from innovation and UX design

Goal: Reduce development barriers so creators can focus on value creation.

Solution — GELab-Zero Components:

  • GELab-Zero-4B-preview: Locally runnable GUI Agent model
  • End-to-End inference infrastructure: Plug-and-play, heavy lifting fully automated
  • AndroidDaily: Scenario-based evaluation benchmark

---

02 – Research Highlights

Highlight 1: Record-Setting Performance in a Lightweight, Fast Package

The GELab-Zero-4B-preview model was evaluated across benchmarks, including:

  • ScreenSpot
  • OSWorld
  • MMBench
  • Android World

Results: Outperforms other mainstream models of similar size; even beats larger models like GUI-Owl-32B.

Example Scenarios & Capabilities:

Complex Tasks

  • Multi-step, multi-entity shopping on Ele.me
  • Enterprise benefits voucher claim through multi-step navigation

Ambiguous Instructions

  • "Classic" Jackie Chan action movie search and play in Tencent Video
  • Family-friendly weekend activity recommendation with contextual judgment
image
image

---

Highlight 2: GUI + Infrastructure = One-Click MCP

image

Key Features:

  • Lightweight local inference: Runs on consumer hardware ⇒ low latency + privacy
  • One-click multi-terminal deployment: Auto environment setup & device management
  • Distributed task orchestration: Multi-device execution tracking & reproducibility
  • Multimodal agent paradigms: ReAct loops, multi-agent collaboration, scheduled tasks

Impact:

  • Handles diverse, complex real-world workflows
  • Enables rapid prototype testing for developers
  • Enterprise-ready MCP embedding

---

Highlight 3: AndroidDaily – Real Life Benchmark

image

Based on industry collaborations (mobile, IoT, automotive), AndroidDaily simulates real-world usage across six lifestyle areas:

  • Food
  • Transportation
  • Shopping
  • Housing
  • Information Consumption
  • Entertainment

Attributes:

  • Focus on mainstream & high-frequency apps
  • Faithful reproduction of realistic task flows (including user prompts for risky operations)

Performance: GELab-Zero-4B-preview scored 73.4% accuracy in AndroidDaily’s demanding test set.

---

Dual Evaluation Framework

Static Testing

  • 3,146 actions
  • Agent receives task + screenshot sequence
  • Predicts action type/value (click, text input)
  • Focus on numerical accuracy
  • No heavy engineering required ⇒ fast, low-cost iteration

End-to-End Testing

  • 235 tasks
  • Domains: transportation, e-commerce, payments, social, content, food delivery, etc.
  • Real devices/emulators
  • Success measured on full task completion
image

---

Connecting AI Deployment to Monetization

Beyond technical excellence, open frameworks like GELab-Zero foster scalable creative ecosystems.

Example: AiToEarn — an open-source AI content monetization platform that offers:

  • AI content generation
  • Cross-platform publishing
  • Analytics & model ranking
  • Simultaneous posting to Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter)

Resources:

---

Conclusion:

The StepFun team envisions GUI Agents as a bridge for large models from digital into the physical world. By open-sourcing GELab-Zero, they aim to democratize mobile AI Agent development, enabling rapid experimentation and deployment.

Click Read Original to access the model link.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.