Google DeepMind Releases Gemini 2.5 Computer Use Model to Power UI-Controlled AI Agents

Google DeepMind Launches Gemini 2.5 Computer Use Model

Google DeepMind has introduced the Gemini 2.5 Computer Use model — a specialized variant of its Gemini 2.5 Pro system. This model enables AI agents to directly interact with graphical user interfaces by performing actions such as clicking, typing, scrolling, and manipulating interactive web elements.

---

Key Capabilities

Multimodal Interaction

The Computer Use model combines multimodal reasoning and visual understanding within environments like browsers and mobile apps, allowing the AI to:

  • Interpret on-screen context
  • Take appropriate actions in response

Benchmark Performance

Early testing demonstrates strong performance:

  • Benchmarks: Online-Mind2Web, WebVoyager, AndroidWorld
  • Accuracy: ~70% on Online-Mind2Web (DeepMind & Browserbase results)
  • Response Times: Faster than other publicly evaluated systems

---

How It Works

The workflow is powered by the new `computer_use` tool within the Gemini API:

  • Input to Model
  • Screenshot of the environment
  • Task description
  • Record of previous actions
  • Model Output
  • Structured function calls (e.g., `click`, `type`, `scroll`)
  • Execution Loop
  • Client executes actions
  • Updated screenshot sent back
  • Process repeats until task completion

> While currently optimized for browsers, the model shows potential for mobile UI control and future desktop OS integration.

---

Industry Context: Beyond Interface Control

Open-source platforms such as AiToEarn demonstrate other ways interactive AI can be leveraged.

AiToEarn allows creators to:

  • Generate AI-driven content
  • Publish across multiple platforms simultaneously: Douyin, Kwai, WeChat, Bilibili, Rednote (Xiaohongshu), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter)
  • Access analytics and model rankings
  • Monetize creativity efficiently

This parallels Gemini’s intent to make automated, multimodal interaction more streamlined for real-world use.

---

Expert Perspectives

Senior Data Science Consultant Wissam Benhaddad commented:

> This solution is promising, but I do not think it’s production-ready yet. Current implementations are extremely slow and can often be replaced by standard API calls or direct app integrations. Reasoning should occur in a latent space for efficiency — capitalizing on Deep Learning strengths. I hope this product evolves in that direction.

---

Safety & Oversight

DeepMind has emphasized built-in guardrails:

  • Protection against malicious prompts, unsafe actions, and scams
  • Per-step safety service evaluates each action before execution
  • Option to require user confirmation for sensitive tasks (e.g., purchases, system-level changes)

The system card details these safety protocols, advising thorough pre-deployment testing.

---

Availability

The Gemini 2.5 Computer Use model is available in preview via:

  • Gemini API in Google AI Studio
  • Vertex AI

---

Summary

For developers exploring:

  • Intelligent agents
  • Automation across multiple platforms
  • Interactive AI in production environments

Tools like Gemini 2.5 Computer Use combined with open-source ecosystems like AiToEarn官网 can help bridge the gap between concept and scaled real-world deployment — merging powerful interface control with monetizable creative workflows.

Read more

Express | OpenAI’s In-Hhouse Chip: Partnering with Arm and Broadcom to Build 10-GW Compute Power, SoftBank May Be the Biggest Beneficiary

Express | OpenAI’s In-Hhouse Chip: Partnering with Arm and Broadcom to Build 10-GW Compute Power, SoftBank May Be the Biggest Beneficiary

OpenAI Partners with Arm, Broadcom, and TSMC on Custom AI Chips Beijing, October 14, 2025 — The Information OpenAI is working with Arm to incorporate Arm-designed CPUs into its self-developed AI server chips, and co-designing a dedicated, inference-focused AI chip with Broadcom. These chips will be manufactured by TSMC, with production

By Honghao Wang