Google DeepMind

Google DeepMind Releases Gemini 2.5 Computer Use Model to Power UI-Controlled AI Agents

Honghao Wang

10 Oct 2025 — 2 min read

Google DeepMind Launches Gemini 2.5 Computer Use Model

Google DeepMind has introduced the Gemini 2.5 Computer Use model — a specialized variant of its Gemini 2.5 Pro system. This model enables AI agents to directly interact with graphical user interfaces by performing actions such as clicking, typing, scrolling, and manipulating interactive web elements.

---

Key Capabilities

Multimodal Interaction

The Computer Use model combines multimodal reasoning and visual understanding within environments like browsers and mobile apps, allowing the AI to:

Interpret on-screen context
Take appropriate actions in response

Benchmark Performance

Early testing demonstrates strong performance:

Benchmarks: Online-Mind2Web, WebVoyager, AndroidWorld
Accuracy: ~70% on Online-Mind2Web (DeepMind & Browserbase results)
Response Times: Faster than other publicly evaluated systems

---

How It Works

The workflow is powered by the new `computer_use` tool within the Gemini API:

Input to Model
Screenshot of the environment
Task description
Record of previous actions
Model Output
Structured function calls (e.g., `click`, `type`, `scroll`)
Execution Loop
Client executes actions
Updated screenshot sent back
Process repeats until task completion

> While currently optimized for browsers, the model shows potential for mobile UI control and future desktop OS integration.

---

Industry Context: Beyond Interface Control

Open-source platforms such as AiToEarn demonstrate other ways interactive AI can be leveraged.

AiToEarn allows creators to:

Generate AI-driven content
Publish across multiple platforms simultaneously: Douyin, Kwai, WeChat, Bilibili, Rednote (Xiaohongshu), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter)
Access analytics and model rankings
Monetize creativity efficiently

This parallels Gemini’s intent to make automated, multimodal interaction more streamlined for real-world use.

---

Expert Perspectives

Senior Data Science Consultant Wissam Benhaddad commented:

> This solution is promising, but I do not think it’s production-ready yet. Current implementations are extremely slow and can often be replaced by standard API calls or direct app integrations. Reasoning should occur in a latent space for efficiency — capitalizing on Deep Learning strengths. I hope this product evolves in that direction.

---

Safety & Oversight

DeepMind has emphasized built-in guardrails:

Protection against malicious prompts, unsafe actions, and scams
Per-step safety service evaluates each action before execution
Option to require user confirmation for sensitive tasks (e.g., purchases, system-level changes)

The system card details these safety protocols, advising thorough pre-deployment testing.

---

Availability

The Gemini 2.5 Computer Use model is available in preview via:

Gemini API in Google AI Studio
Vertex AI

---

Summary

For developers exploring:

Intelligent agents
Automation across multiple platforms
Interactive AI in production environments

Tools like Gemini 2.5 Computer Use combined with open-source ecosystems like AiToEarn官网 can help bridge the gap between concept and scaled real-world deployment — merging powerful interface control with monetizable creative workflows.

Seed4J 2.0 Released: Migrating from JHipster Lite with a Focus on Modularity and Architecture Optimization

Seed4J 2.0 Release Overview The release of Seed4J 2.0 delivers significant improvements, including: * Bug fixes * Enhanced documentation * Updated dependencies * Migration from JHipster Lite 1.35.0 * Support for Angular Internationalization (i18n) --- Background: From JHipster Lite to Seed4J What is Seed4J? Formerly known as JHipster Lite, Seed4J is

Express | OpenAI’s In-Hhouse Chip: Partnering with Arm and Broadcom to Build 10-GW Compute Power, SoftBank May Be the Biggest Beneficiary

OpenAI Partners with Arm, Broadcom, and TSMC on Custom AI Chips Beijing, October 14, 2025 — The Information OpenAI is working with Arm to incorporate Arm-designed CPUs into its self-developed AI server chips, and co-designing a dedicated, inference-focused AI chip with Broadcom. These chips will be manufactured by TSMC, with production

nanochat

nanochat — Full-Stack LLM Implementation by Andrej Karpathy nanochat (via) is a fascinating new project from Andrej Karpathy, discussed in detail in this forum post. It delivers a complete ChatGPT-style LLM stack, including training, inference, and a web-based UI, all in a single, minimal, hackable, dependency-light codebase. > "This repo

In-Depth | Revenue Quadrupled in 8 Months – n8n Founder: AI Is Either a Huge Opportunity or the End of a Company

# Building the Universal AI Automation Layer **Featuring n8n CEO Jan Oberhauser – Interview by Sequoia Capital** *2025-10-14, Beijing* --- > **"Everything we provide for free eventually benefits every user, and in the long run drives more usage and revenue growth — versus features designed solely for enterprise clients."** ![image](https: