Google DeepMind Releases Gemini 2.5 Computer Use Model to Power UI-Controlled AI Agents
Google DeepMind Launches Gemini 2.5 Computer Use Model
Google DeepMind has introduced the Gemini 2.5 Computer Use model — a specialized variant of its Gemini 2.5 Pro system. This model enables AI agents to directly interact with graphical user interfaces by performing actions such as clicking, typing, scrolling, and manipulating interactive web elements.
---
Key Capabilities
Multimodal Interaction
The Computer Use model combines multimodal reasoning and visual understanding within environments like browsers and mobile apps, allowing the AI to:
- Interpret on-screen context
- Take appropriate actions in response
Benchmark Performance
Early testing demonstrates strong performance:
- Benchmarks: Online-Mind2Web, WebVoyager, AndroidWorld
- Accuracy: ~70% on Online-Mind2Web (DeepMind & Browserbase results)
- Response Times: Faster than other publicly evaluated systems
---
How It Works
The workflow is powered by the new `computer_use` tool within the Gemini API:
- Input to Model
- Screenshot of the environment
- Task description
- Record of previous actions
- Model Output
- Structured function calls (e.g., `click`, `type`, `scroll`)
- Execution Loop
- Client executes actions
- Updated screenshot sent back
- Process repeats until task completion
> While currently optimized for browsers, the model shows potential for mobile UI control and future desktop OS integration.
---
Industry Context: Beyond Interface Control
Open-source platforms such as AiToEarn demonstrate other ways interactive AI can be leveraged.
AiToEarn allows creators to:
- Generate AI-driven content
- Publish across multiple platforms simultaneously: Douyin, Kwai, WeChat, Bilibili, Rednote (Xiaohongshu), Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter)
- Access analytics and model rankings
- Monetize creativity efficiently
This parallels Gemini’s intent to make automated, multimodal interaction more streamlined for real-world use.
---
Expert Perspectives
Senior Data Science Consultant Wissam Benhaddad commented:
> This solution is promising, but I do not think it’s production-ready yet. Current implementations are extremely slow and can often be replaced by standard API calls or direct app integrations. Reasoning should occur in a latent space for efficiency — capitalizing on Deep Learning strengths. I hope this product evolves in that direction.
---
Safety & Oversight
DeepMind has emphasized built-in guardrails:
- Protection against malicious prompts, unsafe actions, and scams
- Per-step safety service evaluates each action before execution
- Option to require user confirmation for sensitive tasks (e.g., purchases, system-level changes)
The system card details these safety protocols, advising thorough pre-deployment testing.
---
Availability
The Gemini 2.5 Computer Use model is available in preview via:
- Gemini API in Google AI Studio
- Vertex AI
---
Summary
For developers exploring:
- Intelligent agents
- Automation across multiple platforms
- Interactive AI in production environments
Tools like Gemini 2.5 Computer Use combined with open-source ecosystems like AiToEarn官网 can help bridge the gap between concept and scaled real-world deployment — merging powerful interface control with monetizable creative workflows.