Fara-7B: An Efficient Small Language Model for Intelligent Agents in Computer Applications
Pushing the Frontiers of Computer‑Use Agents
Introducing Fara‑7B — An Open‑Weight, Ultra‑Compact Model Optimized for Real‑World Web Tasks

---
Background: Microsoft’s Small Language Models (SLMs) Journey
In 2024, Microsoft began delivering small language models to customers:
- Phi models via Microsoft Foundry
- Phi Silica deployed on Copilot+ PCs with Windows 11 (details here)
Now, the next milestone: Fara‑7B, our first agentic SLM for direct computer use.
---
What Makes Fara‑7B Different?
Key Highlights
- Direct computer interaction — operates through visual perception, mouse, and keyboard actions.
- Compact size — 7B parameters, yet competitive with much larger multi‑model agentic systems.
- On‑device capability — enables low latency and improved privacy (data stays local).
- Open‑weight release — encourages experimentation and community feedback.
Practical Applications
Fara‑7B is designed for automating everyday web tasks:
- Filling out forms
- Searching information
- Booking travel
- Managing accounts
---
Synergy with Content Monetization Platforms
The rise of on‑device AI agents complements open‑source ecosystems like AiToEarn — a global AI content monetization platform connecting:
- Generation tools
- Cross‑platform publishing (Douyin, Kwai, WeChat, Bilibili, Rednote/Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
- Analytics & model rankings
Creators can:
- Generate AI content
- Publish across multiple channels
- Track performance and monetize efficiently
---
Fara‑7B in Action: Demo Videos
Video 1 — Shopping Scenario
- Task: Purchase an Xbox Spongebob controller via Magentic‑UI.
- Behavior: Pauses at each Critical Point for user approval.
Video 2 — Information Retrieval
- Task: Find and summarize the latest three issues posted on GitHub for `Microsoft/Magentic-UI`.
Video 3 — Multi‑Tool Task
- Task: Determine driving time between two locations and suggest a nearby cheese shop.
- Tools used: Bing Maps + Bing Search.
---
Performance and Limitations
Strengths
- Competitive across common benchmarks and novel evaluation sets (e.g., job postings, price comparisons).
Limitations
- Accuracy challenges with complex queries
- Occasional instruction‑following errors
- Susceptibility to hallucinations
Note: These remain active areas of research.
---
Availability
- Microsoft Foundry
- Hugging Face (MIT license)
- Quantized & silicon‑optimized version for Copilot+ PCs with Windows 11.
Integration with Magentic‑UI enables quick experimentation.
---
Technical Development
Synthetic Data Pipeline
Bottleneck: Lack of large‑scale, annotated computer interaction datasets.
Solution: Generate scalable synthetic multi‑step tasks from public websites using Magentic‑One.
Stages:
- Task Proposal — Seeded from themed or random URLs; refined into actionable steps.
- Benchmark: WebTailBench (11 categories, e.g., booking tickets, applying for jobs).
- Task Solving — Multi‑agent system executes steps via:
- Orchestrator
- WebSurfer
- UserSimulator
- Trajectory Verification — Alignment, rubric, and multimodal checks to ensure fidelity.
Dataset:
- 145,000 trajectories
- 1 million steps across diverse sites and tasks
---
Training Fara‑7B
- Base model: Qwen2.5‑VL‑7B
- Input: Browser screenshots + action history + recent user messages
- Output: Reasoning text + tool invocation (`click(x,y)`, `type()`, `visit_url()`…)
- Training method: Supervised fine‑tuning (observe–think–act sequences)
- No reinforcement learning used.
---
Evaluation
Benchmarks
- WebVoyager
- Online‑Mind2Web
- DeepShop
- WebTailBench (Microsoft‑created, real‑world tasks)
Testing uses BrowserBase for standardized sessions.
Result: Fara‑7B consistently outperforms comparable models, including some larger ones.
Performance Table
| Models | WebVoyager | Online‑Mind2Web | DeepShop | WebTailBench |
|---------------|------------|-----------------|----------|--------------|
| SoM Agent (GPT‑4o) | 65.1 | 34.6 | 16.0 | 30.0 |
| GLM‑4.1V‑9B‑Thinking | 66.8 | 33.9 | 32.0 | 22.4 |
| OpenAI computer‑use | 70.9 | 42.9 | 24.7 | 25.7 |
| UI‑TARS‑1.5‑7B | 66.4 | 31.3 | 11.6 | 19.5 |
| Fara‑7B | 73.5 | 34.1 | 26.2 | 38.4 |
---
Safety Measures
Principles
- Transparency
- User control
- Sandboxed execution
Built‑in Protections:
- Stops at Critical Points for explicit consent
- High refusal rates for harmful tasks (82% in WebTailBench‑Refusals)
- Microsoft red‑teaming on jailbreaks, prompt injections, unsafe outputs
Logging & Auditability: All actions logged for user review.
---
How to Use
Access:
- Microsoft Foundry
- Hugging Face
- Try via Magentic‑UI (inference code provided)
- Download for Copilot+ PCs in the VSCode AI Toolkit (NPU acceleration supported)
---
Looking Ahead
- Goal: Build stronger on‑device CUAs via improved multimodal bases and reinforcement learning.
- Early release focuses on community feedback and real‑world experimentation.
- For contribution opportunities: Open roles at AI Frontiers.
---
Acknowledgements
Thanks to all contributors across engineering, research, and deployment teams who made Fara‑7B possible and brought it to Copilot+ PCs.
---
Tip for Creators: Combine Fara‑7B workflows with multi‑platform publishing via AiToEarn to maximize efficiency and monetization across major social and content networks, supported by integrated analytics and model ranking tools.