Fara-7B: An Efficient Small Language Model for Intelligent Agents in Computer Applications

Fara-7B: An Efficient Small Language Model for Intelligent Agents in Computer Applications

Pushing the Frontiers of Computer‑Use Agents

Introducing Fara‑7B — An Open‑Weight, Ultra‑Compact Model Optimized for Real‑World Web Tasks

image

---

Background: Microsoft’s Small Language Models (SLMs) Journey

In 2024, Microsoft began delivering small language models to customers:

Now, the next milestone: Fara‑7B, our first agentic SLM for direct computer use.

---

What Makes Fara‑7B Different?

Key Highlights

  • Direct computer interaction — operates through visual perception, mouse, and keyboard actions.
  • Compact size — 7B parameters, yet competitive with much larger multi‑model agentic systems.
  • On‑device capability — enables low latency and improved privacy (data stays local).
  • Open‑weight release — encourages experimentation and community feedback.

Practical Applications

Fara‑7B is designed for automating everyday web tasks:

  • Filling out forms
  • Searching information
  • Booking travel
  • Managing accounts

---

Synergy with Content Monetization Platforms

The rise of on‑device AI agents complements open‑source ecosystems like AiToEarn — a global AI content monetization platform connecting:

  • Generation tools
  • Cross‑platform publishing (Douyin, Kwai, WeChat, Bilibili, Rednote/Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
  • Analytics & model rankings

Creators can:

  • Generate AI content
  • Publish across multiple channels
  • Track performance and monetize efficiently

---

Fara‑7B in Action: Demo Videos

Video 1 — Shopping Scenario

  • Task: Purchase an Xbox Spongebob controller via Magentic‑UI.
  • Behavior: Pauses at each Critical Point for user approval.

Video 2 — Information Retrieval

  • Task: Find and summarize the latest three issues posted on GitHub for `Microsoft/Magentic-UI`.

Video 3 — Multi‑Tool Task

  • Task: Determine driving time between two locations and suggest a nearby cheese shop.
  • Tools used: Bing Maps + Bing Search.

---

Performance and Limitations

Strengths

  • Competitive across common benchmarks and novel evaluation sets (e.g., job postings, price comparisons).

Limitations

  • Accuracy challenges with complex queries
  • Occasional instruction‑following errors
  • Susceptibility to hallucinations

Note: These remain active areas of research.

---

Availability

Integration with Magentic‑UI enables quick experimentation.

---

Technical Development

Synthetic Data Pipeline

Bottleneck: Lack of large‑scale, annotated computer interaction datasets.

Solution: Generate scalable synthetic multi‑step tasks from public websites using Magentic‑One.

Stages:

  • Task Proposal — Seeded from themed or random URLs; refined into actionable steps.
  • Benchmark: WebTailBench (11 categories, e.g., booking tickets, applying for jobs).
  • Task Solving — Multi‑agent system executes steps via:
  • Orchestrator
  • WebSurfer
  • UserSimulator
  • Trajectory Verification — Alignment, rubric, and multimodal checks to ensure fidelity.

Dataset:

  • 145,000 trajectories
  • 1 million steps across diverse sites and tasks

---

Training Fara‑7B

  • Base model: Qwen2.5‑VL‑7B
  • Input: Browser screenshots + action history + recent user messages
  • Output: Reasoning text + tool invocation (`click(x,y)`, `type()`, `visit_url()`…)
  • Training method: Supervised fine‑tuning (observe–think–act sequences)
  • No reinforcement learning used.

---

Evaluation

Benchmarks

  • WebVoyager
  • Online‑Mind2Web
  • DeepShop
  • WebTailBench (Microsoft‑created, real‑world tasks)

Testing uses BrowserBase for standardized sessions.

Result: Fara‑7B consistently outperforms comparable models, including some larger ones.

Performance Table

| Models | WebVoyager | Online‑Mind2Web | DeepShop | WebTailBench |

|---------------|------------|-----------------|----------|--------------|

| SoM Agent (GPT‑4o) | 65.1 | 34.6 | 16.0 | 30.0 |

| GLM‑4.1V‑9B‑Thinking | 66.8 | 33.9 | 32.0 | 22.4 |

| OpenAI computer‑use | 70.9 | 42.9 | 24.7 | 25.7 |

| UI‑TARS‑1.5‑7B | 66.4 | 31.3 | 11.6 | 19.5 |

| Fara‑7B | 73.5 | 34.1 | 26.2 | 38.4 |

---

Safety Measures

Principles

  • Transparency
  • User control
  • Sandboxed execution

Built‑in Protections:

  • Stops at Critical Points for explicit consent
  • High refusal rates for harmful tasks (82% in WebTailBench‑Refusals)
  • Microsoft red‑teaming on jailbreaks, prompt injections, unsafe outputs

Logging & Auditability: All actions logged for user review.

---

How to Use

Access:

  • Microsoft Foundry
  • Hugging Face
  • Try via Magentic‑UI (inference code provided)
  • Download for Copilot+ PCs in the VSCode AI Toolkit (NPU acceleration supported)

---

Looking Ahead

  • Goal: Build stronger on‑device CUAs via improved multimodal bases and reinforcement learning.
  • Early release focuses on community feedback and real‑world experimentation.
  • For contribution opportunities: Open roles at AI Frontiers.

---

Acknowledgements

Thanks to all contributors across engineering, research, and deployment teams who made Fara‑7B possible and brought it to Copilot+ PCs.

---

Tip for Creators: Combine Fara‑7B workflows with multi‑platform publishing via AiToEarn to maximize efficiency and monetization across major social and content networks, supported by integrated analytics and model ranking tools.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.