How to Make Agents Meet Expectations: Top 10 Practical Lessons from Building Cloud Assistant Aivis with Context Engineering and Multi-Agent Systems

How to Make Agents Meet Expectations: Top 10 Practical Lessons from Building Cloud Assistant Aivis with Context Engineering and Multi-Agent Systems

Building & Optimizing AI Agents: Lessons from YunXiaoer Aivis

This is the 123rd article of 2025

(Estimated reading time: 15 minutes)

---

01. Background

This year, our team has been focusing on the YunXiaoer Aivis project — a digital employee in the Alibaba Cloud service domain. It represents our transition from traditional intelligent customer support to a new phase of Multi-Agent capable digital employees.

image

At its core, Aivis leverages LLM-based reasoning. Under a Multi-Agent architecture, it integrates MCP Tools, Browser Use, and Computer Use, enabling more human-like problem-solving.

It is a challenging mission spanning algorithms, engineering, and data.

For an overview, see TL Hong Lin’s Yunqi Conference talk: “YunXiaoer Aivis: Advancing Towards Autonomous Agents in Alibaba Cloud Intelligent Service” [1].

---

02. Why Agents Fail to Meet Expectations

When asking “Why doesn’t my Agent output as expected?”, break this into two questions:

  • What is your expectation? (Be precise, measurable)
  • How is the Agent designed to meet it?

Two perspectives to analyze failure:

  • Expectation Perspective: Vague goals (“be smarter”, “answer correctly”) don’t guide optimization.
  • Technical Perspective: Achieve performance via
  • Optimizing Prompt/Context Engineering
  • Optimizing the Model (SFT, DPO, RLHF...)

Model retraining is costly and often unnecessary; this article focuses on context engineering and Multi-Agent strategies.

---

03. Ten Practical Agent Optimization Experiences

Below are distilled lessons from building Aivis — each includes a core principle, pitfalls, and solutions.

3.1 Make Expectations Clear

Core Principle: Avoid vague tasks; define explicit goals with zero ambiguity.

Define expectations with:

  • Task – detailed logic & rules for judgment
  • Format – specify output type/schema
  • Style/Tone – professional, friendly, etc.

Pitfall Example:

"ECS instance lock" has two meanings (business lock vs. OS account lock). Vague prompts lead to misinterpretation.

Solution: Clearly distinguish cases, add diagnostic steps, prompt for clarification when uncertain.

---

3.2 Refine Context Precision

Core Principle: Give what’s needed, remove what disturbs.

Pitfall Example:

Feeding entire finance API results (balances, frozen funds, credits) confuses the model when determining overdue status.

Solution:

Filter for the essential field only (e.g., available balance < 0).

---

3.3 Clarify Identities & Preserve Execution History

Core Principle: The model must know who is acting/talking, and remember past steps.

Pitfall:

Mixing customer-stage dialogue into `History` without including the model’s own execution logs caused hallucinations.

Solution:

  • Keep main flow as user ↔ model.
  • Inject CSR dialogues as labeled Dialogue Memory.
  • Preserve full Action History (even failed steps).
  • Following “Mask, Don’t Remove” and “Keep the Wrong Stuff In” improves grounding.

---

3.4 Express Logic in Structured Form

Core Principle: Complex logic → use structured formats (JSON, YAML, pseudocode), not pure natural language.

Structure reduces ambiguity and boosts compliance (e.g., sequential workflow steps).

---

3.5 Customize Tool Protocols

Core Principle: Domain-specific protocols can outperform generic standards.

Our custom schema outperformed generic Function Call/MCP formats in stability.

Reason: Pre-trained generic patterns can override domain-specific needs.

---

3.6 Use Few-Shot Examples Wisely

  • Single-task scenarios: Provide diverse examples, including “no result” cases.
  • Flexible tasks: Use few-shot sparingly to avoid overfitting/reduced adaptability.

---

3.7 Keep Context Slim

Core Principle: Shorter contexts improve performance & reduce cost.

Techniques:

  • Use RAG for dynamic info selection.
  • Remove low-impact instructions.
  • Test sensitivity before trimming.

---

3.8 Manage Memory to Prevent Forgetting

Approaches:

  • Repeat key facts across turns.
  • Context compression (summarize older chat).
  • External memory storage with read/write capability.

---

3.9 Use Multi-Agent for Control + Flexibility

Balance structured workflows with LLM autonomy:

  • Main Agent: routing, decision-making
  • Sub-Agents/Tools: execute fixed complex processes

Effective in balancing predictability with adaptability.

---

3.10 Embed Human-in-the-Loop (HITL)

Core Principle: Understand how humans do the task before digitizing it.

Requires:

  • Observing real workflows
  • Continuous feedback loops
  • Iterative expectation refinement

Agents that lack human operational insight risk poor alignment.

---

04. Conclusion

These lessons from Cloud Assistant Aivis bridge engineering discipline (context optimization, architecture design) with practical execution (tooling, workflows, HITL).

Share your own challenges — collective learning only makes better Agents.

---

References:

[1] Yunqi Conference Agenda ID: 6008

[2] Manus Blog: Context Engineering Lessons

---

Related Resource:

AiToEarn – An open-source global AI content monetization platform integrating generation, cross-platform publishing, analytics, and AI model ranking. Supports networks like Douyin, Kwai, WeChat, YouTube, Instagram, LinkedIn, Pinterest, X (Twitter). Useful for deploying optimized Agent workflows into real-world creative pipelines.

---

Would you like me to also create a visual summary table of all 10 experiences so they’re easier to scan and reference? This would fit neatly under section 03.

Read more