How to Make Agents Meet Expectations: Top 10 Practical Lessons from Building Cloud Assistant Aivis with Context Engineering and Multi-Agent Systems
Building & Optimizing AI Agents: Lessons from YunXiaoer Aivis
This is the 123rd article of 2025
(Estimated reading time: 15 minutes)
---
01. Background
This year, our team has been focusing on the YunXiaoer Aivis project — a digital employee in the Alibaba Cloud service domain. It represents our transition from traditional intelligent customer support to a new phase of Multi-Agent capable digital employees.

At its core, Aivis leverages LLM-based reasoning. Under a Multi-Agent architecture, it integrates MCP Tools, Browser Use, and Computer Use, enabling more human-like problem-solving.
It is a challenging mission spanning algorithms, engineering, and data.
For an overview, see TL Hong Lin’s Yunqi Conference talk: “YunXiaoer Aivis: Advancing Towards Autonomous Agents in Alibaba Cloud Intelligent Service” [1].
---
02. Why Agents Fail to Meet Expectations
When asking “Why doesn’t my Agent output as expected?”, break this into two questions:
- What is your expectation? (Be precise, measurable)
- How is the Agent designed to meet it?
Two perspectives to analyze failure:
- Expectation Perspective: Vague goals (“be smarter”, “answer correctly”) don’t guide optimization.
- Technical Perspective: Achieve performance via
- Optimizing Prompt/Context Engineering
- Optimizing the Model (SFT, DPO, RLHF...)
Model retraining is costly and often unnecessary; this article focuses on context engineering and Multi-Agent strategies.
---
03. Ten Practical Agent Optimization Experiences
Below are distilled lessons from building Aivis — each includes a core principle, pitfalls, and solutions.
3.1 Make Expectations Clear
Core Principle: Avoid vague tasks; define explicit goals with zero ambiguity.
Define expectations with:
- Task – detailed logic & rules for judgment
- Format – specify output type/schema
- Style/Tone – professional, friendly, etc.
Pitfall Example:
"ECS instance lock" has two meanings (business lock vs. OS account lock). Vague prompts lead to misinterpretation.
Solution: Clearly distinguish cases, add diagnostic steps, prompt for clarification when uncertain.
---
3.2 Refine Context Precision
Core Principle: Give what’s needed, remove what disturbs.
Pitfall Example:
Feeding entire finance API results (balances, frozen funds, credits) confuses the model when determining overdue status.
Solution:
Filter for the essential field only (e.g., available balance < 0).
---
3.3 Clarify Identities & Preserve Execution History
Core Principle: The model must know who is acting/talking, and remember past steps.
Pitfall:
Mixing customer-stage dialogue into `History` without including the model’s own execution logs caused hallucinations.
Solution:
- Keep main flow as user ↔ model.
- Inject CSR dialogues as labeled Dialogue Memory.
- Preserve full Action History (even failed steps).
- Following “Mask, Don’t Remove” and “Keep the Wrong Stuff In” improves grounding.
---
3.4 Express Logic in Structured Form
Core Principle: Complex logic → use structured formats (JSON, YAML, pseudocode), not pure natural language.
Structure reduces ambiguity and boosts compliance (e.g., sequential workflow steps).
---
3.5 Customize Tool Protocols
Core Principle: Domain-specific protocols can outperform generic standards.
Our custom schema outperformed generic Function Call/MCP formats in stability.
Reason: Pre-trained generic patterns can override domain-specific needs.
---
3.6 Use Few-Shot Examples Wisely
- Single-task scenarios: Provide diverse examples, including “no result” cases.
- Flexible tasks: Use few-shot sparingly to avoid overfitting/reduced adaptability.
---
3.7 Keep Context Slim
Core Principle: Shorter contexts improve performance & reduce cost.
Techniques:
- Use RAG for dynamic info selection.
- Remove low-impact instructions.
- Test sensitivity before trimming.
---
3.8 Manage Memory to Prevent Forgetting
Approaches:
- Repeat key facts across turns.
- Context compression (summarize older chat).
- External memory storage with read/write capability.
---
3.9 Use Multi-Agent for Control + Flexibility
Balance structured workflows with LLM autonomy:
- Main Agent: routing, decision-making
- Sub-Agents/Tools: execute fixed complex processes
Effective in balancing predictability with adaptability.
---
3.10 Embed Human-in-the-Loop (HITL)
Core Principle: Understand how humans do the task before digitizing it.
Requires:
- Observing real workflows
- Continuous feedback loops
- Iterative expectation refinement
Agents that lack human operational insight risk poor alignment.
---
04. Conclusion
These lessons from Cloud Assistant Aivis bridge engineering discipline (context optimization, architecture design) with practical execution (tooling, workflows, HITL).
Share your own challenges — collective learning only makes better Agents.
---
References:
[1] Yunqi Conference Agenda ID: 6008
[2] Manus Blog: Context Engineering Lessons
---
Related Resource:
AiToEarn – An open-source global AI content monetization platform integrating generation, cross-platform publishing, analytics, and AI model ranking. Supports networks like Douyin, Kwai, WeChat, YouTube, Instagram, LinkedIn, Pinterest, X (Twitter). Useful for deploying optimized Agent workflows into real-world creative pipelines.
---
Would you like me to also create a visual summary table of all 10 experiences so they’re easier to scan and reference? This would fit neatly under section 03.