AI news

Well‑known open‑source expert reveals: Agent design remains difficult and messy — only manual cache management achieves optimal performance; netizens debate rapid obsolescence of current techniques

Honghao Wang

23 Nov 2025 — 4 min read

How to Design and Build a Reliable, Efficient AI Agent

At first glance, designing an AI Agent might seem as simple as plugging in a few SDKs and writing a few lines of code. In reality, building a robust Agent is far more complex.

Recently, Armin Ronacher (creator of Flask, early engineer at Sentry) shared his hands-on experience — including common pitfalls — in building AI Agents. His post was later amplified by Simon Willson (co‑founder of Django, author of Datasette), who stressed the value of Armin’s insights for Agent developers.

Armin described Agent development as messy:

SDK abstractions often break under real-world tool calls.
Manual caching beats auto-caching — but model behavior can vary drastically.
Reinforcement requires heavy lifting; failure isolation is crucial.
A shared state (like a file system) is critical infrastructure.
Output tool design is surprisingly complex.
Model choice still greatly depends on the specific task.

---

Choosing the Right Agent SDK

Two approaches:

Low-level SDKs (OpenAI SDK, Anthropic SDK) — full control.
Higher-level abstractions (Vercel AI SDK, Pydantic) — provider abstraction.

We initially used Vercel AI SDK for its abstraction but implemented the Agent loop ourselves. On reflection, we'd make a different choice today.

Key Lessons Learned

1. Model differences demand custom abstractions

Agent loops become tricky once tools are bound to models — impacting caching, reinforcement, prompt management, and more.

➡ Native SDKs give fine-grained control; high-level SDKs risk misaligned abstractions.

2. Provider-side tool integration is difficult

Example: Anthropic’s web search tool sometimes breaks message history under Vercel SDK. Using Anthropic’s native SDK avoids these issues and simplifies caching and error handling.

---

Explicit Caching for Predictable Performance

Models like Anthropic charge for caching and require explicit cache points, which:

Give you cost predictability
Allow controlled cache hit rates

Manual caching benefits:

Fork runs from specific dialogue points.
Edit context without impacting unrelated cache segments.

Our caching best practice:

1 cache point right after the system prompt.
2 cache points at conversation start; last moves upward with conversation end.
Dynamic info (e.g., current time) placed separately to avoid breaking cache.

---

Reinforcement in the Agent Loop

Reinforcement signals maintain loop momentum:

Remind Agent of goals.
Update task progress.
Inject recovery hints on tool failure.
Synchronize state changes for parallel processes.

Example:

Claude Code’s todo write tool echoes the Agent’s task list — a simple reinforcement that improves continuity.

---

Isolating Failures — Keep Loops Stable

Two strategies:

Offload frequent failures to a sub-Agent, feed success/failure summaries back to main loop.
Edit context to remove confusing failure outputs (at the cost of cache invalidation).

---

Sub-Agent and Shared Data Systems

We use a Virtual File System (VFS) to share data between agents and tools — avoiding “dead ends” where tools can’t interoperate.

All tools read/write to the same file system so `ExecuteCode` and `RunInference` share paths.

---

Output Tool Challenges

Our Agent isn't a chatbot — final user-facing output goes through a dedicated output tool (e.g., sending email).

Challenges:

Controlling output phrasing is harder than general text generation.
Using small LLMs for tone refinement failed — worse quality, higher latency, context leakage.
Agent sometimes skips output — fixed by logging calls and reinforcing when missing.

---

Model Selection Insights

Best tool-calling models:

Haiku / Sonnet — effective loops, transparent RL.
Gemini 2.5 — stronger for summarization, parsing, image extraction.

Note:

> Token price alone doesn’t determine cost — smarter tool callers need fewer tokens.

---

Testing and Evaluation — The Hardest Part

Agents depend on dynamic inputs, tool calls, and state changes — making observability and instrumentation essential. Current evaluation methods remain unsatisfactory.

---

Trends in Coding Agents

Claude Code and Amp remain benchmarks.
Amp’s elegant sub-Agent cooperation inspires new design approaches.
Key takeaway: build tools you also use — it shapes product quality.

---

Community Insight: Rapid Obsolescence in Agent Techniques

Many “Agent tricks” are workarounds for current LLM limitations — quickly outdated as technology advances.

Example: manual caching may become obsolete once models integrate better memory and context features.

Some warn against over-engineering, citing wasted effort building tools later replaced by platform improvements.

Others argue building your own Agent deepens understanding, even if frameworks exist.

---

Future-Proofing Agent Development

Technological shifts (larger context windows, multimodal LLMs, less reliance on vector search) mean today’s complexity may disappear tomorrow.

Practical advice:

Build core-value features now.
Skip non-essential components that platforms may soon integrate.

---

Your Turn — What’s Your Agent Philosophy?

Do you envision fully autonomous Agents — or supervised, assistive ones?

Share your funniest or most frustrating Agent experiences in the comments.

---

AiToEarn官网 — an open-source AI content monetization platform — integrates:

AI content generation
Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X)
Analytics & model ranking

For Agent developers handling AI outputs across multiple media, AiToEarn offers a ready pipeline for distribution and monetization.

---

References:

Well‑known open‑source expert reveals: Agent design remains difficult and messy — only manual cache management achieves optimal performance; netizens debate rapid obsolescence of current techniques

Honghao Wang

How to Design and Build a Reliable, Efficient AI Agent

Choosing the Right Agent SDK

Key Lessons Learned

Explicit Caching for Predictable Performance

Reinforcement in the Agent Loop

Isolating Failures — Keep Loops Stable

Sub-Agent and Shared Data Systems

Output Tool Challenges

Model Selection Insights

Testing and Evaluation — The Hardest Part

Trends in Coding Agents

Recommended Reading

Community Insight: Rapid Obsolescence in Agent Techniques

Future-Proofing Agent Development

Your Turn — What’s Your Agent Philosophy?

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China

How to Design and Build a Reliable, Efficient AI Agent

Choosing the Right Agent SDK

Key Lessons Learned

Explicit Caching for Predictable Performance

Reinforcement in the Agent Loop

Isolating Failures — Keep Loops Stable

Sub-Agent and Shared Data Systems

Output Tool Challenges

Model Selection Insights

Testing and Evaluation — The Hardest Part

Trends in Coding Agents

Recommended Reading

Community Insight: Rapid Obsolescence in Agent Techniques

Future-Proofing Agent Development

Your Turn — What’s Your Agent Philosophy?

Related Ecosystem: AiToEarn

Read more

Xiaoyuan Learning Tablet Wins 2025 IDEA International Design Award, Setting a New Benchmark for Study Devices

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Cloud Computing Giant Unveils 25 New Products in 10 Minutes — Kimi and MiniMax Debut

TopGear Picks 18 Cars of the Year, Only One from China