Well‑known open‑source expert reveals: Agent design remains difficult and messy — only manual cache management achieves optimal performance; netizens debate rapid obsolescence of current techniques
How to Design and Build a Reliable, Efficient AI Agent
At first glance, designing an AI Agent might seem as simple as plugging in a few SDKs and writing a few lines of code. In reality, building a robust Agent is far more complex.
Recently, Armin Ronacher (creator of Flask, early engineer at Sentry) shared his hands-on experience — including common pitfalls — in building AI Agents. His post was later amplified by Simon Willson (co‑founder of Django, author of Datasette), who stressed the value of Armin’s insights for Agent developers.


Armin described Agent development as messy:
- SDK abstractions often break under real-world tool calls.
- Manual caching beats auto-caching — but model behavior can vary drastically.
- Reinforcement requires heavy lifting; failure isolation is crucial.
- A shared state (like a file system) is critical infrastructure.
- Output tool design is surprisingly complex.
- Model choice still greatly depends on the specific task.
---
Choosing the Right Agent SDK
Two approaches:
- Low-level SDKs (OpenAI SDK, Anthropic SDK) — full control.
- Higher-level abstractions (Vercel AI SDK, Pydantic) — provider abstraction.
We initially used Vercel AI SDK for its abstraction but implemented the Agent loop ourselves. On reflection, we'd make a different choice today.
Key Lessons Learned
1. Model differences demand custom abstractions
Agent loops become tricky once tools are bound to models — impacting caching, reinforcement, prompt management, and more.
➡ Native SDKs give fine-grained control; high-level SDKs risk misaligned abstractions.
2. Provider-side tool integration is difficult
Example: Anthropic’s web search tool sometimes breaks message history under Vercel SDK. Using Anthropic’s native SDK avoids these issues and simplifies caching and error handling.
---
Explicit Caching for Predictable Performance
Models like Anthropic charge for caching and require explicit cache points, which:
- Give you cost predictability
- Allow controlled cache hit rates
Manual caching benefits:
- Fork runs from specific dialogue points.
- Edit context without impacting unrelated cache segments.
Our caching best practice:
- 1 cache point right after the system prompt.
- 2 cache points at conversation start; last moves upward with conversation end.
- Dynamic info (e.g., current time) placed separately to avoid breaking cache.
---
Reinforcement in the Agent Loop
Reinforcement signals maintain loop momentum:
- Remind Agent of goals.
- Update task progress.
- Inject recovery hints on tool failure.
- Synchronize state changes for parallel processes.
Example:
Claude Code’s todo write tool echoes the Agent’s task list — a simple reinforcement that improves continuity.
---
Isolating Failures — Keep Loops Stable
Two strategies:
- Offload frequent failures to a sub-Agent, feed success/failure summaries back to main loop.
- Edit context to remove confusing failure outputs (at the cost of cache invalidation).
---
Sub-Agent and Shared Data Systems
We use a Virtual File System (VFS) to share data between agents and tools — avoiding “dead ends” where tools can’t interoperate.
All tools read/write to the same file system so `ExecuteCode` and `RunInference` share paths.
---
Output Tool Challenges
Our Agent isn't a chatbot — final user-facing output goes through a dedicated output tool (e.g., sending email).
Challenges:
- Controlling output phrasing is harder than general text generation.
- Using small LLMs for tone refinement failed — worse quality, higher latency, context leakage.
- Agent sometimes skips output — fixed by logging calls and reinforcing when missing.
---
Model Selection Insights
Best tool-calling models:
- Haiku / Sonnet — effective loops, transparent RL.
- Gemini 2.5 — stronger for summarization, parsing, image extraction.
Note:
> Token price alone doesn’t determine cost — smarter tool callers need fewer tokens.
---
Testing and Evaluation — The Hardest Part
Agents depend on dynamic inputs, tool calls, and state changes — making observability and instrumentation essential. Current evaluation methods remain unsatisfactory.
---
Trends in Coding Agents
- Claude Code and Amp remain benchmarks.
- Amp’s elegant sub-Agent cooperation inspires new design approaches.
- Key takeaway: build tools you also use — it shapes product quality.
---
Recommended Reading
- Minimalist Agents (No MCP)
- Article
- Fate of Small Open-Source Projects
- Article
- Tmux Skills for Agents
- GitHub
- LLM APIs and Synchronization
- Article
---
Community Insight: Rapid Obsolescence in Agent Techniques
Many “Agent tricks” are workarounds for current LLM limitations — quickly outdated as technology advances.
Example: manual caching may become obsolete once models integrate better memory and context features.
Some warn against over-engineering, citing wasted effort building tools later replaced by platform improvements.
Others argue building your own Agent deepens understanding, even if frameworks exist.
---
Future-Proofing Agent Development
Technological shifts (larger context windows, multimodal LLMs, less reliance on vector search) mean today’s complexity may disappear tomorrow.
Practical advice:
- Build core-value features now.
- Skip non-essential components that platforms may soon integrate.
---
Your Turn — What’s Your Agent Philosophy?
Do you envision fully autonomous Agents — or supervised, assistive ones?
Share your funniest or most frustrating Agent experiences in the comments.
---
Related Ecosystem: AiToEarn
AiToEarn官网 — an open-source AI content monetization platform — integrates:
- AI content generation
- Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X)
- Analytics & model ranking
For Agent developers handling AI outputs across multiple media, AiToEarn offers a ready pipeline for distribution and monetization.
---
References: