
# **Xinzhi Yuan Report**
## **[Xinzhi Yuan Guide]**
**How Can AI Without Long-Term Memory Complete Complex Tasks Lasting for Hours?**
Anthropic has designed a **more efficient framework** for running long-term agents, enabling AI to progress **incrementally** through multi-hour tasks — much like human engineers.
---
## The Long-Term Memory Challenge
Imagine hiring a **24-hour shift engineering team** to build a complex application.
But there’s one odd rule: each engineer **completely forgets** what the previous one did.
> No matter how skilled they are, the project would likely never get done.
This is exactly the **real-world dilemma** for “long-running agents”:
> **Once the context window closes, AI loses memory.**
- Models only rely on **current visible text**.
- When the context window fills or closes, it is like **wiping a whiteboard**.
This *memory defect* prevents agents from handling long projects.
Multi-hour tasks spanning **multiple chat sessions** are especially challenging.
---
## Anthropic’s Inspiration from Human Engineers
Recently, **Anthropic** developed a **practical framework** for long-running agents by **studying the workflow of human engineers**.

🔗 [Read more on Anthropic’s blog](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
---
## Dual-Agent Architecture
### **Mimicking Skilled Developers’ Daily Routines**
Claude Agent SDK is a **powerful and versatile agent framework** — capable of coding, searching, handling tools, planning, and executing tasks.
It supports **context compaction**, enabling agents to carry work forward without exhausting the context window.
But **compaction alone isn’t enough**.
**Common Failure Patterns:**
1. **Trying to do too much at once**
- Attempts to write the full app in one go, often hitting context limits mid-task and leaving gaps.
2. **Prematurely deciding “project complete”**
- Later agents may misinterpret partially complete states as finished work.
---
### Anthropic’s Two-Step Solution
1. **Initial Setup**
- Create a **full functional foundation** for step-by-step progress.
2. **Incremental Advancement**
- Work in **small, clean steps**:
- Bug-free
- Well-documented
- Ready for main branch merge
**Two Agents in the Framework:**
- **Initializer Agent**
- Generates `init.sh`,
`claude-progress.txt` work log,
and initial Git commit.
- **Coding Agent**
- *(Details in later sections)*
---
### Broader Applications
This type of framework can be applied outside coding — such as AI content creation.
Platforms like [AiToEarn官网](https://aitoearn.ai/) help coordinate **multi-step AI workflows** across platforms, preserving context for content generation, publishing, and monetization.
---
## Environment Management: The "Three Essential Tools"
To help **handoff AI agents** get up to speed quickly, Anthropic uses three core environment tools:
### 1. **Feature List**
- Initial agent expands user prompt into a **complete requirements document**.
- Example: Claude.ai clone had **200+ features**, all marked *failing* initially.
- Agents **only** update the `passes` field; **tests must stay intact**.
- **JSON format** used to avoid accidental deletions.


---
### 2. **Incremental Progress**
- Coding agents make **small functional changes** only.
- Keep environment “clean” after each commit.
- Use Git commits with **descriptive messages** and progress file updates.
- Enables easy rollback and prevents guesswork.

---
### 3. **Testing**
- Prevent agents from prematurely marking features complete.
- Require **full user journey testing** via browser automation (e.g., Puppeteer MCP).
- Catch issues that **code inspection alone** misses.
- Limitation: Puppeteer MCP can’t detect browser-native `alert` popups.


---
## Quick Start Workflow
Every coding agent session begins by:
1. **Checking the working directory** (`pwd`) — may edit only these files.
2. **Reviewing Git log and progress file** for recent changes.
3. **Selecting highest-priority incomplete feature** from the feature list.
4. **Running `init.sh`** to start dev server and run a basic end-to-end test before adding new features.
**In the Claude.ai clone example:**
- Start dev server.
- Use Puppeteer MCP for:
- Opening a new conversation,
- Sending a message,
- Receiving a reply.
- Fix abnormal states **before** adding new features.
---
## Benefits & Remaining Questions
The dual-agent design improves **full-stack app stability**, but open questions remain:
- Should a **single general-purpose agent** suffice?
- Or should we use **multiple specialized agents** — e.g.:
- Testing Agent
- QA Agent
- Code Cleanup Agent
---
## References
- 🔗 [Anthropic: Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
---
## Extended Application for Content Creators
For creators managing **multiple AI agents** or automating **content generation workflows**:
- [AiToEarn官网](https://aitoearn.ai) offers:
- AI tools integration
- Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Facebook, LinkedIn, YouTube, and more)
- Analytics & AI model rankings ([AI模型排名](https://rank.aitoearn.ai))
This allows **stepwise progression** + **clear state tracking** + **multi-platform reach** → maximum AI productivity.
---
