# QCon 2025 Beijing Highlights  
**Date:** 2025‑10‑15 · **Location:** Beijing  
**Theme:** *“Deeply cultivate existing skills, embrace new knowledge.”*
  
  
---
## Introduction  
In the era of **distributed** and **cloud‑native computing**, middleware shields underlying complexity and provides standardized interfaces, greatly improving development efficiency.  
Today, **AI middleware** plays a similar role. But:  
- How can we transition from *cloud‑native* to *intelligent‑native* smoothly?  
- How can we address key pain points in current AI application development?
Recently, InfoQ’s *Geek Interview* × AICon livestream invited:  
- **Song Shun** – Senior Technical Expert, Ant Group  
- **Zhang Geng** – Head of AI Middleware, Ant Group  
- **Dr. Li Zhiyu** – CTO, Memory Tensor  
Ahead of **QCon Global Software Development Conference 2025 Shanghai**, they explored **the infrastructure battle behind AI middleware**.
### Key Insights  
- **Lower barriers & improve reliability:** AI middleware reduces entry difficulty, improves system reliability, and keeps applications secure & controllable.  
- **In-house development necessity:** Not “reinventing the wheel,” but building a **“super race car”** tailored to your industry, rules, safety, and costs.  
- **Strategic use of open source:** Use open source/cloud for exploratory projects; build or customize in-house for core/large-scale apps.  
- **Reviving traditional assets:** Older tech isn’t obsolete — it must be reborn in new AI frameworks.
---
## QCon 2025 Shanghai Preview  
**Dates:** October 23–25  
**Location:** Shanghai  
Dedicated Track: **[AI Middleware: Accelerating Intelligent Application Development]**  
Focus:  
- Core AI middleware technologies  
- Industry practices & trends  
- Topics: Agent architecture, multimodal collaboration, production deployments  
Event Details: [https://qcon.infoq.cn/2025/shanghai/schedule](https://qcon.infoq.cn/2025/shanghai/schedule)  
---
## Intelligent Infrastructure
### Cloud-native vs Intelligent-native  
**Song Shun:** Cloud-native schedules/manages services — in intelligent-native, will core objects shift to **Agents, models, memories**?  
**Zhang Geng:**  
- Cloud-native apps: “born in the cloud, grow in the cloud” → microservices, containerization, CI/CD, DevOps → maximize cloud benefits.  
- Intelligent-native: maximize AI benefits → rapid LLM invocation, memory services, stable Agents → carrier shifts from microservices to intelligent Agents.  
**Key Differences:**  
- **Scheduling expansion:** Beyond CPU/memory/network into GPUs, TPUs, heterogeneous computing.  
- **New foundational services:** Inference, RAG, memory — delivered as cloud services, spawning new paradigms.  
- **State management:** Agents require persistent, contextual memory.  
- **Uncertainty challenge:** LLM responses are probabilistic—critical for financial-grade predictability.
---
## Memory as the “Hippocampus” of AI
**Song Shun:** Is memory service fundamental or optional?  
**Li Zhiyu:**  
- **Human hippocampus:** Translates short-term into long-term memories. Without it → no lasting knowledge.  
- AI analogy: Without persistent memory → only short-term exchanges (e.g., ChatGPT’s improvement came from conversation memory).  
- **MemOS goal:** Simulate hippocampus → store, recall, learn from past, adapt continuously.  
- Position: Today optional; future essential for AI-native apps.
---
## AI Middleware as Connector & Accelerator
### Why Build In-House Middleware?  
**Zhang Geng:**  
- **Technical:** Unified infrastructure, easy access to core services, integration with thousands of internal RPC queues/interfaces.  
- **Security:** Meet strict data privacy/compliance demands.  
- **Cost:** Efficiency gains (e.g., halve dev time for apps with tens of millions of users), token cost optimization.
---
## Technical Bottlenecks in AI Memory
**Li Zhiyu:** Beyond retrieval accuracy/speed:  
- **System engineering:** Reading, organizing, storing, retrieving, sharing — all require design.  
- **Vector DBs:** Common, good for semantics; not complete solution.  
- **Future directions:**
  1. Hierarchical memory management.  
  2. Structured/event/context-based organization & extraction.  
  3. **Human brain inspiration:** Forgetting as optimization → OS-like memory lifecycle management.
---
## Engineering Practices in AI Middleware
**Song Shun:**  
- **Challenges:** Context length → solved with layered memory.  
- Tool invocation security → sandbox & permission approval.  
- Agent unpredictability → simulation-based testing for observability & control.  
---
## Future Capabilities & Middleware Roles
**Li Zhiyu & Zhang Geng:**  
- If models become cheap/powerful: Basic orchestration replaced; industry/personalization remains.  
- Middleware role: Connect **business** and **models**, ensure safety, compliance, process orchestration.
---
## Building Enterprise‑Grade AI Middleware
### Balancing Open Source vs In-House
- **Non-core:** Cloud/open source → fast iteration.  
- **Core:** In-house/deep custom → keep control over business lifeline.  
- **Protecting moat:**  
  1. Match business.  
  2. Optimize performance/cost.  
  3. Integrate deeply with tech stack.  
  4. Stay open.
---
## Costs & ROI  
- From scratch → huge investment.  
- Focused scope → controllable cost.  
- **Starts as cost center**, quickly becomes capability/value center.  
- ROI:  
  - Build first → 6–12 months.  
  - Co-build with business → 3 months visible return.
---
## GPU Optimization Strategies
**Li Zhiyu:**  
- Fine-grain scheduling to maximize utilization.  
- Unified memory scheduling (parameterized, activation, plaintext).  
- Predict user intent & preload → reduce latency.  
- KV cache optimizations.
---
## Potential Standards in AI Middleware
**Li Zhiyu:**  
- Possible unified standard if framework covers scheduling, interfaces, governance, security.  
- Architecture may mimic human brain: memory, reasoning, perception, action.  
**Zhang Geng:**  
- Likely “sub-domain standards” (e.g., MCP for tool invocation).  
- Multimodality & robotics → real-time orchestration, safety redundancy.
---
## Skills for AI Middleware Engineers  
### Old Skills (Distributed Systems)
- CAP theorem  
- Paxos/Raft  
- Service governance  
- Monitoring/alerting  
- Performance optimization  
### New Skills (LLM/Agents)
- Context engineering  
- RAG optimization  
- Agent orchestration  
- Multimodal processing  
**Balanced learning path:**
1. Theory.  
2. Hands-on projects.  
3. Solve real problems.  
4. Build reusable components.
---
## Timeless Engineering Principles
- **Engineering methodology** outlasts specific tools.  
- Blend old and new skills to stay competitive.  
- Start small (RAG/tool Agents), progress to complex memory/context/multi-Agent systems.
---
## Event Recommendation
**QCon Shanghai** · **Oct 23–25**  
Three days · 100+ engineering cases · Topics include Agentic AI, Embodied Intelligence, RL, edge‑LLM practice, multi‑Agent collab.  
  
---
**Original Article:** [Read Original](2651258789)  
**WeChat Link:** [Open in WeChat](https://wechat2rss.bestblogs.dev/link-proxy/?k=3155a2c6&r=1&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMjM5MDE0Mjc4MA%3D%3D%26mid%3D2651258789%26idx%3D2%26sn%3D99c9581e0a36f918de867d9840f95211)