AI middleware

From “Cloud Native” to “Intelligence Native”: How Far Has AI Middleware Come?

Honghao Wang

15 Oct 2025 — 4 min read

# QCon 2025 Beijing Highlights  
**Date:** 2025‑10‑15 · **Location:** Beijing  
**Theme:** *“Deeply cultivate existing skills, embrace new knowledge.”*

![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-178.jpg)  
![image](https://blog.aitoearn.ai/content/images/2025/10/img_002-169.jpg)  

---

## Introduction  

In the era of **distributed** and **cloud‑native computing**, middleware shields underlying complexity and provides standardized interfaces, greatly improving development efficiency.  

Today, **AI middleware** plays a similar role. But:  
- How can we transition from *cloud‑native* to *intelligent‑native* smoothly?  
- How can we address key pain points in current AI application development?

Recently, InfoQ’s *Geek Interview* × AICon livestream invited:  
- **Song Shun** – Senior Technical Expert, Ant Group  
- **Zhang Geng** – Head of AI Middleware, Ant Group  
- **Dr. Li Zhiyu** – CTO, Memory Tensor  

Ahead of **QCon Global Software Development Conference 2025 Shanghai**, they explored **the infrastructure battle behind AI middleware**.

### Key Insights  
- **Lower barriers & improve reliability:** AI middleware reduces entry difficulty, improves system reliability, and keeps applications secure & controllable.  
- **In-house development necessity:** Not “reinventing the wheel,” but building a **“super race car”** tailored to your industry, rules, safety, and costs.  
- **Strategic use of open source:** Use open source/cloud for exploratory projects; build or customize in-house for core/large-scale apps.  
- **Reviving traditional assets:** Older tech isn’t obsolete — it must be reborn in new AI frameworks.

---

## QCon 2025 Shanghai Preview  

**Dates:** October 23–25  
**Location:** Shanghai  

Dedicated Track: **[AI Middleware: Accelerating Intelligent Application Development]**  
Focus:  
- Core AI middleware technologies  
- Industry practices & trends  
- Topics: Agent architecture, multimodal collaboration, production deployments  

Event Details: [https://qcon.infoq.cn/2025/shanghai/schedule](https://qcon.infoq.cn/2025/shanghai/schedule)  

---

## Intelligent Infrastructure

### Cloud-native vs Intelligent-native  
**Song Shun:** Cloud-native schedules/manages services — in intelligent-native, will core objects shift to **Agents, models, memories**?  

**Zhang Geng:**  
- Cloud-native apps: “born in the cloud, grow in the cloud” → microservices, containerization, CI/CD, DevOps → maximize cloud benefits.  
- Intelligent-native: maximize AI benefits → rapid LLM invocation, memory services, stable Agents → carrier shifts from microservices to intelligent Agents.  

**Key Differences:**  
- **Scheduling expansion:** Beyond CPU/memory/network into GPUs, TPUs, heterogeneous computing.  
- **New foundational services:** Inference, RAG, memory — delivered as cloud services, spawning new paradigms.  
- **State management:** Agents require persistent, contextual memory.  
- **Uncertainty challenge:** LLM responses are probabilistic—critical for financial-grade predictability.

---

## Memory as the “Hippocampus” of AI

**Song Shun:** Is memory service fundamental or optional?  

**Li Zhiyu:**  
- **Human hippocampus:** Translates short-term into long-term memories. Without it → no lasting knowledge.  
- AI analogy: Without persistent memory → only short-term exchanges (e.g., ChatGPT’s improvement came from conversation memory).  
- **MemOS goal:** Simulate hippocampus → store, recall, learn from past, adapt continuously.  
- Position: Today optional; future essential for AI-native apps.

---

## AI Middleware as Connector & Accelerator

### Why Build In-House Middleware?  
**Zhang Geng:**  
- **Technical:** Unified infrastructure, easy access to core services, integration with thousands of internal RPC queues/interfaces.  
- **Security:** Meet strict data privacy/compliance demands.  
- **Cost:** Efficiency gains (e.g., halve dev time for apps with tens of millions of users), token cost optimization.

---

## Technical Bottlenecks in AI Memory

**Li Zhiyu:** Beyond retrieval accuracy/speed:  
- **System engineering:** Reading, organizing, storing, retrieving, sharing — all require design.  
- **Vector DBs:** Common, good for semantics; not complete solution.  
- **Future directions:**
  1. Hierarchical memory management.  
  2. Structured/event/context-based organization & extraction.  
  3. **Human brain inspiration:** Forgetting as optimization → OS-like memory lifecycle management.

---

## Engineering Practices in AI Middleware

**Song Shun:**  
- **Challenges:** Context length → solved with layered memory.  
- Tool invocation security → sandbox & permission approval.  
- Agent unpredictability → simulation-based testing for observability & control.  

---

## Future Capabilities & Middleware Roles

**Li Zhiyu & Zhang Geng:**  
- If models become cheap/powerful: Basic orchestration replaced; industry/personalization remains.  
- Middleware role: Connect **business** and **models**, ensure safety, compliance, process orchestration.

---

## Building Enterprise‑Grade AI Middleware

### Balancing Open Source vs In-House
- **Non-core:** Cloud/open source → fast iteration.  
- **Core:** In-house/deep custom → keep control over business lifeline.  
- **Protecting moat:**  
  1. Match business.  
  2. Optimize performance/cost.  
  3. Integrate deeply with tech stack.  
  4. Stay open.

---

## Costs & ROI  
- From scratch → huge investment.  
- Focused scope → controllable cost.  
- **Starts as cost center**, quickly becomes capability/value center.  
- ROI:  
  - Build first → 6–12 months.  
  - Co-build with business → 3 months visible return.

---

## GPU Optimization Strategies

**Li Zhiyu:**  
- Fine-grain scheduling to maximize utilization.  
- Unified memory scheduling (parameterized, activation, plaintext).  
- Predict user intent & preload → reduce latency.  
- KV cache optimizations.

---

## Potential Standards in AI Middleware

**Li Zhiyu:**  
- Possible unified standard if framework covers scheduling, interfaces, governance, security.  
- Architecture may mimic human brain: memory, reasoning, perception, action.  

**Zhang Geng:**  
- Likely “sub-domain standards” (e.g., MCP for tool invocation).  
- Multimodality & robotics → real-time orchestration, safety redundancy.

---

## Skills for AI Middleware Engineers  

### Old Skills (Distributed Systems)
- CAP theorem  
- Paxos/Raft  
- Service governance  
- Monitoring/alerting  
- Performance optimization  

### New Skills (LLM/Agents)
- Context engineering  
- RAG optimization  
- Agent orchestration  
- Multimodal processing  

**Balanced learning path:**
1. Theory.  
2. Hands-on projects.  
3. Solve real problems.  
4. Build reusable components.

---

## Timeless Engineering Principles

- **Engineering methodology** outlasts specific tools.  
- Blend old and new skills to stay competitive.  
- Start small (RAG/tool Agents), progress to complex memory/context/multi-Agent systems.

---

## Event Recommendation

**QCon Shanghai** · **Oct 23–25**  
Three days · 100+ engineering cases · Topics include Agentic AI, Embodied Intelligence, RL, edge‑LLM practice, multi‑Agent collab.  

![image](https://blog.aitoearn.ai/content/images/2025/10/img_003-156.jpg)  

---

**Original Article:** [Read Original](2651258789)  
**WeChat Link:** [Open in WeChat](https://wechat2rss.bestblogs.dev/link-proxy/?k=3155a2c6&r=1&u=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMjM5MDE0Mjc4MA%3D%3D%26mid%3D2651258789%26idx%3D2%26sn%3D99c9581e0a36f918de867d9840f95211)

Cost Anomaly Detection GA Release

IntelliJ Platform 2025.3: What Plugin Developers Need to Know | JetBrains Platform Blog

Today’s Open Source (2025-11-3): Kuaishou and Nanjing University Lab Co-Develop HiPO for Hybrid Strategy Optimization in LLM Dynamic Inference, Dual-Mode Switching Balances Accuracy and Efficiency

Using Conch 2.3 and LTX2 for Million-Scale Product Transitions: My Workflow on Lovart

Read more

Cost Anomaly Detection GA Release

IntelliJ Platform 2025.3: What Plugin Developers Need to Know | JetBrains Platform Blog

Today’s Open Source (2025-11-3): Kuaishou and Nanjing University Lab Co-Develop HiPO for Hybrid Strategy Optimization in LLM Dynamic Inference, Dual-Mode Switching Balances Accuracy and Efficiency

Using Conch 2.3 and LTX2 for Million-Scale Product Transitions: My Workflow on Lovart