Data-Driven Intelligent Diagnostic Systems: Multi-Agent Technology Implementation in Production Environments | AICon Beijing

Data-Driven Intelligent Diagnostic Systems: Multi-Agent Technology Implementation in Production Environments | AICon Beijing

AICon Beijing 2025 — Intelligent Multi-Agent Diagnostics

Date: 2025-11-24 13:41 (Beijing)

image

Explore the core of Alibaba Cloud’s Intelligent Diagnostics system and decode the practical path for multi-agent collaborative troubleshooting.

image

---

Event Overview

Dates: December 19–20, 2025

Location: Beijing

Theme: Exploring the Boundaries of AI Applications

Key Topics

  • Enterprise-level agent deployment
  • Context engineering
  • AI product innovation
  • Large-model applications enhancing R&D and business operations

The conference will feature experts from Alibaba, ByteDance, Huawei, JD.com, Kuaishou, Meituan, and top startups, sharing first-hand large model implementation practices. The goal: uncover more possibilities for enterprise AI and drive new business growth paths.

---

Zhao Qingjie — Alibaba Cloud Serverless Infrastructure Lead & AgentRun Product R&D Lead

Presentation: Data-Driven Intelligent Diagnostics System: Technical Implementation of Multi-Agent Systems in Production Environments

Session Focus

Traditional diagnostics struggle in highly dynamic, high-dimensional production environments. Zhao Qingjie will present Alibaba Cloud’s multi-agent system, combining:

  • Full-stack observability data (metrics, logs, traces, eBPF events)
  • LLM reasoning capabilities
  • Collaborative multi-agent architecture

This system enables end-to-end fault detection, root cause analysis, and repair suggestion generation in production — reducing:

  • MTTR by over 40%
  • False alerts by 65%
  • Manual interventions by 60%

---

Core Design Principles

  • Role-Based Agent Division: Sensor Agent, Reasoning Agent, Verification Agent, Execution Agent
  • Data–Model–Action Closed Loop: Converting real-time data into semantic context and automated actions
  • Seamless SRE Integration: Safely replacing manual inspections and alert responses

---

Speaker Background

Zhao Qingjie specializes in:

  • Serverless architecture
  • AI agents
  • PaaS development
  • Large-scale distributed systems

Previously: Core PaaS Platform Lead at Baidu, maintaining 80% of Baidu’s online services under high concurrency and high availability conditions.

---

Presentation Outline

Part 1 — Introduction

  • New challenges in intelligent operations
  • Complexity trends in cloud-native
  • From single-model assistance → multi-agent collaboration

Part 2 — System Architecture

  • Observability data layer
  • Agent coordination engine
  • Action execution layer

---

Intelligent Agent Roles

  • Perception Agent: Data collection & anomaly detection
  • Inference Agent: Root cause analysis
  • Validation Agent: Hypothesis testing
  • Execution Agent: Automated repair & recommendations

---

Data Closed Loop

Unifying metrics, logs, traces, eBPF events into a consistent agent context model.

---

Technical Implementation Highlights

  • Multi-Agent Communication: Dynamic task decomposition & consensus protocols
  • LLM + Domain Knowledge: Prompt design, tool invocation, hallucination suppression
  • Security & Reliability: Permission controls, audit logs, manual circuit breakers

---

Production Best Practices

Scenarios:

  • Serverless cold start anomalies
  • DB slow query storms
  • Container resource contention

Results:

  • MTTR ↓ 40%+
  • Invalid alerts ↓ 65%
  • Human intervention ↓ 60%

Lessons Learned:

  • From POC → large-scale deployment
  • Cold start optimization
  • Cost vs latency trade-offs

---

Summary & Outlook

Audience Takeaways:

  • Framework for enterprise-grade intelligent agent reliability
  • Patterns for multi-agent observability design
  • Integration with open-source diagnostic standards like OpenAgentTracing

---

Conference Extras

Additional sessions:

  • Software dev in the LLM era
  • Context Engineering
  • Large-model system engineering
  • Enterprise Agent design
  • SAR systems in large-model contexts
  • Multimodal AI innovations

Over 50 experts will share live insights at AICon Beijing.

Special Offer: 10% off tickets (save 580 RMB) — contact: 13269078023

image

Read Original

Open in WeChat

---

As multi-agent systems expand in enterprise automation, AI content generation, publishing, and monetization tools are vital.

AiToEarn官网 — An open-source global AI content monetization platform supporting:

  • AI content generation
  • Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X/Twitter)
  • Analytics & model ranking

Developers can explore the open-source repo for integrated AI tooling — empowering both creative and technical communities.

---

Would you like me to also create a clean infographic-style summary for the talk so it can be used as a one-page marketing PDF for AICon Beijing? That would make this even more conference-ready.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.