Lots of Insights! Transcript of Silicon Valley’s Internal Discussion on AI Agents – Oct 2, 2025

Lots of Insights! Transcript of Silicon Valley’s Internal Discussion on AI Agents – Oct 2, 2025

Datawhale Insights: Why 95% of AI Agents Fail in Production

At a recent Silicon Valley industry event, Oana Olteanu—a renowned venture capitalist—joined engineers and ML leads from Uber, WisdomAI, EvenUp, and Datastrato to discuss how AI Agents can succeed in production environments.

Key statistic: 95% of AI Agents fail in production — not because the models lack intelligence, but because supporting systems (context engineering, security, memory architecture) are immature or missing.

image

Original link: https://www.motivenotes.ai/p/what-makes-5-of-ai-agents-actually

> Most founders think they're building an AI product, but in reality, they’re building a context selection system.

---

Under the Hood of an AI Agent

Panelists explored the foundational layers necessary for production-grade AI:

  • Advanced Context Selection
  • Semantic Layers
  • Memory Orchestration
  • Governance Mechanisms
  • Multi-Model Routing Strategies

This synthesis is based on seminar content compiled by Datawhale from Oana’s seminar insights.

---

Context Engineering ≠ Prompt Tricks

Why RAG Is Often Enough—But Rarely Well-Built

Most agreed: model fine-tuning is infrequently required. If retrieval-augmented generation (RAG) is robust, performance can be excellent. Unfortunately, most current RAG systems are too simplistic.

Top failure patterns:

  • Over-indexing: Too much irrelevant data confuses the model
  • Under-indexing: Insufficient signals lead to poor answers
  • Mixing structured + unstructured data: Breaking embeddings or oversimplifying architecture

---

What Advanced Context Engineering Looks Like

image

a) Feature Selection for LLMs

Think of context engineering as feature engineering for LLMs:

  • Selective context pruning = feature selection
  • Context validation = schema/type/timeliness checks
  • Context observability = track inputs that improve/ degrade output quality
  • Embedding augmentation with metadata

Implication: Context becomes a versionable, auditable, testable artifact—not just text.

---

b) Dual-Layer: Semantic + Metadata

  • Semantic layer: Classical vector search
  • Metadata layer: Filters by type, timestamp, permissions, domain ontology

Benefits:

Keeps retrieval relevant and structured, not just similar.

---

c) Real Challenges in Text-to-SQL

When asked, “Has anyone deployed text-to-SQL in production?”none raised their hand.

Reason: Understanding natural language queries in business contexts is harder than the technical mapping.

---

Succeeding with Structured Query Systems

Winning approach:

  • Business glossary & terminology mapping
  • Query templates with constraints
  • Semantic validation before execution
  • Feedback loops to improve understanding

---

Governance & Trust — Not Just Enterprise Concerns

Critical requirements:

  • Track input-output lineage
  • Enforce role-based access control
  • Customize output per user permissions

> “Two employees asking the same question should get different answers if permissions differ.”

Solution trend: Unified metadata catalogs embedding access policies directly into indexing/search.

---

Trust Is a Human Problem

Story: One guest’s wife bans Tesla autopilot—not due to function, but lack of trust.

Same issue in enterprise AI: Reliability, explainability, and auditability > raw capability.

---

Common Trait in the Top 5%

Design for human-in-the-loop:

  • Humans review + override AI decisions
  • Continuous improvement via feedback loops

---

Memory: An Architectural Choice

Memory Tiers

  • User-level: Personal prefs, style, tone
  • Team-level: Shared queries, dashboards
  • Org-level: Policies, institutional knowledge
image

Best practice: Abstract memory into independent layers with version control.

---

Personalization vs Privacy

Memory offers:

  • Behavior customization
  • Event-driven proactive assistance

Cold-start example: Uber’s BI chat tool used past queries to recommend new ones and spark conversation.

Privacy risk: Over-personalization can feel intrusive — e.g., AI knowing children's names unprompted.

---

Designing Memory Responsibly

  • Balance UX benefits with privacy concerns
  • Avoid scope creep to protect access control
  • Goal: A secure, user-controlled portable memory layer

---

Multi-Model Reasoning & Orchestration

Model routing criteria:

  • Task complexity
  • Latency
  • Cost
  • Compliance requirements
  • Query type

Example pipeline:

  • Simple queries → local models
  • Structured queries → DSL/SQL translators
  • Complex analysis → GPT-4 / Anthropic / Gemini
  • Verification → dual-model redundancy

Benefit: Optimizes both performance & cost without brittleness.

image

---

Chat Interface ≠ Always Optimal

When chat shines: Lowers learning curve for complex tasks like BI.

When GUI shines: Graphic adjustments post-results.

Hybrid UX approach:

  • Chat for entry
  • GUI for refinement
  • Mode choice per task

---

What’s Still Missing

1. Context Observability

Systematic tools to see which contexts improve or harm model responses.

2. Composable Memory

Secure, portable, user-organized memory not tied to any provider.

3. Domain-Aware Languages

High-level, constraint-safe DSLs instead of brittle text-to-SQL.

4. Latency-Aware UX

Match response speed to task requirements.

image

---

The Future Moat in Generative AI

Not model access, but:

  • Context quality
  • Memory design
  • Stable orchestration
  • Trustworthy UX

---

5 Key Questions for Founders

  • Context capacity: Optimal window + content strategy
  • Memory boundary: Scope, storage, user inspection
  • Output traceability: Clear input-output linkage
  • Model strategy: Single vs multi, routing logic
  • Trust factors: Security + feedback design

---

Final Insight

Early consideration of context, memory, orchestration, trust will define products’ long-term viability.

Platforms like AiToEarn官网 are exploring integrated ecosystems—AI-driven content generation, cross-platform publishing, analytics, orchestration, and model ranking (AI模型排名)—helping creators and enterprises operationalize AI with trust, portability, and multi-channel scale.

Read more

Drink Some VC | a16z on the “Data Moat”: The Breakthrough Lies in High-Quality Data That Remains Fragmented, Sensitive, or Hard to Access, with Data Sovereignty and Trust Becoming More Crucial

Drink Some VC | a16z on the “Data Moat”: The Breakthrough Lies in High-Quality Data That Remains Fragmented, Sensitive, or Hard to Access, with Data Sovereignty and Trust Becoming More Crucial

Z Potentials — 2025-11-03 11:58 Beijing > “High-quality data often resides for long periods in fragmented, highly sensitive, or hard-to-access domains. In these areas, data sovereignty and trust often outweigh sheer model compute power or general capabilities.” Image source: unsplash --- 📌 Z Highlights * When infrastructure providers also become competitors, startups

By Honghao Wang