Drink Some VC | a16z on the “Data Moat”: The Breakthrough Lies in High-Quality Data That Remains Fragmented, Sensitive, or Hard to Access, with Data Sovereignty and Trust Becoming More Crucial
Z Potentials — 2025-11-03 11:58 Beijing
> “High-quality data often resides for long periods in fragmented, highly sensitive, or hard-to-access domains. In these areas, data sovereignty and trust often outweigh sheer model compute power or general capabilities.”


Image source: unsplash
---
📌 Z Highlights
- When infrastructure providers also become competitors, startups must plant seeds within a walled garden of data to remain defensible.
- The true moat lies in proprietary data systems painstakingly built over years—far harder to replicate than any model architecture.
- Scale and compute advantage will converge, making unique, high-trust datasets the lasting competitive edge in AI.
---
When Infrastructure Climbs “Up the Stack”
Initially, generative AI companies like OpenAI and Anthropic functioned as infrastructure providers, inviting developers to build applications atop their APIs.
Over time, they climbed further “up the stack”:
- OpenAI’s Sora2 → no longer just an API for text-to-video, but a fully consumer-facing video generation app, directly competing with startups.
- Anthropic’s Claude Teams → not only an enterprise API, but a ready-to-use productivity suite.
Think of these companies as farms:
They used to sell ingredients (models) to restaurants (startups). Now, they also run their own restaurants.
The choice for startups: cook better with the same ingredients, or acquire ingredients that no one else can.
Strategic Question:
> When infrastructure providers are also your toughest competitors, how can a startup build defensibility?
Answer: Build a Walled Garden of Data — a proprietary dataset that rivals cannot easily access because it is:
- Proprietary — not freely available online
- Regulated or sensitive — requiring licensing or compliance
- Curated & dynamic — continuously updated and verified
---
Case Studies
1. VLex — Legal Domain Moat
- Founded in Spain (2000), VLex aggregated legal rulings, statutes, and administrative orders from fragmented regional sources.
- Over years, built a comprehensive, machine-readable legal corpus, akin to LexisNexis + Westlaw + Bloomberg Law.
- Generative AI advantage: its models reason over authoritative, complete, real-time legal texts.
- Moat: proprietary legal database, impossible to replicate quickly.
---
2. OpenEvidence — Medical Domain Moat
- Built extensive partnerships and licensing agreements with publishers and institutions.
- Structured peer-reviewed medical research, systematic reviews, and clinical guidelines.
- AI outputs: evidence-based answers for complex clinical questions.
- Moat: trust and accuracy in life-critical contexts, far beyond public web content.
---
Key Insight:
Trusted, scarce, and domain-specific data ecosystems form the foundation of defensible AI products in law, medicine, research, and niche industries.
---
Potential “Next-Generation” Data Walled Gardens
1. Supply Chain & Logistics
- State: Scattered, low-digitization shipping manifests and customs data.
- Opportunity: Aggregate & clean for predictive logistics and trade risk AI.
- Gap: No complete global trade dataset yet.
2. Local & Municipal Government Records
- State: Permits, zoning, inspections scattered in thousands of systems.
- Opportunity: Standardize into a nationwide proprietary database.
- Gap: No equivalent to LexisNexis for local regulations.
3. Frontier Science Fields
- State: Synthetic biology, quantum materials research scattered.
- Opportunity: Structure experimental data for AI-driven R&D.
- Gap: Highly decentralized, unlike medicine’s centralized ecosystems.
4. Cultural & Creative Archives
- State: Museum and historical assets fragmented and undigitized.
- Opportunity: License and digitize for AI cultural heritage applications.
- Gap: Few active AI integration initiatives.
5. Vertical Niche Industry Processes
- State: Veterinary records, blueprints, niche manufacturing specs unstructured.
- Opportunity: Data exclusivity for defensible vertical AI products.
- Gap: Often too “small” for incumbents—ideal for startup focus.
6. Climate & Environmental Data
- State: Emissions/climate risk data split across agencies and NGOs.
- Opportunity: Structure and license for compliance, risk assessment, and renewable energy AI.
- Gap: No “Bloomberg for climate” exists.
---
Why This Matters: Building Defensibility in AI
Big model advantages:
- Larger scale
- Better compute resources
- Wider distribution
Startup moat potential:
- High-quality data in fragmented, sensitive, or proprietary domains
- Requires long-term investment in licensing and partnerships
- Once established, nearly impossible to replicate
📢 A true “data moat” can be the only durable advantage in AI.
---
Join the Conversation
Are you building the next data moat? Share your story.
Original Article: Fruits of the Walled Garden — Marc Andrusko and Alex Rampell
---
Note: This translation reflects the original source’s ideas without representing Z Potentials’ official stance.
Z Potentials provides insights in AI, robotics, globalization, and more. Join our community to share, learn, and grow.
---
📣 Opportunities at Z Potentials

🚀 Next batch of interns — Apply now

🚀 Creative Gen Z entrepreneurs wanted



---
About Z Potentials

---
Related:
In sectors reliant on unique data, platforms that streamline cross-domain AI content publishing are invaluable. Example: AiToEarn — an open-source AI content monetization platform publishing simultaneously across Douyin, Kwai, WeChat, Bilibili, Rednote, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, and X (Twitter), with built-in analytics and model ranking (AI模型排名).
---
----------- END -----------