AI news

Karpathy Forms Large Model “Parliament” with GPT‑5.1, Gemini 3 Pro as Ultimate Think Tank

Honghao Wang

23 Nov 2025 — 3 min read

Large Models Competing Like Fighting Crickets

From short videos to AI models, people's content consumption habits are shifting again — toward speed and efficiency.

---

Changing Reading Habits

When facing long-form articles, academic papers, or large volumes of data, an increasing number of people no longer read start-to-finish.

Instead, they jump straight to high-density, quickly digestible knowledge — often by asking a large language model (LLM) to generate a summary.

A typical example: someone comments “@Yuanbao, summarize this” — a routine interaction in 2025.

This isn’t a flaw.

It’s an upgrade in human capability in the AI era — acquiring information faster and more efficiently than ever.

---

Karpathy’s Admission

Even AI leaders share this habit.

Andrej Karpathy — former OpenAI co-founder and Tesla AI Director — posted on X (Twitter) recently:

> "I’ve started using LLMs to read everything."

Like many, Karpathy combines personal insights with LLM summaries to deepen understanding.

---

The “LLM Parliament” Concept

With so many LLM options — each excelling in different areas — Karpathy wanted higher-quality results.

So he assembled four leading models into his own multi-model “LLM Parliament”.

How It Works

Karpathy describes the process as ambient computing:

Distribute the question to multiple models via OpenRouter:
`openai/gpt-5.1`
`google/gemini-3-pro-preview`
`anthropic/claude-sonnet-4.5`
`x-ai/grok-4`
Peer review:
Models see anonymized answers from others.
They review and rank them.
Final synthesis:
A Chairman LLM uses the ranked answers as context to produce the final output.

---

Comparison to PewDiePie’s Experiment

This is reminiscent of PewDiePie’s “Large Model Committee” experiment,

where 8 instances of the same model (with different prompts/personalities) produced answers and voted.

Difference: Karpathy uses different models, yielding greater diversity.

---

The “Cyber Cricket Fight”

Placing multiple LLM answers side-by-side — and letting them vote — is like watching a digital debate or AI cricket fight.

Sometimes one model openly admits another’s answer is better — making this approach both fun and a novel evaluation method.

Example:

When reading books, Karpathy’s parliament often rates GPT‑5.1 highest, Claude lowest, with Gemini and Grok in between.

Karpathy disagrees slightly — preferring Gemini’s concise summaries over GPT‑5.1’s verbosity, and noting Claude as overly minimalistic.

---

Multi-Model Workflow Beyond Fun

This collaborative answering style has real applications for content creators and analysts.

Platforms like AiToEarn官网 enable worldwide creators to:

Generate AI-assisted content
Publish across Douyin, Kwai, WeChat, Bilibili, Xiaohongshu, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X (Twitter)
Analyze performance
Rank AI models (AI模型排名)

Such ecosystems fit naturally with LLM Parliament workflows — merging diverse AI outputs into powerful, cross-platform production pipelines.

---

Karpathy’s LLM Parliament – Three Key Stages

Stage 1: Initial Opinions

Send the user’s query to all models.
Collect and display responses in a tab view for easy comparison.

Stage 2: Peer Review

Each LLM sees anonymized responses from others.
Rank them based on accuracy and insightfulness.

Stage 3: Final Response

Council Chair LLM synthesizes all responses and rankings into the final answer.

---

Could This Be a Benchmark?

Some believe this multi-model council could evolve into a benchmarking tool:

However, the design space for multi-model integration is still wide open and underexplored.

---

Try It Yourself

Karpathy has open-sourced the project:

GitHub: https://github.com/karpathy/llm-council
X announcement: https://x.com/karpathy/status/1992381094667411768

Note: No support is provided; the code is shared as-is and won’t be updated.

We previously used vibe coding to recreate a similar project with two deployed models.

Should we also consider open-sourcing ours?

---

Integrating Multi-Model Councils with AiToEarn

Tools like AiToEarn can:

Connect multi-model councils to content creation pipelines
Automate cross-platform publishing
Provide analytics and model ranking (AI模型排名)
Support a wider multi-model reasoning system

Full resource list: AiToEarn Docs →