AGI Now Has a Quantitative Benchmark! Led by Bengio, Progress at 58%
Defining AGI: Moving Beyond "Human‑Like Intelligence"
For decades, Artificial General Intelligence (AGI) has been loosely described as “human‑like intelligence” — often framed as “as smart as a human”.
But how smart is that, exactly?
Recently, Turing Award laureate Yoshua Bengio, alongside the Center for AI Safety, the University of California, Berkeley, and other institutions, proposed a new, measurable definition in their work: “A Definition of AGI.”
> AGI is artificial intelligence that can match or exceed the cognitive versatility and proficiency of a well‑educated adult.

---
Core Components of the Definition
The paper’s definition focuses on two key dimensions:
- A Clear Reference Point
- Using “well‑educated adult” as the benchmark avoids vague terms like “superhuman” and provides concrete, testable criteria.
- Comprehensive Competence
- AGI should meet benchmarks across multiple core cognitive domains — not just excel at one task but display balanced ability in reasoning, memory, perception, and more.
---
From Concept to Measurement
To make AGI measurable, the team built a quantitative evaluation framework based on the Cattell–Horn–Carroll (CHC) theory — a well‑established psychological model of human intelligence.

The CHC Model's Cognitive Domains
The CHC model breaks down general intelligence into 10 interrelated cognitive domains, ranging from perception to advanced reasoning. Researchers adapted classic human test questions to suit AI, removing items dependent on human physiology (like touch) or context-specific scenarios (like driving).
Final AGI Evaluation Domains:
- Knowledge (K) – Common sense, sciences, history, culture
- Reading/Writing (RW) – Comprehension, expression, composition
- Mathematics (M) – Calculation, quantitative reasoning
- Reasoning (R) – Logic, abstract problem‑solving
- Working Memory (WM) – Short‑term retention and processing
- Long‑Term Memory Storage (MS) – Stable knowledge preservation
- Long‑Term Memory Retrieval (MR) – Efficient recall from storage
- Visual (V) – Image recognition, spatial understanding
- Auditory (A) – Speech and sound comprehension
- Speed (S) – Rapid execution of simple cognitive tasks
---
Scoring System
Each domain is scored out of 10 points for a total of 100.
A score of 100 represents AGI‑level ability.
Benchmark Results
| Model | Score | Relative to AGI |
|-----------------|-------|-----------------|
| GPT‑4 (2023) | 27 | Far below AGI |
| GPT‑5 (2025) | 58 | Incomplete AGI |
In just two years, scores increased by 115%, but GPT‑5 still scores 0 in long‑term memory storage — a critical weakness.

---
Uneven Capabilities
Experiments reveal imbalances across domains:

- Strongest Areas: Knowledge, Reading/Writing, Mathematics
- Weakest Areas: Perception (Visual/Auditory), Long‑Term Memory, and certain forms of reasoning
---
Domain‑Specific Performance
Strengths: Textual & Symbolic Mastery
- Knowledge (K): GPT‑5 scores above 8
- Reading/Writing (RW): Above 8
- Mathematics (M): Above 8



These strengths showcase pattern matching on massive datasets — tasks well‑suited to large language model training.
---
Weaknesses: Perception & Memory
Visual (V) & Auditory (A) Performance


- GPT‑4: No image or sound processing
- GPT‑5: Basic cat/dog classification, simple speech‑to‑text
- Still far from human‑level complex scene interpretation or emotional recognition
Long‑Term Memory Deficits


- Cannot reliably store and recall information over extended periods
- Context window expansion only extends short‑term working memory, not true long‑term capability
- Online search augments knowledge but introduces hallucination and outdated data risk
---
Why External Tools Don't Count
This AGI evaluation excludes:
- Browser / internet search assistance
- External APIs
- Memory extensions beyond native capacity
The focus is purely on intrinsic cognitive ability.
No model with a zero score in a core domain can be considered AGI, regardless of its overall performance.
---
Outlook
With a measurable definition now available, the question shifts from “What is AGI?” to “When will we reach it?”.
Paper link: https://www.agidefinition.ai/paper.pdf
Reference: https://x.com/DanHendrycks/status/1978828377269117007
---
For AI Creators & Innovators
Platforms like AiToEarn官网 offer:
- Open-source AI innovation tools
- Cross-platform publishing (Douyin, Bilibili, X, LinkedIn)
- Analytics and model ranking
- Monetization workflows
Such ecosystems bridge the gap between AI’s growing potential and real‑world creative impact — helping developers and creators prepare for the true AGI era.