AGI benchmark

AGI Now Has a Quantitative Benchmark! Led by Bengio, Progress at 58%

Honghao Wang

17 Oct 2025 — 4 min read

Defining AGI: Moving Beyond "Human‑Like Intelligence"

For decades, Artificial General Intelligence (AGI) has been loosely described as “human‑like intelligence” — often framed as “as smart as a human”.

But how smart is that, exactly?

Recently, Turing Award laureate Yoshua Bengio, alongside the Center for AI Safety, the University of California, Berkeley, and other institutions, proposed a new, measurable definition in their work: “A Definition of AGI.”

> AGI is artificial intelligence that can match or exceed the cognitive versatility and proficiency of a well‑educated adult.

---

Core Components of the Definition

The paper’s definition focuses on two key dimensions:

A Clear Reference Point
Using “well‑educated adult” as the benchmark avoids vague terms like “superhuman” and provides concrete, testable criteria.
Comprehensive Competence
AGI should meet benchmarks across multiple core cognitive domains — not just excel at one task but display balanced ability in reasoning, memory, perception, and more.

---

From Concept to Measurement

To make AGI measurable, the team built a quantitative evaluation framework based on the Cattell–Horn–Carroll (CHC) theory — a well‑established psychological model of human intelligence.

The CHC Model's Cognitive Domains

The CHC model breaks down general intelligence into 10 interrelated cognitive domains, ranging from perception to advanced reasoning. Researchers adapted classic human test questions to suit AI, removing items dependent on human physiology (like touch) or context-specific scenarios (like driving).

Final AGI Evaluation Domains:

Knowledge (K) – Common sense, sciences, history, culture
Reading/Writing (RW) – Comprehension, expression, composition
Mathematics (M) – Calculation, quantitative reasoning
Reasoning (R) – Logic, abstract problem‑solving
Working Memory (WM) – Short‑term retention and processing
Long‑Term Memory Storage (MS) – Stable knowledge preservation
Long‑Term Memory Retrieval (MR) – Efficient recall from storage
Visual (V) – Image recognition, spatial understanding
Auditory (A) – Speech and sound comprehension
Speed (S) – Rapid execution of simple cognitive tasks

---

Scoring System

Each domain is scored out of 10 points for a total of 100.

A score of 100 represents AGI‑level ability.

Benchmark Results

| Model | Score | Relative to AGI |

|-----------------|-------|-----------------|

| GPT‑4 (2023) | 27 | Far below AGI |

| GPT‑5 (2025) | 58 | Incomplete AGI |

In just two years, scores increased by 115%, but GPT‑5 still scores 0 in long‑term memory storage — a critical weakness.

---

Uneven Capabilities

Experiments reveal imbalances across domains:

Strongest Areas: Knowledge, Reading/Writing, Mathematics
Weakest Areas: Perception (Visual/Auditory), Long‑Term Memory, and certain forms of reasoning

---

Domain‑Specific Performance

Strengths: Textual & Symbolic Mastery

Knowledge (K): GPT‑5 scores above 8
Reading/Writing (RW): Above 8
Mathematics (M): Above 8

These strengths showcase pattern matching on massive datasets — tasks well‑suited to large language model training.

---

Weaknesses: Perception & Memory

Visual (V) & Auditory (A) Performance

GPT‑4: No image or sound processing
GPT‑5: Basic cat/dog classification, simple speech‑to‑text
Still far from human‑level complex scene interpretation or emotional recognition

Long‑Term Memory Deficits

Cannot reliably store and recall information over extended periods
Context window expansion only extends short‑term working memory, not true long‑term capability
Online search augments knowledge but introduces hallucination and outdated data risk

---

Why External Tools Don't Count

This AGI evaluation excludes:

Browser / internet search assistance
External APIs
Memory extensions beyond native capacity

The focus is purely on intrinsic cognitive ability.

No model with a zero score in a core domain can be considered AGI, regardless of its overall performance.

---

Outlook

With a measurable definition now available, the question shifts from “What is AGI?” to “When will we reach it?”.

Paper link: https://www.agidefinition.ai/paper.pdf

Reference: https://x.com/DanHendrycks/status/1978828377269117007

---

For AI Creators & Innovators

Platforms like AiToEarn官网 offer:

Open-source AI innovation tools
Cross-platform publishing (Douyin, Bilibili, X, LinkedIn)
Analytics and model ranking
Monetization workflows

Such ecosystems bridge the gap between AI’s growing potential and real‑world creative impact — helping developers and creators prepare for the true AGI era.