Karpathy
Karpathy Builds ChatGPT from Scratch in 8,000 Lines of Code for $100, Trains in 12 Hours to Outperform GPT-2 — Step-by-Step Guide

Honghao Wang

14 Oct 2025 — 4 min read
# $100 Cost, 8,000 Lines of Code to Handcraft a ChatGPT Clone

## Introduction

**Andrej Karpathy**, former Tesla AI Director, OpenAI founding member, and respected AI educator, has released a new course project: **nanochat**.  
Karpathy describes it as one of his most “chaotic” and free-spirited creations — a **minimalist, from-scratch, full-stack training/inference pipeline** for building a ChatGPT-like model.

Simply spin up a cloud GPU server, run one script, and **within 4 hours** you can chat with your own trained LLM through a ChatGPT-style web UI.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_001-30.png)

---

## Core Capabilities

This ~8,000-line project supports:

- **Tokenizer training** in Rust, implemented from scratch.
- **Pretraining** a Transformer architecture LLM on the FineWeb dataset, evaluating CORE metrics.
- **Midtraining** with SmolTalk dialogue data, multiple-choice datasets, and tool-use datasets.
- **Supervised fine-tuning (SFT)** with ARC-E/C, GSM8K, HumanEval.
- Optional **Reinforcement Learning (RL)** with the GRPO algorithm on GSM8K.
- **Inference engine** supports KV caching, prefill/decoding, and tool use (Python interpreter), accessible via CLI or web UI.
- Automatic single **Markdown report card** documenting training, metrics, and progress.

💡 **Cost**: Around **$100** (4 hours training on 8×H100 GPUs).

![image](https://blog.aitoearn.ai/content/images/2025/10/img_002-26.png)

---

## Performance Highlights

![image](https://blog.aitoearn.ai/content/images/2025/10/img_003-24.png)

- **After 12 hours** of training, nanochat surpasses GPT‑2 on CORE benchmarks.
- **$1,000 budget (~41.6 hours)** gives enhanced performance: solves math/code problems and multiple-choice reasoning.
- Example: Depth‑30 model trained for 24 hours (~GPT‑3 Small 125M compute cost) scores:
  - **MMLU:** 40+
  - **ARC-Easy:** 70+
  - **GSM8K:** 20+

---

## Project Vision

Karpathy’s aim: **Consolidate** a readable, modifiable, minimal tech stack for LLMs into one repo.

> "It may grow into a research toolkit or benchmarking tool — much like nanoGPT before it."

**nanochat** acts as the capstone project for the upcoming **LLM101n course**.

---

## Popularity

- Immediate community excitement — GitHub repo reached **4.8K stars** quickly.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_004-19.png)  
![image](https://blog.aitoearn.ai/content/images/2025/10/img_005-21.png)  
![image](https://blog.aitoearn.ai/content/images/2025/10/img_006-20.png)

---

## Architecture Overview

Karpathy notes:

- Architecture: Similar to **LLaMA**, simplified.
- Inspiration: Some features from **modded nanoGPT**.
- Handwritten code — attempted AI agent assistance (Claude, Codex) yielded poor results.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_007-11.png)  
![image](https://blog.aitoearn.ai/content/images/2025/10/img_008-12.png)

---

# Quick-Start Guide

Launching an **8×H100 GPU server** on Lambda GPU Cloud ≈ $24/hour — efficiency matters.

---

## Step 1: Clone the Project

![image](https://blog.aitoearn.ai/content/images/2025/10/img_009-11.png)

---

## Step 2: Install Dependencies

1. Install **uv** project manager.
2. Create and activate a `.venv` virtual environment.
3. Ensure `python` maps to the venv environment.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_010-8.png)

---

## Step 3: Install Rust/Cargo

Required to **compile the custom Rust tokenizer**.  
Karpathy found:
- Python `minbpe` tokenizer too slow.
- HF tokenizer too bloated/confusing.
  
This Rust tokenizer is optimized for training with accurate output matching the Python version, using OpenAI's `tiktoken` for inference.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_011-4.png)

---

## Step 4: Download Pretraining Data

Dataset: **FineWeb‑EDU** (Karpathy version: shuffled shards).

- **Size:** 0.25M chars/shard (~100MB gzip).
- **Total shards:** 1,822.
- **For depth‑20:** Requires 240 shards (~24GB total).

![image](https://blog.aitoearn.ai/content/images/2025/10/img_012-4.png)

---

## Step 5: Train the Tokenizer

- **Vocab size:** 65,536 (2¹⁶ tokens).
- **Training time:** ~1 minute.
- Algorithm: Regex splitting, byte-level BPE — same as OpenAI.
- Evaluate compression ratio (~4.8).

---

## Step 6: Download Eval Bundle

- Location: `~/.cache/nanochat/eval_bundle`
- Contains 22 datasets for CORE evaluation (ARC, HellaSwag, lambada, etc.).

![image](https://blog.aitoearn.ai/content/images/2025/10/img_016-2.png)

---

## Step 7: Configure WandB *(Optional)*

![image](https://blog.aitoearn.ai/content/images/2025/10/img_017-1.png)

---

## Step 8: Pretraining

- Script: `scripts/base_train.py`
- Setup:  
  - Depth‑20 Transformer  
  - 1,280 channels, 10 attention heads × 128 dims  
  - ~560M parameters  
  - Following Chinchilla scaling laws (~11.2B tokens)

![image](https://blog.aitoearn.ai/content/images/2025/10/img_018-1.png)

💡 **Pretraining time**: ~3 hours.

---

## Training Metrics

- **Bits per byte (bpb):** Tokenizer-independent loss measure.
- Example result: bpb ≈ 0.81, CORE ≈ 0.22 (outperforms GPT‑2 Large).

![image](https://blog.aitoearn.ai/content/images/2025/10/img_020.png)  
![image](https://blog.aitoearn.ai/content/images/2025/10/img_021.png)  

---

## Step 9: Mid-Phase Training

Purpose:
- Adapt to multi-turn conversation tokens.
- Learn multiple-choice formatting.
- Learn tool usage (e.g., Python execution between `<|python_start|>` and `<|python_end|>`).

![image](https://blog.aitoearn.ai/content/images/2025/10/img_024.png)

---

### Midtraining Process

![image](https://blog.aitoearn.ai/content/images/2025/10/img_025.png)  
Run:  
![image](https://blog.aitoearn.ai/content/images/2025/10/img_026.png)  

**Duration:** ~8 minutes.  
**Results:**
![image](https://blog.aitoearn.ai/content/images/2025/10/img_028.png)

---

## Step 10: Supervised Fine-Tuning (SFT)

- Fine-tune on high-quality dialogue.
- Align training & inference formats.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_030.png)  
**Duration:** ~7 minutes.  
Slight performance boosts:
![image](https://blog.aitoearn.ai/content/images/2025/10/img_031.png)

---

## Step 11: Chat Interaction

Two methods:
1. CLI
2. Web UI (`chat_web` via FastAPI)

![image](https://blog.aitoearn.ai/content/images/2025/10/img_032.png)  
![image](https://blog.aitoearn.ai/content/images/2025/10/img_033.png)

---

## Step 12: Reinforcement Learning (Optional)

- Dataset: GSM8K
- Simplified GRPO loop: Rewards correct answers.
- Results after 1.5 hours:
![image](https://blog.aitoearn.ai/content/images/2025/10/img_035.png)

---

## Final Report

- **Characters:** 333,989  
- **Lines:** 8,304  
- **Files:** 44  
- **Tokens:** ~83,497  
- **Dependencies:** 2,004 lines in uv.lock  
- **Runtime (SFT stage):** 3h 51m → **~$92.4 cost**

![image](https://blog.aitoearn.ai/content/images/2025/10/img_036.png)

---

# Model Scaling Tips

- Use `--depth` parameter in `base_train.py`.
- Larger depth → adjust batch size (e.g., 32 → 16).
- The repo auto-adjusts with gradient accumulation.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_037.png)  
![image](https://blog.aitoearn.ai/content/images/2025/10/img_038.png)

---

## About Karpathy

![image](https://blog.aitoearn.ai/content/images/2025/10/img_039.jpeg)

- Former Tesla AI Director → OpenAI → Founded **Eureka Labs** in 2023.
- Courses: **CS231n** at Stanford, influential blog posts, YouTube tutorials.
- Vision: **AI-native education** combining human teachers + patient, multilingual AI guides.

---

### Eureka Labs' First Course: **LLM101n**
> Build a story-generating LLM similar to ChatGPT with companion web app.

![image](https://blog.aitoearn.ai/content/images/2025/10/img_042.png)

---

**Resources**  
GitHub repo: https://github.com/karpathy/nanochat  
Discussion guide: https://github.com/karpathy/nanochat/discussions/1  
Karpathy tweet: https://x.com/karpathy/status/1977755427569111362  

---
Karpathy Builds ChatGPT from Scratch in 8,000 Lines of Code for $100, Trains in 12 Hours to Outperform GPT-2 — Step-by-Step Guide

Honghao Wang

Read more

Quantum + AI4S! BosonQ Secures Hundreds of Millions in A++ Round Financing

ChatGPT’s Adult Mode Is Coming, But As an Adult I’m Not Excited at All

Just Now: Musk’s Second-Gen Starship Final Flight Succeeds — Bonus: Jensen Huang Personally Delivers Supercomputer

Surgeon-Founded AI Company Raises Nearly $100M in 18 Months, Captivating 2 Million Doctors