nanochat-d20-sft
Supervised fine-tuned language model from the nanochat project by Andrej Karpathy.
"The best ChatGPT that $100 can buy"
Model Details
- Architecture: GPT-based transformer
- Parameters: 561M
- Layers: 20
- Embedding dimension: 1280
- Attention heads: 10
- KV heads: 10
- Context length: 2048 tokens
- Vocabulary size: 65,536 tokens (BPE tokenizer)
- Training step: 700
Training
This model was trained using the nanochat pipeline, which includes:
- Base pretraining on web text (FineWeb)
- Midtraining on curated datasets
- Supervised fine-tuning (SFT) on conversational data (SmolTalk)
Training cost: $100 on 8xH100 GPUs (4 hours total)
Evaluation Scores (at step 700)
- Validation Loss: 1.015
- MMLU: 33.9%
- ARC-Easy: 48.5%
Usage
Quick Start with nanochat
# Clone and store the path
git clone https://github.com/karpathy/nanochat
cd nanochat
NANOCHAT_DIR=$(pwd)
# Install dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
# Download files
mkdir -p ~/.cache/nanochat/chatsft_checkpoints/d20
mkdir -p ~/.cache/nanochat/tokenizer
cd ~/.cache/nanochat/tokenizer
wget https://huggingface.co/gaviego/nanochat/resolve/main/tokenizer.pkl
cd ~/.cache/nanochat/chatsft_checkpoints/d20
wget https://huggingface.co/gaviego/nanochat/resolve/main/model_000700.pt
wget https://huggingface.co/gaviego/nanochat/resolve/main/meta_000700.json
# Return to nanochat directory
cd "$NANOCHAT_DIR"
source .venv/bin/activate
python -m scripts.chat_cli
Load Programmatically
import torch
from nanochat.checkpoint_manager import build_model
# Path to checkpoint directory
checkpoint_dir = "~/.cache/nanochat/chatsft_checkpoints/d20"
step = 700
device = torch.device("cpu") # or "cuda"
# Load model
model, tokenizer, meta = build_model(checkpoint_dir, step, device, phase="eval")
# Generate text
from nanochat.engine import Engine
engine = Engine(model, tokenizer, device=device)
response = engine.generate("Hello, how are you?", max_new_tokens=100)
print(response)
Download with Hugging Face Hub
from huggingface_hub import hf_hub_download
# Download checkpoint
model_path = hf_hub_download(
repo_id="gaviego/nanochat",
filename="model_000700.pt"
)
meta_path = hf_hub_download(
repo_id="gaviego/nanochat",
filename="meta_000700.json"
)
Files
model_000700.pt(2.0GB) - PyTorch model checkpoint with weightsmeta_000700.json(264 bytes) - Training metadata and hyperparameters
Limitations
As a micro-scale language model trained on a $100 budget:
- Makes mistakes and hallucinations are common
- Limited reasoning capabilities compared to modern LLMs
- Best suited for educational purposes and experimentation
- Performance roughly comparable to early GPT-2 era models
Citation
If you use this model, please cite the nanochat project:
@misc{nanochat,
author = {Andrej Karpathy},
title = {nanochat: The best ChatGPT that $100 can buy},
year = {2025},
publisher = {GitHub},
url = {https://github.com/karpathy/nanochat}
}
License
MIT License (same as nanochat)