nanochat-d20-sft

Supervised fine-tuned language model from the nanochat project by Andrej Karpathy.

"The best ChatGPT that $100 can buy"

Model Details

  • Architecture: GPT-based transformer
  • Parameters: 561M
  • Layers: 20
  • Embedding dimension: 1280
  • Attention heads: 10
  • KV heads: 10
  • Context length: 2048 tokens
  • Vocabulary size: 65,536 tokens (BPE tokenizer)
  • Training step: 700

Training

This model was trained using the nanochat pipeline, which includes:

  1. Base pretraining on web text (FineWeb)
  2. Midtraining on curated datasets
  3. Supervised fine-tuning (SFT) on conversational data (SmolTalk)

Training cost: $100 on 8xH100 GPUs (4 hours total)

Evaluation Scores (at step 700)

  • Validation Loss: 1.015
  • MMLU: 33.9%
  • ARC-Easy: 48.5%

Usage

Quick Start with nanochat

# Clone and store the path
git clone https://github.com/karpathy/nanochat
cd nanochat
NANOCHAT_DIR=$(pwd)

# Install dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync

# Download files
mkdir -p ~/.cache/nanochat/chatsft_checkpoints/d20
mkdir -p ~/.cache/nanochat/tokenizer

cd ~/.cache/nanochat/tokenizer
wget https://huggingface.co/gaviego/nanochat/resolve/main/tokenizer.pkl 

cd ~/.cache/nanochat/chatsft_checkpoints/d20
wget https://huggingface.co/gaviego/nanochat/resolve/main/model_000700.pt
wget https://huggingface.co/gaviego/nanochat/resolve/main/meta_000700.json

# Return to nanochat directory
cd "$NANOCHAT_DIR"
source .venv/bin/activate
python -m scripts.chat_cli

Load Programmatically

import torch
from nanochat.checkpoint_manager import build_model

# Path to checkpoint directory
checkpoint_dir = "~/.cache/nanochat/chatsft_checkpoints/d20"
step = 700
device = torch.device("cpu")  # or "cuda"

# Load model
model, tokenizer, meta = build_model(checkpoint_dir, step, device, phase="eval")

# Generate text
from nanochat.engine import Engine
engine = Engine(model, tokenizer, device=device)
response = engine.generate("Hello, how are you?", max_new_tokens=100)
print(response)

Download with Hugging Face Hub

from huggingface_hub import hf_hub_download

# Download checkpoint
model_path = hf_hub_download(
    repo_id="gaviego/nanochat",
    filename="model_000700.pt"
)
meta_path = hf_hub_download(
    repo_id="gaviego/nanochat",
    filename="meta_000700.json"
)

Files

  • model_000700.pt (2.0GB) - PyTorch model checkpoint with weights
  • meta_000700.json (264 bytes) - Training metadata and hyperparameters

Limitations

As a micro-scale language model trained on a $100 budget:

  • Makes mistakes and hallucinations are common
  • Limited reasoning capabilities compared to modern LLMs
  • Best suited for educational purposes and experimentation
  • Performance roughly comparable to early GPT-2 era models

Citation

If you use this model, please cite the nanochat project:

@misc{nanochat,
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}

License

MIT License (same as nanochat)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support