nanochat

nanochat is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs).

Read about the process at https://samdobson.uk/posts/training-a-chatgpt-clone-for-cheap/

Chat with the model at https://huggingface.co/spaces/sdobson/nanochat

Model Description

  • Developed by: Andrej Karpathy
  • Trained by: Sam Dobson
  • Model type: Transformer-based causal language model
  • Language(s): English
  • License: MIT
  • Parameters: 560,988,160 (~561M)

Architecture

  • Layers: 20
  • Hidden size: 1280 channels
  • Attention heads: 10
  • Head dimension: 128
  • Vocabulary size: 65,536 tokens

Training Details

Training Data

nanochat was trained in multiple stages:

  1. Pretraining: 100B token subset of FineWeb-EDU (11.2B tokens processed)
  2. Midtraining: SmolTalk conversations, MMLU multiple choice questions, GSM8K math problems
  3. Supervised Fine-tuning (SFT): Conversational adaptation data

Training Procedure

Tokenization

  • Custom Rust-based tokenizer
  • Vocabulary: 65,536 tokens
  • Compression ratio: 4.8 characters per token

Training Infrastructure

  • Hardware: 8x H100 GPUs (Lambda GPU Cloud)
  • Training time: ~3 hours for pretraining stage
  • Estimated compute: ~4e19 FLOPs
  • Total cost: ~$100

Training Stages

The model was trained in three stages:

  1. Pretraining on web text (FineWeb-EDU)
  2. Midtraining on domain-specific datasets (reasoning, conversation, maths)
  3. Supervised fine-tuning for chat optimisation

Performance

Benchmark Results

Benchmark Score Description
MMLU 23.99% Multitask language understanding
GSM8K 4.47% Grade school math problems
HumanEval 6.71% Python code generation
ARC-Easy 24.79% Science questions (easy)
ARC-Challenge 24.32% Science questions (hard)
ChatCORE 1.73% Conversational reasoning

Intended Use

Direct Use

nanochat is designed for:

  • Conversational AI applications
  • Research on efficient language model training
  • Educational purposes for understanding LLM training pipelines
  • Low-resource deployment scenarios

Downstream Use

The model can be fine-tuned for specific conversational tasks or used as a base model for further domain adaptation.

Out-of-Scope Use

  • Production-grade conversational AI (the model is relatively small and has limited capabilities)
  • Tasks requiring specialised knowledge or high accuracy
  • Critical applications where errors could cause harm

Limitations and Bias

  • Small scale: At 561M parameters, this model has significantly fewer capabilities than larger models (1B+ parameters)
  • Limited training: Trained on only 11.2B tokens, which is modest by modern standards
  • Performance: Benchmark scores indicate limited reasoning and mathematical capabilities
  • Bias: Inherits biases from training data (FineWeb-EDU, SmolTalk, etc.)
  • Language: English-only

Inference guide

Simon Willison created a script to allow this to run on CPU on MacOS:

  cd /tmp
  git clone https://huggingface.co/sdobson/nanochat
  uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
    --model-dir /tmp/nanochat \
    --prompt "Tell me about dogs."

Otherwise you can:

  1. Download all files
  2. Put tokenizer.pkl and token_bytes.pt in ~/.cache/nanochat/tokenizer
  3. Put model_000650.pt and meta_000650.json in ~/.cache/nanochat/chatsft_checkpoints/d20
  4. Clone https://github.com/karpathy/nanochat
  5. Run uv sync followed by uv run python -m scripts.chat_web

Citation

Repository: github.com/karpathy/nanochat

@software{nanochat2025,
  author = {Karpathy, Andrej},
  title = {nanochat: A 561M parameter conversational language model},
  year = {2025},
  url = {https://github.com/karpathy/nanochat}
}

Model Card Author

Sam Dobson

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train sdobson/nanochat

Spaces using sdobson/nanochat 3

Evaluation results