NanoChat SFT

This is the the checkpoint from Andrej Karpathy's fullstack llm project to build an LLM, nanochat.

Usage

Install transformers from this specific branch:

pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation

Then, you can run this inference snippet:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


model_id="nanochat-students/d20-chat-transformers"
max_new_tokens=64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16).to(device)
model.eval()

conversation = [
    {"role": "user", "content": "What is the capital of France?"},
]

inputs = tokenizer.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt"
).to(device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
    )

# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs.input_ids.shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))

Chat SFT Training Metrics

timestamp: 2025-10-14 20:17:42

run:
source: mid
dtype: bfloat16
device_batch_size: 4
num_epochs: 1
max_iterations: -1
target_examples_per_step: 32
unembedding_lr: 0.0040
embedding_lr: 0.2000
matrix_lr: 0.0200
weight_decay: 0.0000
init_lr_frac: 0.0200
eval_every: 100
eval_steps: 100
eval_metrics_every: 200
Training rows: 20,843
Number of iterations: 651
Training loss: 1.1904
Validation loss: 1.0664

Chat evaluation sft

timestamp: 2025-10-14 20:29:59

source: sft
task_name: None
dtype: bfloat16
temperature: 0.0000
max_new_tokens: 512
num_samples: 1
top_k: 50
batch_size: 8
model_tag: None
step: None
max_problems: None
ARC-Easy: 0.4259
ARC-Challenge: 0.2961
MMLU: 0.3250
GSM8K: 0.0432
HumanEval: 0.0549
ChatCORE metric: 0.0988

Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train nanochat-students/chat-d20

Space using nanochat-students/chat-d20 1

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set nanochat

29.610
normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set nanochat

42.590
accuracy on MMLU (5-Shot)
test set nanochat

32.500
accuracy on GSM8k (5-shot)
test set nanochat

4.320
pass@1 on HumanEval
test set nanochat

5.490
ChatCORE metric on ChatCORE
test set nanochat

9.880

View on Papers With Code