NanoChat SFT
This is the the checkpoint from Andrej Karpathy's fullstack llm project to build an LLM, nanochat.
Usage
Install transformers from this specific branch:
pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation
Then, you can run this inference snippet:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id="nanochat-students/d20-chat-transformers"
max_new_tokens=64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16).to(device)
model.eval()
conversation = [
{"role": "user", "content": "What is the capital of France?"},
]
inputs = tokenizer.apply_chat_template(
conversation,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt"
).to(device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
)
# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs.input_ids.shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
Chat SFT Training Metrics
timestamp: 2025-10-14 20:17:42
- run:
- source: mid
- dtype: bfloat16
- device_batch_size: 4
- num_epochs: 1
- max_iterations: -1
- target_examples_per_step: 32
- unembedding_lr: 0.0040
- embedding_lr: 0.2000
- matrix_lr: 0.0200
- weight_decay: 0.0000
- init_lr_frac: 0.0200
- eval_every: 100
- eval_steps: 100
- eval_metrics_every: 200
- Training rows: 20,843
- Number of iterations: 651
- Training loss: 1.1904
- Validation loss: 1.0664
Chat evaluation sft
timestamp: 2025-10-14 20:29:59
- source: sft
- task_name: None
- dtype: bfloat16
- temperature: 0.0000
- max_new_tokens: 512
- num_samples: 1
- top_k: 50
- batch_size: 8
- model_tag: None
- step: None
- max_problems: None
- ARC-Easy: 0.4259
- ARC-Challenge: 0.2961
- MMLU: 0.3250
- GSM8K: 0.0432
- HumanEval: 0.0549
- ChatCORE metric: 0.0988
Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Dataset used to train nanochat-students/chat-d20
Space using nanochat-students/chat-d20 1
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set nanochat29.610
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set nanochat42.590
- accuracy on MMLU (5-Shot)test set nanochat32.500
- accuracy on GSM8k (5-shot)test set nanochat4.320
- pass@1 on HumanEvaltest set nanochat5.490
- ChatCORE metric on ChatCOREtest set nanochat9.880