PyTorch
English
nanogpt
custom_code
Eval Results

NanoChat SFT

This is the RL trained checkpoint from Andrej Karpathy's fullstack llm project to build an LLM, nanochat.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


model_name = "nanochat-students/rl-d20"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device)
model.eval()

conversation = [
    {"role": "user", "content": "Hello, who are you?"},
]
rendered = tokenizer.apply_chat_template(
    conversation,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([rendered], return_tensors="pt").to(model.device)

generated = model.generate(**model_inputs, max_new_tokens=256)
output_ids = generated[0, model_inputs.input_ids.shape[1]:]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

Chat RL Training Metrics

timestamp: 2025-10-15 12:59:52

  • run: burtenshaw-20251015111354
  • source: sft
  • dtype: bfloat16
  • device_batch_size: 8
  • examples_per_step: 16
  • num_samples: 16
  • max_new_tokens: 256
  • temperature: 1.0000
  • top_k: 50
  • unembedding_lr: 0.0040
  • embedding_lr: 0.2000
  • matrix_lr: 0.0200
  • weight_decay: 0.0000
  • init_lr_frac: 0.0500
  • num_epochs: 1
  • save_every: 60
  • eval_every: 60
  • eval_examples: 400

Chat evaluation RL

timestamp: 2025-10-15 13:04:39

  • source: rl
  • task_name: GSM8K
  • dtype: bfloat16
  • temperature: 0.0000
  • max_new_tokens: 512
  • num_samples: 1
  • top_k: 50
  • batch_size: 8
  • model_tag: None
  • step: None
  • max_problems: None
  • GSM8K: 0.0970

Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train nanochat-students/rl-d20

Evaluation results