--- license: apache-2.0 datasets: - karpathy/fineweb-edu-100b-shuffle language: - en model-index: - name: chat-d10 results: - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test metrics: - type: acc value: 9.7 name: accuracy source: url: https://github.com/karpathy/nanochat name: nanochat --- # NanoChat SFT This is the RL trained checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat). ## Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "nanochat-students/rl-d20" device = torch.device("cuda" if torch.cuda.is_available() else "cpu") tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device) model.eval() conversation = [ {"role": "user", "content": "Hello, who are you?"}, ] rendered = tokenizer.apply_chat_template( conversation, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([rendered], return_tensors="pt").to(model.device) generated = model.generate(**model_inputs, max_new_tokens=256) output_ids = generated[0, model_inputs.input_ids.shape[1]:] print(tokenizer.decode(output_ids, skip_special_tokens=True)) ``` ## Chat RL Training Metrics timestamp: 2025-10-15 12:59:52 - run: burtenshaw-20251015111354 - source: sft - dtype: bfloat16 - device_batch_size: 8 - examples_per_step: 16 - num_samples: 16 - max_new_tokens: 256 - temperature: 1.0000 - top_k: 50 - unembedding_lr: 0.0040 - embedding_lr: 0.2000 - matrix_lr: 0.0200 - weight_decay: 0.0000 - init_lr_frac: 0.0500 - num_epochs: 1 - save_every: 60 - eval_every: 60 - eval_examples: 400 ## Chat evaluation RL timestamp: 2025-10-15 13:04:39 - source: rl - task_name: GSM8K - dtype: bfloat16 - temperature: 0.0000 - max_new_tokens: 512 - num_samples: 1 - top_k: 50 - batch_size: 8 - model_tag: None - step: None - max_problems: None - GSM8K: 0.0970 Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio