--- license: apache-2.0 datasets: - karpathy/fineweb-edu-100b-shuffle language: - en model-index: - name: chat-d10 results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test metrics: - type: acc_norm value: 29.61 name: normalized accuracy source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Easy split: test metrics: - type: acc_norm value: 42.59 name: normalized accuracy source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test metrics: - type: acc value: 32.50 name: accuracy source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test metrics: - type: acc value: 4.32 name: accuracy source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: HumanEval type: openai_humaneval split: test metrics: - type: pass@1 value: 5.49 name: pass@1 source: url: https://github.com/karpathy/nanochat name: nanochat - task: type: text-generation name: Text Generation dataset: name: ChatCORE type: chatcore split: test metrics: - type: score value: 9.88 name: ChatCORE metric source: url: https://github.com/karpathy/nanochat name: nanochat --- # NanoChat SFT This is the the checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat). ## Usage Install transformers from this specific branch: ```sh pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation ``` Then, you can run this inference snippet: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id="nanochat-students/d20-chat-transformers" max_new_tokens=64 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False) model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16).to(device) model.eval() conversation = [ {"role": "user", "content": "What is the capital of France?"}, ] inputs = tokenizer.apply_chat_template( conversation, add_generation_prompt=True, tokenize=True, return_tensors="pt" ).to(device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=max_new_tokens, ) # Decode only the generated tokens (excluding the input prompt) generated_tokens = outputs[0, inputs.input_ids.shape[1]:] print(tokenizer.decode(generated_tokens, skip_special_tokens=True)) ``` ## Chat SFT Training Metrics timestamp: 2025-10-14 20:17:42 - run: - source: mid - dtype: bfloat16 - device_batch_size: 4 - num_epochs: 1 - max_iterations: -1 - target_examples_per_step: 32 - unembedding_lr: 0.0040 - embedding_lr: 0.2000 - matrix_lr: 0.0200 - weight_decay: 0.0000 - init_lr_frac: 0.0200 - eval_every: 100 - eval_steps: 100 - eval_metrics_every: 200 - Training rows: 20,843 - Number of iterations: 651 - Training loss: 1.1904 - Validation loss: 1.0664 ## Chat evaluation sft timestamp: 2025-10-14 20:29:59 - source: sft - task_name: None - dtype: bfloat16 - temperature: 0.0000 - max_new_tokens: 512 - num_samples: 1 - top_k: 50 - batch_size: 8 - model_tag: None - step: None - max_problems: None - ARC-Easy: 0.4259 - ARC-Challenge: 0.2961 - MMLU: 0.3250 - GSM8K: 0.0432 - HumanEval: 0.0549 - ChatCORE metric: 0.0988 Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio