nanochat-students
/

rl-d20

Model card Files Files and versions

burtenshaw HF Staff commited on 1 day ago

Commit

22fbb6a

·

verified ·

1 Parent(s): f12cf25

Update README.md

Files changed (1) hide show

README.md +77 -1

README.md CHANGED Viewed

@@ -17,9 +17,85 @@ model-index:
       split: test
     metrics:
     - type: acc
-      value: 4.32
       name: accuracy
     source:
       url: https://github.com/karpathy/nanochat
       name: nanochat
 ---

       split: test
     metrics:
     - type: acc
+      value: 9.7
       name: accuracy
     source:
       url: https://github.com/karpathy/nanochat
       name: nanochat
 ---
+# NanoChat SFT
+This is the RL trained checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat).
+## Usage
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "nanochat-students/rl-d20"
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device)
+model.eval()
+conversation = [
+    {"role": "user", "content": "Hello, who are you?"},
+]
+rendered = tokenizer.apply_chat_template(
+    conversation,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer([rendered], return_tensors="pt").to(model.device)
+generated = model.generate(**model_inputs, max_new_tokens=256)
+output_ids = generated[0, model_inputs.input_ids.shape[1]:]
+print(tokenizer.decode(output_ids, skip_special_tokens=True))
+```
+## Chat RL Training Metrics
+timestamp: 2025-10-15 12:59:52
+- run: burtenshaw-20251015111354
+- source: sft
+- dtype: bfloat16
+- device_batch_size: 8
+- examples_per_step: 16
+- num_samples: 16
+- max_new_tokens: 256
+- temperature: 1.0000
+- top_k: 50
+- unembedding_lr: 0.0040
+- embedding_lr: 0.2000
+- matrix_lr: 0.0200
+- weight_decay: 0.0000
+- init_lr_frac: 0.0500
+- num_epochs: 1
+- save_every: 60
+- eval_every: 60
+- eval_examples: 400
+## Chat evaluation RL
+timestamp: 2025-10-15 13:04:39
+- source: rl
+- task_name: GSM8K
+- dtype: bfloat16
+- temperature: 0.0000
+- max_new_tokens: 512
+- num_samples: 1
+- top_k: 50
+- batch_size: 8
+- model_tag: None
+- step: None
+- max_problems: None
+- GSM8K: 0.0970
+Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio