PyTorch
English
nanogpt
custom_code
Eval Results
burtenshaw HF Staff commited on
Commit
22fbb6a
·
verified ·
1 Parent(s): f12cf25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -1
README.md CHANGED
@@ -17,9 +17,85 @@ model-index:
17
  split: test
18
  metrics:
19
  - type: acc
20
- value: 4.32
21
  name: accuracy
22
  source:
23
  url: https://github.com/karpathy/nanochat
24
  name: nanochat
25
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  split: test
18
  metrics:
19
  - type: acc
20
+ value: 9.7
21
  name: accuracy
22
  source:
23
  url: https://github.com/karpathy/nanochat
24
  name: nanochat
25
  ---
26
+
27
+ # NanoChat SFT
28
+
29
+ This is the RL trained checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat).
30
+
31
+ ## Usage
32
+
33
+ ```python
34
+ import torch
35
+ from transformers import AutoModelForCausalLM, AutoTokenizer
36
+
37
+
38
+ model_name = "nanochat-students/rl-d20"
39
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
40
+
41
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
42
+ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device)
43
+ model.eval()
44
+
45
+ conversation = [
46
+ {"role": "user", "content": "Hello, who are you?"},
47
+ ]
48
+ rendered = tokenizer.apply_chat_template(
49
+ conversation,
50
+ tokenize=False,
51
+ add_generation_prompt=True,
52
+ )
53
+ model_inputs = tokenizer([rendered], return_tensors="pt").to(model.device)
54
+
55
+ generated = model.generate(**model_inputs, max_new_tokens=256)
56
+ output_ids = generated[0, model_inputs.input_ids.shape[1]:]
57
+ print(tokenizer.decode(output_ids, skip_special_tokens=True))
58
+ ```
59
+
60
+
61
+ ## Chat RL Training Metrics
62
+
63
+ timestamp: 2025-10-15 12:59:52
64
+
65
+ - run: burtenshaw-20251015111354
66
+ - source: sft
67
+ - dtype: bfloat16
68
+ - device_batch_size: 8
69
+ - examples_per_step: 16
70
+ - num_samples: 16
71
+ - max_new_tokens: 256
72
+ - temperature: 1.0000
73
+ - top_k: 50
74
+ - unembedding_lr: 0.0040
75
+ - embedding_lr: 0.2000
76
+ - matrix_lr: 0.0200
77
+ - weight_decay: 0.0000
78
+ - init_lr_frac: 0.0500
79
+ - num_epochs: 1
80
+ - save_every: 60
81
+ - eval_every: 60
82
+ - eval_examples: 400
83
+
84
+ ## Chat evaluation RL
85
+
86
+ timestamp: 2025-10-15 13:04:39
87
+
88
+ - source: rl
89
+ - task_name: GSM8K
90
+ - dtype: bfloat16
91
+ - temperature: 0.0000
92
+ - max_new_tokens: 512
93
+ - num_samples: 1
94
+ - top_k: 50
95
+ - batch_size: 8
96
+ - model_tag: None
97
+ - step: None
98
+ - max_problems: None
99
+ - GSM8K: 0.0970
100
+
101
+ Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio