PyTorch
English
nanogpt
custom_code
Eval Results
rl-d20 / README.md
burtenshaw's picture
burtenshaw HF Staff
Update README.md
22fbb6a verified
---
license: apache-2.0
datasets:
- karpathy/fineweb-edu-100b-shuffle
language:
- en
model-index:
- name: chat-d10
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
metrics:
- type: acc
value: 9.7
name: accuracy
source:
url: https://github.com/karpathy/nanochat
name: nanochat
---
# NanoChat SFT
This is the RL trained checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat).
## Usage
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "nanochat-students/rl-d20"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device)
model.eval()
conversation = [
{"role": "user", "content": "Hello, who are you?"},
]
rendered = tokenizer.apply_chat_template(
conversation,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([rendered], return_tensors="pt").to(model.device)
generated = model.generate(**model_inputs, max_new_tokens=256)
output_ids = generated[0, model_inputs.input_ids.shape[1]:]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
```
## Chat RL Training Metrics
timestamp: 2025-10-15 12:59:52
- run: burtenshaw-20251015111354
- source: sft
- dtype: bfloat16
- device_batch_size: 8
- examples_per_step: 16
- num_samples: 16
- max_new_tokens: 256
- temperature: 1.0000
- top_k: 50
- unembedding_lr: 0.0040
- embedding_lr: 0.2000
- matrix_lr: 0.0200
- weight_decay: 0.0000
- init_lr_frac: 0.0500
- num_epochs: 1
- save_every: 60
- eval_every: 60
- eval_examples: 400
## Chat evaluation RL
timestamp: 2025-10-15 13:04:39
- source: rl
- task_name: GSM8K
- dtype: bfloat16
- temperature: 0.0000
- max_new_tokens: 512
- num_samples: 1
- top_k: 50
- batch_size: 8
- model_tag: None
- step: None
- max_problems: None
- GSM8K: 0.0970
Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio