|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- karpathy/fineweb-edu-100b-shuffle |
|
language: |
|
- en |
|
model-index: |
|
- name: chat-d10 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
metrics: |
|
- type: acc |
|
value: 9.7 |
|
name: accuracy |
|
source: |
|
url: https://github.com/karpathy/nanochat |
|
name: nanochat |
|
--- |
|
|
|
# NanoChat SFT |
|
|
|
This is the RL trained checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat). |
|
|
|
## Usage |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_name = "nanochat-students/rl-d20" |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device) |
|
model.eval() |
|
|
|
conversation = [ |
|
{"role": "user", "content": "Hello, who are you?"}, |
|
] |
|
rendered = tokenizer.apply_chat_template( |
|
conversation, |
|
tokenize=False, |
|
add_generation_prompt=True, |
|
) |
|
model_inputs = tokenizer([rendered], return_tensors="pt").to(model.device) |
|
|
|
generated = model.generate(**model_inputs, max_new_tokens=256) |
|
output_ids = generated[0, model_inputs.input_ids.shape[1]:] |
|
print(tokenizer.decode(output_ids, skip_special_tokens=True)) |
|
``` |
|
|
|
|
|
## Chat RL Training Metrics |
|
|
|
timestamp: 2025-10-15 12:59:52 |
|
|
|
- run: burtenshaw-20251015111354 |
|
- source: sft |
|
- dtype: bfloat16 |
|
- device_batch_size: 8 |
|
- examples_per_step: 16 |
|
- num_samples: 16 |
|
- max_new_tokens: 256 |
|
- temperature: 1.0000 |
|
- top_k: 50 |
|
- unembedding_lr: 0.0040 |
|
- embedding_lr: 0.2000 |
|
- matrix_lr: 0.0200 |
|
- weight_decay: 0.0000 |
|
- init_lr_frac: 0.0500 |
|
- num_epochs: 1 |
|
- save_every: 60 |
|
- eval_every: 60 |
|
- eval_examples: 400 |
|
|
|
## Chat evaluation RL |
|
|
|
timestamp: 2025-10-15 13:04:39 |
|
|
|
- source: rl |
|
- task_name: GSM8K |
|
- dtype: bfloat16 |
|
- temperature: 0.0000 |
|
- max_new_tokens: 512 |
|
- num_samples: 1 |
|
- top_k: 50 |
|
- batch_size: 8 |
|
- model_tag: None |
|
- step: None |
|
- max_problems: None |
|
- GSM8K: 0.0970 |
|
|
|
Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio |
|
|