nanochat-students
/

rl-d20

Model card Files Files and versions

rl-d20 / README.md

burtenshaw's picture

burtenshaw HF Staff

Update README.md

22fbb6a verified 1 day ago

|

history blame contribute delete

2.28 kB

	---
	license: apache-2.0
	datasets:
	- karpathy/fineweb-edu-100b-shuffle
	language:
	- en
	model-index:
	- name: chat-d10
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	metrics:
	- type: acc
	value: 9.7
	name: accuracy
	source:
	url: https://github.com/karpathy/nanochat
	name: nanochat
	---

	# NanoChat SFT

	This is the RL trained checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat).

	## Usage

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer


	model_name = "nanochat-students/rl-d20"
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device)
	model.eval()

	conversation = [
	{"role": "user", "content": "Hello, who are you?"},
	]
	rendered = tokenizer.apply_chat_template(
	conversation,
	tokenize=False,
	add_generation_prompt=True,
	)
	model_inputs = tokenizer([rendered], return_tensors="pt").to(model.device)

	generated = model.generate(**model_inputs, max_new_tokens=256)
	output_ids = generated[0, model_inputs.input_ids.shape[1]:]
	print(tokenizer.decode(output_ids, skip_special_tokens=True))
	```


	## Chat RL Training Metrics

	timestamp: 2025-10-15 12:59:52

	- run: burtenshaw-20251015111354
	- source: sft
	- dtype: bfloat16
	- device_batch_size: 8
	- examples_per_step: 16
	- num_samples: 16
	- max_new_tokens: 256
	- temperature: 1.0000
	- top_k: 50
	- unembedding_lr: 0.0040
	- embedding_lr: 0.2000
	- matrix_lr: 0.0200
	- weight_decay: 0.0000
	- init_lr_frac: 0.0500
	- num_epochs: 1
	- save_every: 60
	- eval_every: 60
	- eval_examples: 400

	## Chat evaluation RL

	timestamp: 2025-10-15 13:04:39

	- source: rl
	- task_name: GSM8K
	- dtype: bfloat16
	- temperature: 0.0000
	- max_new_tokens: 512
	- num_samples: 1
	- top_k: 50
	- batch_size: 8
	- model_tag: None
	- step: None
	- max_problems: None
	- GSM8K: 0.0970

	Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio