Update README.md

82fce22 verified 4 months ago

5.51 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- roneneldan/TinyStories
	language:
	- en
	---

	# Tiny Recursive Model (TRM)

	A compact language model featuring a recursive architecture designed for efficient text generation. This model uses a custom `TinyRecursiveModel` class with a ~7M parameter logic core [1].

	## Model Details

	- Model Type: Causal Language Model with Custom Recursive Architecture
	- Parameters: ~40.21M total parameters (7.39M logic core, 32.82M vocabulary)
	- Architecture: 3 physical layers, 8 recursive loops, 8 attention heads [1]
	- Vocabulary Size: 50,257 tokens
	- Context Length: 1024 tokens
	- Embedding Dimension: 512

	## ⚠️ Important: Custom Model Class

	This model uses a custom `TinyRecursiveModel` class that is not part of the standard transformers library [1]. You must use `trust_remote_code=True` when loading the model.

	## Installation Requirements

	```bash
	pip install transformers torch
	```

	## Usage

	### Method 1: Using trust_remote_code (Recommended)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Load the model and tokenizer (MUST use trust_remote_code=True)
	model_name = "ainz/tiny-recursive-model"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	trust_remote_code=True # Required for custom model class
	)

	# Generate text
	input_text = "Once upon a time"
	inputs = tokenizer(input_text, return_tensors="pt")

	with torch.no_grad():
	outputs = model.generate(
	inputs["input_ids"],
	max_length=100,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	pad_token_id=tokenizer.eos_token_id
	)

	generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(generated_text)
	```

	### Method 2: Manual Class Loading

	If you prefer not to use `trust_remote_code`, you can manually download and use the model files:

	```python
	import torch
	from huggingface_hub import hf_hub_download

	# Download the model files
	model_path = hf_hub_download(repo_id="ainz/tiny-recursive-model", filename="pytorch_model.bin")
	config_path = hf_hub_download(repo_id="ainz/tiny-recursive-model", filename="config.json")

	# You'll need to copy the TinyRecursiveModel class definition locally
	# Then load manually:
	# model = TinyRecursiveModel.from_pretrained("ainz/tiny-recursive-model")
	```

	### Batch Generation Example

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	# Load model with trust_remote_code
	tokenizer = AutoTokenizer.from_pretrained("ainz/tiny-recursive-model")
	model = AutoModelForCausalLM.from_pretrained(
	"ainz/tiny-recursive-model",
	trust_remote_code=True
	)

	# Generate for multiple prompts
	prompts = [
	"The future of artificial intelligence",
	"In a distant galaxy",
	"The secret to happiness"
	]

	inputs = tokenizer(prompts, return_tensors="pt", padding=True, truncation=True)

	with torch.no_grad():
	outputs = model.generate(
	inputs["input_ids"],
	attention_mask=inputs["attention_mask"],
	max_length=80,
	do_sample=True,
	temperature=0.7,
	pad_token_id=tokenizer.eos_token_id
	)

	for i, output in enumerate(outputs):
	text = tokenizer.decode(output, skip_special_tokens=True)
	print(f"Prompt {i+1}: {text}\n")
	```

	### Advanced Generation Parameters

	```python
	# More creative generation
	outputs = model.generate(
	inputs["input_ids"],
	max_length=150,
	do_sample=True,
	temperature=0.8, # Higher = more creative
	top_k=50, # Consider top 50 tokens
	top_p=0.95, # Nucleus sampling
	repetition_penalty=1.1, # Reduce repetition
	pad_token_id=tokenizer.eos_token_id
	)

	# Deterministic generation
	outputs = model.generate(
	inputs["input_ids"],
	max_length=100,
	do_sample=False, # Greedy decoding
	pad_token_id=tokenizer.eos_token_id
	)
	```

	## Architecture Overview

	This model implements a novel recursive architecture where layers are reused multiple times through loops [1]. Key features:

	- Recursive Layers: 3 physical transformer layers recursively applied 8 times
	- Parameter Efficiency: Achieves 7.39M logic parameters through recursive design
	- Custom Implementation: Uses `TinyRecursiveModel` class with `TRMConfig`

	## Model Performance

	Training completed with:
	- Final Training Loss: ~2.0
	- Training Steps: 7,032 (1 epoch)
	- Parameter Breakdown: 7.39M logic core + 32.82M vocabulary

	## Security Note

	This model requires `trust_remote_code=True` because it uses custom model architecture code. Only use this if you trust the model source.

	## Troubleshooting

	Error loading model?
	- Make sure you're using `trust_remote_code=True`
	- Ensure you have the latest transformers version: `pip install --upgrade transformers`

	Generation issues?
	- The model is relatively small (7.39M logic parameters) - adjust temperature and sampling parameters
	- Try different prompt formats for better results

	## Limitations

	- Small model size (~7M logic parameters) may limit performance compared to larger models
	- Custom architecture requires `trust_remote_code=True`
	- Best suited for creative writing and simple text completion tasks

	## Citation

	```bibtex
	@model{tiny_recursive_model_2024,
	author = {ainz},
	title = {Tiny Recursive Model},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/ainz/tiny-recursive-model}
	}
	```