TinyStories Llama Model
Model Description
This is a small Llama-architecture language model trained on the TinyStories dataset. The model is designed to generate simple, coherent children's stories using a vocabulary and concepts that a typical 3-4 year old would understand.
Model Architecture: Llama 2
Training Framework: PyTorch
Implementation: Based on llama2.c
Model Details
Architecture Hyperparameters
- Dimension: 288
- Number of Layers: 6
- Number of Attention Heads: 6
- Number of KV Heads: 6
- Vocabulary Size: 32,000 (Llama 2 tokenizer)
- Maximum Sequence Length: 256 tokens
- Dropout: 0.0
- Hidden Dimension Multiple: 32
Total Parameters: ~15M
Training Hyperparameters
- Batch Size: 128 (micro-batch)
- Gradient Accumulation Steps: 4
- Effective Batch Size: 512
- Learning Rate: 5e-4 (max)
- Learning Rate Schedule: Cosine decay with warmup
- Warmup Iterations: 1,000
- Total Training Iterations: 100,000
- Weight Decay: 0.1
- Beta1: 0.9
- Beta2: 0.95
- Gradient Clipping: 1.0
- Optimizer: AdamW
- Precision: bfloat16 (with mixed precision training)
Tokens per Iteration: ~65,536 (4 grad accum ร 1 process ร 64 batch ร 256 seq len)
Intended Use
This model is intended for:
- Generating simple children's stories
- Educational demonstrations of small-scale language model training
- Research into emergent capabilities in small language models
- Experimentation with efficient inference (e.g., pure C implementation)
Limitations
- Domain-Specific: The model is trained exclusively on simple stories and will not perform well on general text generation tasks
- Vocabulary: Limited to concepts and language appropriate for very young children
- Context Length: Maximum sequence length of 256 tokens limits story length
- No Instruction Following: This is a base model without instruction tuning
Training Data
The model was trained on the TinyStories dataset, which consists of short stories generated to contain only words that a typical 3-4 year old would understand. The dataset was created to study the capabilities of small language models.
Dataset Size: ~2.1M stories
Vocabulary: Words understandable by 3-4 year olds
Content: Simple narratives, common objects, basic emotions and actions
Example Outputs
Prompt: "Once upon a time, there was a little girl named Lily."
Generation (temperature=0.8, top_p=0.9):
She loved to play outside in the park. One day, she saw a big, red ball.
She wanted to play with it, but it was too high. Lily's mom said, "Let's
go get it together!" They worked together and got the ball down. Lily was
so happy! She played with the ball all day long.
Citation
If you use this model or the llama2.c implementation, please cite:
@misc{llama2c,
author = {Andrej Karpathy},
title = {llama2.c: Inference Llama 2 in one file of pure C},
year = {2023},
publisher = {GitHub},
url = {https://github.com/karpathy/llama2.c}
}
@article{eldan2023tinystories,
title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
author={Eldan, Ronen and Li, Yuanzhi},
journal={arXiv preprint arXiv:2305.07759},
year={2023}
}
License
MIT License - See the LICENSE file for details.
Acknowledgments
- Model architecture and training code adapted from llama2.c by Andrej Karpathy
- Trained on the TinyStories dataset by Ronen Eldan and Yuanzhi Li
- Based on the Llama 2 architecture by Meta AI
- Downloads last month
- 20