--- language: - en license: mit tags: - text-generation - llama - tinystories - storytelling datasets: - roneneldan/TinyStories widget: - text: "Once upon a time, there was a" example_title: "Story Beginning" - text: "One day, Lily met a" example_title: "Character Introduction" - text: "The little boy was very happy because" example_title: "Story Continuation" --- # TinyStories Llama Model ## Model Description This is a small Llama-architecture language model trained on the [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories). The model is designed to generate simple, coherent children's stories using a vocabulary and concepts that a typical 3-4 year old would understand. **Model Architecture:** Llama 2 **Training Framework:** PyTorch **Implementation:** Based on [llama2.c](https://github.com/karpathy/llama2.c) ## Model Details ### Architecture Hyperparameters - **Dimension:** 288 - **Number of Layers:** 6 - **Number of Attention Heads:** 6 - **Number of KV Heads:** 6 - **Vocabulary Size:** 32,000 (Llama 2 tokenizer) - **Maximum Sequence Length:** 256 tokens - **Dropout:** 0.0 - **Hidden Dimension Multiple:** 32 **Total Parameters:** ~15M ### Training Hyperparameters - **Batch Size:** 128 (micro-batch) - **Gradient Accumulation Steps:** 4 - **Effective Batch Size:** 512 - **Learning Rate:** 5e-4 (max) - **Learning Rate Schedule:** Cosine decay with warmup - **Warmup Iterations:** 1,000 - **Total Training Iterations:** 100,000 - **Weight Decay:** 0.1 - **Beta1:** 0.9 - **Beta2:** 0.95 - **Gradient Clipping:** 1.0 - **Optimizer:** AdamW - **Precision:** bfloat16 (with mixed precision training) **Tokens per Iteration:** ~65,536 (4 grad accum × 1 process × 64 batch × 256 seq len) ## Intended Use This model is intended for: - Generating simple children's stories - Educational demonstrations of small-scale language model training - Research into emergent capabilities in small language models - Experimentation with efficient inference (e.g., pure C implementation) ## Limitations - **Domain-Specific:** The model is trained exclusively on simple stories and will not perform well on general text generation tasks - **Vocabulary:** Limited to concepts and language appropriate for very young children - **Context Length:** Maximum sequence length of 256 tokens limits story length - **No Instruction Following:** This is a base model without instruction tuning ## Training Data The model was trained on the [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories), which consists of short stories generated to contain only words that a typical 3-4 year old would understand. The dataset was created to study the capabilities of small language models. **Dataset Size:** ~2.1M stories **Vocabulary:** Words understandable by 3-4 year olds **Content:** Simple narratives, common objects, basic emotions and actions ## Example Outputs **Prompt:** "Once upon a time, there was a little girl named Lily." **Generation (temperature=0.8, top_p=0.9):** ``` She loved to play outside in the park. One day, she saw a big, red ball. She wanted to play with it, but it was too high. Lily's mom said, "Let's go get it together!" They worked together and got the ball down. Lily was so happy! She played with the ball all day long. ``` ## Citation If you use this model or the llama2.c implementation, please cite: ```bibtex @misc{llama2c, author = {Andrej Karpathy}, title = {llama2.c: Inference Llama 2 in one file of pure C}, year = {2023}, publisher = {GitHub}, url = {https://github.com/karpathy/llama2.c} } @article{eldan2023tinystories, title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?}, author={Eldan, Ronen and Li, Yuanzhi}, journal={arXiv preprint arXiv:2305.07759}, year={2023} } ``` ## License MIT License - See the [LICENSE](LICENSE) file for details. ## Acknowledgments - Model architecture and training code adapted from [llama2.c](https://github.com/karpathy/llama2.c) by Andrej Karpathy - Trained on the [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories) by Ronen Eldan and Yuanzhi Li - Based on the Llama 2 architecture by Meta AI