TinyStories Llama Model

Model Description

This is a small Llama-architecture language model trained on the TinyStories dataset. The model is designed to generate simple, coherent children's stories using a vocabulary and concepts that a typical 3-4 year old would understand.

Model Architecture: Llama 2
Training Framework: PyTorch
Implementation: Based on llama2.c

Model Details

Architecture Hyperparameters

Dimension: 288
Number of Layers: 6
Number of Attention Heads: 6
Number of KV Heads: 6
Vocabulary Size: 32,000 (Llama 2 tokenizer)
Maximum Sequence Length: 256 tokens
Dropout: 0.0
Hidden Dimension Multiple: 32

Total Parameters: ~15M

Training Hyperparameters

Batch Size: 128 (micro-batch)
Gradient Accumulation Steps: 4
Effective Batch Size: 512
Learning Rate: 5e-4 (max)
Learning Rate Schedule: Cosine decay with warmup
Warmup Iterations: 1,000
Total Training Iterations: 100,000
Weight Decay: 0.1
Beta1: 0.9
Beta2: 0.95
Gradient Clipping: 1.0
Optimizer: AdamW
Precision: bfloat16 (with mixed precision training)

Tokens per Iteration: ~65,536 (4 grad accum × 1 process × 64 batch × 256 seq len)

Intended Use

This model is intended for:

Generating simple children's stories
Educational demonstrations of small-scale language model training
Research into emergent capabilities in small language models
Experimentation with efficient inference (e.g., pure C implementation)

Limitations

Domain-Specific: The model is trained exclusively on simple stories and will not perform well on general text generation tasks
Vocabulary: Limited to concepts and language appropriate for very young children
Context Length: Maximum sequence length of 256 tokens limits story length
No Instruction Following: This is a base model without instruction tuning

Training Data

The model was trained on the TinyStories dataset, which consists of short stories generated to contain only words that a typical 3-4 year old would understand. The dataset was created to study the capabilities of small language models.

Dataset Size: ~2.1M stories
Vocabulary: Words understandable by 3-4 year olds
Content: Simple narratives, common objects, basic emotions and actions

Example Outputs

Prompt: "Once upon a time, there was a little girl named Lily."

Generation (temperature=0.8, top_p=0.9):

She loved to play outside in the park. One day, she saw a big, red ball. 
She wanted to play with it, but it was too high. Lily's mom said, "Let's 
go get it together!" They worked together and got the ball down. Lily was 
so happy! She played with the ball all day long.

Citation

If you use this model or the llama2.c implementation, please cite:

@misc{llama2c,
  author = {Andrej Karpathy},
  title = {llama2.c: Inference Llama 2 in one file of pure C},
  year = {2023},
  publisher = {GitHub},
  url = {https://github.com/karpathy/llama2.c}
}

@article{eldan2023tinystories,
  title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
  author={Eldan, Ronen and Li, Yuanzhi},
  journal={arXiv preprint arXiv:2305.07759},
  year={2023}
}

License

MIT License - See the LICENSE file for details.

Acknowledgments

Model architecture and training code adapted from llama2.c by Andrej Karpathy
Trained on the TinyStories dataset by Ronen Eldan and Yuanzhi Li
Based on the Llama 2 architecture by Meta AI

Downloads last month: 20

Safetensors

Model size

15.2M params

Tensor type

F32

sdobson
/

tinystories-llama-15m