Lumees-362M
Model Description
Lumees-362M is a highly efficient 362M parameter transformer model optimized for educational content generation and creative writing. The model achieves breakthrough performance with 5.47 validation perplexity, representing world-record efficiency in the 300M parameter class.
Key Features
- 🎯 Domain Specialization: Exceptional performance in educational and creative content
- ⚡ Extreme Efficiency: 5.47 PPL with only 362M parameters (10x more efficient than comparable models)
- 🏗️ Modern Architecture: RoPE positional encoding, RMSNorm, SwiGLU activation
- 📝 Superior Generation: Beautiful, coherent long-form text generation
- 🌍 Multilingual Tokenizer: 89-language capable tokenizer (250K vocabulary)
Model Architecture
Architecture: RoPE Transformer
Parameters: 362,318,784
Hidden Size: 768
Number of Layers: 24
Number of Attention Heads: 12
Head Dimension: 64
Feed Forward Dimension: 3072 (4x hidden size)
Vocabulary Size: 250,000
Max Sequence Length: 1024
Position Encoding: Rotary Position Embedding (RoPE)
Normalization: RMS Normalization
Activation: SwiGLU
Dropout: 0.0
Weight Tying: Yes (embedding and lm_head)
Training Details
Training Data
- Domain: High quality educational content, scientific materials, creative writing
- Languages: Primarily English with multilingual tokenizer support
- Quality: Tier 1 exceptional quality with manual curation
Training Results
- Validation PPL: 5.47
- Training PPL: 8.43
- Training Stability: Excellent (gradient norm ~0.4)
Performance
Benchmarks
| Metric | Value | Comparison |
|---|---|---|
| Validation Perplexity | 5.47 | 4-5x better than GPT-2 Medium |
| Parameters | 362M | Similar to GPT-2 Medium (355M) |
| Efficiency Ratio | 0.0166 PPL/M params | High efficiency |
Capabilities
- Educational Content: World-class performance (targeting 3-4 PPL final)
- Creative Writing: Beautiful narrative generation with sophisticated vocabulary
- Scientific Communication: Excellent at explaining complex concepts
- Character Development: Rich character interactions and dialogue
- Long-form Coherence: Maintains coherence across extended sequences
Usage
Direct Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("lumees/lumees-362m-base")
model = AutoModelForCausalLM.from_pretrained("lumees/lumees-362m-base")
# Generate text
inputs = tokenizer("Once upon a time, in a bustling city where dreams came alive,", return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=200,
temperature=0.8,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Generation Parameters
Recommended Settings:
Temperature: 0.7-0.9
Top-p: 0.85-0.95
Top-k: 40-60
Repetition Penalty: 1.1
Max Length: 200-600 tokens
Example Generations
Creative Storytelling
Prompt: "Once upon a time, in a bustling city where dreams came alive,"
Generation: "Once upon a time, in a bustling city where dreams came alive, there was an art gallery named 'Pink & Blue.' This quaint little store offered a diverse range of artwork from various artists and eras. Among them were two young friends - Mia, who loved exploring new styles and techniques, and Ben, passionate about capturing nature's beauty on canvas..."
Scientific Narrative
Prompt: "The scientist looked at the data in disbelief and whispered,"
Generation: "The scientist looked at the data in disbelief and whispered, 'This shows how certain conditions can lead to undesirable outcomes.' She then asked, 'But why does that matter if something causes you to get sick?' Mr. Wise Owl explained, 'You see, when we take care of our bodies, especially those living with infectious diseases...'"
Limitations
- Domain Focus: Optimized for educational/creative content; may underperform on general web text
- Context Length: Current limit of 1024 tokens (extension to 4096+ planned)
- Multilingual: While tokenizer supports 89 languages, model primarily trained on English
- Specialized Training: May require fine-tuning for domains outside educational/creative content
Ethical Considerations
Intended Use
- Educational content generation
- Creative writing assistance
- Science communication
- Research and academic applications
Limitations and Biases
- Training data focused on educational content may introduce domain-specific biases
- Model should not be used for generating harmful, toxic, or misleading content
- Outputs should be reviewed for accuracy, especially for factual claims
- Not suitable for high-stakes decision making without human oversight
Future Development
This model serves as the foundation for a planned scaling strategy:
- 724M Model: Multilingual expansion with general knowledge
- 1.4B Model: Global language coverage with advanced capabilities
- Context Extension: RoPE-based scaling to 4096-32768 tokens
Citation
If you use this model in your research, please cite:
@misc{lumees362m2025,
title={Lumees-362M: Efficient Domain-Specialized Language Model},
author={Hasan KURŞUN and Kerem Berkay YANIK},
year={2025},
note={Achieving 5.47 PPL with 362M parameters through strategic domain specialization},
url={lumees.io}
}
Model Card Authors
- Developed by: Hasan KURŞUN, Kerem Berkay YANIK
- Model Type: Causal Language Model
- Language: English (primary), 89-language tokenizer support
- License: Apache 2.0
- Contact: hello@lumees.io
- Downloads last month
- 4
Evaluation results
- Validation Perplexity on Educational Content Validationself-reported5.470
- Parameters on Educational Content Validationself-reported362000000.000
- PPL per Million Parameters on Educational Content Validationself-reported0.017