Lumees-362M

Model Description

Lumees-362M is a highly efficient 362M parameter transformer model optimized for educational content generation and creative writing. The model achieves breakthrough performance with 5.47 validation perplexity, representing world-record efficiency in the 300M parameter class.

Key Features

🎯 Domain Specialization: Exceptional performance in educational and creative content
⚡ Extreme Efficiency: 5.47 PPL with only 362M parameters (10x more efficient than comparable models)
🏗️ Modern Architecture: RoPE positional encoding, RMSNorm, SwiGLU activation
📝 Superior Generation: Beautiful, coherent long-form text generation
🌍 Multilingual Tokenizer: 89-language capable tokenizer (250K vocabulary)

Model Architecture

Architecture: RoPE Transformer
Parameters: 362,318,784
Hidden Size: 768
Number of Layers: 24
Number of Attention Heads: 12
Head Dimension: 64
Feed Forward Dimension: 3072 (4x hidden size)
Vocabulary Size: 250,000
Max Sequence Length: 1024
Position Encoding: Rotary Position Embedding (RoPE)
Normalization: RMS Normalization
Activation: SwiGLU
Dropout: 0.0
Weight Tying: Yes (embedding and lm_head)

Training Details

Training Data

Domain: High quality educational content, scientific materials, creative writing
Languages: Primarily English with multilingual tokenizer support
Quality: Tier 1 exceptional quality with manual curation

Training Results

Validation PPL: 5.47
Training PPL: 8.43
Training Stability: Excellent (gradient norm ~0.4)

Performance

Benchmarks

Metric	Value	Comparison
Validation Perplexity	5.47	4-5x better than GPT-2 Medium
Parameters	362M	Similar to GPT-2 Medium (355M)
Efficiency Ratio	0.0166 PPL/M params	High efficiency

Capabilities

Educational Content: World-class performance (targeting 3-4 PPL final)
Creative Writing: Beautiful narrative generation with sophisticated vocabulary
Scientific Communication: Excellent at explaining complex concepts
Character Development: Rich character interactions and dialogue
Long-form Coherence: Maintains coherence across extended sequences

Usage

Direct Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lumees/lumees-362m-base")
model = AutoModelForCausalLM.from_pretrained("lumees/lumees-362m-base")

# Generate text
inputs = tokenizer("Once upon a time, in a bustling city where dreams came alive,", return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_length=200,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Generation Parameters

Recommended Settings:
  Temperature: 0.7-0.9
  Top-p: 0.85-0.95
  Top-k: 40-60
  Repetition Penalty: 1.1
  Max Length: 200-600 tokens

Example Generations

Creative Storytelling

Prompt: "Once upon a time, in a bustling city where dreams came alive,"

Generation: "Once upon a time, in a bustling city where dreams came alive, there was an art gallery named 'Pink & Blue.' This quaint little store offered a diverse range of artwork from various artists and eras. Among them were two young friends - Mia, who loved exploring new styles and techniques, and Ben, passionate about capturing nature's beauty on canvas..."

Scientific Narrative

Prompt: "The scientist looked at the data in disbelief and whispered,"

Generation: "The scientist looked at the data in disbelief and whispered, 'This shows how certain conditions can lead to undesirable outcomes.' She then asked, 'But why does that matter if something causes you to get sick?' Mr. Wise Owl explained, 'You see, when we take care of our bodies, especially those living with infectious diseases...'"

Limitations

Domain Focus: Optimized for educational/creative content; may underperform on general web text
Context Length: Current limit of 1024 tokens (extension to 4096+ planned)
Multilingual: While tokenizer supports 89 languages, model primarily trained on English
Specialized Training: May require fine-tuning for domains outside educational/creative content

Ethical Considerations

Intended Use

Educational content generation
Creative writing assistance
Science communication
Research and academic applications

Limitations and Biases

Training data focused on educational content may introduce domain-specific biases
Model should not be used for generating harmful, toxic, or misleading content
Outputs should be reviewed for accuracy, especially for factual claims
Not suitable for high-stakes decision making without human oversight

Future Development

This model serves as the foundation for a planned scaling strategy:

724M Model: Multilingual expansion with general knowledge
1.4B Model: Global language coverage with advanced capabilities
Context Extension: RoPE-based scaling to 4096-32768 tokens

Citation

If you use this model in your research, please cite:

@misc{lumees362m2025,
  title={Lumees-362M: Efficient Domain-Specialized Language Model},
  author={Hasan KURŞUN and Kerem Berkay YANIK},
  year={2025},
  note={Achieving 5.47 PPL with 362M parameters through strategic domain specialization},
  url={lumees.io}
}

Model Card Authors

Developed by: Hasan KURŞUN, Kerem Berkay YANIK
Model Type: Causal Language Model
Language: English (primary), 89-language tokenizer support
License: Apache 2.0
Contact: hello@lumees.io

Downloads last month: 4

Evaluation results

Validation Perplexity on Educational Content Validation
self-reported

5.470
Parameters on Educational Content Validation
self-reported

362000000.000
PPL per Million Parameters on Educational Content Validation
self-reported

0.017

View on Papers With Code