Lumees-362M

Model Description

Lumees-362M is a highly efficient 362M parameter transformer model optimized for educational content generation and creative writing. The model achieves breakthrough performance with 5.47 validation perplexity, representing world-record efficiency in the 300M parameter class.

Key Features

  • 🎯 Domain Specialization: Exceptional performance in educational and creative content
  • ⚡ Extreme Efficiency: 5.47 PPL with only 362M parameters (10x more efficient than comparable models)
  • 🏗️ Modern Architecture: RoPE positional encoding, RMSNorm, SwiGLU activation
  • 📝 Superior Generation: Beautiful, coherent long-form text generation
  • 🌍 Multilingual Tokenizer: 89-language capable tokenizer (250K vocabulary)

Model Architecture

Architecture: RoPE Transformer
Parameters: 362,318,784
Hidden Size: 768
Number of Layers: 24
Number of Attention Heads: 12
Head Dimension: 64
Feed Forward Dimension: 3072 (4x hidden size)
Vocabulary Size: 250,000
Max Sequence Length: 1024
Position Encoding: Rotary Position Embedding (RoPE)
Normalization: RMS Normalization
Activation: SwiGLU
Dropout: 0.0
Weight Tying: Yes (embedding and lm_head)

Training Details

Training Data

  • Domain: High quality educational content, scientific materials, creative writing
  • Languages: Primarily English with multilingual tokenizer support
  • Quality: Tier 1 exceptional quality with manual curation

Training Results

  • Validation PPL: 5.47
  • Training PPL: 8.43
  • Training Stability: Excellent (gradient norm ~0.4)

Performance

Benchmarks

Metric Value Comparison
Validation Perplexity 5.47 4-5x better than GPT-2 Medium
Parameters 362M Similar to GPT-2 Medium (355M)
Efficiency Ratio 0.0166 PPL/M params High efficiency

Capabilities

  • Educational Content: World-class performance (targeting 3-4 PPL final)
  • Creative Writing: Beautiful narrative generation with sophisticated vocabulary
  • Scientific Communication: Excellent at explaining complex concepts
  • Character Development: Rich character interactions and dialogue
  • Long-form Coherence: Maintains coherence across extended sequences

Usage

Direct Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lumees/lumees-362m-base")
model = AutoModelForCausalLM.from_pretrained("lumees/lumees-362m-base")

# Generate text
inputs = tokenizer("Once upon a time, in a bustling city where dreams came alive,", return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_length=200,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Generation Parameters

Recommended Settings:
  Temperature: 0.7-0.9
  Top-p: 0.85-0.95
  Top-k: 40-60
  Repetition Penalty: 1.1
  Max Length: 200-600 tokens

Example Generations

Creative Storytelling

Prompt: "Once upon a time, in a bustling city where dreams came alive,"

Generation: "Once upon a time, in a bustling city where dreams came alive, there was an art gallery named 'Pink & Blue.' This quaint little store offered a diverse range of artwork from various artists and eras. Among them were two young friends - Mia, who loved exploring new styles and techniques, and Ben, passionate about capturing nature's beauty on canvas..."

Scientific Narrative

Prompt: "The scientist looked at the data in disbelief and whispered,"

Generation: "The scientist looked at the data in disbelief and whispered, 'This shows how certain conditions can lead to undesirable outcomes.' She then asked, 'But why does that matter if something causes you to get sick?' Mr. Wise Owl explained, 'You see, when we take care of our bodies, especially those living with infectious diseases...'"

Limitations

  • Domain Focus: Optimized for educational/creative content; may underperform on general web text
  • Context Length: Current limit of 1024 tokens (extension to 4096+ planned)
  • Multilingual: While tokenizer supports 89 languages, model primarily trained on English
  • Specialized Training: May require fine-tuning for domains outside educational/creative content

Ethical Considerations

Intended Use

  • Educational content generation
  • Creative writing assistance
  • Science communication
  • Research and academic applications

Limitations and Biases

  • Training data focused on educational content may introduce domain-specific biases
  • Model should not be used for generating harmful, toxic, or misleading content
  • Outputs should be reviewed for accuracy, especially for factual claims
  • Not suitable for high-stakes decision making without human oversight

Future Development

This model serves as the foundation for a planned scaling strategy:

  • 724M Model: Multilingual expansion with general knowledge
  • 1.4B Model: Global language coverage with advanced capabilities
  • Context Extension: RoPE-based scaling to 4096-32768 tokens

Citation

If you use this model in your research, please cite:

@misc{lumees362m2025,
  title={Lumees-362M: Efficient Domain-Specialized Language Model},
  author={Hasan KURŞUN and Kerem Berkay YANIK},
  year={2025},
  note={Achieving 5.47 PPL with 362M parameters through strategic domain specialization},
  url={lumees.io}
}

Model Card Authors

  • Developed by: Hasan KURŞUN, Kerem Berkay YANIK
  • Model Type: Causal Language Model
  • Language: English (primary), 89-language tokenizer support
  • License: Apache 2.0
  • Contact: hello@lumees.io

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

  • Validation Perplexity on Educational Content Validation
    self-reported
    5.470
  • Parameters on Educational Content Validation
    self-reported
    362000000.000
  • PPL per Million Parameters on Educational Content Validation
    self-reported
    0.017