Dense-5L-ArXiv-Code-SimpleStories

Model Description

This is a 5-layer dense transformer model trained on a combination of ArXiv papers, code repositories, and SimpleStories dataset. The model uses a standard transformer architecture optimized for causal language modeling tasks.

Model Details

Architecture

  • Model Type: Dense Transformer for Causal Language Modeling
  • Architecture: DenseTransformerForCausalLM
  • Parameters: ~50M parameters
  • Layers: 5 transformer layers
  • Hidden Size: 768
  • Attention Heads: 12 (with 8 key-value heads for efficiency)
  • Vocabulary Size: 50,256 tokens
  • Max Sequence Length: 1024 tokens
  • Context Window: 512 tokens (with windowing support)

Training Details

  • Training Data: ArXiv papers, code repositories, and SimpleStories
  • Training Epochs: 1
  • Batch Size: 256
  • Learning Rate: 1e-3
  • Optimizer: AdamW (β1=0.9, β2=0.999)
  • Dropout: 0.1 (attention and hidden layers)
  • Normalization: RMSNorm (ε=1e-6)

Model Features

  • Rotary Position Embeddings: For better handling of positional information
  • Group Query Attention: Efficient attention with 12 query heads and 8 key-value heads
  • SwiGLU Activation: Modern activation function in feed-forward layers
  • RMSNorm: Layer normalization for improved training stability

Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "your-username/dense-5l-arxiv-code-simplestories"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float32,
    device_map="auto"
)

Text Generation

# Generate text
prompt = "The fundamental theorem of calculus states that"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=200,
        num_return_sequences=1,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Code Generation

# Generate Python code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=150,
        temperature=0.2,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)

Intended Use

Primary Use Cases

  • Research: Academic research in natural language processing and code generation
  • Educational: Learning about transformer architectures and language modeling
  • Prototyping: Building applications that require text and code generation capabilities

Suitable Tasks

  • Text completion and generation
  • Code completion and synthesis
  • Story generation
  • Academic writing assistance
  • Programming tutorials and explanations

Limitations and Biases

Known Limitations

  • Context Length: Limited to 1024 tokens maximum sequence length
  • Model Size: Relatively small model may have limited knowledge compared to larger models
  • Training Data: Performance dependent on the quality and coverage of training datasets
  • Arithmetic: May struggle with complex mathematical calculations
  • Factual Accuracy: May generate plausible but incorrect information

Potential Biases

  • Dataset Bias: Reflects biases present in ArXiv papers, code repositories, and SimpleStories
  • Language Bias: Primarily trained on English text
  • Domain Bias: May perform better on academic and programming content than general conversation

Training Data

The model was trained on a curated dataset combining:

  1. ArXiv Papers: Academic papers covering various scientific disciplines
  2. Code Repositories: Open-source code from various programming languages and projects
  3. SimpleStories: Simplified narrative text for improving text generation capabilities

Evaluation

Performance Metrics

  • Perplexity: [Add your perplexity scores]
  • BLEU Score: [Add BLEU scores for code generation]
  • Human Evaluation: [Add human evaluation results]

Benchmark Results

[Add your benchmark results here, e.g.:]
- HumanEval: XX/100
- MBPP: XX/100
- HellaSwag: XX.X%
- PIQA: XX.X%

Environmental Impact

  • Training Time: [Add training duration]
  • Hardware: [Add hardware specifications]
  • Carbon Footprint: [Add estimated carbon footprint if available]

Technical Specifications

Hardware Requirements

  • Minimum RAM: 4GB for inference
  • Recommended GPU: NVIDIA GTX 1080 or better
  • CPU: Modern multi-core processor

Software Requirements

  • Python 3.8+
  • PyTorch 1.11+
  • Transformers 4.20+
  • CUDA 11.0+ (for GPU acceleration)

Citation

@misc{dense5l2024,
  title={Dense-5L-ArXiv-Code-SimpleStories: A Compact Transformer for Multi-Domain Text Generation},
  author={[Your Name]},
  year={2024},
  howpublished={HuggingFace Model Hub},
  url={https://huggingface.co/your-username/dense-5l-arxiv-code-simplestories}
}

License

This model is released under the Apache 2.0 License. See the LICENSE file for more details.

Model Card Authors

Pranav Karra - pranavkarra001@gmail.com

Contact

For questions or issues regarding this model, please:


Disclaimer: This model is provided for research and educational purposes. Users should be aware of potential biases and limitations when using this model in applications.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support