Dense-5L-ArXiv-Code-SimpleStories

Model Description

This is a 5-layer dense transformer model trained on a combination of ArXiv papers, code repositories, and SimpleStories dataset. The model uses a standard transformer architecture optimized for causal language modeling tasks.

Model Details

Architecture

  • Model Type: Dense Transformer for Causal Language Modeling
  • Architecture: DenseTransformerForCausalLM
  • Parameters: ~50M parameters
  • Layers: 5 transformer layers
  • Hidden Size: 768
  • Attention Heads: 12 (with 8 key-value heads for efficiency)
  • Vocabulary Size: 50,256 tokens
  • Max Sequence Length: 1024 tokens
  • Context Window: 512 tokens (with windowing support)

Training Details

  • Training Data: ArXiv papers, code repositories, and SimpleStories
  • Training Epochs: 1
  • Batch Size: 256
  • Learning Rate: 1e-3
  • Optimizer: AdamW (β1=0.9, β2=0.999)
  • Dropout: 0.1 (attention and hidden layers)
  • Normalization: RMSNorm (ε=1e-6)

Model Features

  • Rotary Position Embeddings: For better handling of positional information
  • Group Query Attention: Efficient attention with 12 query heads and 8 key-value heads
  • SwiGLU Activation: Modern activation function in feed-forward layers
  • RMSNorm: Layer normalization for improved training stability

Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "your-username/dense-5l-arxiv-code-simplestories"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float32,
    device_map="auto"
)

Text Generation

# Generate text
prompt = "The fundamental theorem of calculus states that"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=200,
        num_return_sequences=1,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Code Generation

# Generate Python code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=150,
        temperature=0.2,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)

Intended Use

Primary Use Cases

  • Research: Academic research in natural language processing and code generation
  • Educational: Learning about transformer architectures and language modeling
  • Prototyping: Building applications that require text and code generation capabilities

Suitable Tasks

  • Text completion and generation
  • Code completion and synthesis
  • Story generation
  • Academic writing assistance
  • Programming tutorials and explanations

Limitations and Biases

Known Limitations

  • Context Length: Limited to 1024 tokens maximum sequence length
  • Model Size: Relatively small model may have limited knowledge compared to larger models
  • Training Data: Performance dependent on the quality and coverage of training datasets
  • Arithmetic: May struggle with complex mathematical calculations
  • Factual Accuracy: May generate plausible but incorrect information

Potential Biases

  • Dataset Bias: Reflects biases present in ArXiv papers, code repositories, and SimpleStories
  • Language Bias: Primarily trained on English text
  • Domain Bias: May perform better on academic and programming content than general conversation

Training Data

The model was trained on a curated dataset combining:

  1. ArXiv Papers: Academic papers covering various scientific disciplines
  2. Code Repositories: Open-source code from various programming languages and projects
  3. SimpleStories: Simplified narrative text for improving text generation capabilities

Evaluation

Performance Metrics

  • Perplexity: [Add your perplexity scores]
  • BLEU Score: [Add BLEU scores for code generation]
  • Human Evaluation: [Add human evaluation results]

Benchmark Results

[Add your benchmark results here, e.g.:]
- HumanEval: XX/100
- MBPP: XX/100
- HellaSwag: XX.X%
- PIQA: XX.X%

Environmental Impact

  • Training Time: [Add training duration]
  • Hardware: [Add hardware specifications]
  • Carbon Footprint: [Add estimated carbon footprint if available]

Technical Specifications

Hardware Requirements

  • Minimum RAM: 4GB for inference
  • Recommended GPU: NVIDIA GTX 1080 or better
  • CPU: Modern multi-core processor

Software Requirements

  • Python 3.8+
  • PyTorch 1.11+
  • Transformers 4.20+
  • CUDA 11.0+ (for GPU acceleration)

Citation

@misc{dense5l2024,
  title={Dense-5L-ArXiv-Code-SimpleStories: A Compact Transformer for Multi-Domain Text Generation},
  author={[Your Name]},
  year={2024},
  howpublished={HuggingFace Model Hub},
  url={https://huggingface.co/your-username/dense-5l-arxiv-code-simplestories}
}

License

This model is released under the Apache 2.0 License. See the LICENSE file for more details.

Model Card Authors

Pranav Karra - pranavkarra001@gmail.com

Contact

For questions or issues regarding this model, please:


Disclaimer: This model is provided for research and educational purposes. Users should be aware of potential biases and limitations when using this model in applications.

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support