Dense-5L-ArXiv-Code-SimpleStories

Model Description

This is a 5-layer dense transformer model trained on a combination of ArXiv papers, code repositories, and SimpleStories dataset. The model uses a standard transformer architecture optimized for causal language modeling tasks.

Model Details

Architecture

Model Type: Dense Transformer for Causal Language Modeling
Architecture: DenseTransformerForCausalLM
Parameters: ~50M parameters
Layers: 5 transformer layers
Hidden Size: 768
Attention Heads: 12 (with 8 key-value heads for efficiency)
Vocabulary Size: 50,256 tokens
Max Sequence Length: 1024 tokens
Context Window: 512 tokens (with windowing support)

Training Details

Training Data: ArXiv papers, code repositories, and SimpleStories
Training Epochs: 1
Batch Size: 256
Learning Rate: 1e-3
Optimizer: AdamW (β1=0.9, β2=0.999)
Dropout: 0.1 (attention and hidden layers)
Normalization: RMSNorm (ε=1e-6)

Model Features

Rotary Position Embeddings: For better handling of positional information
Group Query Attention: Efficient attention with 12 query heads and 8 key-value heads
SwiGLU Activation: Modern activation function in feed-forward layers
RMSNorm: Layer normalization for improved training stability

Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "your-username/dense-5l-arxiv-code-simplestories"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float32,
    device_map="auto"
)

Text Generation

# Generate text
prompt = "The fundamental theorem of calculus states that"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=200,
        num_return_sequences=1,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Code Generation

# Generate Python code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=150,
        temperature=0.2,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)

Intended Use

Primary Use Cases

Research: Academic research in natural language processing and code generation
Educational: Learning about transformer architectures and language modeling
Prototyping: Building applications that require text and code generation capabilities

Suitable Tasks

Text completion and generation
Code completion and synthesis
Story generation
Academic writing assistance
Programming tutorials and explanations

Limitations and Biases

Known Limitations

Context Length: Limited to 1024 tokens maximum sequence length
Model Size: Relatively small model may have limited knowledge compared to larger models
Training Data: Performance dependent on the quality and coverage of training datasets
Arithmetic: May struggle with complex mathematical calculations
Factual Accuracy: May generate plausible but incorrect information

Potential Biases

Dataset Bias: Reflects biases present in ArXiv papers, code repositories, and SimpleStories
Language Bias: Primarily trained on English text
Domain Bias: May perform better on academic and programming content than general conversation

Training Data

The model was trained on a curated dataset combining:

ArXiv Papers: Academic papers covering various scientific disciplines
Code Repositories: Open-source code from various programming languages and projects
SimpleStories: Simplified narrative text for improving text generation capabilities

Evaluation

Performance Metrics

Perplexity: [Add your perplexity scores]
BLEU Score: [Add BLEU scores for code generation]
Human Evaluation: [Add human evaluation results]

Benchmark Results

[Add your benchmark results here, e.g.:]
- HumanEval: XX/100
- MBPP: XX/100
- HellaSwag: XX.X%
- PIQA: XX.X%

Environmental Impact

Training Time: [Add training duration]
Hardware: [Add hardware specifications]
Carbon Footprint: [Add estimated carbon footprint if available]

Technical Specifications

Hardware Requirements

Minimum RAM: 4GB for inference
Recommended GPU: NVIDIA GTX 1080 or better
CPU: Modern multi-core processor

Software Requirements

Python 3.8+
PyTorch 1.11+
Transformers 4.20+
CUDA 11.0+ (for GPU acceleration)

Citation

@misc{dense5l2024,
  title={Dense-5L-ArXiv-Code-SimpleStories: A Compact Transformer for Multi-Domain Text Generation},
  author={[Your Name]},
  year={2024},
  howpublished={HuggingFace Model Hub},
  url={https://huggingface.co/your-username/dense-5l-arxiv-code-simplestories}
}

License

This model is released under the Apache 2.0 License. See the LICENSE file for more details.

Model Card Authors

Pranav Karra - pranavkarra001@gmail.com

Contact

For questions or issues regarding this model, please:

Open an issue on the model repository
Contact: pranavkarra001@gmail.com

Disclaimer: This model is provided for research and educational purposes. Users should be aware of potential biases and limitations when using this model in applications.

Downloads last month: 5