Dense-5L-ArXiv-Code-SimpleStories
Model Description
This is a 5-layer dense transformer model trained on a combination of ArXiv papers, code repositories, and SimpleStories dataset. The model uses a standard transformer architecture optimized for causal language modeling tasks.
Model Details
Architecture
- Model Type: Dense Transformer for Causal Language Modeling
- Architecture:
DenseTransformerForCausalLM
- Parameters: ~50M parameters
- Layers: 5 transformer layers
- Hidden Size: 768
- Attention Heads: 12 (with 8 key-value heads for efficiency)
- Vocabulary Size: 50,256 tokens
- Max Sequence Length: 1024 tokens
- Context Window: 512 tokens (with windowing support)
Training Details
- Training Data: ArXiv papers, code repositories, and SimpleStories
- Training Epochs: 1
- Batch Size: 256
- Learning Rate: 1e-3
- Optimizer: AdamW (β1=0.9, β2=0.999)
- Dropout: 0.1 (attention and hidden layers)
- Normalization: RMSNorm (ε=1e-6)
Model Features
- Rotary Position Embeddings: For better handling of positional information
- Group Query Attention: Efficient attention with 12 query heads and 8 key-value heads
- SwiGLU Activation: Modern activation function in feed-forward layers
- RMSNorm: Layer normalization for improved training stability
Usage
Loading the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "your-username/dense-5l-arxiv-code-simplestories"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float32,
device_map="auto"
)
Text Generation
# Generate text
prompt = "The fundamental theorem of calculus states that"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=200,
num_return_sequences=1,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Code Generation
# Generate Python code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=150,
temperature=0.2,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)
Intended Use
Primary Use Cases
- Research: Academic research in natural language processing and code generation
- Educational: Learning about transformer architectures and language modeling
- Prototyping: Building applications that require text and code generation capabilities
Suitable Tasks
- Text completion and generation
- Code completion and synthesis
- Story generation
- Academic writing assistance
- Programming tutorials and explanations
Limitations and Biases
Known Limitations
- Context Length: Limited to 1024 tokens maximum sequence length
- Model Size: Relatively small model may have limited knowledge compared to larger models
- Training Data: Performance dependent on the quality and coverage of training datasets
- Arithmetic: May struggle with complex mathematical calculations
- Factual Accuracy: May generate plausible but incorrect information
Potential Biases
- Dataset Bias: Reflects biases present in ArXiv papers, code repositories, and SimpleStories
- Language Bias: Primarily trained on English text
- Domain Bias: May perform better on academic and programming content than general conversation
Training Data
The model was trained on a curated dataset combining:
- ArXiv Papers: Academic papers covering various scientific disciplines
- Code Repositories: Open-source code from various programming languages and projects
- SimpleStories: Simplified narrative text for improving text generation capabilities
Evaluation
Performance Metrics
- Perplexity: [Add your perplexity scores]
- BLEU Score: [Add BLEU scores for code generation]
- Human Evaluation: [Add human evaluation results]
Benchmark Results
[Add your benchmark results here, e.g.:]
- HumanEval: XX/100
- MBPP: XX/100
- HellaSwag: XX.X%
- PIQA: XX.X%
Environmental Impact
- Training Time: [Add training duration]
- Hardware: [Add hardware specifications]
- Carbon Footprint: [Add estimated carbon footprint if available]
Technical Specifications
Hardware Requirements
- Minimum RAM: 4GB for inference
- Recommended GPU: NVIDIA GTX 1080 or better
- CPU: Modern multi-core processor
Software Requirements
- Python 3.8+
- PyTorch 1.11+
- Transformers 4.20+
- CUDA 11.0+ (for GPU acceleration)
Citation
@misc{dense5l2024,
title={Dense-5L-ArXiv-Code-SimpleStories: A Compact Transformer for Multi-Domain Text Generation},
author={[Your Name]},
year={2024},
howpublished={HuggingFace Model Hub},
url={https://huggingface.co/your-username/dense-5l-arxiv-code-simplestories}
}
License
This model is released under the Apache 2.0 License. See the LICENSE file for more details.
Model Card Authors
Pranav Karra - pranavkarra001@gmail.com
Contact
For questions or issues regarding this model, please:
- Open an issue on the model repository
- Contact: pranavkarra001@gmail.com
Disclaimer: This model is provided for research and educational purposes. Users should be aware of potential biases and limitations when using this model in applications.
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support