File size: 7,659 Bytes
c6dbc76 204cc17 5d8d2f3 204cc17 5d8d2f3 204cc17 c6dbc76 5404b6e c6dbc76 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
---
license: mit
datasets:
- yahma/alpaca-cleaned
---
## Model Details
This model builds upon the neuromorphic **Llama-SNN-LTC** base architecture, incorporating **Spiking Neural Networks (SNNs)** and **Liquid Time Constants (LTCs)**, and fine-tunes it specifically for instruction following using the Alpaca Cleaned dataset.
**Model Type**: Instruction-Following Language Model with Neuromorphic Enhancements
**Supported Languages**: English
**Number of Parameters**: 155.8M
**Context Length**: 1024 tokens
**Base Architecture**: Llama with SNN/LTC modifications
**Base Model**: rootxhacker/arthemis-lm
**Fine-tuning Data**: Alpaca Cleaned (~52K instruction-response pairs)
### Architecture Features
- **Spiking Neural Networks** in attention mechanisms for temporal processing
- **Liquid Time Constants** in feed-forward layers for adaptive dynamics
- **12-layer transformer backbone** with neuromorphic enhancements
- **RoPE positional encoding** for sequence understanding
- **Custom surrogate gradient training** for differentiable spike computation
- **Instruction-following fine-tuning** for enhanced conversational abilities
Here are my major model configurations:
```
hidden_size = 768
intermediate_size = 2048
num_hidden_layers = 12
num_attention_heads = 12
num_key_value_heads = 12
max_position_embeddings = 1024
vocab_size = 50257
spiking_threshold = 1.0
ltc_hidden_size = 256
ltc_layers = 2
```
## Usage
### Install dependencies
```bash
pip install transformers torch numpy
```
## Inference
This gist has full code for inference
``` bash
https://gist.github.com/harishsg993010/e632de8b15a3ab1ff03e3912f55109ea
```
### Run code!
```python
# Note: This model requires custom implementation due to SNN/LTC architecture
# Standard transformers library cannot load this model directly
# For custom loading, you'll need the specialized architecture:
from custom_model import LlamaSNNLTCModel
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
tokenizer.pad_token = tokenizer.eos_token
# Load the instruction-tuned model
model = LlamaSNNLTCModel.from_pretrained("rootxhacker/arthemis-instruct")
# For instruction-following generation
def generate_instruction_response(instruction, input_text="", model=None, tokenizer=None, max_length=150):
model.eval()
device = next(model.parameters()).device
# Reset model states for clean generation
model.reset_states()
# Format prompt in Alpaca style
if input_text.strip():
prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n"
else:
prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors='pt').to(device)
input_ids = inputs['input_ids']
with torch.no_grad():
for _ in range(max_length - input_ids.shape[1]):
outputs = model(input_ids)
logits = outputs['logits'][0, -1, :]
# Sample with temperature for more natural responses
logits = logits / 0.7
probs = torch.softmax(logits, dim=-1)
next_token = torch.multinomial(probs, 1)
input_ids = torch.cat([input_ids, next_token.unsqueeze(0)], dim=-1)
if next_token.item() == tokenizer.eos_token_id:
break
generated = tokenizer.decode(input_ids[0], skip_special_tokens=True)
# Extract just the response part
if "### Response:\n" in generated:
response = generated.split("### Response:\n")[-1].strip()
return response
return generated
# Example usage
instruction = "Explain what artificial intelligence is in simple terms."
response = generate_instruction_response(instruction, model=model, tokenizer=tokenizer)
print(f"Instruction: {instruction}")
print(f"Response: {response}")
```
## Evaluation
I performed evaluation using the https://gist.github.com/harishsg993010/e3c31c2d2c8207384ee263627f990300
### Results Comparison
| Model | Params | Budget | HellaSwag | OBQA | WinoGrande | ARC_e | ARC_c | BoolQ | Avg |
|-------|--------|--------|-----------|------|------------|-------|-------|-------|-----|
| **rootxhacker/arthemis-lm** | **155.8M** | **<$50** | **24.65** | **20.60** | **48.10** | **28.20** | **22.20** | **39.80** | **30.59** |
| google/bert-large-uncased | 336M | N/A | 24.53 | 26.20 | 49.80 | 25.08 | 25.68 | 40.86 | 32.03 |
## Technical Specifications
```
Architecture: Llama + Spiking Neural Networks + Liquid Time Constants
Hidden Size: 768
Intermediate Size: 2048
Attention Heads: 12
Layers: 12
Max Position Embeddings: 1024
Vocabulary Size: 50,257
Spiking Threshold: 1.0
LTC Hidden Size: 256
Training Precision: FP32
Fine-tuning Dataset: Alpaca Cleaned (52K instructions)
```
## Training Details
The model was fine-tuned from rootxhacker/arthemis-lm using:
- **Base Model**: rootxhacker/arthemis-lm (pretrained neuromorphic LLM)
- **Dataset**: Alpaca Cleaned (~52K instruction-response pairs)
- **Hardware**: Google Colab Pro Plus (A100 GPU)
- **Training Steps**: 5,000 steps
- **Batch Size**: 4 with gradient accumulation
- **Learning Rate**: 5e-5 (lower for fine-tuning)
- **Precision**: FP32 for stability with neuromorphic components
### Key Features
- **Instruction Format**: Uses Alpaca's structured instruction format
- **Response Generation**: Optimized for helpful, accurate responses
- **Neuromorphic Preservation**: Maintains SNN/LTC benefits during fine-tuning
- **Budget-Conscious**: Additional fine-tuning cost under $10
## Fine-tuning Process
The fine-tuning process involved:
1. **Base Model Loading**: Started from the pretrained arthemis-lm checkpoint
2. **Data Formatting**: Converted Alpaca instructions to proper format
3. **Careful Training**: Lower learning rate to preserve base model knowledge
4. **State Management**: Proper handling of SNN/LTC states during training
5. **Validation**: Continuous monitoring of instruction-following quality
## Limitations
- **Training Data**: Limited to Alpaca Cleaned dataset scope
- **Context Length**: Maximum 1024 tokens
- **Domain**: Primarily English instructions
- **Custom Architecture**: Requires specialized loading code
- **Scale**: Smaller than commercial instruction models
## Model Sources
- **Repository**: [Coming Soon]
- **Base Model**: [rootxhacker/arthemis-lm](https://huggingface.co/rootxhacker/arthemis-lm)
- **Hugging Face**: [rootxhacker/arthemis-instruct](https://huggingface.co/rootxhacker/arthemis-instruct)
## Future Work
- Scale instruction dataset for broader capabilities
- Add multi-turn conversation support
- Implement reinforcement learning from human feedback (RLHF)
- Explore specialized instruction types (coding, math, reasoning)
- Compare instruction-following efficiency with standard transformers
## Acknowledgments
Special thanks to **keeeeenw** for the inspiration and open-source MicroLlama project, which demonstrated that impressive language models can be built on a budget. This work extends those principles to instruction-following capabilities while exploring neuromorphic computing approaches.
Thanks to the Stanford Alpaca team for the high-quality instruction dataset that made this fine-tuning possible.
## Citation
```bibtex
@misc{arthemis-instruct-2024,
title={Arthemis-Instruct: A Neuromorphic Instruction-Following Model with Spiking Neural Networks and Liquid Time Constants},
author={rootxhacker},
year={2024},
howpublished={\url{https://huggingface.co/rootxhacker/arthemis-instruct}}
}
```
## License
Apache License 2.0 |