File size: 7,659 Bytes
c6dbc76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204cc17
 
 
5d8d2f3
204cc17
5d8d2f3
204cc17
c6dbc76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5404b6e
c6dbc76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
license: mit
datasets:
- yahma/alpaca-cleaned
---
## Model Details

This model builds upon the neuromorphic **Llama-SNN-LTC** base architecture, incorporating **Spiking Neural Networks (SNNs)** and **Liquid Time Constants (LTCs)**, and fine-tunes it specifically for instruction following using the Alpaca Cleaned dataset.

**Model Type**: Instruction-Following Language Model with Neuromorphic Enhancements  
**Supported Languages**: English  
**Number of Parameters**: 155.8M  
**Context Length**: 1024 tokens  
**Base Architecture**: Llama with SNN/LTC modifications  
**Base Model**: rootxhacker/arthemis-lm  
**Fine-tuning Data**: Alpaca Cleaned (~52K instruction-response pairs)

### Architecture Features
- **Spiking Neural Networks** in attention mechanisms for temporal processing
- **Liquid Time Constants** in feed-forward layers for adaptive dynamics
- **12-layer transformer backbone** with neuromorphic enhancements
- **RoPE positional encoding** for sequence understanding
- **Custom surrogate gradient training** for differentiable spike computation
- **Instruction-following fine-tuning** for enhanced conversational abilities

Here are my major model configurations:

```
hidden_size = 768
intermediate_size = 2048
num_hidden_layers = 12
num_attention_heads = 12
num_key_value_heads = 12
max_position_embeddings = 1024
vocab_size = 50257
spiking_threshold = 1.0
ltc_hidden_size = 256
ltc_layers = 2
```

## Usage

### Install dependencies
```bash
pip install transformers torch numpy
```

## Inference
This gist has full code for inference

``` bash
https://gist.github.com/harishsg993010/e632de8b15a3ab1ff03e3912f55109ea
```

### Run code!
```python
# Note: This model requires custom implementation due to SNN/LTC architecture
# Standard transformers library cannot load this model directly

# For custom loading, you'll need the specialized architecture:
from custom_model import LlamaSNNLTCModel
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
tokenizer.pad_token = tokenizer.eos_token

# Load the instruction-tuned model
model = LlamaSNNLTCModel.from_pretrained("rootxhacker/arthemis-instruct")

# For instruction-following generation
def generate_instruction_response(instruction, input_text="", model=None, tokenizer=None, max_length=150):
    model.eval()
    device = next(model.parameters()).device
    
    # Reset model states for clean generation
    model.reset_states()
    
    # Format prompt in Alpaca style
    if input_text.strip():
        prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n"
    else:
        prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"
    
    inputs = tokenizer(prompt, return_tensors='pt').to(device)
    input_ids = inputs['input_ids']
    
    with torch.no_grad():
        for _ in range(max_length - input_ids.shape[1]):
            outputs = model(input_ids)
            logits = outputs['logits'][0, -1, :]
            
            # Sample with temperature for more natural responses
            logits = logits / 0.7
            probs = torch.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, 1)
            
            input_ids = torch.cat([input_ids, next_token.unsqueeze(0)], dim=-1)
            
            if next_token.item() == tokenizer.eos_token_id:
                break
    
    generated = tokenizer.decode(input_ids[0], skip_special_tokens=True)
    
    # Extract just the response part
    if "### Response:\n" in generated:
        response = generated.split("### Response:\n")[-1].strip()
        return response
    
    return generated

# Example usage
instruction = "Explain what artificial intelligence is in simple terms."
response = generate_instruction_response(instruction, model=model, tokenizer=tokenizer)
print(f"Instruction: {instruction}")
print(f"Response: {response}")
```


## Evaluation

I performed evaluation using the https://gist.github.com/harishsg993010/e3c31c2d2c8207384ee263627f990300

### Results Comparison

| Model | Params | Budget | HellaSwag | OBQA | WinoGrande | ARC_e | ARC_c | BoolQ | Avg |
|-------|--------|--------|-----------|------|------------|-------|-------|-------|-----|
| **rootxhacker/arthemis-lm** | **155.8M** | **<$50** | **24.65** | **20.60** | **48.10** | **28.20** | **22.20** | **39.80** | **30.59** |
| google/bert-large-uncased | 336M | N/A | 24.53 | 26.20 | 49.80 | 25.08 | 25.68 | 40.86 | 32.03 |


## Technical Specifications

```
Architecture: Llama + Spiking Neural Networks + Liquid Time Constants
Hidden Size: 768
Intermediate Size: 2048
Attention Heads: 12
Layers: 12
Max Position Embeddings: 1024
Vocabulary Size: 50,257
Spiking Threshold: 1.0
LTC Hidden Size: 256
Training Precision: FP32
Fine-tuning Dataset: Alpaca Cleaned (52K instructions)
```

## Training Details

The model was fine-tuned from rootxhacker/arthemis-lm using:
- **Base Model**: rootxhacker/arthemis-lm (pretrained neuromorphic LLM)
- **Dataset**: Alpaca Cleaned (~52K instruction-response pairs)
- **Hardware**: Google Colab Pro Plus (A100 GPU)
- **Training Steps**: 5,000 steps
- **Batch Size**: 4 with gradient accumulation
- **Learning Rate**: 5e-5 (lower for fine-tuning)
- **Precision**: FP32 for stability with neuromorphic components

### Key Features
- **Instruction Format**: Uses Alpaca's structured instruction format
- **Response Generation**: Optimized for helpful, accurate responses
- **Neuromorphic Preservation**: Maintains SNN/LTC benefits during fine-tuning
- **Budget-Conscious**: Additional fine-tuning cost under $10

## Fine-tuning Process

The fine-tuning process involved:
1. **Base Model Loading**: Started from the pretrained arthemis-lm checkpoint
2. **Data Formatting**: Converted Alpaca instructions to proper format
3. **Careful Training**: Lower learning rate to preserve base model knowledge
4. **State Management**: Proper handling of SNN/LTC states during training
5. **Validation**: Continuous monitoring of instruction-following quality


## Limitations

- **Training Data**: Limited to Alpaca Cleaned dataset scope
- **Context Length**: Maximum 1024 tokens
- **Domain**: Primarily English instructions
- **Custom Architecture**: Requires specialized loading code
- **Scale**: Smaller than commercial instruction models

## Model Sources

- **Repository**: [Coming Soon]
- **Base Model**: [rootxhacker/arthemis-lm](https://huggingface.co/rootxhacker/arthemis-lm)
- **Hugging Face**: [rootxhacker/arthemis-instruct](https://huggingface.co/rootxhacker/arthemis-instruct)

## Future Work

- Scale instruction dataset for broader capabilities
- Add multi-turn conversation support
- Implement reinforcement learning from human feedback (RLHF)
- Explore specialized instruction types (coding, math, reasoning)
- Compare instruction-following efficiency with standard transformers

## Acknowledgments

Special thanks to **keeeeenw** for the inspiration and open-source MicroLlama project, which demonstrated that impressive language models can be built on a budget. This work extends those principles to instruction-following capabilities while exploring neuromorphic computing approaches.

Thanks to the Stanford Alpaca team for the high-quality instruction dataset that made this fine-tuning possible.

## Citation

```bibtex
@misc{arthemis-instruct-2024,
  title={Arthemis-Instruct: A Neuromorphic Instruction-Following Model with Spiking Neural Networks and Liquid Time Constants},
  author={rootxhacker},
  year={2024},
  howpublished={\url{https://huggingface.co/rootxhacker/arthemis-instruct}}
}
```

## License

Apache License 2.0