File size: 7,056 Bytes
481a266 3bb8d99 481a266 f913935 481a266 95f1c8b f913935 95f1c8b 481a266 95f1c8b 481a266 95f1c8b 0afa2f5 95f1c8b 0afa2f5 95f1c8b f913935 95f1c8b 0afa2f5 95f1c8b f913935 95f1c8b 0afa2f5 95f1c8b 0afa2f5 95f1c8b 0afa2f5 95f1c8b 3bb8d99 95f1c8b 0afa2f5 95f1c8b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
---
language: en
license: apache-2.0
tags:
- code-generation
- causal-lm
- steering
- contrastive-activation-addition
- caa
- qwen
- wisent
library_name: transformers
datasets:
- evalplus/mbppplus
metrics:
- pass@1
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
model-index:
- name: wisent-ai/qwen2.5-coder-7b-wisent-caa
results:
- task:
type: code-generation
name: Code Generation
dataset:
type: mbppplus
name: MBPP Plus
metrics:
- type: pass@1
value: 0.521
name: Pass@1
---
# Wisent-Qwen2.5-Coder-7B-Instruct with CAA Steering
## Model Description
This is an enhanced version of Qwen2.5-Coder-7B-Instruct that integrates **Contrastive Activation Addition (CAA)** steering directly into the model architecture. The steering parameters have been optimized using Optuna to improve code generation quality on the MBPP Plus benchmark.
### Key Features
- π **Automatic CAA Steering**: No manual hook management required
- π― **Optimized Parameters**: Layer 24, Ξ±=1.4
- ποΈ **Trait-Based Organization**: Steering vectors organized by traits
- π§ **Runtime Configurable**: Adjust or disable steering on the fly
- π€ **HuggingFace Compatible**: Works with standard transformers API
## Installation
```bash
pip install transformers torch safetensors
# Or install from requirements.txt
pip install -r requirements.txt
```
## Hardware Requirements
### Minimum Requirements:
- **GPU Memory**: 16GB VRAM (for inference with bfloat16)
- **System RAM**: 32GB recommended
- **Storage**: 15MB (model configuration + steering vectors)
### Recommended Setup:
- **GPU**: NVIDIA RTX 4090, A100, or similar
- **CUDA**: 11.8 or newer
- **Python**: 3.8-3.11
### Performance Notes:
- Model automatically loads base Qwen2.5-Coder weights (7B parameters)
- CAA steering adds minimal computational overhead (~1-2% inference time)
- Supports CPU inference but GPU recommended for practical use
- Memory usage: ~14GB GPU memory for bfloat16 inference
## Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model - CAA steering is automatically applied!
model = AutoModelForCausalLM.from_pretrained("./huggingface_qwen25-7b-coder-caa", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("./huggingface_qwen25-7b-coder-caa")
# Generate code
prompt = "Write a Python function to calculate the factorial of a number"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.2)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Advanced Usage
### Adjusting Steering Strength
```python
# Increase steering strength for stronger safety alignment
model.set_caa_alpha(1.2)
# Decrease for more creative outputs
model.set_caa_alpha(0.5)
```
### Disabling CAA Steering
```python
# Disable CAA to get baseline model behavior
model.set_caa_enabled(False)
# Re-enable CAA
model.set_caa_enabled(True)
```
### Accessing Steering Configuration
```python
print(f"CAA Layer: {model.caa_layer_id}")
print(f"CAA Alpha: {model.caa_alpha}")
print(f"Steering Method: {model.steering_method}")
```
### Trait-Based Vector Organization
The model uses a trait-based organization for steering vectors:
```
vectors/
βββ mbpp_plus/ # Current: Optimized for MBPP Plus benchmark
βββ safety/ # Future: Safety-aligned behavior
βββ creativity/ # Future: Enhanced creative outputs
βββ helpfulness/ # Future: Improved helpfulness
βββ reasoning/ # Future: Enhanced logical reasoning
```
To switch traits, simply update the configuration:
```json
{
"steering_vector_path": "./vectors/safety/steering_vector.safetensors"
}
```
## Technical Details
### CAA Steering Parameters
- **Steering Method**: Contrastive Activation Addition (CAA)
- **Optimal Layer**: 24 (out of 28 transformer layers)
- **Steering Strength (Ξ±)**: 1.4
- **Vector Format**: Safetensors format for efficient loading and HuggingFace compatibility
- **Vector Dimension**: 3584 (pre-normalized during training)
- **Storage Path**: `./vectors/mbpp_plus/steering_vector.safetensors`
### How It Works
1. **Trait-based Organization**: Steering vectors are organized by behavioral traits (`vectors/{trait}/`)
2. **Dynamic Loading**: The model loads the specified steering vector from the configured path
3. **Layer Application**: Steering is applied to hidden states at layer 24 during forward pass
4. **Generation Integration**: Steering affects the last token position during generation
5. **Configurable Strength**: The Ξ± parameter (default: 0.9) controls steering intensity
6. **Pre-optimized Vectors**: Steering vectors are pre-normalized and ready for immediate use
### Optimization Process
The CAA parameters were optimized using:
- **Framework**: Optuna with TPE sampler
- **Search Space**: Layers 15-28, Ξ± β [0.1, 5.0]
- **Objective**: Maximize accuracy on MBPP Plus validation set
- **Best Performance**: 52.1% accuracy on MBPP Plus (378 problems)
## Model Architecture
```
WisentQwen2ForCausalLM
βββ Base: Qwen2.5-Coder-7B-Instruct
βββ CAA Integration: Layer 24
βββ Steering Vector: ./vectors/mbpp_plus/steering_vector.safetensors
βββ Auto-applied during generation
```
## File Structure
```
huggingface_qwen25-7b-coder-caa/
βββ config.json # Model configuration with CAA params
βββ modeling_wisent_qwen.py # Custom model class
βββ tokenizer files # Standard Qwen tokenizer
βββ wisent_config.json # Optimization results
βββ vectors/ # Trait-based steering vectors
βββ mbpp_plus/
βββ steering_vector.safetensors # MBPP Plus optimized steering vector
```
## Evaluation
### MBPP Plus Benchmark
The model has been optimized using Optuna on MBPP Plus tasks. For reliable performance metrics, evaluation should be conducted on the complete MBPP Plus dataset (378 problems) using the [evalplus/mbppplus](https://huggingface.co/datasets/evalplus/mbppplus) dataset.
### Running Evaluation
```python
# Use with bigcode-evaluation-harness
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"./huggingface_qwen25-7b-coder-caa",
trust_remote_code=True
)
# CAA steering is automatically applied during evaluation!
# No manual hooks or modifications needed
```
## Citation
If you use this model, please cite:
```bibtex
@software{wisent_qwen_caa_2025,
title={Wisent-Qwen2.5-Coder with CAA Steering},
author={Wisent AI},
year={2025},
url={https://github.com/wisent-ai/wisent-guard}
}
```
## License
This model inherits the license from the base Qwen2.5-Coder-7B-Instruct model. Please refer to the original model's license for usage terms.
## Acknowledgments
- Base model: Qwen2.5-Coder-7B-Instruct by Alibaba
- CAA method: Contrastive Activation Addition
- Optimization: Optuna framework
- Implementation: Wisent Guard framework |