File size: 9,051 Bytes
599c2c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
# System Patterns: Morris Bot Architecture

## System Architecture Overview

### High-Level Architecture
```
Data Collection β†’ Preprocessing β†’ Enhancement β†’ Fine-tuning β†’ Inference β†’ Web Interface
     ↓              ↓             ↓            ↓           ↓           ↓
  scraper.py β†’ preprocess.py β†’ enhance.py β†’ finetune.py β†’ model β†’ app.py (Gradio)
```

### Core Components
1. **Data Pipeline**: Web scraping β†’ JSON storage β†’ Enhancement β†’ Dataset preparation
2. **Enhancement Pipeline**: System prompt improvement β†’ Non-telecom examples β†’ Style optimization
3. **Training Pipeline**: Enhanced LoRA fine-tuning β†’ Multiple checkpoints β†’ Enhanced adapter storage
4. **Inference Pipeline**: Enhanced model loading β†’ Style-aware generation β†’ Response formatting
5. **User Interface**: Enhanced Gradio web app β†’ Apple Silicon optimization β†’ Real-time generation

## Key Technical Decisions

### Model Selection: Zephyr-7B-Beta
**Decision**: Use HuggingFaceH4/zephyr-7b-beta as base model
**Rationale**:
- Instruction-tuned for better following of generation prompts
- No authentication required (unlike some Mistral variants)
- 7B parameters: Good balance of capability vs. resource requirements
- Strong performance on text generation tasks

**Alternative Considered**: Direct Mistral-7B
**Why Rejected**: Zephyr's instruction-tuning provides better prompt adherence

### Fine-tuning Approach: LoRA (Low-Rank Adaptation)
**Decision**: Use LoRA instead of full fine-tuning
**Rationale**:
- **Memory Efficiency**: Only 0.58% of parameters trainable (42.5M vs 7.24B)
- **Hardware Compatibility**: Fits in 8GB RAM on Apple Silicon
- **Training Speed**: ~18 minutes vs hours for full fine-tuning
- **Preservation**: Keeps base model knowledge while adding specialization

**Configuration**:
```python
LoRA Parameters:
- rank: 16 (balance of efficiency vs capacity)
- alpha: 32 (scaling factor)
- dropout: 0.1 (regularization)
- target_modules: All attention layers
```

### Hardware Optimization: Apple Silicon MPS
**Decision**: Optimize for Apple M1/M2/M3 chips with MPS backend
**Rationale**:
- **Target Hardware**: Many developers use MacBooks
- **Performance**: MPS provides significant acceleration over CPU
- **Memory**: Unified memory architecture efficient for ML workloads
- **Accessibility**: Makes fine-tuning accessible without expensive GPUs

**Implementation Pattern**:
```python
# Automatic device detection
if torch.backends.mps.is_available():
    device = "mps"
    dtype = torch.float16  # Memory efficient
elif torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"
```

## Design Patterns in Use

### Data Processing Pipeline Pattern
**Pattern**: ETL (Extract, Transform, Load) with validation
**Implementation**:
1. **Extract**: Web scraping with rate limiting and error handling
2. **Transform**: Text cleaning, format standardization, instruction formatting
3. **Load**: JSON storage with validation and dataset splitting
4. **Validate**: Content quality checks and format verification

### Model Adapter Pattern
**Pattern**: Adapter pattern for model extensions
**Implementation**:
- Base model remains unchanged
- LoRA adapters provide specialization
- Easy swapping between different fine-tuned versions
- Preserves ability to use base model capabilities

### Configuration Management Pattern
**Pattern**: Centralized configuration with environment-specific overrides
**Implementation**:
```python
# Training configuration centralized in finetune.py
TRAINING_CONFIG = {
    "learning_rate": 1e-4,
    "num_epochs": 2,
    "batch_size": 1,
    "gradient_accumulation_steps": 8
}

# Hardware-specific overrides
if device == "mps":
    TRAINING_CONFIG["fp16"] = False  # Not supported on MPS
    TRAINING_CONFIG["dataloader_num_workers"] = 0
```

### Error Handling and Logging Pattern
**Pattern**: Comprehensive logging with graceful degradation
**Implementation**:
- Structured logging to `morris_bot.log`
- Try-catch blocks with informative error messages
- Fallback behaviors (CPU if MPS fails, etc.)
- Progress tracking during long operations

## Component Relationships

### Enhanced Data Flow Architecture
```
Raw Articles β†’ Enhanced Dataset β†’ Style-Optimized Training β†’ Enhanced Model
     ↓              ↓                      ↓                     ↓
Raw JSON β†’ Improved Prompts β†’ Non-telecom Examples β†’ Enhanced LoRA Adapters
     ↓              ↓                      ↓                     ↓
Original β†’ System Prompt Update β†’ Topic Diversification β†’ Multi-topic Capability
     ↓              ↓                      ↓                     ↓
Web Interface ← Enhanced Inference ← Enhanced Model ← Style-Aware Training
```

### Enhanced Dependency Relationships
- **app.py** depends on enhanced model in `models/iain-morris-model-enhanced/`
- **finetune.py** depends on enhanced dataset in `data/enhanced_train_dataset.json`
- **update_system_prompt.py** enhances training data with improved style guidance
- **add_non_telecom_examples.py** expands dataset with topic diversity
- **test_enhanced_model.py** validates enhanced model performance
- **ENHANCEMENT_SUMMARY.md** documents all improvements and changes

### Enhanced Model Architecture
```
Base Model: Zephyr-7B-Beta (7.24B parameters)
     ↓
Enhanced LoRA Adapters (42.5M trainable parameters)
     ↓
Style-Aware Generation with:
- Doom-laden openings
- Cynical wit and expertise  
- Signature phrases ("What could possibly go wrong?")
- Dark analogies and visceral metaphors
- British cynicism with parenthetical snark
- Multi-topic versatility (telecom + non-telecom)
```

### State Management
- **Model State**: Stored as LoRA adapter files (safetensors format)
- **Training State**: Checkpoints saved during training for recovery
- **Data State**: JSON files with versioning through filenames
- **Application State**: Stateless web interface, model loaded on demand

## Critical Implementation Paths

### Training Pipeline Critical Path
1. **Data Validation**: Ensure training examples meet quality standards
2. **Model Loading**: Base model download and initialization
3. **LoRA Setup**: Adapter configuration and parameter freezing
4. **Training Loop**: Gradient computation and adapter updates
5. **Checkpoint Saving**: Periodic saves for recovery
6. **Final Export**: Adapter weights saved for inference

### Inference Pipeline Critical Path
1. **Model Loading**: Base model + LoRA adapter loading
2. **Prompt Formatting**: User input β†’ instruction format
3. **Generation**: Model forward pass with sampling parameters
4. **Post-processing**: Clean output, format for display
5. **Response**: Return formatted article to user interface

### Error Recovery Patterns
- **Training Interruption**: Resume from last checkpoint
- **Memory Overflow**: Reduce batch size, enable gradient checkpointing
- **Model Loading Failure**: Fallback to CPU, reduce precision
- **Generation Timeout**: Implement timeout with partial results

## Performance Optimization Patterns

### Memory Management
- **Gradient Accumulation**: Simulate larger batch sizes without memory increase
- **Mixed Precision**: float16 where supported for memory efficiency
- **Model Sharding**: LoRA adapters separate from base model
- **Garbage Collection**: Explicit cleanup after training steps

### Compute Optimization
- **Hardware Detection**: Automatic selection of best available device
- **Batch Processing**: Process multiple examples efficiently
- **Caching**: Tokenized datasets cached for repeated training runs
- **Parallel Processing**: Multi-threading where beneficial

### User Experience Optimization
- **Lazy Loading**: Model loaded only when needed
- **Progress Indicators**: Real-time feedback during long operations
- **Parameter Validation**: Input validation before expensive operations
- **Responsive Interface**: Non-blocking UI during generation

## Scalability Considerations

### Current Limitations
- **Single Model**: Only one fine-tuned model at a time
- **Local Deployment**: No distributed inference capability
- **Memory Bound**: Limited by single machine memory
- **Sequential Processing**: One generation request at a time

### Future Scalability Patterns
- **Model Versioning**: Support multiple LoRA adapters
- **Distributed Inference**: Model serving across multiple devices
- **Batch Generation**: Process multiple requests simultaneously
- **Cloud Deployment**: Container-based scaling patterns

## Security and Ethics Patterns

### Data Handling
- **Public Data Only**: Scrape only publicly available articles
- **Rate Limiting**: Respectful scraping with delays
- **Attribution**: Clear marking of AI-generated content
- **Privacy**: No personal data collection or storage

### Model Safety
- **Content Filtering**: Basic checks on generated content
- **Human Review**: Emphasis on human oversight requirement
- **Educational Use**: Clear guidelines for appropriate use
- **Transparency**: Open documentation of training process