Spaces:
Sleeping
Sleeping
File size: 9,051 Bytes
599c2c0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
# System Patterns: Morris Bot Architecture
## System Architecture Overview
### High-Level Architecture
```
Data Collection β Preprocessing β Enhancement β Fine-tuning β Inference β Web Interface
β β β β β β
scraper.py β preprocess.py β enhance.py β finetune.py β model β app.py (Gradio)
```
### Core Components
1. **Data Pipeline**: Web scraping β JSON storage β Enhancement β Dataset preparation
2. **Enhancement Pipeline**: System prompt improvement β Non-telecom examples β Style optimization
3. **Training Pipeline**: Enhanced LoRA fine-tuning β Multiple checkpoints β Enhanced adapter storage
4. **Inference Pipeline**: Enhanced model loading β Style-aware generation β Response formatting
5. **User Interface**: Enhanced Gradio web app β Apple Silicon optimization β Real-time generation
## Key Technical Decisions
### Model Selection: Zephyr-7B-Beta
**Decision**: Use HuggingFaceH4/zephyr-7b-beta as base model
**Rationale**:
- Instruction-tuned for better following of generation prompts
- No authentication required (unlike some Mistral variants)
- 7B parameters: Good balance of capability vs. resource requirements
- Strong performance on text generation tasks
**Alternative Considered**: Direct Mistral-7B
**Why Rejected**: Zephyr's instruction-tuning provides better prompt adherence
### Fine-tuning Approach: LoRA (Low-Rank Adaptation)
**Decision**: Use LoRA instead of full fine-tuning
**Rationale**:
- **Memory Efficiency**: Only 0.58% of parameters trainable (42.5M vs 7.24B)
- **Hardware Compatibility**: Fits in 8GB RAM on Apple Silicon
- **Training Speed**: ~18 minutes vs hours for full fine-tuning
- **Preservation**: Keeps base model knowledge while adding specialization
**Configuration**:
```python
LoRA Parameters:
- rank: 16 (balance of efficiency vs capacity)
- alpha: 32 (scaling factor)
- dropout: 0.1 (regularization)
- target_modules: All attention layers
```
### Hardware Optimization: Apple Silicon MPS
**Decision**: Optimize for Apple M1/M2/M3 chips with MPS backend
**Rationale**:
- **Target Hardware**: Many developers use MacBooks
- **Performance**: MPS provides significant acceleration over CPU
- **Memory**: Unified memory architecture efficient for ML workloads
- **Accessibility**: Makes fine-tuning accessible without expensive GPUs
**Implementation Pattern**:
```python
# Automatic device detection
if torch.backends.mps.is_available():
device = "mps"
dtype = torch.float16 # Memory efficient
elif torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
```
## Design Patterns in Use
### Data Processing Pipeline Pattern
**Pattern**: ETL (Extract, Transform, Load) with validation
**Implementation**:
1. **Extract**: Web scraping with rate limiting and error handling
2. **Transform**: Text cleaning, format standardization, instruction formatting
3. **Load**: JSON storage with validation and dataset splitting
4. **Validate**: Content quality checks and format verification
### Model Adapter Pattern
**Pattern**: Adapter pattern for model extensions
**Implementation**:
- Base model remains unchanged
- LoRA adapters provide specialization
- Easy swapping between different fine-tuned versions
- Preserves ability to use base model capabilities
### Configuration Management Pattern
**Pattern**: Centralized configuration with environment-specific overrides
**Implementation**:
```python
# Training configuration centralized in finetune.py
TRAINING_CONFIG = {
"learning_rate": 1e-4,
"num_epochs": 2,
"batch_size": 1,
"gradient_accumulation_steps": 8
}
# Hardware-specific overrides
if device == "mps":
TRAINING_CONFIG["fp16"] = False # Not supported on MPS
TRAINING_CONFIG["dataloader_num_workers"] = 0
```
### Error Handling and Logging Pattern
**Pattern**: Comprehensive logging with graceful degradation
**Implementation**:
- Structured logging to `morris_bot.log`
- Try-catch blocks with informative error messages
- Fallback behaviors (CPU if MPS fails, etc.)
- Progress tracking during long operations
## Component Relationships
### Enhanced Data Flow Architecture
```
Raw Articles β Enhanced Dataset β Style-Optimized Training β Enhanced Model
β β β β
Raw JSON β Improved Prompts β Non-telecom Examples β Enhanced LoRA Adapters
β β β β
Original β System Prompt Update β Topic Diversification β Multi-topic Capability
β β β β
Web Interface β Enhanced Inference β Enhanced Model β Style-Aware Training
```
### Enhanced Dependency Relationships
- **app.py** depends on enhanced model in `models/iain-morris-model-enhanced/`
- **finetune.py** depends on enhanced dataset in `data/enhanced_train_dataset.json`
- **update_system_prompt.py** enhances training data with improved style guidance
- **add_non_telecom_examples.py** expands dataset with topic diversity
- **test_enhanced_model.py** validates enhanced model performance
- **ENHANCEMENT_SUMMARY.md** documents all improvements and changes
### Enhanced Model Architecture
```
Base Model: Zephyr-7B-Beta (7.24B parameters)
β
Enhanced LoRA Adapters (42.5M trainable parameters)
β
Style-Aware Generation with:
- Doom-laden openings
- Cynical wit and expertise
- Signature phrases ("What could possibly go wrong?")
- Dark analogies and visceral metaphors
- British cynicism with parenthetical snark
- Multi-topic versatility (telecom + non-telecom)
```
### State Management
- **Model State**: Stored as LoRA adapter files (safetensors format)
- **Training State**: Checkpoints saved during training for recovery
- **Data State**: JSON files with versioning through filenames
- **Application State**: Stateless web interface, model loaded on demand
## Critical Implementation Paths
### Training Pipeline Critical Path
1. **Data Validation**: Ensure training examples meet quality standards
2. **Model Loading**: Base model download and initialization
3. **LoRA Setup**: Adapter configuration and parameter freezing
4. **Training Loop**: Gradient computation and adapter updates
5. **Checkpoint Saving**: Periodic saves for recovery
6. **Final Export**: Adapter weights saved for inference
### Inference Pipeline Critical Path
1. **Model Loading**: Base model + LoRA adapter loading
2. **Prompt Formatting**: User input β instruction format
3. **Generation**: Model forward pass with sampling parameters
4. **Post-processing**: Clean output, format for display
5. **Response**: Return formatted article to user interface
### Error Recovery Patterns
- **Training Interruption**: Resume from last checkpoint
- **Memory Overflow**: Reduce batch size, enable gradient checkpointing
- **Model Loading Failure**: Fallback to CPU, reduce precision
- **Generation Timeout**: Implement timeout with partial results
## Performance Optimization Patterns
### Memory Management
- **Gradient Accumulation**: Simulate larger batch sizes without memory increase
- **Mixed Precision**: float16 where supported for memory efficiency
- **Model Sharding**: LoRA adapters separate from base model
- **Garbage Collection**: Explicit cleanup after training steps
### Compute Optimization
- **Hardware Detection**: Automatic selection of best available device
- **Batch Processing**: Process multiple examples efficiently
- **Caching**: Tokenized datasets cached for repeated training runs
- **Parallel Processing**: Multi-threading where beneficial
### User Experience Optimization
- **Lazy Loading**: Model loaded only when needed
- **Progress Indicators**: Real-time feedback during long operations
- **Parameter Validation**: Input validation before expensive operations
- **Responsive Interface**: Non-blocking UI during generation
## Scalability Considerations
### Current Limitations
- **Single Model**: Only one fine-tuned model at a time
- **Local Deployment**: No distributed inference capability
- **Memory Bound**: Limited by single machine memory
- **Sequential Processing**: One generation request at a time
### Future Scalability Patterns
- **Model Versioning**: Support multiple LoRA adapters
- **Distributed Inference**: Model serving across multiple devices
- **Batch Generation**: Process multiple requests simultaneously
- **Cloud Deployment**: Container-based scaling patterns
## Security and Ethics Patterns
### Data Handling
- **Public Data Only**: Scrape only publicly available articles
- **Rate Limiting**: Respectful scraping with delays
- **Attribution**: Clear marking of AI-generated content
- **Privacy**: No personal data collection or storage
### Model Safety
- **Content Filtering**: Basic checks on generated content
- **Human Review**: Emphasis on human oversight requirement
- **Educational Use**: Clear guidelines for appropriate use
- **Transparency**: Open documentation of training process
|