morris-bot / memory-bank /systemPatterns.md
eusholli's picture
Upload folder using huggingface_hub
599c2c0 verified
# System Patterns: Morris Bot Architecture
## System Architecture Overview
### High-Level Architecture
```
Data Collection β†’ Preprocessing β†’ Enhancement β†’ Fine-tuning β†’ Inference β†’ Web Interface
↓ ↓ ↓ ↓ ↓ ↓
scraper.py β†’ preprocess.py β†’ enhance.py β†’ finetune.py β†’ model β†’ app.py (Gradio)
```
### Core Components
1. **Data Pipeline**: Web scraping β†’ JSON storage β†’ Enhancement β†’ Dataset preparation
2. **Enhancement Pipeline**: System prompt improvement β†’ Non-telecom examples β†’ Style optimization
3. **Training Pipeline**: Enhanced LoRA fine-tuning β†’ Multiple checkpoints β†’ Enhanced adapter storage
4. **Inference Pipeline**: Enhanced model loading β†’ Style-aware generation β†’ Response formatting
5. **User Interface**: Enhanced Gradio web app β†’ Apple Silicon optimization β†’ Real-time generation
## Key Technical Decisions
### Model Selection: Zephyr-7B-Beta
**Decision**: Use HuggingFaceH4/zephyr-7b-beta as base model
**Rationale**:
- Instruction-tuned for better following of generation prompts
- No authentication required (unlike some Mistral variants)
- 7B parameters: Good balance of capability vs. resource requirements
- Strong performance on text generation tasks
**Alternative Considered**: Direct Mistral-7B
**Why Rejected**: Zephyr's instruction-tuning provides better prompt adherence
### Fine-tuning Approach: LoRA (Low-Rank Adaptation)
**Decision**: Use LoRA instead of full fine-tuning
**Rationale**:
- **Memory Efficiency**: Only 0.58% of parameters trainable (42.5M vs 7.24B)
- **Hardware Compatibility**: Fits in 8GB RAM on Apple Silicon
- **Training Speed**: ~18 minutes vs hours for full fine-tuning
- **Preservation**: Keeps base model knowledge while adding specialization
**Configuration**:
```python
LoRA Parameters:
- rank: 16 (balance of efficiency vs capacity)
- alpha: 32 (scaling factor)
- dropout: 0.1 (regularization)
- target_modules: All attention layers
```
### Hardware Optimization: Apple Silicon MPS
**Decision**: Optimize for Apple M1/M2/M3 chips with MPS backend
**Rationale**:
- **Target Hardware**: Many developers use MacBooks
- **Performance**: MPS provides significant acceleration over CPU
- **Memory**: Unified memory architecture efficient for ML workloads
- **Accessibility**: Makes fine-tuning accessible without expensive GPUs
**Implementation Pattern**:
```python
# Automatic device detection
if torch.backends.mps.is_available():
device = "mps"
dtype = torch.float16 # Memory efficient
elif torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
```
## Design Patterns in Use
### Data Processing Pipeline Pattern
**Pattern**: ETL (Extract, Transform, Load) with validation
**Implementation**:
1. **Extract**: Web scraping with rate limiting and error handling
2. **Transform**: Text cleaning, format standardization, instruction formatting
3. **Load**: JSON storage with validation and dataset splitting
4. **Validate**: Content quality checks and format verification
### Model Adapter Pattern
**Pattern**: Adapter pattern for model extensions
**Implementation**:
- Base model remains unchanged
- LoRA adapters provide specialization
- Easy swapping between different fine-tuned versions
- Preserves ability to use base model capabilities
### Configuration Management Pattern
**Pattern**: Centralized configuration with environment-specific overrides
**Implementation**:
```python
# Training configuration centralized in finetune.py
TRAINING_CONFIG = {
"learning_rate": 1e-4,
"num_epochs": 2,
"batch_size": 1,
"gradient_accumulation_steps": 8
}
# Hardware-specific overrides
if device == "mps":
TRAINING_CONFIG["fp16"] = False # Not supported on MPS
TRAINING_CONFIG["dataloader_num_workers"] = 0
```
### Error Handling and Logging Pattern
**Pattern**: Comprehensive logging with graceful degradation
**Implementation**:
- Structured logging to `morris_bot.log`
- Try-catch blocks with informative error messages
- Fallback behaviors (CPU if MPS fails, etc.)
- Progress tracking during long operations
## Component Relationships
### Enhanced Data Flow Architecture
```
Raw Articles β†’ Enhanced Dataset β†’ Style-Optimized Training β†’ Enhanced Model
↓ ↓ ↓ ↓
Raw JSON β†’ Improved Prompts β†’ Non-telecom Examples β†’ Enhanced LoRA Adapters
↓ ↓ ↓ ↓
Original β†’ System Prompt Update β†’ Topic Diversification β†’ Multi-topic Capability
↓ ↓ ↓ ↓
Web Interface ← Enhanced Inference ← Enhanced Model ← Style-Aware Training
```
### Enhanced Dependency Relationships
- **app.py** depends on enhanced model in `models/iain-morris-model-enhanced/`
- **finetune.py** depends on enhanced dataset in `data/enhanced_train_dataset.json`
- **update_system_prompt.py** enhances training data with improved style guidance
- **add_non_telecom_examples.py** expands dataset with topic diversity
- **test_enhanced_model.py** validates enhanced model performance
- **ENHANCEMENT_SUMMARY.md** documents all improvements and changes
### Enhanced Model Architecture
```
Base Model: Zephyr-7B-Beta (7.24B parameters)
↓
Enhanced LoRA Adapters (42.5M trainable parameters)
↓
Style-Aware Generation with:
- Doom-laden openings
- Cynical wit and expertise
- Signature phrases ("What could possibly go wrong?")
- Dark analogies and visceral metaphors
- British cynicism with parenthetical snark
- Multi-topic versatility (telecom + non-telecom)
```
### State Management
- **Model State**: Stored as LoRA adapter files (safetensors format)
- **Training State**: Checkpoints saved during training for recovery
- **Data State**: JSON files with versioning through filenames
- **Application State**: Stateless web interface, model loaded on demand
## Critical Implementation Paths
### Training Pipeline Critical Path
1. **Data Validation**: Ensure training examples meet quality standards
2. **Model Loading**: Base model download and initialization
3. **LoRA Setup**: Adapter configuration and parameter freezing
4. **Training Loop**: Gradient computation and adapter updates
5. **Checkpoint Saving**: Periodic saves for recovery
6. **Final Export**: Adapter weights saved for inference
### Inference Pipeline Critical Path
1. **Model Loading**: Base model + LoRA adapter loading
2. **Prompt Formatting**: User input β†’ instruction format
3. **Generation**: Model forward pass with sampling parameters
4. **Post-processing**: Clean output, format for display
5. **Response**: Return formatted article to user interface
### Error Recovery Patterns
- **Training Interruption**: Resume from last checkpoint
- **Memory Overflow**: Reduce batch size, enable gradient checkpointing
- **Model Loading Failure**: Fallback to CPU, reduce precision
- **Generation Timeout**: Implement timeout with partial results
## Performance Optimization Patterns
### Memory Management
- **Gradient Accumulation**: Simulate larger batch sizes without memory increase
- **Mixed Precision**: float16 where supported for memory efficiency
- **Model Sharding**: LoRA adapters separate from base model
- **Garbage Collection**: Explicit cleanup after training steps
### Compute Optimization
- **Hardware Detection**: Automatic selection of best available device
- **Batch Processing**: Process multiple examples efficiently
- **Caching**: Tokenized datasets cached for repeated training runs
- **Parallel Processing**: Multi-threading where beneficial
### User Experience Optimization
- **Lazy Loading**: Model loaded only when needed
- **Progress Indicators**: Real-time feedback during long operations
- **Parameter Validation**: Input validation before expensive operations
- **Responsive Interface**: Non-blocking UI during generation
## Scalability Considerations
### Current Limitations
- **Single Model**: Only one fine-tuned model at a time
- **Local Deployment**: No distributed inference capability
- **Memory Bound**: Limited by single machine memory
- **Sequential Processing**: One generation request at a time
### Future Scalability Patterns
- **Model Versioning**: Support multiple LoRA adapters
- **Distributed Inference**: Model serving across multiple devices
- **Batch Generation**: Process multiple requests simultaneously
- **Cloud Deployment**: Container-based scaling patterns
## Security and Ethics Patterns
### Data Handling
- **Public Data Only**: Scrape only publicly available articles
- **Rate Limiting**: Respectful scraping with delays
- **Attribution**: Clear marking of AI-generated content
- **Privacy**: No personal data collection or storage
### Model Safety
- **Content Filtering**: Basic checks on generated content
- **Human Review**: Emphasis on human oversight requirement
- **Educational Use**: Clear guidelines for appropriate use
- **Transparency**: Open documentation of training process