# System Patterns: Morris Bot Architecture ## System Architecture Overview ### High-Level Architecture ``` Data Collection → Preprocessing → Enhancement → Fine-tuning → Inference → Web Interface ↓ ↓ ↓ ↓ ↓ ↓ scraper.py → preprocess.py → enhance.py → finetune.py → model → app.py (Gradio) ``` ### Core Components 1. **Data Pipeline**: Web scraping → JSON storage → Enhancement → Dataset preparation 2. **Enhancement Pipeline**: System prompt improvement → Non-telecom examples → Style optimization 3. **Training Pipeline**: Enhanced LoRA fine-tuning → Multiple checkpoints → Enhanced adapter storage 4. **Inference Pipeline**: Enhanced model loading → Style-aware generation → Response formatting 5. **User Interface**: Enhanced Gradio web app → Apple Silicon optimization → Real-time generation ## Key Technical Decisions ### Model Selection: Zephyr-7B-Beta **Decision**: Use HuggingFaceH4/zephyr-7b-beta as base model **Rationale**: - Instruction-tuned for better following of generation prompts - No authentication required (unlike some Mistral variants) - 7B parameters: Good balance of capability vs. resource requirements - Strong performance on text generation tasks **Alternative Considered**: Direct Mistral-7B **Why Rejected**: Zephyr's instruction-tuning provides better prompt adherence ### Fine-tuning Approach: LoRA (Low-Rank Adaptation) **Decision**: Use LoRA instead of full fine-tuning **Rationale**: - **Memory Efficiency**: Only 0.58% of parameters trainable (42.5M vs 7.24B) - **Hardware Compatibility**: Fits in 8GB RAM on Apple Silicon - **Training Speed**: ~18 minutes vs hours for full fine-tuning - **Preservation**: Keeps base model knowledge while adding specialization **Configuration**: ```python LoRA Parameters: - rank: 16 (balance of efficiency vs capacity) - alpha: 32 (scaling factor) - dropout: 0.1 (regularization) - target_modules: All attention layers ``` ### Hardware Optimization: Apple Silicon MPS **Decision**: Optimize for Apple M1/M2/M3 chips with MPS backend **Rationale**: - **Target Hardware**: Many developers use MacBooks - **Performance**: MPS provides significant acceleration over CPU - **Memory**: Unified memory architecture efficient for ML workloads - **Accessibility**: Makes fine-tuning accessible without expensive GPUs **Implementation Pattern**: ```python # Automatic device detection if torch.backends.mps.is_available(): device = "mps" dtype = torch.float16 # Memory efficient elif torch.cuda.is_available(): device = "cuda" else: device = "cpu" ``` ## Design Patterns in Use ### Data Processing Pipeline Pattern **Pattern**: ETL (Extract, Transform, Load) with validation **Implementation**: 1. **Extract**: Web scraping with rate limiting and error handling 2. **Transform**: Text cleaning, format standardization, instruction formatting 3. **Load**: JSON storage with validation and dataset splitting 4. **Validate**: Content quality checks and format verification ### Model Adapter Pattern **Pattern**: Adapter pattern for model extensions **Implementation**: - Base model remains unchanged - LoRA adapters provide specialization - Easy swapping between different fine-tuned versions - Preserves ability to use base model capabilities ### Configuration Management Pattern **Pattern**: Centralized configuration with environment-specific overrides **Implementation**: ```python # Training configuration centralized in finetune.py TRAINING_CONFIG = { "learning_rate": 1e-4, "num_epochs": 2, "batch_size": 1, "gradient_accumulation_steps": 8 } # Hardware-specific overrides if device == "mps": TRAINING_CONFIG["fp16"] = False # Not supported on MPS TRAINING_CONFIG["dataloader_num_workers"] = 0 ``` ### Error Handling and Logging Pattern **Pattern**: Comprehensive logging with graceful degradation **Implementation**: - Structured logging to `morris_bot.log` - Try-catch blocks with informative error messages - Fallback behaviors (CPU if MPS fails, etc.) - Progress tracking during long operations ## Component Relationships ### Enhanced Data Flow Architecture ``` Raw Articles → Enhanced Dataset → Style-Optimized Training → Enhanced Model ↓ ↓ ↓ ↓ Raw JSON → Improved Prompts → Non-telecom Examples → Enhanced LoRA Adapters ↓ ↓ ↓ ↓ Original → System Prompt Update → Topic Diversification → Multi-topic Capability ↓ ↓ ↓ ↓ Web Interface ← Enhanced Inference ← Enhanced Model ← Style-Aware Training ``` ### Enhanced Dependency Relationships - **app.py** depends on enhanced model in `models/iain-morris-model-enhanced/` - **finetune.py** depends on enhanced dataset in `data/enhanced_train_dataset.json` - **update_system_prompt.py** enhances training data with improved style guidance - **add_non_telecom_examples.py** expands dataset with topic diversity - **test_enhanced_model.py** validates enhanced model performance - **ENHANCEMENT_SUMMARY.md** documents all improvements and changes ### Enhanced Model Architecture ``` Base Model: Zephyr-7B-Beta (7.24B parameters) ↓ Enhanced LoRA Adapters (42.5M trainable parameters) ↓ Style-Aware Generation with: - Doom-laden openings - Cynical wit and expertise - Signature phrases ("What could possibly go wrong?") - Dark analogies and visceral metaphors - British cynicism with parenthetical snark - Multi-topic versatility (telecom + non-telecom) ``` ### State Management - **Model State**: Stored as LoRA adapter files (safetensors format) - **Training State**: Checkpoints saved during training for recovery - **Data State**: JSON files with versioning through filenames - **Application State**: Stateless web interface, model loaded on demand ## Critical Implementation Paths ### Training Pipeline Critical Path 1. **Data Validation**: Ensure training examples meet quality standards 2. **Model Loading**: Base model download and initialization 3. **LoRA Setup**: Adapter configuration and parameter freezing 4. **Training Loop**: Gradient computation and adapter updates 5. **Checkpoint Saving**: Periodic saves for recovery 6. **Final Export**: Adapter weights saved for inference ### Inference Pipeline Critical Path 1. **Model Loading**: Base model + LoRA adapter loading 2. **Prompt Formatting**: User input → instruction format 3. **Generation**: Model forward pass with sampling parameters 4. **Post-processing**: Clean output, format for display 5. **Response**: Return formatted article to user interface ### Error Recovery Patterns - **Training Interruption**: Resume from last checkpoint - **Memory Overflow**: Reduce batch size, enable gradient checkpointing - **Model Loading Failure**: Fallback to CPU, reduce precision - **Generation Timeout**: Implement timeout with partial results ## Performance Optimization Patterns ### Memory Management - **Gradient Accumulation**: Simulate larger batch sizes without memory increase - **Mixed Precision**: float16 where supported for memory efficiency - **Model Sharding**: LoRA adapters separate from base model - **Garbage Collection**: Explicit cleanup after training steps ### Compute Optimization - **Hardware Detection**: Automatic selection of best available device - **Batch Processing**: Process multiple examples efficiently - **Caching**: Tokenized datasets cached for repeated training runs - **Parallel Processing**: Multi-threading where beneficial ### User Experience Optimization - **Lazy Loading**: Model loaded only when needed - **Progress Indicators**: Real-time feedback during long operations - **Parameter Validation**: Input validation before expensive operations - **Responsive Interface**: Non-blocking UI during generation ## Scalability Considerations ### Current Limitations - **Single Model**: Only one fine-tuned model at a time - **Local Deployment**: No distributed inference capability - **Memory Bound**: Limited by single machine memory - **Sequential Processing**: One generation request at a time ### Future Scalability Patterns - **Model Versioning**: Support multiple LoRA adapters - **Distributed Inference**: Model serving across multiple devices - **Batch Generation**: Process multiple requests simultaneously - **Cloud Deployment**: Container-based scaling patterns ## Security and Ethics Patterns ### Data Handling - **Public Data Only**: Scrape only publicly available articles - **Rate Limiting**: Respectful scraping with delays - **Attribution**: Clear marking of AI-generated content - **Privacy**: No personal data collection or storage ### Model Safety - **Content Filtering**: Basic checks on generated content - **Human Review**: Emphasis on human oversight requirement - **Educational Use**: Clear guidelines for appropriate use - **Transparency**: Open documentation of training process