# Technical Context: Morris Bot ## Technology Stack ### Core ML Technologies - **Base Model**: HuggingFaceH4/zephyr-7b-beta (7 billion parameters) - **Fine-tuning**: LoRA (Low-Rank Adaptation) via PEFT library - **Framework**: PyTorch with Transformers library - **Hardware Acceleration**: Apple Silicon MPS / NVIDIA CUDA - **Precision**: float16 for memory efficiency ### Development Environment - **Language**: Python 3.8+ - **Package Manager**: pip with requirements.txt - **Virtual Environment**: venv (recommended) - **IDE Support**: VSCode with Python extensions - **Version Control**: Git (project structure suggests GitHub) ### Key Dependencies ```python # Core ML Stack torch>=2.0.0 # PyTorch framework transformers>=4.35.0 # HuggingFace transformers peft>=0.6.0 # Parameter-efficient fine-tuning datasets>=2.14.0 # Dataset handling accelerate>=0.24.0 # Training acceleration # Web Interface gradio>=4.0.0 # Web UI framework # Data Processing beautifulsoup4>=4.12.0 # Web scraping requests>=2.31.0 # HTTP requests pandas>=2.0.0 # Data manipulation numpy>=1.24.0 # Numerical computing # Utilities tqdm>=4.65.0 # Progress bars logging # Built-in logging json # Built-in JSON handling ``` ## Development Setup ### Hardware Requirements - **Minimum**: 8GB RAM, 5GB free disk space - **Recommended**: 16GB RAM, Apple Silicon M1/M2/M3 or NVIDIA GPU - **Storage**: ~5GB for model files, ~1GB for training data - **Network**: Stable internet for model downloads ### Installation Process ```bash # Environment setup python -m venv venv source venv/bin/activate # macOS/Linux # venv\Scripts\activate # Windows # Dependencies pip install -r requirements.txt # Verify installation python test_setup.py ``` ### Hardware Detection Logic ```python # Automatic device selection (from src/finetune.py) import torch if torch.backends.mps.is_available(): device = "mps" # Apple Silicon dtype = torch.float16 quantization_config = None # Not supported on MPS elif torch.cuda.is_available(): device = "cuda" # NVIDIA GPU dtype = torch.float16 quantization_config = BitsAndBytesConfig(...) else: device = "cpu" # CPU fallback dtype = torch.float32 ``` ## Technical Constraints ### Apple Silicon Specific - **MPS Backend**: Metal Performance Shaders for acceleration - **Quantization**: BitsAndBytesConfig not supported on MPS - **DataLoader**: num_workers=0 required for stability - **Memory**: Unified memory architecture, efficient but limited ### Memory Management - **Model Size**: 7B parameters ≈ 14GB in float32, 7GB in float16 - **LoRA Efficiency**: Only 42.5M parameters trainable (0.58% of total) - **Gradient Accumulation**: Simulate larger batches without memory increase - **Batch Size**: Limited to 1 on consumer hardware ### Training Constraints - **Epochs**: Enhanced model uses 4 epochs for better style learning - **Learning Rate**: Enhanced model uses 5e-5 for stable training - **Sequence Length**: Max 2048 tokens per example - **Dataset Size**: Enhanced model trained on 126 examples with topic diversity ## Tool Usage Patterns ### Model Training Workflow ```bash # Full pipeline python run_pipeline.py --all # Individual steps python src/scraper.py # Collect articles python src/preprocess.py # Prepare training data python src/finetune.py # Train model python test_finetuned_model.py # Validate results ``` ### Development Testing ```bash # Enhanced model testing python test_enhanced_model.py # Enhanced style testing python test_enhanced_style.py # Original model test python test_finetuned_model.py # Setup verification python test_setup.py # Web interface python app.py ``` ### Enhanced Model Tools ```bash # Update system prompts in training data python update_system_prompt.py # Add non-telecom examples to dataset python add_non_telecom_examples.py # Train enhanced model python src/finetune.py # Uses enhanced dataset automatically ``` ### Data Management ```bash # Check training data python -c "import json; print(len(json.load(open('data/train_dataset.json'))))" # Validate training examples python validate_training_examples.py # Generate additional examples python generate_training_examples.py ``` ## File Structure and Conventions ### Project Organization ``` morris-bot/ ├── src/ # Core source code │ ├── finetune.py # Training logic │ ├── preprocess.py # Data preparation │ ├── scraper.py # Web scraping │ └── utils.py # Helper functions ├── data/ # Training and processed data ├── models/ # Trained model storage ├── memory-bank/ # Documentation and context └── logs/ # Training and application logs ``` ### Naming Conventions - **Files**: snake_case (e.g., `test_finetuned_model.py`) - **Classes**: PascalCase (e.g., `MorrisBotTrainer`) - **Functions**: snake_case (e.g., `load_model_and_tokenizer`) - **Constants**: UPPER_CASE (e.g., `TRAINING_CONFIG`) ### Configuration Management - **Training Config**: Centralized in `src/finetune.py` - **Model Paths**: Relative paths from project root - **Device Detection**: Automatic with fallbacks - **Logging**: Structured logging to `morris_bot.log` ## Performance Characteristics ### Training Performance - **Apple M3**: ~18 minutes for 2 epochs - **Apple M1/M2**: ~25 minutes for 2 epochs - **NVIDIA RTX 4090**: ~10 minutes for 2 epochs - **CPU Only**: 4-6 hours for 2 epochs ### Inference Performance - **Apple Silicon**: 2-3 seconds per article - **NVIDIA GPU**: 1-2 seconds per article - **CPU**: 15-30 seconds per article ### Memory Usage - **Training**: ~8GB RAM (with LoRA) - **Inference**: ~6GB RAM (model loaded) - **Storage**: ~5GB for complete setup ## Integration Patterns ### Web Interface Integration - **Framework**: Gradio for rapid prototyping - **Model Loading**: Lazy loading on first generation request - **State Management**: Stateless interface, model cached in memory - **Error Handling**: Graceful degradation with user feedback ### Data Pipeline Integration - **Input**: Raw HTML from Light Reading articles - **Processing**: BeautifulSoup → JSON → HuggingFace Dataset - **Output**: Instruction-formatted training examples - **Validation**: Quality checks at each stage ### Model Serving Integration - **Loading**: Base model + LoRA adapters - **Tokenization**: Automatic tokenizer selection - **Generation**: Configurable sampling parameters - **Post-processing**: Text cleaning and formatting ## Development Tools and Debugging ### Logging Configuration ```python # Structured logging setup import logging logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler('morris_bot.log'), logging.StreamHandler() ] ) ``` ### Debug Utilities - **Model Testing**: `test_finetuned_model.py` for quick validation - **Setup Verification**: `test_setup.py` for environment checks - **Training Validation**: `validate_training_examples.py` for data quality - **Progress Tracking**: tqdm progress bars during training ### Common Debug Commands ```bash # Check model files ls -la models/lora_adapters/ # Verify training data python -c "import json; data=json.load(open('data/train_dataset.json')); print(f'Examples: {len(data)}')" # Test hardware acceleration python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}, CUDA: {torch.cuda.is_available()}')" # Monitor training logs tail -f morris_bot.log ``` ## Deployment Considerations ### Local Deployment - **Requirements**: Python environment with dependencies - **Model Storage**: Local filesystem (~5GB) - **Interface**: Gradio web server on localhost:7860 - **Scaling**: Single user, single model instance ### Production Considerations (Future) - **Containerization**: Docker for consistent deployment - **Model Serving**: Dedicated inference servers - **Load Balancing**: Multiple model instances - **Monitoring**: Performance and usage metrics ### Security Considerations - **Model Access**: Local filesystem only - **Web Interface**: Local network access by default - **Data Privacy**: No user data persistence - **Content Safety**: Basic output validation recommended