# Technical Context: Morris Bot

## Technology Stack

### Core ML Technologies
- **Base Model**: HuggingFaceH4/zephyr-7b-beta (7 billion parameters)
- **Fine-tuning**: LoRA (Low-Rank Adaptation) via PEFT library
- **Framework**: PyTorch with Transformers library
- **Hardware Acceleration**: Apple Silicon MPS / NVIDIA CUDA
- **Precision**: float16 for memory efficiency

### Development Environment
- **Language**: Python 3.8+
- **Package Manager**: pip with requirements.txt
- **Virtual Environment**: venv (recommended)
- **IDE Support**: VSCode with Python extensions
- **Version Control**: Git (project structure suggests GitHub)

### Key Dependencies
```python
# Core ML Stack
torch>=2.0.0                    # PyTorch framework
transformers>=4.35.0            # HuggingFace transformers
peft>=0.6.0                     # Parameter-efficient fine-tuning
datasets>=2.14.0                # Dataset handling
accelerate>=0.24.0              # Training acceleration

# Web Interface
gradio>=4.0.0                   # Web UI framework

# Data Processing
beautifulsoup4>=4.12.0          # Web scraping
requests>=2.31.0                # HTTP requests
pandas>=2.0.0                   # Data manipulation
numpy>=1.24.0                   # Numerical computing

# Utilities
tqdm>=4.65.0                    # Progress bars
logging                         # Built-in logging
json                           # Built-in JSON handling
```

## Development Setup

### Hardware Requirements
- **Minimum**: 8GB RAM, 5GB free disk space
- **Recommended**: 16GB RAM, Apple Silicon M1/M2/M3 or NVIDIA GPU
- **Storage**: ~5GB for model files, ~1GB for training data
- **Network**: Stable internet for model downloads

### Installation Process
```bash
# Environment setup
python -m venv venv
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows

# Dependencies
pip install -r requirements.txt

# Verify installation
python test_setup.py
```

### Hardware Detection Logic
```python
# Automatic device selection (from src/finetune.py)
import torch

if torch.backends.mps.is_available():
    device = "mps"              # Apple Silicon
    dtype = torch.float16
    quantization_config = None  # Not supported on MPS
elif torch.cuda.is_available():
    device = "cuda"             # NVIDIA GPU
    dtype = torch.float16
    quantization_config = BitsAndBytesConfig(...)
else:
    device = "cpu"              # CPU fallback
    dtype = torch.float32
```

## Technical Constraints

### Apple Silicon Specific
- **MPS Backend**: Metal Performance Shaders for acceleration
- **Quantization**: BitsAndBytesConfig not supported on MPS
- **DataLoader**: num_workers=0 required for stability
- **Memory**: Unified memory architecture, efficient but limited

### Memory Management
- **Model Size**: 7B parameters ≈ 14GB in float32, 7GB in float16
- **LoRA Efficiency**: Only 42.5M parameters trainable (0.58% of total)
- **Gradient Accumulation**: Simulate larger batches without memory increase
- **Batch Size**: Limited to 1 on consumer hardware

### Training Constraints
- **Epochs**: Enhanced model uses 4 epochs for better style learning
- **Learning Rate**: Enhanced model uses 5e-5 for stable training
- **Sequence Length**: Max 2048 tokens per example
- **Dataset Size**: Enhanced model trained on 126 examples with topic diversity

## Tool Usage Patterns

### Model Training Workflow
```bash
# Full pipeline
python run_pipeline.py --all

# Individual steps
python src/scraper.py           # Collect articles
python src/preprocess.py        # Prepare training data
python src/finetune.py          # Train model
python test_finetuned_model.py  # Validate results
```

### Development Testing
```bash
# Enhanced model testing
python test_enhanced_model.py

# Enhanced style testing
python test_enhanced_style.py

# Original model test
python test_finetuned_model.py

# Setup verification
python test_setup.py

# Web interface
python app.py
```

### Enhanced Model Tools
```bash
# Update system prompts in training data
python update_system_prompt.py

# Add non-telecom examples to dataset
python add_non_telecom_examples.py

# Train enhanced model
python src/finetune.py  # Uses enhanced dataset automatically
```

### Data Management
```bash
# Check training data
python -c "import json; print(len(json.load(open('data/train_dataset.json'))))"

# Validate training examples
python validate_training_examples.py

# Generate additional examples
python generate_training_examples.py
```

## File Structure and Conventions

### Project Organization
```
morris-bot/
├── src/                    # Core source code
│   ├── finetune.py        # Training logic
│   ├── preprocess.py      # Data preparation
│   ├── scraper.py         # Web scraping
│   └── utils.py           # Helper functions
├── data/                  # Training and processed data
├── models/                # Trained model storage
├── memory-bank/           # Documentation and context
└── logs/                  # Training and application logs
```

### Naming Conventions
- **Files**: snake_case (e.g., `test_finetuned_model.py`)
- **Classes**: PascalCase (e.g., `MorrisBotTrainer`)
- **Functions**: snake_case (e.g., `load_model_and_tokenizer`)
- **Constants**: UPPER_CASE (e.g., `TRAINING_CONFIG`)

### Configuration Management
- **Training Config**: Centralized in `src/finetune.py`
- **Model Paths**: Relative paths from project root
- **Device Detection**: Automatic with fallbacks
- **Logging**: Structured logging to `morris_bot.log`

## Performance Characteristics

### Training Performance
- **Apple M3**: ~18 minutes for 2 epochs
- **Apple M1/M2**: ~25 minutes for 2 epochs
- **NVIDIA RTX 4090**: ~10 minutes for 2 epochs
- **CPU Only**: 4-6 hours for 2 epochs

### Inference Performance
- **Apple Silicon**: 2-3 seconds per article
- **NVIDIA GPU**: 1-2 seconds per article
- **CPU**: 15-30 seconds per article

### Memory Usage
- **Training**: ~8GB RAM (with LoRA)
- **Inference**: ~6GB RAM (model loaded)
- **Storage**: ~5GB for complete setup

## Integration Patterns

### Web Interface Integration
- **Framework**: Gradio for rapid prototyping
- **Model Loading**: Lazy loading on first generation request
- **State Management**: Stateless interface, model cached in memory
- **Error Handling**: Graceful degradation with user feedback

### Data Pipeline Integration
- **Input**: Raw HTML from Light Reading articles
- **Processing**: BeautifulSoup → JSON → HuggingFace Dataset
- **Output**: Instruction-formatted training examples
- **Validation**: Quality checks at each stage

### Model Serving Integration
- **Loading**: Base model + LoRA adapters
- **Tokenization**: Automatic tokenizer selection
- **Generation**: Configurable sampling parameters
- **Post-processing**: Text cleaning and formatting

## Development Tools and Debugging

### Logging Configuration
```python
# Structured logging setup
import logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('morris_bot.log'),
        logging.StreamHandler()
    ]
)
```

### Debug Utilities
- **Model Testing**: `test_finetuned_model.py` for quick validation
- **Setup Verification**: `test_setup.py` for environment checks
- **Training Validation**: `validate_training_examples.py` for data quality
- **Progress Tracking**: tqdm progress bars during training

### Common Debug Commands
```bash
# Check model files
ls -la models/lora_adapters/

# Verify training data
python -c "import json; data=json.load(open('data/train_dataset.json')); print(f'Examples: {len(data)}')"

# Test hardware acceleration
python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}, CUDA: {torch.cuda.is_available()}')"

# Monitor training logs
tail -f morris_bot.log
```

## Deployment Considerations

### Local Deployment
- **Requirements**: Python environment with dependencies
- **Model Storage**: Local filesystem (~5GB)
- **Interface**: Gradio web server on localhost:7860
- **Scaling**: Single user, single model instance

### Production Considerations (Future)
- **Containerization**: Docker for consistent deployment
- **Model Serving**: Dedicated inference servers
- **Load Balancing**: Multiple model instances
- **Monitoring**: Performance and usage metrics

### Security Considerations
- **Model Access**: Local filesystem only
- **Web Interface**: Local network access by default
- **Data Privacy**: No user data persistence
- **Content Safety**: Basic output validation recommended