Technical Context: Morris Bot

Technology Stack

Core ML Technologies

Base Model: HuggingFaceH4/zephyr-7b-beta (7 billion parameters)
Fine-tuning: LoRA (Low-Rank Adaptation) via PEFT library
Framework: PyTorch with Transformers library
Hardware Acceleration: Apple Silicon MPS / NVIDIA CUDA
Precision: float16 for memory efficiency

Development Environment

Language: Python 3.8+
Package Manager: pip with requirements.txt
Virtual Environment: venv (recommended)
IDE Support: VSCode with Python extensions
Version Control: Git (project structure suggests GitHub)

Key Dependencies

# Core ML Stack
torch>=2.0.0                    # PyTorch framework
transformers>=4.35.0            # HuggingFace transformers
peft>=0.6.0                     # Parameter-efficient fine-tuning
datasets>=2.14.0                # Dataset handling
accelerate>=0.24.0              # Training acceleration

# Web Interface
gradio>=4.0.0                   # Web UI framework

# Data Processing
beautifulsoup4>=4.12.0          # Web scraping
requests>=2.31.0                # HTTP requests
pandas>=2.0.0                   # Data manipulation
numpy>=1.24.0                   # Numerical computing

# Utilities
tqdm>=4.65.0                    # Progress bars
logging                         # Built-in logging
json                           # Built-in JSON handling

Development Setup

Hardware Requirements

Minimum: 8GB RAM, 5GB free disk space
Recommended: 16GB RAM, Apple Silicon M1/M2/M3 or NVIDIA GPU
Storage: ~5GB for model files, ~1GB for training data
Network: Stable internet for model downloads

Installation Process

# Environment setup
python -m venv venv
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows

# Dependencies
pip install -r requirements.txt

# Verify installation
python test_setup.py

Hardware Detection Logic

# Automatic device selection (from src/finetune.py)
import torch

if torch.backends.mps.is_available():
    device = "mps"              # Apple Silicon
    dtype = torch.float16
    quantization_config = None  # Not supported on MPS
elif torch.cuda.is_available():
    device = "cuda"             # NVIDIA GPU
    dtype = torch.float16
    quantization_config = BitsAndBytesConfig(...)
else:
    device = "cpu"              # CPU fallback
    dtype = torch.float32

Technical Constraints

Apple Silicon Specific

MPS Backend: Metal Performance Shaders for acceleration
Quantization: BitsAndBytesConfig not supported on MPS
DataLoader: num_workers=0 required for stability
Memory: Unified memory architecture, efficient but limited

Memory Management

Model Size: 7B parameters ≈ 14GB in float32, 7GB in float16
LoRA Efficiency: Only 42.5M parameters trainable (0.58% of total)
Gradient Accumulation: Simulate larger batches without memory increase
Batch Size: Limited to 1 on consumer hardware

Training Constraints

Epochs: Enhanced model uses 4 epochs for better style learning
Learning Rate: Enhanced model uses 5e-5 for stable training
Sequence Length: Max 2048 tokens per example
Dataset Size: Enhanced model trained on 126 examples with topic diversity

Tool Usage Patterns

Model Training Workflow

# Full pipeline
python run_pipeline.py --all

# Individual steps
python src/scraper.py           # Collect articles
python src/preprocess.py        # Prepare training data
python src/finetune.py          # Train model
python test_finetuned_model.py  # Validate results

Development Testing

# Enhanced model testing
python test_enhanced_model.py

# Enhanced style testing
python test_enhanced_style.py

# Original model test
python test_finetuned_model.py

# Setup verification
python test_setup.py

# Web interface
python app.py

Enhanced Model Tools

# Update system prompts in training data
python update_system_prompt.py

# Add non-telecom examples to dataset
python add_non_telecom_examples.py

# Train enhanced model
python src/finetune.py  # Uses enhanced dataset automatically

Data Management

# Check training data
python -c "import json; print(len(json.load(open('data/train_dataset.json'))))"

# Validate training examples
python validate_training_examples.py

# Generate additional examples
python generate_training_examples.py

File Structure and Conventions

Project Organization

morris-bot/
├── src/                    # Core source code
│   ├── finetune.py        # Training logic
│   ├── preprocess.py      # Data preparation
│   ├── scraper.py         # Web scraping
│   └── utils.py           # Helper functions
├── data/                  # Training and processed data
├── models/                # Trained model storage
├── memory-bank/           # Documentation and context
└── logs/                  # Training and application logs

Naming Conventions

Files: snake_case (e.g., test_finetuned_model.py)
Classes: PascalCase (e.g., MorrisBotTrainer)
Functions: snake_case (e.g., load_model_and_tokenizer)
Constants: UPPER_CASE (e.g., TRAINING_CONFIG)

Configuration Management

Training Config: Centralized in src/finetune.py
Model Paths: Relative paths from project root
Device Detection: Automatic with fallbacks
Logging: Structured logging to morris_bot.log

Performance Characteristics

Training Performance

Apple M3: ~18 minutes for 2 epochs
Apple M1/M2: ~25 minutes for 2 epochs
NVIDIA RTX 4090: ~10 minutes for 2 epochs
CPU Only: 4-6 hours for 2 epochs

Inference Performance

Apple Silicon: 2-3 seconds per article
NVIDIA GPU: 1-2 seconds per article
CPU: 15-30 seconds per article

Memory Usage

Training: ~8GB RAM (with LoRA)
Inference: ~6GB RAM (model loaded)
Storage: ~5GB for complete setup

Integration Patterns

Web Interface Integration

Framework: Gradio for rapid prototyping
Model Loading: Lazy loading on first generation request
State Management: Stateless interface, model cached in memory
Error Handling: Graceful degradation with user feedback

Data Pipeline Integration

Input: Raw HTML from Light Reading articles
Processing: BeautifulSoup → JSON → HuggingFace Dataset
Output: Instruction-formatted training examples
Validation: Quality checks at each stage

Model Serving Integration

Loading: Base model + LoRA adapters
Tokenization: Automatic tokenizer selection
Generation: Configurable sampling parameters
Post-processing: Text cleaning and formatting

Development Tools and Debugging

Logging Configuration

# Structured logging setup
import logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('morris_bot.log'),
        logging.StreamHandler()
    ]
)

Debug Utilities

Model Testing: test_finetuned_model.py for quick validation
Setup Verification: test_setup.py for environment checks
Training Validation: validate_training_examples.py for data quality
Progress Tracking: tqdm progress bars during training

Common Debug Commands

# Check model files
ls -la models/lora_adapters/

# Verify training data
python -c "import json; data=json.load(open('data/train_dataset.json')); print(f'Examples: {len(data)}')"

# Test hardware acceleration
python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}, CUDA: {torch.cuda.is_available()}')"

# Monitor training logs
tail -f morris_bot.log

Deployment Considerations

Local Deployment

Requirements: Python environment with dependencies
Model Storage: Local filesystem (~5GB)
Interface: Gradio web server on localhost:7860
Scaling: Single user, single model instance

Production Considerations (Future)

Containerization: Docker for consistent deployment
Model Serving: Dedicated inference servers
Load Balancing: Multiple model instances
Monitoring: Performance and usage metrics

Security Considerations

Model Access: Local filesystem only
Web Interface: Local network access by default
Data Privacy: No user data persistence
Content Safety: Basic output validation recommended