morris-bot / memory-bank /techContext.md
eusholli's picture
Upload folder using huggingface_hub
599c2c0 verified

A newer version of the Gradio SDK is available: 5.41.1

Upgrade

Technical Context: Morris Bot

Technology Stack

Core ML Technologies

  • Base Model: HuggingFaceH4/zephyr-7b-beta (7 billion parameters)
  • Fine-tuning: LoRA (Low-Rank Adaptation) via PEFT library
  • Framework: PyTorch with Transformers library
  • Hardware Acceleration: Apple Silicon MPS / NVIDIA CUDA
  • Precision: float16 for memory efficiency

Development Environment

  • Language: Python 3.8+
  • Package Manager: pip with requirements.txt
  • Virtual Environment: venv (recommended)
  • IDE Support: VSCode with Python extensions
  • Version Control: Git (project structure suggests GitHub)

Key Dependencies

# Core ML Stack
torch>=2.0.0                    # PyTorch framework
transformers>=4.35.0            # HuggingFace transformers
peft>=0.6.0                     # Parameter-efficient fine-tuning
datasets>=2.14.0                # Dataset handling
accelerate>=0.24.0              # Training acceleration

# Web Interface
gradio>=4.0.0                   # Web UI framework

# Data Processing
beautifulsoup4>=4.12.0          # Web scraping
requests>=2.31.0                # HTTP requests
pandas>=2.0.0                   # Data manipulation
numpy>=1.24.0                   # Numerical computing

# Utilities
tqdm>=4.65.0                    # Progress bars
logging                         # Built-in logging
json                           # Built-in JSON handling

Development Setup

Hardware Requirements

  • Minimum: 8GB RAM, 5GB free disk space
  • Recommended: 16GB RAM, Apple Silicon M1/M2/M3 or NVIDIA GPU
  • Storage: ~5GB for model files, ~1GB for training data
  • Network: Stable internet for model downloads

Installation Process

# Environment setup
python -m venv venv
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows

# Dependencies
pip install -r requirements.txt

# Verify installation
python test_setup.py

Hardware Detection Logic

# Automatic device selection (from src/finetune.py)
import torch

if torch.backends.mps.is_available():
    device = "mps"              # Apple Silicon
    dtype = torch.float16
    quantization_config = None  # Not supported on MPS
elif torch.cuda.is_available():
    device = "cuda"             # NVIDIA GPU
    dtype = torch.float16
    quantization_config = BitsAndBytesConfig(...)
else:
    device = "cpu"              # CPU fallback
    dtype = torch.float32

Technical Constraints

Apple Silicon Specific

  • MPS Backend: Metal Performance Shaders for acceleration
  • Quantization: BitsAndBytesConfig not supported on MPS
  • DataLoader: num_workers=0 required for stability
  • Memory: Unified memory architecture, efficient but limited

Memory Management

  • Model Size: 7B parameters β‰ˆ 14GB in float32, 7GB in float16
  • LoRA Efficiency: Only 42.5M parameters trainable (0.58% of total)
  • Gradient Accumulation: Simulate larger batches without memory increase
  • Batch Size: Limited to 1 on consumer hardware

Training Constraints

  • Epochs: Enhanced model uses 4 epochs for better style learning
  • Learning Rate: Enhanced model uses 5e-5 for stable training
  • Sequence Length: Max 2048 tokens per example
  • Dataset Size: Enhanced model trained on 126 examples with topic diversity

Tool Usage Patterns

Model Training Workflow

# Full pipeline
python run_pipeline.py --all

# Individual steps
python src/scraper.py           # Collect articles
python src/preprocess.py        # Prepare training data
python src/finetune.py          # Train model
python test_finetuned_model.py  # Validate results

Development Testing

# Enhanced model testing
python test_enhanced_model.py

# Enhanced style testing
python test_enhanced_style.py

# Original model test
python test_finetuned_model.py

# Setup verification
python test_setup.py

# Web interface
python app.py

Enhanced Model Tools

# Update system prompts in training data
python update_system_prompt.py

# Add non-telecom examples to dataset
python add_non_telecom_examples.py

# Train enhanced model
python src/finetune.py  # Uses enhanced dataset automatically

Data Management

# Check training data
python -c "import json; print(len(json.load(open('data/train_dataset.json'))))"

# Validate training examples
python validate_training_examples.py

# Generate additional examples
python generate_training_examples.py

File Structure and Conventions

Project Organization

morris-bot/
β”œβ”€β”€ src/                    # Core source code
β”‚   β”œβ”€β”€ finetune.py        # Training logic
β”‚   β”œβ”€β”€ preprocess.py      # Data preparation
β”‚   β”œβ”€β”€ scraper.py         # Web scraping
β”‚   └── utils.py           # Helper functions
β”œβ”€β”€ data/                  # Training and processed data
β”œβ”€β”€ models/                # Trained model storage
β”œβ”€β”€ memory-bank/           # Documentation and context
└── logs/                  # Training and application logs

Naming Conventions

  • Files: snake_case (e.g., test_finetuned_model.py)
  • Classes: PascalCase (e.g., MorrisBotTrainer)
  • Functions: snake_case (e.g., load_model_and_tokenizer)
  • Constants: UPPER_CASE (e.g., TRAINING_CONFIG)

Configuration Management

  • Training Config: Centralized in src/finetune.py
  • Model Paths: Relative paths from project root
  • Device Detection: Automatic with fallbacks
  • Logging: Structured logging to morris_bot.log

Performance Characteristics

Training Performance

  • Apple M3: ~18 minutes for 2 epochs
  • Apple M1/M2: ~25 minutes for 2 epochs
  • NVIDIA RTX 4090: ~10 minutes for 2 epochs
  • CPU Only: 4-6 hours for 2 epochs

Inference Performance

  • Apple Silicon: 2-3 seconds per article
  • NVIDIA GPU: 1-2 seconds per article
  • CPU: 15-30 seconds per article

Memory Usage

  • Training: ~8GB RAM (with LoRA)
  • Inference: ~6GB RAM (model loaded)
  • Storage: ~5GB for complete setup

Integration Patterns

Web Interface Integration

  • Framework: Gradio for rapid prototyping
  • Model Loading: Lazy loading on first generation request
  • State Management: Stateless interface, model cached in memory
  • Error Handling: Graceful degradation with user feedback

Data Pipeline Integration

  • Input: Raw HTML from Light Reading articles
  • Processing: BeautifulSoup β†’ JSON β†’ HuggingFace Dataset
  • Output: Instruction-formatted training examples
  • Validation: Quality checks at each stage

Model Serving Integration

  • Loading: Base model + LoRA adapters
  • Tokenization: Automatic tokenizer selection
  • Generation: Configurable sampling parameters
  • Post-processing: Text cleaning and formatting

Development Tools and Debugging

Logging Configuration

# Structured logging setup
import logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('morris_bot.log'),
        logging.StreamHandler()
    ]
)

Debug Utilities

  • Model Testing: test_finetuned_model.py for quick validation
  • Setup Verification: test_setup.py for environment checks
  • Training Validation: validate_training_examples.py for data quality
  • Progress Tracking: tqdm progress bars during training

Common Debug Commands

# Check model files
ls -la models/lora_adapters/

# Verify training data
python -c "import json; data=json.load(open('data/train_dataset.json')); print(f'Examples: {len(data)}')"

# Test hardware acceleration
python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}, CUDA: {torch.cuda.is_available()}')"

# Monitor training logs
tail -f morris_bot.log

Deployment Considerations

Local Deployment

  • Requirements: Python environment with dependencies
  • Model Storage: Local filesystem (~5GB)
  • Interface: Gradio web server on localhost:7860
  • Scaling: Single user, single model instance

Production Considerations (Future)

  • Containerization: Docker for consistent deployment
  • Model Serving: Dedicated inference servers
  • Load Balancing: Multiple model instances
  • Monitoring: Performance and usage metrics

Security Considerations

  • Model Access: Local filesystem only
  • Web Interface: Local network access by default
  • Data Privacy: No user data persistence
  • Content Safety: Basic output validation recommended