Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.41.1
Technical Context: Morris Bot
Technology Stack
Core ML Technologies
- Base Model: HuggingFaceH4/zephyr-7b-beta (7 billion parameters)
- Fine-tuning: LoRA (Low-Rank Adaptation) via PEFT library
- Framework: PyTorch with Transformers library
- Hardware Acceleration: Apple Silicon MPS / NVIDIA CUDA
- Precision: float16 for memory efficiency
Development Environment
- Language: Python 3.8+
- Package Manager: pip with requirements.txt
- Virtual Environment: venv (recommended)
- IDE Support: VSCode with Python extensions
- Version Control: Git (project structure suggests GitHub)
Key Dependencies
# Core ML Stack
torch>=2.0.0 # PyTorch framework
transformers>=4.35.0 # HuggingFace transformers
peft>=0.6.0 # Parameter-efficient fine-tuning
datasets>=2.14.0 # Dataset handling
accelerate>=0.24.0 # Training acceleration
# Web Interface
gradio>=4.0.0 # Web UI framework
# Data Processing
beautifulsoup4>=4.12.0 # Web scraping
requests>=2.31.0 # HTTP requests
pandas>=2.0.0 # Data manipulation
numpy>=1.24.0 # Numerical computing
# Utilities
tqdm>=4.65.0 # Progress bars
logging # Built-in logging
json # Built-in JSON handling
Development Setup
Hardware Requirements
- Minimum: 8GB RAM, 5GB free disk space
- Recommended: 16GB RAM, Apple Silicon M1/M2/M3 or NVIDIA GPU
- Storage: ~5GB for model files, ~1GB for training data
- Network: Stable internet for model downloads
Installation Process
# Environment setup
python -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Dependencies
pip install -r requirements.txt
# Verify installation
python test_setup.py
Hardware Detection Logic
# Automatic device selection (from src/finetune.py)
import torch
if torch.backends.mps.is_available():
device = "mps" # Apple Silicon
dtype = torch.float16
quantization_config = None # Not supported on MPS
elif torch.cuda.is_available():
device = "cuda" # NVIDIA GPU
dtype = torch.float16
quantization_config = BitsAndBytesConfig(...)
else:
device = "cpu" # CPU fallback
dtype = torch.float32
Technical Constraints
Apple Silicon Specific
- MPS Backend: Metal Performance Shaders for acceleration
- Quantization: BitsAndBytesConfig not supported on MPS
- DataLoader: num_workers=0 required for stability
- Memory: Unified memory architecture, efficient but limited
Memory Management
- Model Size: 7B parameters β 14GB in float32, 7GB in float16
- LoRA Efficiency: Only 42.5M parameters trainable (0.58% of total)
- Gradient Accumulation: Simulate larger batches without memory increase
- Batch Size: Limited to 1 on consumer hardware
Training Constraints
- Epochs: Enhanced model uses 4 epochs for better style learning
- Learning Rate: Enhanced model uses 5e-5 for stable training
- Sequence Length: Max 2048 tokens per example
- Dataset Size: Enhanced model trained on 126 examples with topic diversity
Tool Usage Patterns
Model Training Workflow
# Full pipeline
python run_pipeline.py --all
# Individual steps
python src/scraper.py # Collect articles
python src/preprocess.py # Prepare training data
python src/finetune.py # Train model
python test_finetuned_model.py # Validate results
Development Testing
# Enhanced model testing
python test_enhanced_model.py
# Enhanced style testing
python test_enhanced_style.py
# Original model test
python test_finetuned_model.py
# Setup verification
python test_setup.py
# Web interface
python app.py
Enhanced Model Tools
# Update system prompts in training data
python update_system_prompt.py
# Add non-telecom examples to dataset
python add_non_telecom_examples.py
# Train enhanced model
python src/finetune.py # Uses enhanced dataset automatically
Data Management
# Check training data
python -c "import json; print(len(json.load(open('data/train_dataset.json'))))"
# Validate training examples
python validate_training_examples.py
# Generate additional examples
python generate_training_examples.py
File Structure and Conventions
Project Organization
morris-bot/
βββ src/ # Core source code
β βββ finetune.py # Training logic
β βββ preprocess.py # Data preparation
β βββ scraper.py # Web scraping
β βββ utils.py # Helper functions
βββ data/ # Training and processed data
βββ models/ # Trained model storage
βββ memory-bank/ # Documentation and context
βββ logs/ # Training and application logs
Naming Conventions
- Files: snake_case (e.g.,
test_finetuned_model.py
) - Classes: PascalCase (e.g.,
MorrisBotTrainer
) - Functions: snake_case (e.g.,
load_model_and_tokenizer
) - Constants: UPPER_CASE (e.g.,
TRAINING_CONFIG
)
Configuration Management
- Training Config: Centralized in
src/finetune.py
- Model Paths: Relative paths from project root
- Device Detection: Automatic with fallbacks
- Logging: Structured logging to
morris_bot.log
Performance Characteristics
Training Performance
- Apple M3: ~18 minutes for 2 epochs
- Apple M1/M2: ~25 minutes for 2 epochs
- NVIDIA RTX 4090: ~10 minutes for 2 epochs
- CPU Only: 4-6 hours for 2 epochs
Inference Performance
- Apple Silicon: 2-3 seconds per article
- NVIDIA GPU: 1-2 seconds per article
- CPU: 15-30 seconds per article
Memory Usage
- Training: ~8GB RAM (with LoRA)
- Inference: ~6GB RAM (model loaded)
- Storage: ~5GB for complete setup
Integration Patterns
Web Interface Integration
- Framework: Gradio for rapid prototyping
- Model Loading: Lazy loading on first generation request
- State Management: Stateless interface, model cached in memory
- Error Handling: Graceful degradation with user feedback
Data Pipeline Integration
- Input: Raw HTML from Light Reading articles
- Processing: BeautifulSoup β JSON β HuggingFace Dataset
- Output: Instruction-formatted training examples
- Validation: Quality checks at each stage
Model Serving Integration
- Loading: Base model + LoRA adapters
- Tokenization: Automatic tokenizer selection
- Generation: Configurable sampling parameters
- Post-processing: Text cleaning and formatting
Development Tools and Debugging
Logging Configuration
# Structured logging setup
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('morris_bot.log'),
logging.StreamHandler()
]
)
Debug Utilities
- Model Testing:
test_finetuned_model.py
for quick validation - Setup Verification:
test_setup.py
for environment checks - Training Validation:
validate_training_examples.py
for data quality - Progress Tracking: tqdm progress bars during training
Common Debug Commands
# Check model files
ls -la models/lora_adapters/
# Verify training data
python -c "import json; data=json.load(open('data/train_dataset.json')); print(f'Examples: {len(data)}')"
# Test hardware acceleration
python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}, CUDA: {torch.cuda.is_available()}')"
# Monitor training logs
tail -f morris_bot.log
Deployment Considerations
Local Deployment
- Requirements: Python environment with dependencies
- Model Storage: Local filesystem (~5GB)
- Interface: Gradio web server on localhost:7860
- Scaling: Single user, single model instance
Production Considerations (Future)
- Containerization: Docker for consistent deployment
- Model Serving: Dedicated inference servers
- Load Balancing: Multiple model instances
- Monitoring: Performance and usage metrics
Security Considerations
- Model Access: Local filesystem only
- Web Interface: Local network access by default
- Data Privacy: No user data persistence
- Content Safety: Basic output validation recommended