morris-bot / memory-bank /techContext.md
eusholli's picture
Upload folder using huggingface_hub
599c2c0 verified
# Technical Context: Morris Bot
## Technology Stack
### Core ML Technologies
- **Base Model**: HuggingFaceH4/zephyr-7b-beta (7 billion parameters)
- **Fine-tuning**: LoRA (Low-Rank Adaptation) via PEFT library
- **Framework**: PyTorch with Transformers library
- **Hardware Acceleration**: Apple Silicon MPS / NVIDIA CUDA
- **Precision**: float16 for memory efficiency
### Development Environment
- **Language**: Python 3.8+
- **Package Manager**: pip with requirements.txt
- **Virtual Environment**: venv (recommended)
- **IDE Support**: VSCode with Python extensions
- **Version Control**: Git (project structure suggests GitHub)
### Key Dependencies
```python
# Core ML Stack
torch>=2.0.0 # PyTorch framework
transformers>=4.35.0 # HuggingFace transformers
peft>=0.6.0 # Parameter-efficient fine-tuning
datasets>=2.14.0 # Dataset handling
accelerate>=0.24.0 # Training acceleration
# Web Interface
gradio>=4.0.0 # Web UI framework
# Data Processing
beautifulsoup4>=4.12.0 # Web scraping
requests>=2.31.0 # HTTP requests
pandas>=2.0.0 # Data manipulation
numpy>=1.24.0 # Numerical computing
# Utilities
tqdm>=4.65.0 # Progress bars
logging # Built-in logging
json # Built-in JSON handling
```
## Development Setup
### Hardware Requirements
- **Minimum**: 8GB RAM, 5GB free disk space
- **Recommended**: 16GB RAM, Apple Silicon M1/M2/M3 or NVIDIA GPU
- **Storage**: ~5GB for model files, ~1GB for training data
- **Network**: Stable internet for model downloads
### Installation Process
```bash
# Environment setup
python -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Dependencies
pip install -r requirements.txt
# Verify installation
python test_setup.py
```
### Hardware Detection Logic
```python
# Automatic device selection (from src/finetune.py)
import torch
if torch.backends.mps.is_available():
device = "mps" # Apple Silicon
dtype = torch.float16
quantization_config = None # Not supported on MPS
elif torch.cuda.is_available():
device = "cuda" # NVIDIA GPU
dtype = torch.float16
quantization_config = BitsAndBytesConfig(...)
else:
device = "cpu" # CPU fallback
dtype = torch.float32
```
## Technical Constraints
### Apple Silicon Specific
- **MPS Backend**: Metal Performance Shaders for acceleration
- **Quantization**: BitsAndBytesConfig not supported on MPS
- **DataLoader**: num_workers=0 required for stability
- **Memory**: Unified memory architecture, efficient but limited
### Memory Management
- **Model Size**: 7B parameters β‰ˆ 14GB in float32, 7GB in float16
- **LoRA Efficiency**: Only 42.5M parameters trainable (0.58% of total)
- **Gradient Accumulation**: Simulate larger batches without memory increase
- **Batch Size**: Limited to 1 on consumer hardware
### Training Constraints
- **Epochs**: Enhanced model uses 4 epochs for better style learning
- **Learning Rate**: Enhanced model uses 5e-5 for stable training
- **Sequence Length**: Max 2048 tokens per example
- **Dataset Size**: Enhanced model trained on 126 examples with topic diversity
## Tool Usage Patterns
### Model Training Workflow
```bash
# Full pipeline
python run_pipeline.py --all
# Individual steps
python src/scraper.py # Collect articles
python src/preprocess.py # Prepare training data
python src/finetune.py # Train model
python test_finetuned_model.py # Validate results
```
### Development Testing
```bash
# Enhanced model testing
python test_enhanced_model.py
# Enhanced style testing
python test_enhanced_style.py
# Original model test
python test_finetuned_model.py
# Setup verification
python test_setup.py
# Web interface
python app.py
```
### Enhanced Model Tools
```bash
# Update system prompts in training data
python update_system_prompt.py
# Add non-telecom examples to dataset
python add_non_telecom_examples.py
# Train enhanced model
python src/finetune.py # Uses enhanced dataset automatically
```
### Data Management
```bash
# Check training data
python -c "import json; print(len(json.load(open('data/train_dataset.json'))))"
# Validate training examples
python validate_training_examples.py
# Generate additional examples
python generate_training_examples.py
```
## File Structure and Conventions
### Project Organization
```
morris-bot/
β”œβ”€β”€ src/ # Core source code
β”‚ β”œβ”€β”€ finetune.py # Training logic
β”‚ β”œβ”€β”€ preprocess.py # Data preparation
β”‚ β”œβ”€β”€ scraper.py # Web scraping
β”‚ └── utils.py # Helper functions
β”œβ”€β”€ data/ # Training and processed data
β”œβ”€β”€ models/ # Trained model storage
β”œβ”€β”€ memory-bank/ # Documentation and context
└── logs/ # Training and application logs
```
### Naming Conventions
- **Files**: snake_case (e.g., `test_finetuned_model.py`)
- **Classes**: PascalCase (e.g., `MorrisBotTrainer`)
- **Functions**: snake_case (e.g., `load_model_and_tokenizer`)
- **Constants**: UPPER_CASE (e.g., `TRAINING_CONFIG`)
### Configuration Management
- **Training Config**: Centralized in `src/finetune.py`
- **Model Paths**: Relative paths from project root
- **Device Detection**: Automatic with fallbacks
- **Logging**: Structured logging to `morris_bot.log`
## Performance Characteristics
### Training Performance
- **Apple M3**: ~18 minutes for 2 epochs
- **Apple M1/M2**: ~25 minutes for 2 epochs
- **NVIDIA RTX 4090**: ~10 minutes for 2 epochs
- **CPU Only**: 4-6 hours for 2 epochs
### Inference Performance
- **Apple Silicon**: 2-3 seconds per article
- **NVIDIA GPU**: 1-2 seconds per article
- **CPU**: 15-30 seconds per article
### Memory Usage
- **Training**: ~8GB RAM (with LoRA)
- **Inference**: ~6GB RAM (model loaded)
- **Storage**: ~5GB for complete setup
## Integration Patterns
### Web Interface Integration
- **Framework**: Gradio for rapid prototyping
- **Model Loading**: Lazy loading on first generation request
- **State Management**: Stateless interface, model cached in memory
- **Error Handling**: Graceful degradation with user feedback
### Data Pipeline Integration
- **Input**: Raw HTML from Light Reading articles
- **Processing**: BeautifulSoup β†’ JSON β†’ HuggingFace Dataset
- **Output**: Instruction-formatted training examples
- **Validation**: Quality checks at each stage
### Model Serving Integration
- **Loading**: Base model + LoRA adapters
- **Tokenization**: Automatic tokenizer selection
- **Generation**: Configurable sampling parameters
- **Post-processing**: Text cleaning and formatting
## Development Tools and Debugging
### Logging Configuration
```python
# Structured logging setup
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('morris_bot.log'),
logging.StreamHandler()
]
)
```
### Debug Utilities
- **Model Testing**: `test_finetuned_model.py` for quick validation
- **Setup Verification**: `test_setup.py` for environment checks
- **Training Validation**: `validate_training_examples.py` for data quality
- **Progress Tracking**: tqdm progress bars during training
### Common Debug Commands
```bash
# Check model files
ls -la models/lora_adapters/
# Verify training data
python -c "import json; data=json.load(open('data/train_dataset.json')); print(f'Examples: {len(data)}')"
# Test hardware acceleration
python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}, CUDA: {torch.cuda.is_available()}')"
# Monitor training logs
tail -f morris_bot.log
```
## Deployment Considerations
### Local Deployment
- **Requirements**: Python environment with dependencies
- **Model Storage**: Local filesystem (~5GB)
- **Interface**: Gradio web server on localhost:7860
- **Scaling**: Single user, single model instance
### Production Considerations (Future)
- **Containerization**: Docker for consistent deployment
- **Model Serving**: Dedicated inference servers
- **Load Balancing**: Multiple model instances
- **Monitoring**: Performance and usage metrics
### Security Considerations
- **Model Access**: Local filesystem only
- **Web Interface**: Local network access by default
- **Data Privacy**: No user data persistence
- **Content Safety**: Basic output validation recommended