Spaces:
Sleeping
Sleeping
# Technical Context: Morris Bot | |
## Technology Stack | |
### Core ML Technologies | |
- **Base Model**: HuggingFaceH4/zephyr-7b-beta (7 billion parameters) | |
- **Fine-tuning**: LoRA (Low-Rank Adaptation) via PEFT library | |
- **Framework**: PyTorch with Transformers library | |
- **Hardware Acceleration**: Apple Silicon MPS / NVIDIA CUDA | |
- **Precision**: float16 for memory efficiency | |
### Development Environment | |
- **Language**: Python 3.8+ | |
- **Package Manager**: pip with requirements.txt | |
- **Virtual Environment**: venv (recommended) | |
- **IDE Support**: VSCode with Python extensions | |
- **Version Control**: Git (project structure suggests GitHub) | |
### Key Dependencies | |
```python | |
# Core ML Stack | |
torch>=2.0.0 # PyTorch framework | |
transformers>=4.35.0 # HuggingFace transformers | |
peft>=0.6.0 # Parameter-efficient fine-tuning | |
datasets>=2.14.0 # Dataset handling | |
accelerate>=0.24.0 # Training acceleration | |
# Web Interface | |
gradio>=4.0.0 # Web UI framework | |
# Data Processing | |
beautifulsoup4>=4.12.0 # Web scraping | |
requests>=2.31.0 # HTTP requests | |
pandas>=2.0.0 # Data manipulation | |
numpy>=1.24.0 # Numerical computing | |
# Utilities | |
tqdm>=4.65.0 # Progress bars | |
logging # Built-in logging | |
json # Built-in JSON handling | |
``` | |
## Development Setup | |
### Hardware Requirements | |
- **Minimum**: 8GB RAM, 5GB free disk space | |
- **Recommended**: 16GB RAM, Apple Silicon M1/M2/M3 or NVIDIA GPU | |
- **Storage**: ~5GB for model files, ~1GB for training data | |
- **Network**: Stable internet for model downloads | |
### Installation Process | |
```bash | |
# Environment setup | |
python -m venv venv | |
source venv/bin/activate # macOS/Linux | |
# venv\Scripts\activate # Windows | |
# Dependencies | |
pip install -r requirements.txt | |
# Verify installation | |
python test_setup.py | |
``` | |
### Hardware Detection Logic | |
```python | |
# Automatic device selection (from src/finetune.py) | |
import torch | |
if torch.backends.mps.is_available(): | |
device = "mps" # Apple Silicon | |
dtype = torch.float16 | |
quantization_config = None # Not supported on MPS | |
elif torch.cuda.is_available(): | |
device = "cuda" # NVIDIA GPU | |
dtype = torch.float16 | |
quantization_config = BitsAndBytesConfig(...) | |
else: | |
device = "cpu" # CPU fallback | |
dtype = torch.float32 | |
``` | |
## Technical Constraints | |
### Apple Silicon Specific | |
- **MPS Backend**: Metal Performance Shaders for acceleration | |
- **Quantization**: BitsAndBytesConfig not supported on MPS | |
- **DataLoader**: num_workers=0 required for stability | |
- **Memory**: Unified memory architecture, efficient but limited | |
### Memory Management | |
- **Model Size**: 7B parameters β 14GB in float32, 7GB in float16 | |
- **LoRA Efficiency**: Only 42.5M parameters trainable (0.58% of total) | |
- **Gradient Accumulation**: Simulate larger batches without memory increase | |
- **Batch Size**: Limited to 1 on consumer hardware | |
### Training Constraints | |
- **Epochs**: Enhanced model uses 4 epochs for better style learning | |
- **Learning Rate**: Enhanced model uses 5e-5 for stable training | |
- **Sequence Length**: Max 2048 tokens per example | |
- **Dataset Size**: Enhanced model trained on 126 examples with topic diversity | |
## Tool Usage Patterns | |
### Model Training Workflow | |
```bash | |
# Full pipeline | |
python run_pipeline.py --all | |
# Individual steps | |
python src/scraper.py # Collect articles | |
python src/preprocess.py # Prepare training data | |
python src/finetune.py # Train model | |
python test_finetuned_model.py # Validate results | |
``` | |
### Development Testing | |
```bash | |
# Enhanced model testing | |
python test_enhanced_model.py | |
# Enhanced style testing | |
python test_enhanced_style.py | |
# Original model test | |
python test_finetuned_model.py | |
# Setup verification | |
python test_setup.py | |
# Web interface | |
python app.py | |
``` | |
### Enhanced Model Tools | |
```bash | |
# Update system prompts in training data | |
python update_system_prompt.py | |
# Add non-telecom examples to dataset | |
python add_non_telecom_examples.py | |
# Train enhanced model | |
python src/finetune.py # Uses enhanced dataset automatically | |
``` | |
### Data Management | |
```bash | |
# Check training data | |
python -c "import json; print(len(json.load(open('data/train_dataset.json'))))" | |
# Validate training examples | |
python validate_training_examples.py | |
# Generate additional examples | |
python generate_training_examples.py | |
``` | |
## File Structure and Conventions | |
### Project Organization | |
``` | |
morris-bot/ | |
βββ src/ # Core source code | |
β βββ finetune.py # Training logic | |
β βββ preprocess.py # Data preparation | |
β βββ scraper.py # Web scraping | |
β βββ utils.py # Helper functions | |
βββ data/ # Training and processed data | |
βββ models/ # Trained model storage | |
βββ memory-bank/ # Documentation and context | |
βββ logs/ # Training and application logs | |
``` | |
### Naming Conventions | |
- **Files**: snake_case (e.g., `test_finetuned_model.py`) | |
- **Classes**: PascalCase (e.g., `MorrisBotTrainer`) | |
- **Functions**: snake_case (e.g., `load_model_and_tokenizer`) | |
- **Constants**: UPPER_CASE (e.g., `TRAINING_CONFIG`) | |
### Configuration Management | |
- **Training Config**: Centralized in `src/finetune.py` | |
- **Model Paths**: Relative paths from project root | |
- **Device Detection**: Automatic with fallbacks | |
- **Logging**: Structured logging to `morris_bot.log` | |
## Performance Characteristics | |
### Training Performance | |
- **Apple M3**: ~18 minutes for 2 epochs | |
- **Apple M1/M2**: ~25 minutes for 2 epochs | |
- **NVIDIA RTX 4090**: ~10 minutes for 2 epochs | |
- **CPU Only**: 4-6 hours for 2 epochs | |
### Inference Performance | |
- **Apple Silicon**: 2-3 seconds per article | |
- **NVIDIA GPU**: 1-2 seconds per article | |
- **CPU**: 15-30 seconds per article | |
### Memory Usage | |
- **Training**: ~8GB RAM (with LoRA) | |
- **Inference**: ~6GB RAM (model loaded) | |
- **Storage**: ~5GB for complete setup | |
## Integration Patterns | |
### Web Interface Integration | |
- **Framework**: Gradio for rapid prototyping | |
- **Model Loading**: Lazy loading on first generation request | |
- **State Management**: Stateless interface, model cached in memory | |
- **Error Handling**: Graceful degradation with user feedback | |
### Data Pipeline Integration | |
- **Input**: Raw HTML from Light Reading articles | |
- **Processing**: BeautifulSoup β JSON β HuggingFace Dataset | |
- **Output**: Instruction-formatted training examples | |
- **Validation**: Quality checks at each stage | |
### Model Serving Integration | |
- **Loading**: Base model + LoRA adapters | |
- **Tokenization**: Automatic tokenizer selection | |
- **Generation**: Configurable sampling parameters | |
- **Post-processing**: Text cleaning and formatting | |
## Development Tools and Debugging | |
### Logging Configuration | |
```python | |
# Structured logging setup | |
import logging | |
logging.basicConfig( | |
level=logging.INFO, | |
format='%(asctime)s - %(levelname)s - %(message)s', | |
handlers=[ | |
logging.FileHandler('morris_bot.log'), | |
logging.StreamHandler() | |
] | |
) | |
``` | |
### Debug Utilities | |
- **Model Testing**: `test_finetuned_model.py` for quick validation | |
- **Setup Verification**: `test_setup.py` for environment checks | |
- **Training Validation**: `validate_training_examples.py` for data quality | |
- **Progress Tracking**: tqdm progress bars during training | |
### Common Debug Commands | |
```bash | |
# Check model files | |
ls -la models/lora_adapters/ | |
# Verify training data | |
python -c "import json; data=json.load(open('data/train_dataset.json')); print(f'Examples: {len(data)}')" | |
# Test hardware acceleration | |
python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}, CUDA: {torch.cuda.is_available()}')" | |
# Monitor training logs | |
tail -f morris_bot.log | |
``` | |
## Deployment Considerations | |
### Local Deployment | |
- **Requirements**: Python environment with dependencies | |
- **Model Storage**: Local filesystem (~5GB) | |
- **Interface**: Gradio web server on localhost:7860 | |
- **Scaling**: Single user, single model instance | |
### Production Considerations (Future) | |
- **Containerization**: Docker for consistent deployment | |
- **Model Serving**: Dedicated inference servers | |
- **Load Balancing**: Multiple model instances | |
- **Monitoring**: Performance and usage metrics | |
### Security Considerations | |
- **Model Access**: Local filesystem only | |
- **Web Interface**: Local network access by default | |
- **Data Privacy**: No user data persistence | |
- **Content Safety**: Basic output validation recommended | |