Spaces:

eusholli
/

morris-bot

Sleeping

App Files Files Community

morris-bot / memory-bank /techContext.md

eusholli

Upload folder using huggingface_hub

599c2c0 verified 27 days ago

preview code

raw

history blame contribute delete

8.59 kB

	# Technical Context: Morris Bot

	## Technology Stack

	### Core ML Technologies
	- Base Model: HuggingFaceH4/zephyr-7b-beta (7 billion parameters)
	- Fine-tuning: LoRA (Low-Rank Adaptation) via PEFT library
	- Framework: PyTorch with Transformers library
	- Hardware Acceleration: Apple Silicon MPS / NVIDIA CUDA
	- Precision: float16 for memory efficiency

	### Development Environment
	- Language: Python 3.8+
	- Package Manager: pip with requirements.txt
	- Virtual Environment: venv (recommended)
	- IDE Support: VSCode with Python extensions
	- Version Control: Git (project structure suggests GitHub)

	### Key Dependencies
	```python
	# Core ML Stack
	torch>=2.0.0 # PyTorch framework
	transformers>=4.35.0 # HuggingFace transformers
	peft>=0.6.0 # Parameter-efficient fine-tuning
	datasets>=2.14.0 # Dataset handling
	accelerate>=0.24.0 # Training acceleration

	# Web Interface
	gradio>=4.0.0 # Web UI framework

	# Data Processing
	beautifulsoup4>=4.12.0 # Web scraping
	requests>=2.31.0 # HTTP requests
	pandas>=2.0.0 # Data manipulation
	numpy>=1.24.0 # Numerical computing

	# Utilities
	tqdm>=4.65.0 # Progress bars
	logging # Built-in logging
	json # Built-in JSON handling
	```

	## Development Setup

	### Hardware Requirements
	- Minimum: 8GB RAM, 5GB free disk space
	- Recommended: 16GB RAM, Apple Silicon M1/M2/M3 or NVIDIA GPU
	- Storage: ~5GB for model files, ~1GB for training data
	- Network: Stable internet for model downloads

	### Installation Process
	```bash
	# Environment setup
	python -m venv venv
	source venv/bin/activate # macOS/Linux
	# venv\Scripts\activate # Windows

	# Dependencies
	pip install -r requirements.txt

	# Verify installation
	python test_setup.py
	```

	### Hardware Detection Logic
	```python
	# Automatic device selection (from src/finetune.py)
	import torch

	if torch.backends.mps.is_available():
	device = "mps" # Apple Silicon
	dtype = torch.float16
	quantization_config = None # Not supported on MPS
	elif torch.cuda.is_available():
	device = "cuda" # NVIDIA GPU
	dtype = torch.float16
	quantization_config = BitsAndBytesConfig(...)
	else:
	device = "cpu" # CPU fallback
	dtype = torch.float32
	```

	## Technical Constraints

	### Apple Silicon Specific
	- MPS Backend: Metal Performance Shaders for acceleration
	- Quantization: BitsAndBytesConfig not supported on MPS
	- DataLoader: num_workers=0 required for stability
	- Memory: Unified memory architecture, efficient but limited

	### Memory Management
	- Model Size: 7B parameters ≈ 14GB in float32, 7GB in float16
	- LoRA Efficiency: Only 42.5M parameters trainable (0.58% of total)
	- Gradient Accumulation: Simulate larger batches without memory increase
	- Batch Size: Limited to 1 on consumer hardware

	### Training Constraints
	- Epochs: Enhanced model uses 4 epochs for better style learning
	- Learning Rate: Enhanced model uses 5e-5 for stable training
	- Sequence Length: Max 2048 tokens per example
	- Dataset Size: Enhanced model trained on 126 examples with topic diversity

	## Tool Usage Patterns

	### Model Training Workflow
	```bash
	# Full pipeline
	python run_pipeline.py --all

	# Individual steps
	python src/scraper.py # Collect articles
	python src/preprocess.py # Prepare training data
	python src/finetune.py # Train model
	python test_finetuned_model.py # Validate results
	```

	### Development Testing
	```bash
	# Enhanced model testing
	python test_enhanced_model.py

	# Enhanced style testing
	python test_enhanced_style.py

	# Original model test
	python test_finetuned_model.py

	# Setup verification
	python test_setup.py

	# Web interface
	python app.py
	```

	### Enhanced Model Tools
	```bash
	# Update system prompts in training data
	python update_system_prompt.py

	# Add non-telecom examples to dataset
	python add_non_telecom_examples.py

	# Train enhanced model
	python src/finetune.py # Uses enhanced dataset automatically
	```

	### Data Management
	```bash
	# Check training data
	python -c "import json; print(len(json.load(open('data/train_dataset.json'))))"

	# Validate training examples
	python validate_training_examples.py

	# Generate additional examples
	python generate_training_examples.py
	```

	## File Structure and Conventions

	### Project Organization
	```
	morris-bot/
	├── src/ # Core source code
	│ ├── finetune.py # Training logic
	│ ├── preprocess.py # Data preparation
	│ ├── scraper.py # Web scraping
	│ └── utils.py # Helper functions
	├── data/ # Training and processed data
	├── models/ # Trained model storage
	├── memory-bank/ # Documentation and context
	└── logs/ # Training and application logs
	```

	### Naming Conventions
	- Files: snake_case (e.g., `test_finetuned_model.py`)
	- Classes: PascalCase (e.g., `MorrisBotTrainer`)
	- Functions: snake_case (e.g., `load_model_and_tokenizer`)
	- Constants: UPPER_CASE (e.g., `TRAINING_CONFIG`)

	### Configuration Management
	- Training Config: Centralized in `src/finetune.py`
	- Model Paths: Relative paths from project root
	- Device Detection: Automatic with fallbacks
	- Logging: Structured logging to `morris_bot.log`

	## Performance Characteristics

	### Training Performance
	- Apple M3: ~18 minutes for 2 epochs
	- Apple M1/M2: ~25 minutes for 2 epochs
	- NVIDIA RTX 4090: ~10 minutes for 2 epochs
	- CPU Only: 4-6 hours for 2 epochs

	### Inference Performance
	- Apple Silicon: 2-3 seconds per article
	- NVIDIA GPU: 1-2 seconds per article
	- CPU: 15-30 seconds per article

	### Memory Usage
	- Training: ~8GB RAM (with LoRA)
	- Inference: ~6GB RAM (model loaded)
	- Storage: ~5GB for complete setup

	## Integration Patterns

	### Web Interface Integration
	- Framework: Gradio for rapid prototyping
	- Model Loading: Lazy loading on first generation request
	- State Management: Stateless interface, model cached in memory
	- Error Handling: Graceful degradation with user feedback

	### Data Pipeline Integration
	- Input: Raw HTML from Light Reading articles
	- Processing: BeautifulSoup → JSON → HuggingFace Dataset
	- Output: Instruction-formatted training examples
	- Validation: Quality checks at each stage

	### Model Serving Integration
	- Loading: Base model + LoRA adapters
	- Tokenization: Automatic tokenizer selection
	- Generation: Configurable sampling parameters
	- Post-processing: Text cleaning and formatting

	## Development Tools and Debugging

	### Logging Configuration
	```python
	# Structured logging setup
	import logging
	logging.basicConfig(
	level=logging.INFO,
	format='%(asctime)s - %(levelname)s - %(message)s',
	handlers=[
	logging.FileHandler('morris_bot.log'),
	logging.StreamHandler()
	]
	)
	```

	### Debug Utilities
	- Model Testing: `test_finetuned_model.py` for quick validation
	- Setup Verification: `test_setup.py` for environment checks
	- Training Validation: `validate_training_examples.py` for data quality
	- Progress Tracking: tqdm progress bars during training

	### Common Debug Commands
	```bash
	# Check model files
	ls -la models/lora_adapters/

	# Verify training data
	python -c "import json; data=json.load(open('data/train_dataset.json')); print(f'Examples: {len(data)}')"

	# Test hardware acceleration
	python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}, CUDA: {torch.cuda.is_available()}')"

	# Monitor training logs
	tail -f morris_bot.log
	```

	## Deployment Considerations

	### Local Deployment
	- Requirements: Python environment with dependencies
	- Model Storage: Local filesystem (~5GB)
	- Interface: Gradio web server on localhost:7860
	- Scaling: Single user, single model instance

	### Production Considerations (Future)
	- Containerization: Docker for consistent deployment
	- Model Serving: Dedicated inference servers
	- Load Balancing: Multiple model instances
	- Monitoring: Performance and usage metrics

	### Security Considerations
	- Model Access: Local filesystem only
	- Web Interface: Local network access by default
	- Data Privacy: No user data persistence
	- Content Safety: Basic output validation recommended