File size: 8,594 Bytes
599c2c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
# Technical Context: Morris Bot

## Technology Stack

### Core ML Technologies
- **Base Model**: HuggingFaceH4/zephyr-7b-beta (7 billion parameters)
- **Fine-tuning**: LoRA (Low-Rank Adaptation) via PEFT library
- **Framework**: PyTorch with Transformers library
- **Hardware Acceleration**: Apple Silicon MPS / NVIDIA CUDA
- **Precision**: float16 for memory efficiency

### Development Environment
- **Language**: Python 3.8+
- **Package Manager**: pip with requirements.txt
- **Virtual Environment**: venv (recommended)
- **IDE Support**: VSCode with Python extensions
- **Version Control**: Git (project structure suggests GitHub)

### Key Dependencies
```python
# Core ML Stack
torch>=2.0.0                    # PyTorch framework
transformers>=4.35.0            # HuggingFace transformers
peft>=0.6.0                     # Parameter-efficient fine-tuning
datasets>=2.14.0                # Dataset handling
accelerate>=0.24.0              # Training acceleration

# Web Interface
gradio>=4.0.0                   # Web UI framework

# Data Processing
beautifulsoup4>=4.12.0          # Web scraping
requests>=2.31.0                # HTTP requests
pandas>=2.0.0                   # Data manipulation
numpy>=1.24.0                   # Numerical computing

# Utilities
tqdm>=4.65.0                    # Progress bars
logging                         # Built-in logging
json                           # Built-in JSON handling
```

## Development Setup

### Hardware Requirements
- **Minimum**: 8GB RAM, 5GB free disk space
- **Recommended**: 16GB RAM, Apple Silicon M1/M2/M3 or NVIDIA GPU
- **Storage**: ~5GB for model files, ~1GB for training data
- **Network**: Stable internet for model downloads

### Installation Process
```bash
# Environment setup
python -m venv venv
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows

# Dependencies
pip install -r requirements.txt

# Verify installation
python test_setup.py
```

### Hardware Detection Logic
```python
# Automatic device selection (from src/finetune.py)
import torch

if torch.backends.mps.is_available():
    device = "mps"              # Apple Silicon
    dtype = torch.float16
    quantization_config = None  # Not supported on MPS
elif torch.cuda.is_available():
    device = "cuda"             # NVIDIA GPU
    dtype = torch.float16
    quantization_config = BitsAndBytesConfig(...)
else:
    device = "cpu"              # CPU fallback
    dtype = torch.float32
```

## Technical Constraints

### Apple Silicon Specific
- **MPS Backend**: Metal Performance Shaders for acceleration
- **Quantization**: BitsAndBytesConfig not supported on MPS
- **DataLoader**: num_workers=0 required for stability
- **Memory**: Unified memory architecture, efficient but limited

### Memory Management
- **Model Size**: 7B parameters β‰ˆ 14GB in float32, 7GB in float16
- **LoRA Efficiency**: Only 42.5M parameters trainable (0.58% of total)
- **Gradient Accumulation**: Simulate larger batches without memory increase
- **Batch Size**: Limited to 1 on consumer hardware

### Training Constraints
- **Epochs**: Enhanced model uses 4 epochs for better style learning
- **Learning Rate**: Enhanced model uses 5e-5 for stable training
- **Sequence Length**: Max 2048 tokens per example
- **Dataset Size**: Enhanced model trained on 126 examples with topic diversity

## Tool Usage Patterns

### Model Training Workflow
```bash
# Full pipeline
python run_pipeline.py --all

# Individual steps
python src/scraper.py           # Collect articles
python src/preprocess.py        # Prepare training data
python src/finetune.py          # Train model
python test_finetuned_model.py  # Validate results
```

### Development Testing
```bash
# Enhanced model testing
python test_enhanced_model.py

# Enhanced style testing
python test_enhanced_style.py

# Original model test
python test_finetuned_model.py

# Setup verification
python test_setup.py

# Web interface
python app.py
```

### Enhanced Model Tools
```bash
# Update system prompts in training data
python update_system_prompt.py

# Add non-telecom examples to dataset
python add_non_telecom_examples.py

# Train enhanced model
python src/finetune.py  # Uses enhanced dataset automatically
```

### Data Management
```bash
# Check training data
python -c "import json; print(len(json.load(open('data/train_dataset.json'))))"

# Validate training examples
python validate_training_examples.py

# Generate additional examples
python generate_training_examples.py
```

## File Structure and Conventions

### Project Organization
```
morris-bot/
β”œβ”€β”€ src/                    # Core source code
β”‚   β”œβ”€β”€ finetune.py        # Training logic
β”‚   β”œβ”€β”€ preprocess.py      # Data preparation
β”‚   β”œβ”€β”€ scraper.py         # Web scraping
β”‚   └── utils.py           # Helper functions
β”œβ”€β”€ data/                  # Training and processed data
β”œβ”€β”€ models/                # Trained model storage
β”œβ”€β”€ memory-bank/           # Documentation and context
└── logs/                  # Training and application logs
```

### Naming Conventions
- **Files**: snake_case (e.g., `test_finetuned_model.py`)
- **Classes**: PascalCase (e.g., `MorrisBotTrainer`)
- **Functions**: snake_case (e.g., `load_model_and_tokenizer`)
- **Constants**: UPPER_CASE (e.g., `TRAINING_CONFIG`)

### Configuration Management
- **Training Config**: Centralized in `src/finetune.py`
- **Model Paths**: Relative paths from project root
- **Device Detection**: Automatic with fallbacks
- **Logging**: Structured logging to `morris_bot.log`

## Performance Characteristics

### Training Performance
- **Apple M3**: ~18 minutes for 2 epochs
- **Apple M1/M2**: ~25 minutes for 2 epochs
- **NVIDIA RTX 4090**: ~10 minutes for 2 epochs
- **CPU Only**: 4-6 hours for 2 epochs

### Inference Performance
- **Apple Silicon**: 2-3 seconds per article
- **NVIDIA GPU**: 1-2 seconds per article
- **CPU**: 15-30 seconds per article

### Memory Usage
- **Training**: ~8GB RAM (with LoRA)
- **Inference**: ~6GB RAM (model loaded)
- **Storage**: ~5GB for complete setup

## Integration Patterns

### Web Interface Integration
- **Framework**: Gradio for rapid prototyping
- **Model Loading**: Lazy loading on first generation request
- **State Management**: Stateless interface, model cached in memory
- **Error Handling**: Graceful degradation with user feedback

### Data Pipeline Integration
- **Input**: Raw HTML from Light Reading articles
- **Processing**: BeautifulSoup β†’ JSON β†’ HuggingFace Dataset
- **Output**: Instruction-formatted training examples
- **Validation**: Quality checks at each stage

### Model Serving Integration
- **Loading**: Base model + LoRA adapters
- **Tokenization**: Automatic tokenizer selection
- **Generation**: Configurable sampling parameters
- **Post-processing**: Text cleaning and formatting

## Development Tools and Debugging

### Logging Configuration
```python
# Structured logging setup
import logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('morris_bot.log'),
        logging.StreamHandler()
    ]
)
```

### Debug Utilities
- **Model Testing**: `test_finetuned_model.py` for quick validation
- **Setup Verification**: `test_setup.py` for environment checks
- **Training Validation**: `validate_training_examples.py` for data quality
- **Progress Tracking**: tqdm progress bars during training

### Common Debug Commands
```bash
# Check model files
ls -la models/lora_adapters/

# Verify training data
python -c "import json; data=json.load(open('data/train_dataset.json')); print(f'Examples: {len(data)}')"

# Test hardware acceleration
python -c "import torch; print(f'MPS: {torch.backends.mps.is_available()}, CUDA: {torch.cuda.is_available()}')"

# Monitor training logs
tail -f morris_bot.log
```

## Deployment Considerations

### Local Deployment
- **Requirements**: Python environment with dependencies
- **Model Storage**: Local filesystem (~5GB)
- **Interface**: Gradio web server on localhost:7860
- **Scaling**: Single user, single model instance

### Production Considerations (Future)
- **Containerization**: Docker for consistent deployment
- **Model Serving**: Dedicated inference servers
- **Load Balancing**: Multiple model instances
- **Monitoring**: Performance and usage metrics

### Security Considerations
- **Model Access**: Local filesystem only
- **Web Interface**: Local network access by default
- **Data Privacy**: No user data persistence
- **Content Safety**: Basic output validation recommended