Spaces:
Running
Running
File size: 10,344 Bytes
a3a3978 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 |
# Deployment Pipeline
```mermaid
graph TB
%% Input Sources
subgraph "Inputs"
TRAINED_MODEL[Trained Model<br/>Local directory]
TRAINING_CONFIG[Training Config<br/>JSON/YAML]
TRAINING_RESULTS[Training Results<br/>Metrics & logs]
MODEL_METADATA[Model Metadata<br/>Name, description, etc.]
end
%% Model Publishing
subgraph "Model Publishing"
PUSH_SCRIPT[push_to_huggingface.py<br/>Model Publisher]
subgraph "Publishing Steps"
REPO_CREATION[Repository Creation<br/>HF Hub API]
FILE_UPLOAD[File Upload<br/>Model files to HF]
METADATA_UPLOAD[Metadata Upload<br/>Config & results]
end
end
%% Model Card Generation
subgraph "Model Card Generation"
CARD_SCRIPT[generate_model_card.py<br/>Card Generator]
subgraph "Card Components"
TEMPLATE_LOAD[Template Loading<br/>model_card.md]
VARIABLE_REPLACEMENT[Variable Replacement<br/>Config injection]
CONDITIONAL_PROCESSING[Conditional Sections<br/>Quantized models, etc.]
end
end
%% Demo Space Deployment
subgraph "Demo Space Deployment"
DEPLOY_SCRIPT[deploy_demo_space.py<br/>Space Deployer]
subgraph "Space Setup"
SPACE_CREATION[Space Repository<br/>Create HF Space]
TEMPLATE_COPY[Template Copying<br/>demo_voxtral/ files]
ENV_INJECTION[Environment Setup<br/>Model config injection]
SECRET_SETUP[Secret Configuration<br/>HF_TOKEN, model vars]
end
end
%% Space Building & Testing
subgraph "Space Building"
BUILD_TRIGGER[Build Trigger<br/>Automatic build start]
DEPENDENCY_INSTALL[Dependency Installation<br/>requirements.txt]
MODEL_DOWNLOAD[Model Download<br/>From HF Hub]
APP_INITIALIZATION[App Initialization<br/>Gradio app setup]
end
%% Live Demo
subgraph "Live Demo Space"
GRADIO_INTERFACE[Gradio Interface<br/>Interactive demo]
MODEL_INFERENCE[Model Inference<br/>Real-time ASR]
USER_INTERACTION[User Interaction<br/>Audio upload/playback]
end
%% External Services
subgraph "External Services"
HF_HUB[Hugging Face Hub<br/>Model & Space hosting]
HF_SPACES[HF Spaces Platform<br/>Demo hosting]
end
%% Flow Connections
TRAINED_MODEL --> PUSH_SCRIPT
TRAINING_CONFIG --> PUSH_SCRIPT
TRAINING_RESULTS --> PUSH_SCRIPT
MODEL_METADATA --> PUSH_SCRIPT
PUSH_SCRIPT --> REPO_CREATION
REPO_CREATION --> FILE_UPLOAD
FILE_UPLOAD --> METADATA_UPLOAD
METADATA_UPLOAD --> CARD_SCRIPT
TRAINING_CONFIG --> CARD_SCRIPT
TRAINING_RESULTS --> CARD_SCRIPT
CARD_SCRIPT --> TEMPLATE_LOAD
TEMPLATE_LOAD --> VARIABLE_REPLACEMENT
VARIABLE_REPLACEMENT --> CONDITIONAL_PROCESSING
CONDITIONAL_PROCESSING --> DEPLOY_SCRIPT
METADATA_UPLOAD --> DEPLOY_SCRIPT
DEPLOY_SCRIPT --> SPACE_CREATION
SPACE_CREATION --> TEMPLATE_COPY
TEMPLATE_COPY --> ENV_INJECTION
ENV_INJECTION --> SECRET_SETUP
SECRET_SETUP --> BUILD_TRIGGER
BUILD_TRIGGER --> DEPENDENCY_INSTALL
DEPENDENCY_INSTALL --> MODEL_DOWNLOAD
MODEL_DOWNLOAD --> APP_INITIALIZATION
APP_INITIALIZATION --> GRADIO_INTERFACE
GRADIO_INTERFACE --> MODEL_INFERENCE
MODEL_INFERENCE --> USER_INTERACTION
HF_HUB --> MODEL_DOWNLOAD
HF_SPACES --> GRADIO_INTERFACE
%% Styling
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef publishing fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
classDef generation fill:#fff3e0,stroke:#f57c00,stroke-width:2px
classDef deployment fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef building fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef demo fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px
class TRAINED_MODEL,TRAINING_CONFIG,TRAINING_RESULTS,MODEL_METADATA input
class PUSH_SCRIPT,REPO_CREATION,FILE_UPLOAD,METADATA_UPLOAD publishing
class CARD_SCRIPT,TEMPLATE_LOAD,VARIABLE_REPLACEMENT,CONDITIONAL_PROCESSING generation
class DEPLOY_SCRIPT,SPACE_CREATION,TEMPLATE_COPY,ENV_INJECTION,SECRET_SETUP deployment
class BUILD_TRIGGER,DEPENDENCY_INSTALL,MODEL_DOWNLOAD,APP_INITIALIZATION building
class GRADIO_INTERFACE,MODEL_INFERENCE,USER_INTERACTION demo
class HF_HUB,HF_SPACES external
```
## Deployment Pipeline Overview
This diagram illustrates the complete deployment pipeline that takes a trained Voxtral model and makes it available as an interactive demo on Hugging Face Spaces.
### Input Sources
#### Trained Model Artifacts
- **Model Files**: `model.safetensors`, `config.json`, `tokenizer.json`
- **Training Config**: Hyperparameters and training setup
- **Training Results**: Metrics, loss curves, evaluation results
- **Model Metadata**: Name, description, base model information
### Model Publishing Phase
#### push_to_huggingface.py Script
```python
# Initialize publisher
pusher = HuggingFacePusher(
model_path=output_dir,
repo_name=repo_name,
token=hf_token
)
# Push model
success = pusher.push_model(training_config, results)
```
#### Publishing Steps
1. **Repository Creation**: Create HF Hub repository
2. **File Upload**: Upload all model files
3. **Metadata Upload**: Upload training config and results
### Model Card Generation
#### generate_model_card.py Script
```python
# Create generator
generator = ModelCardGenerator()
# Generate card
variables = {
"model_name": model_name,
"repo_name": repo_id,
"base_model": base_model,
# ... other variables
}
content = generator.generate_model_card(variables)
```
#### Card Processing
1. **Template Loading**: Load from `templates/model_card.md`
2. **Variable Replacement**: Inject actual values
3. **Conditional Processing**: Handle optional sections
### Demo Space Deployment
#### deploy_demo_space.py Script
```python
# Initialize deployer
deployer = DemoSpaceDeployer(
hf_token=token,
hf_username=username,
model_id=model_id,
demo_type="voxtral"
)
# Deploy space
success = deployer.deploy()
```
#### Space Setup Process
1. **Space Creation**: Create HF Space repository
2. **Template Copying**: Copy demo template files
3. **Environment Injection**: Set model-specific variables
4. **Secret Configuration**: Configure HF_TOKEN and model variables
### Space Building Process
#### Automatic Build Trigger
- **Dependency Installation**: `pip install -r requirements.txt`
- **Model Download**: Download model from HF Hub
- **App Initialization**: Setup Gradio application
#### Demo Template Structure
```
templates/spaces/demo_voxtral/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # Space documentation
```
### Live Demo Features
#### Gradio Interface
- **Audio Upload**: File upload or recording
- **Real-time Inference**: Live ASR transcription
- **Interactive Controls**: Model parameters, settings
#### Model Inference Pipeline
- **Audio Processing**: Convert to model inputs
- **Transcription Generation**: Run ASR inference
- **Result Display**: Show transcription with confidence
### Configuration Management
#### Environment Variables
```python
# Set in Space secrets/environment
os.environ['HF_MODEL_ID'] = model_id
os.environ['MODEL_NAME'] = model_name
os.environ['HF_TOKEN'] = token # For model access
```
#### Demo-Specific Settings
- **Model Configuration**: Base model, subfolder, quantization
- **UI Branding**: Custom titles, descriptions, links
- **Example Prompts**: Pre-configured demo examples
### Error Handling & Monitoring
#### Build Process Monitoring
- **Build Logs**: Real-time build status
- **Error Detection**: Failed dependency installation
- **Retry Logic**: Automatic rebuild on failure
#### Runtime Monitoring
- **Space Health**: Uptime and responsiveness
- **Model Loading**: Successful model initialization
- **Inference Errors**: Runtime error handling
### Security Considerations
#### Token Management
- **Read-Only Tokens**: Use read-only tokens for demo spaces
- **Secret Storage**: Secure storage of HF_TOKEN
- **Access Control**: Proper repository permissions
#### Resource Management
- **Memory Limits**: Space hardware constraints
- **Timeout Handling**: Inference timeout protection
- **Rate Limiting**: Prevent abuse
### Integration Points
#### With Training Scripts
- **Training Config**: Used for model card generation
- **Training Results**: Included in model metadata
- **Model Path**: Direct path to trained model files
#### With Interface (interface.py)
- **Parameter Passing**: Deployment settings from UI
- **Progress Updates**: Deployment progress to user
- **Result Links**: Direct links to deployed spaces
### Deployment Workflows
#### Full Pipeline (Recommended)
1. Train model β Generate model card β Push to Hub β Deploy demo
2. All steps automated through single interface action
3. Comprehensive error handling and rollback
#### Manual Deployment
1. Use individual scripts for granular control
2. Custom configuration and branding
3. Debugging and troubleshooting capabilities
#### CI/CD Integration
- **Automated Triggers**: GitHub Actions integration
- **Version Control**: Model versioning and releases
- **Testing**: Automated demo testing
### Performance Optimization
#### Space Hardware Selection
- **CPU Basic**: Free tier, sufficient for small models
- **GPU Options**: For larger models requiring acceleration
- **Memory Scaling**: Based on model size requirements
#### Model Optimization
- **Quantization**: 4-bit quantization for smaller footprint
- **Model Sharding**: Split large models across memory
- **Caching**: Model caching for faster cold starts
### Monitoring & Analytics
#### Space Analytics
- **Usage Metrics**: Daily active users, session duration
- **Performance Metrics**: Inference latency, error rates
- **User Feedback**: Demo effectiveness and issues
#### Model Analytics
- **Download Stats**: Model popularity and usage
- **Citation Tracking**: Academic and research usage
- **Community Feedback**: GitHub issues and discussions
See also:
- [Architecture Overview](architecture.md)
- [Training Pipeline](training-pipeline.md)
- [Data Flow](data-flow.md)
|