Spaces:
Running
Running
# Deployment Pipeline | |
```mermaid | |
graph TB | |
%% Input Sources | |
subgraph "Inputs" | |
TRAINED_MODEL[Trained Model<br/>Local directory] | |
TRAINING_CONFIG[Training Config<br/>JSON/YAML] | |
TRAINING_RESULTS[Training Results<br/>Metrics & logs] | |
MODEL_METADATA[Model Metadata<br/>Name, description, etc.] | |
end | |
%% Model Publishing | |
subgraph "Model Publishing" | |
PUSH_SCRIPT[push_to_huggingface.py<br/>Model Publisher] | |
subgraph "Publishing Steps" | |
REPO_CREATION[Repository Creation<br/>HF Hub API] | |
FILE_UPLOAD[File Upload<br/>Model files to HF] | |
METADATA_UPLOAD[Metadata Upload<br/>Config & results] | |
end | |
end | |
%% Model Card Generation | |
subgraph "Model Card Generation" | |
CARD_SCRIPT[generate_model_card.py<br/>Card Generator] | |
subgraph "Card Components" | |
TEMPLATE_LOAD[Template Loading<br/>model_card.md] | |
VARIABLE_REPLACEMENT[Variable Replacement<br/>Config injection] | |
CONDITIONAL_PROCESSING[Conditional Sections<br/>Quantized models, etc.] | |
end | |
end | |
%% Demo Space Deployment | |
subgraph "Demo Space Deployment" | |
DEPLOY_SCRIPT[deploy_demo_space.py<br/>Space Deployer] | |
subgraph "Space Setup" | |
SPACE_CREATION[Space Repository<br/>Create HF Space] | |
TEMPLATE_COPY[Template Copying<br/>demo_voxtral/ files] | |
ENV_INJECTION[Environment Setup<br/>Model config injection] | |
SECRET_SETUP[Secret Configuration<br/>HF_TOKEN, model vars] | |
end | |
end | |
%% Space Building & Testing | |
subgraph "Space Building" | |
BUILD_TRIGGER[Build Trigger<br/>Automatic build start] | |
DEPENDENCY_INSTALL[Dependency Installation<br/>requirements.txt] | |
MODEL_DOWNLOAD[Model Download<br/>From HF Hub] | |
APP_INITIALIZATION[App Initialization<br/>Gradio app setup] | |
end | |
%% Live Demo | |
subgraph "Live Demo Space" | |
GRADIO_INTERFACE[Gradio Interface<br/>Interactive demo] | |
MODEL_INFERENCE[Model Inference<br/>Real-time ASR] | |
USER_INTERACTION[User Interaction<br/>Audio upload/playback] | |
end | |
%% External Services | |
subgraph "External Services" | |
HF_HUB[Hugging Face Hub<br/>Model & Space hosting] | |
HF_SPACES[HF Spaces Platform<br/>Demo hosting] | |
end | |
%% Flow Connections | |
TRAINED_MODEL --> PUSH_SCRIPT | |
TRAINING_CONFIG --> PUSH_SCRIPT | |
TRAINING_RESULTS --> PUSH_SCRIPT | |
MODEL_METADATA --> PUSH_SCRIPT | |
PUSH_SCRIPT --> REPO_CREATION | |
REPO_CREATION --> FILE_UPLOAD | |
FILE_UPLOAD --> METADATA_UPLOAD | |
METADATA_UPLOAD --> CARD_SCRIPT | |
TRAINING_CONFIG --> CARD_SCRIPT | |
TRAINING_RESULTS --> CARD_SCRIPT | |
CARD_SCRIPT --> TEMPLATE_LOAD | |
TEMPLATE_LOAD --> VARIABLE_REPLACEMENT | |
VARIABLE_REPLACEMENT --> CONDITIONAL_PROCESSING | |
CONDITIONAL_PROCESSING --> DEPLOY_SCRIPT | |
METADATA_UPLOAD --> DEPLOY_SCRIPT | |
DEPLOY_SCRIPT --> SPACE_CREATION | |
SPACE_CREATION --> TEMPLATE_COPY | |
TEMPLATE_COPY --> ENV_INJECTION | |
ENV_INJECTION --> SECRET_SETUP | |
SECRET_SETUP --> BUILD_TRIGGER | |
BUILD_TRIGGER --> DEPENDENCY_INSTALL | |
DEPENDENCY_INSTALL --> MODEL_DOWNLOAD | |
MODEL_DOWNLOAD --> APP_INITIALIZATION | |
APP_INITIALIZATION --> GRADIO_INTERFACE | |
GRADIO_INTERFACE --> MODEL_INFERENCE | |
MODEL_INFERENCE --> USER_INTERACTION | |
HF_HUB --> MODEL_DOWNLOAD | |
HF_SPACES --> GRADIO_INTERFACE | |
%% Styling | |
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px | |
classDef publishing fill:#e8f5e8,stroke:#388e3c,stroke-width:2px | |
classDef generation fill:#fff3e0,stroke:#f57c00,stroke-width:2px | |
classDef deployment fill:#fce4ec,stroke:#c2185b,stroke-width:2px | |
classDef building fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px | |
classDef demo fill:#e1f5fe,stroke:#0277bd,stroke-width:2px | |
classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px | |
class TRAINED_MODEL,TRAINING_CONFIG,TRAINING_RESULTS,MODEL_METADATA input | |
class PUSH_SCRIPT,REPO_CREATION,FILE_UPLOAD,METADATA_UPLOAD publishing | |
class CARD_SCRIPT,TEMPLATE_LOAD,VARIABLE_REPLACEMENT,CONDITIONAL_PROCESSING generation | |
class DEPLOY_SCRIPT,SPACE_CREATION,TEMPLATE_COPY,ENV_INJECTION,SECRET_SETUP deployment | |
class BUILD_TRIGGER,DEPENDENCY_INSTALL,MODEL_DOWNLOAD,APP_INITIALIZATION building | |
class GRADIO_INTERFACE,MODEL_INFERENCE,USER_INTERACTION demo | |
class HF_HUB,HF_SPACES external | |
``` | |
## Deployment Pipeline Overview | |
This diagram illustrates the complete deployment pipeline that takes a trained Voxtral model and makes it available as an interactive demo on Hugging Face Spaces. | |
### Input Sources | |
#### Trained Model Artifacts | |
- **Model Files**: `model.safetensors`, `config.json`, `tokenizer.json` | |
- **Training Config**: Hyperparameters and training setup | |
- **Training Results**: Metrics, loss curves, evaluation results | |
- **Model Metadata**: Name, description, base model information | |
### Model Publishing Phase | |
#### push_to_huggingface.py Script | |
```python | |
# Initialize publisher | |
pusher = HuggingFacePusher( | |
model_path=output_dir, | |
repo_name=repo_name, | |
token=hf_token | |
) | |
# Push model | |
success = pusher.push_model(training_config, results) | |
``` | |
#### Publishing Steps | |
1. **Repository Creation**: Create HF Hub repository | |
2. **File Upload**: Upload all model files | |
3. **Metadata Upload**: Upload training config and results | |
### Model Card Generation | |
#### generate_model_card.py Script | |
```python | |
# Create generator | |
generator = ModelCardGenerator() | |
# Generate card | |
variables = { | |
"model_name": model_name, | |
"repo_name": repo_id, | |
"base_model": base_model, | |
# ... other variables | |
} | |
content = generator.generate_model_card(variables) | |
``` | |
#### Card Processing | |
1. **Template Loading**: Load from `templates/model_card.md` | |
2. **Variable Replacement**: Inject actual values | |
3. **Conditional Processing**: Handle optional sections | |
### Demo Space Deployment | |
#### deploy_demo_space.py Script | |
```python | |
# Initialize deployer | |
deployer = DemoSpaceDeployer( | |
hf_token=token, | |
hf_username=username, | |
model_id=model_id, | |
demo_type="voxtral" | |
) | |
# Deploy space | |
success = deployer.deploy() | |
``` | |
#### Space Setup Process | |
1. **Space Creation**: Create HF Space repository | |
2. **Template Copying**: Copy demo template files | |
3. **Environment Injection**: Set model-specific variables | |
4. **Secret Configuration**: Configure HF_TOKEN and model variables | |
### Space Building Process | |
#### Automatic Build Trigger | |
- **Dependency Installation**: `pip install -r requirements.txt` | |
- **Model Download**: Download model from HF Hub | |
- **App Initialization**: Setup Gradio application | |
#### Demo Template Structure | |
``` | |
templates/spaces/demo_voxtral/ | |
βββ app.py # Main Gradio application | |
βββ requirements.txt # Python dependencies | |
βββ README.md # Space documentation | |
``` | |
### Live Demo Features | |
#### Gradio Interface | |
- **Audio Upload**: File upload or recording | |
- **Real-time Inference**: Live ASR transcription | |
- **Interactive Controls**: Model parameters, settings | |
#### Model Inference Pipeline | |
- **Audio Processing**: Convert to model inputs | |
- **Transcription Generation**: Run ASR inference | |
- **Result Display**: Show transcription with confidence | |
### Configuration Management | |
#### Environment Variables | |
```python | |
# Set in Space secrets/environment | |
os.environ['HF_MODEL_ID'] = model_id | |
os.environ['MODEL_NAME'] = model_name | |
os.environ['HF_TOKEN'] = token # For model access | |
``` | |
#### Demo-Specific Settings | |
- **Model Configuration**: Base model, subfolder, quantization | |
- **UI Branding**: Custom titles, descriptions, links | |
- **Example Prompts**: Pre-configured demo examples | |
### Error Handling & Monitoring | |
#### Build Process Monitoring | |
- **Build Logs**: Real-time build status | |
- **Error Detection**: Failed dependency installation | |
- **Retry Logic**: Automatic rebuild on failure | |
#### Runtime Monitoring | |
- **Space Health**: Uptime and responsiveness | |
- **Model Loading**: Successful model initialization | |
- **Inference Errors**: Runtime error handling | |
### Security Considerations | |
#### Token Management | |
- **Read-Only Tokens**: Use read-only tokens for demo spaces | |
- **Secret Storage**: Secure storage of HF_TOKEN | |
- **Access Control**: Proper repository permissions | |
#### Resource Management | |
- **Memory Limits**: Space hardware constraints | |
- **Timeout Handling**: Inference timeout protection | |
- **Rate Limiting**: Prevent abuse | |
### Integration Points | |
#### With Training Scripts | |
- **Training Config**: Used for model card generation | |
- **Training Results**: Included in model metadata | |
- **Model Path**: Direct path to trained model files | |
#### With Interface (interface.py) | |
- **Parameter Passing**: Deployment settings from UI | |
- **Progress Updates**: Deployment progress to user | |
- **Result Links**: Direct links to deployed spaces | |
### Deployment Workflows | |
#### Full Pipeline (Recommended) | |
1. Train model β Generate model card β Push to Hub β Deploy demo | |
2. All steps automated through single interface action | |
3. Comprehensive error handling and rollback | |
#### Manual Deployment | |
1. Use individual scripts for granular control | |
2. Custom configuration and branding | |
3. Debugging and troubleshooting capabilities | |
#### CI/CD Integration | |
- **Automated Triggers**: GitHub Actions integration | |
- **Version Control**: Model versioning and releases | |
- **Testing**: Automated demo testing | |
### Performance Optimization | |
#### Space Hardware Selection | |
- **CPU Basic**: Free tier, sufficient for small models | |
- **GPU Options**: For larger models requiring acceleration | |
- **Memory Scaling**: Based on model size requirements | |
#### Model Optimization | |
- **Quantization**: 4-bit quantization for smaller footprint | |
- **Model Sharding**: Split large models across memory | |
- **Caching**: Model caching for faster cold starts | |
### Monitoring & Analytics | |
#### Space Analytics | |
- **Usage Metrics**: Daily active users, session duration | |
- **Performance Metrics**: Inference latency, error rates | |
- **User Feedback**: Demo effectiveness and issues | |
#### Model Analytics | |
- **Download Stats**: Model popularity and usage | |
- **Citation Tracking**: Academic and research usage | |
- **Community Feedback**: GitHub issues and discussions | |
See also: | |
- [Architecture Overview](architecture.md) | |
- [Training Pipeline](training-pipeline.md) | |
- [Data Flow](data-flow.md) | |