# Deployment Pipeline
```mermaid
graph TB
%% Input Sources
subgraph "Inputs"
TRAINED_MODEL[Trained Model
Local directory]
TRAINING_CONFIG[Training Config
JSON/YAML]
TRAINING_RESULTS[Training Results
Metrics & logs]
MODEL_METADATA[Model Metadata
Name, description, etc.]
end
%% Model Publishing
subgraph "Model Publishing"
PUSH_SCRIPT[push_to_huggingface.py
Model Publisher]
subgraph "Publishing Steps"
REPO_CREATION[Repository Creation
HF Hub API]
FILE_UPLOAD[File Upload
Model files to HF]
METADATA_UPLOAD[Metadata Upload
Config & results]
end
end
%% Model Card Generation
subgraph "Model Card Generation"
CARD_SCRIPT[generate_model_card.py
Card Generator]
subgraph "Card Components"
TEMPLATE_LOAD[Template Loading
model_card.md]
VARIABLE_REPLACEMENT[Variable Replacement
Config injection]
CONDITIONAL_PROCESSING[Conditional Sections
Quantized models, etc.]
end
end
%% Demo Space Deployment
subgraph "Demo Space Deployment"
DEPLOY_SCRIPT[deploy_demo_space.py
Space Deployer]
subgraph "Space Setup"
SPACE_CREATION[Space Repository
Create HF Space]
TEMPLATE_COPY[Template Copying
demo_voxtral/ files]
ENV_INJECTION[Environment Setup
Model config injection]
SECRET_SETUP[Secret Configuration
HF_TOKEN, model vars]
end
end
%% Space Building & Testing
subgraph "Space Building"
BUILD_TRIGGER[Build Trigger
Automatic build start]
DEPENDENCY_INSTALL[Dependency Installation
requirements.txt]
MODEL_DOWNLOAD[Model Download
From HF Hub]
APP_INITIALIZATION[App Initialization
Gradio app setup]
end
%% Live Demo
subgraph "Live Demo Space"
GRADIO_INTERFACE[Gradio Interface
Interactive demo]
MODEL_INFERENCE[Model Inference
Real-time ASR]
USER_INTERACTION[User Interaction
Audio upload/playback]
end
%% External Services
subgraph "External Services"
HF_HUB[Hugging Face Hub
Model & Space hosting]
HF_SPACES[HF Spaces Platform
Demo hosting]
end
%% Flow Connections
TRAINED_MODEL --> PUSH_SCRIPT
TRAINING_CONFIG --> PUSH_SCRIPT
TRAINING_RESULTS --> PUSH_SCRIPT
MODEL_METADATA --> PUSH_SCRIPT
PUSH_SCRIPT --> REPO_CREATION
REPO_CREATION --> FILE_UPLOAD
FILE_UPLOAD --> METADATA_UPLOAD
METADATA_UPLOAD --> CARD_SCRIPT
TRAINING_CONFIG --> CARD_SCRIPT
TRAINING_RESULTS --> CARD_SCRIPT
CARD_SCRIPT --> TEMPLATE_LOAD
TEMPLATE_LOAD --> VARIABLE_REPLACEMENT
VARIABLE_REPLACEMENT --> CONDITIONAL_PROCESSING
CONDITIONAL_PROCESSING --> DEPLOY_SCRIPT
METADATA_UPLOAD --> DEPLOY_SCRIPT
DEPLOY_SCRIPT --> SPACE_CREATION
SPACE_CREATION --> TEMPLATE_COPY
TEMPLATE_COPY --> ENV_INJECTION
ENV_INJECTION --> SECRET_SETUP
SECRET_SETUP --> BUILD_TRIGGER
BUILD_TRIGGER --> DEPENDENCY_INSTALL
DEPENDENCY_INSTALL --> MODEL_DOWNLOAD
MODEL_DOWNLOAD --> APP_INITIALIZATION
APP_INITIALIZATION --> GRADIO_INTERFACE
GRADIO_INTERFACE --> MODEL_INFERENCE
MODEL_INFERENCE --> USER_INTERACTION
HF_HUB --> MODEL_DOWNLOAD
HF_SPACES --> GRADIO_INTERFACE
%% Styling
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef publishing fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
classDef generation fill:#fff3e0,stroke:#f57c00,stroke-width:2px
classDef deployment fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef building fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef demo fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px
class TRAINED_MODEL,TRAINING_CONFIG,TRAINING_RESULTS,MODEL_METADATA input
class PUSH_SCRIPT,REPO_CREATION,FILE_UPLOAD,METADATA_UPLOAD publishing
class CARD_SCRIPT,TEMPLATE_LOAD,VARIABLE_REPLACEMENT,CONDITIONAL_PROCESSING generation
class DEPLOY_SCRIPT,SPACE_CREATION,TEMPLATE_COPY,ENV_INJECTION,SECRET_SETUP deployment
class BUILD_TRIGGER,DEPENDENCY_INSTALL,MODEL_DOWNLOAD,APP_INITIALIZATION building
class GRADIO_INTERFACE,MODEL_INFERENCE,USER_INTERACTION demo
class HF_HUB,HF_SPACES external
```
## Deployment Pipeline Overview
This diagram illustrates the complete deployment pipeline that takes a trained Voxtral model and makes it available as an interactive demo on Hugging Face Spaces.
### Input Sources
#### Trained Model Artifacts
- **Model Files**: `model.safetensors`, `config.json`, `tokenizer.json`
- **Training Config**: Hyperparameters and training setup
- **Training Results**: Metrics, loss curves, evaluation results
- **Model Metadata**: Name, description, base model information
### Model Publishing Phase
#### push_to_huggingface.py Script
```python
# Initialize publisher
pusher = HuggingFacePusher(
model_path=output_dir,
repo_name=repo_name,
token=hf_token
)
# Push model
success = pusher.push_model(training_config, results)
```
#### Publishing Steps
1. **Repository Creation**: Create HF Hub repository
2. **File Upload**: Upload all model files
3. **Metadata Upload**: Upload training config and results
### Model Card Generation
#### generate_model_card.py Script
```python
# Create generator
generator = ModelCardGenerator()
# Generate card
variables = {
"model_name": model_name,
"repo_name": repo_id,
"base_model": base_model,
# ... other variables
}
content = generator.generate_model_card(variables)
```
#### Card Processing
1. **Template Loading**: Load from `templates/model_card.md`
2. **Variable Replacement**: Inject actual values
3. **Conditional Processing**: Handle optional sections
### Demo Space Deployment
#### deploy_demo_space.py Script
```python
# Initialize deployer
deployer = DemoSpaceDeployer(
hf_token=token,
hf_username=username,
model_id=model_id,
demo_type="voxtral"
)
# Deploy space
success = deployer.deploy()
```
#### Space Setup Process
1. **Space Creation**: Create HF Space repository
2. **Template Copying**: Copy demo template files
3. **Environment Injection**: Set model-specific variables
4. **Secret Configuration**: Configure HF_TOKEN and model variables
### Space Building Process
#### Automatic Build Trigger
- **Dependency Installation**: `pip install -r requirements.txt`
- **Model Download**: Download model from HF Hub
- **App Initialization**: Setup Gradio application
#### Demo Template Structure
```
templates/spaces/demo_voxtral/
├── app.py # Main Gradio application
├── requirements.txt # Python dependencies
└── README.md # Space documentation
```
### Live Demo Features
#### Gradio Interface
- **Audio Upload**: File upload or recording
- **Real-time Inference**: Live ASR transcription
- **Interactive Controls**: Model parameters, settings
#### Model Inference Pipeline
- **Audio Processing**: Convert to model inputs
- **Transcription Generation**: Run ASR inference
- **Result Display**: Show transcription with confidence
### Configuration Management
#### Environment Variables
```python
# Set in Space secrets/environment
os.environ['HF_MODEL_ID'] = model_id
os.environ['MODEL_NAME'] = model_name
os.environ['HF_TOKEN'] = token # For model access
```
#### Demo-Specific Settings
- **Model Configuration**: Base model, subfolder, quantization
- **UI Branding**: Custom titles, descriptions, links
- **Example Prompts**: Pre-configured demo examples
### Error Handling & Monitoring
#### Build Process Monitoring
- **Build Logs**: Real-time build status
- **Error Detection**: Failed dependency installation
- **Retry Logic**: Automatic rebuild on failure
#### Runtime Monitoring
- **Space Health**: Uptime and responsiveness
- **Model Loading**: Successful model initialization
- **Inference Errors**: Runtime error handling
### Security Considerations
#### Token Management
- **Read-Only Tokens**: Use read-only tokens for demo spaces
- **Secret Storage**: Secure storage of HF_TOKEN
- **Access Control**: Proper repository permissions
#### Resource Management
- **Memory Limits**: Space hardware constraints
- **Timeout Handling**: Inference timeout protection
- **Rate Limiting**: Prevent abuse
### Integration Points
#### With Training Scripts
- **Training Config**: Used for model card generation
- **Training Results**: Included in model metadata
- **Model Path**: Direct path to trained model files
#### With Interface (interface.py)
- **Parameter Passing**: Deployment settings from UI
- **Progress Updates**: Deployment progress to user
- **Result Links**: Direct links to deployed spaces
### Deployment Workflows
#### Full Pipeline (Recommended)
1. Train model → Generate model card → Push to Hub → Deploy demo
2. All steps automated through single interface action
3. Comprehensive error handling and rollback
#### Manual Deployment
1. Use individual scripts for granular control
2. Custom configuration and branding
3. Debugging and troubleshooting capabilities
#### CI/CD Integration
- **Automated Triggers**: GitHub Actions integration
- **Version Control**: Model versioning and releases
- **Testing**: Automated demo testing
### Performance Optimization
#### Space Hardware Selection
- **CPU Basic**: Free tier, sufficient for small models
- **GPU Options**: For larger models requiring acceleration
- **Memory Scaling**: Based on model size requirements
#### Model Optimization
- **Quantization**: 4-bit quantization for smaller footprint
- **Model Sharding**: Split large models across memory
- **Caching**: Model caching for faster cold starts
### Monitoring & Analytics
#### Space Analytics
- **Usage Metrics**: Daily active users, session duration
- **Performance Metrics**: Inference latency, error rates
- **User Feedback**: Demo effectiveness and issues
#### Model Analytics
- **Download Stats**: Model popularity and usage
- **Citation Tracking**: Academic and research usage
- **Community Feedback**: GitHub issues and discussions
See also:
- [Architecture Overview](architecture.md)
- [Training Pipeline](training-pipeline.md)
- [Data Flow](data-flow.md)