VoxFactory / docs /deployment-pipeline.md
Joseph Pollack
adds docs
a3a3978 unverified
# Deployment Pipeline
```mermaid
graph TB
%% Input Sources
subgraph "Inputs"
TRAINED_MODEL[Trained Model<br/>Local directory]
TRAINING_CONFIG[Training Config<br/>JSON/YAML]
TRAINING_RESULTS[Training Results<br/>Metrics & logs]
MODEL_METADATA[Model Metadata<br/>Name, description, etc.]
end
%% Model Publishing
subgraph "Model Publishing"
PUSH_SCRIPT[push_to_huggingface.py<br/>Model Publisher]
subgraph "Publishing Steps"
REPO_CREATION[Repository Creation<br/>HF Hub API]
FILE_UPLOAD[File Upload<br/>Model files to HF]
METADATA_UPLOAD[Metadata Upload<br/>Config & results]
end
end
%% Model Card Generation
subgraph "Model Card Generation"
CARD_SCRIPT[generate_model_card.py<br/>Card Generator]
subgraph "Card Components"
TEMPLATE_LOAD[Template Loading<br/>model_card.md]
VARIABLE_REPLACEMENT[Variable Replacement<br/>Config injection]
CONDITIONAL_PROCESSING[Conditional Sections<br/>Quantized models, etc.]
end
end
%% Demo Space Deployment
subgraph "Demo Space Deployment"
DEPLOY_SCRIPT[deploy_demo_space.py<br/>Space Deployer]
subgraph "Space Setup"
SPACE_CREATION[Space Repository<br/>Create HF Space]
TEMPLATE_COPY[Template Copying<br/>demo_voxtral/ files]
ENV_INJECTION[Environment Setup<br/>Model config injection]
SECRET_SETUP[Secret Configuration<br/>HF_TOKEN, model vars]
end
end
%% Space Building & Testing
subgraph "Space Building"
BUILD_TRIGGER[Build Trigger<br/>Automatic build start]
DEPENDENCY_INSTALL[Dependency Installation<br/>requirements.txt]
MODEL_DOWNLOAD[Model Download<br/>From HF Hub]
APP_INITIALIZATION[App Initialization<br/>Gradio app setup]
end
%% Live Demo
subgraph "Live Demo Space"
GRADIO_INTERFACE[Gradio Interface<br/>Interactive demo]
MODEL_INFERENCE[Model Inference<br/>Real-time ASR]
USER_INTERACTION[User Interaction<br/>Audio upload/playback]
end
%% External Services
subgraph "External Services"
HF_HUB[Hugging Face Hub<br/>Model & Space hosting]
HF_SPACES[HF Spaces Platform<br/>Demo hosting]
end
%% Flow Connections
TRAINED_MODEL --> PUSH_SCRIPT
TRAINING_CONFIG --> PUSH_SCRIPT
TRAINING_RESULTS --> PUSH_SCRIPT
MODEL_METADATA --> PUSH_SCRIPT
PUSH_SCRIPT --> REPO_CREATION
REPO_CREATION --> FILE_UPLOAD
FILE_UPLOAD --> METADATA_UPLOAD
METADATA_UPLOAD --> CARD_SCRIPT
TRAINING_CONFIG --> CARD_SCRIPT
TRAINING_RESULTS --> CARD_SCRIPT
CARD_SCRIPT --> TEMPLATE_LOAD
TEMPLATE_LOAD --> VARIABLE_REPLACEMENT
VARIABLE_REPLACEMENT --> CONDITIONAL_PROCESSING
CONDITIONAL_PROCESSING --> DEPLOY_SCRIPT
METADATA_UPLOAD --> DEPLOY_SCRIPT
DEPLOY_SCRIPT --> SPACE_CREATION
SPACE_CREATION --> TEMPLATE_COPY
TEMPLATE_COPY --> ENV_INJECTION
ENV_INJECTION --> SECRET_SETUP
SECRET_SETUP --> BUILD_TRIGGER
BUILD_TRIGGER --> DEPENDENCY_INSTALL
DEPENDENCY_INSTALL --> MODEL_DOWNLOAD
MODEL_DOWNLOAD --> APP_INITIALIZATION
APP_INITIALIZATION --> GRADIO_INTERFACE
GRADIO_INTERFACE --> MODEL_INFERENCE
MODEL_INFERENCE --> USER_INTERACTION
HF_HUB --> MODEL_DOWNLOAD
HF_SPACES --> GRADIO_INTERFACE
%% Styling
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef publishing fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
classDef generation fill:#fff3e0,stroke:#f57c00,stroke-width:2px
classDef deployment fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef building fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef demo fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px
class TRAINED_MODEL,TRAINING_CONFIG,TRAINING_RESULTS,MODEL_METADATA input
class PUSH_SCRIPT,REPO_CREATION,FILE_UPLOAD,METADATA_UPLOAD publishing
class CARD_SCRIPT,TEMPLATE_LOAD,VARIABLE_REPLACEMENT,CONDITIONAL_PROCESSING generation
class DEPLOY_SCRIPT,SPACE_CREATION,TEMPLATE_COPY,ENV_INJECTION,SECRET_SETUP deployment
class BUILD_TRIGGER,DEPENDENCY_INSTALL,MODEL_DOWNLOAD,APP_INITIALIZATION building
class GRADIO_INTERFACE,MODEL_INFERENCE,USER_INTERACTION demo
class HF_HUB,HF_SPACES external
```
## Deployment Pipeline Overview
This diagram illustrates the complete deployment pipeline that takes a trained Voxtral model and makes it available as an interactive demo on Hugging Face Spaces.
### Input Sources
#### Trained Model Artifacts
- **Model Files**: `model.safetensors`, `config.json`, `tokenizer.json`
- **Training Config**: Hyperparameters and training setup
- **Training Results**: Metrics, loss curves, evaluation results
- **Model Metadata**: Name, description, base model information
### Model Publishing Phase
#### push_to_huggingface.py Script
```python
# Initialize publisher
pusher = HuggingFacePusher(
model_path=output_dir,
repo_name=repo_name,
token=hf_token
)
# Push model
success = pusher.push_model(training_config, results)
```
#### Publishing Steps
1. **Repository Creation**: Create HF Hub repository
2. **File Upload**: Upload all model files
3. **Metadata Upload**: Upload training config and results
### Model Card Generation
#### generate_model_card.py Script
```python
# Create generator
generator = ModelCardGenerator()
# Generate card
variables = {
"model_name": model_name,
"repo_name": repo_id,
"base_model": base_model,
# ... other variables
}
content = generator.generate_model_card(variables)
```
#### Card Processing
1. **Template Loading**: Load from `templates/model_card.md`
2. **Variable Replacement**: Inject actual values
3. **Conditional Processing**: Handle optional sections
### Demo Space Deployment
#### deploy_demo_space.py Script
```python
# Initialize deployer
deployer = DemoSpaceDeployer(
hf_token=token,
hf_username=username,
model_id=model_id,
demo_type="voxtral"
)
# Deploy space
success = deployer.deploy()
```
#### Space Setup Process
1. **Space Creation**: Create HF Space repository
2. **Template Copying**: Copy demo template files
3. **Environment Injection**: Set model-specific variables
4. **Secret Configuration**: Configure HF_TOKEN and model variables
### Space Building Process
#### Automatic Build Trigger
- **Dependency Installation**: `pip install -r requirements.txt`
- **Model Download**: Download model from HF Hub
- **App Initialization**: Setup Gradio application
#### Demo Template Structure
```
templates/spaces/demo_voxtral/
β”œβ”€β”€ app.py # Main Gradio application
β”œβ”€β”€ requirements.txt # Python dependencies
└── README.md # Space documentation
```
### Live Demo Features
#### Gradio Interface
- **Audio Upload**: File upload or recording
- **Real-time Inference**: Live ASR transcription
- **Interactive Controls**: Model parameters, settings
#### Model Inference Pipeline
- **Audio Processing**: Convert to model inputs
- **Transcription Generation**: Run ASR inference
- **Result Display**: Show transcription with confidence
### Configuration Management
#### Environment Variables
```python
# Set in Space secrets/environment
os.environ['HF_MODEL_ID'] = model_id
os.environ['MODEL_NAME'] = model_name
os.environ['HF_TOKEN'] = token # For model access
```
#### Demo-Specific Settings
- **Model Configuration**: Base model, subfolder, quantization
- **UI Branding**: Custom titles, descriptions, links
- **Example Prompts**: Pre-configured demo examples
### Error Handling & Monitoring
#### Build Process Monitoring
- **Build Logs**: Real-time build status
- **Error Detection**: Failed dependency installation
- **Retry Logic**: Automatic rebuild on failure
#### Runtime Monitoring
- **Space Health**: Uptime and responsiveness
- **Model Loading**: Successful model initialization
- **Inference Errors**: Runtime error handling
### Security Considerations
#### Token Management
- **Read-Only Tokens**: Use read-only tokens for demo spaces
- **Secret Storage**: Secure storage of HF_TOKEN
- **Access Control**: Proper repository permissions
#### Resource Management
- **Memory Limits**: Space hardware constraints
- **Timeout Handling**: Inference timeout protection
- **Rate Limiting**: Prevent abuse
### Integration Points
#### With Training Scripts
- **Training Config**: Used for model card generation
- **Training Results**: Included in model metadata
- **Model Path**: Direct path to trained model files
#### With Interface (interface.py)
- **Parameter Passing**: Deployment settings from UI
- **Progress Updates**: Deployment progress to user
- **Result Links**: Direct links to deployed spaces
### Deployment Workflows
#### Full Pipeline (Recommended)
1. Train model β†’ Generate model card β†’ Push to Hub β†’ Deploy demo
2. All steps automated through single interface action
3. Comprehensive error handling and rollback
#### Manual Deployment
1. Use individual scripts for granular control
2. Custom configuration and branding
3. Debugging and troubleshooting capabilities
#### CI/CD Integration
- **Automated Triggers**: GitHub Actions integration
- **Version Control**: Model versioning and releases
- **Testing**: Automated demo testing
### Performance Optimization
#### Space Hardware Selection
- **CPU Basic**: Free tier, sufficient for small models
- **GPU Options**: For larger models requiring acceleration
- **Memory Scaling**: Based on model size requirements
#### Model Optimization
- **Quantization**: 4-bit quantization for smaller footprint
- **Model Sharding**: Split large models across memory
- **Caching**: Model caching for faster cold starts
### Monitoring & Analytics
#### Space Analytics
- **Usage Metrics**: Daily active users, session duration
- **Performance Metrics**: Inference latency, error rates
- **User Feedback**: Demo effectiveness and issues
#### Model Analytics
- **Download Stats**: Model popularity and usage
- **Citation Tracking**: Academic and research usage
- **Community Feedback**: GitHub issues and discussions
See also:
- [Architecture Overview](architecture.md)
- [Training Pipeline](training-pipeline.md)
- [Data Flow](data-flow.md)