Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.45.0
Deployment Pipeline
graph TB
%% Input Sources
subgraph "Inputs"
TRAINED_MODEL[Trained Model<br/>Local directory]
TRAINING_CONFIG[Training Config<br/>JSON/YAML]
TRAINING_RESULTS[Training Results<br/>Metrics & logs]
MODEL_METADATA[Model Metadata<br/>Name, description, etc.]
end
%% Model Publishing
subgraph "Model Publishing"
PUSH_SCRIPT[push_to_huggingface.py<br/>Model Publisher]
subgraph "Publishing Steps"
REPO_CREATION[Repository Creation<br/>HF Hub API]
FILE_UPLOAD[File Upload<br/>Model files to HF]
METADATA_UPLOAD[Metadata Upload<br/>Config & results]
end
end
%% Model Card Generation
subgraph "Model Card Generation"
CARD_SCRIPT[generate_model_card.py<br/>Card Generator]
subgraph "Card Components"
TEMPLATE_LOAD[Template Loading<br/>model_card.md]
VARIABLE_REPLACEMENT[Variable Replacement<br/>Config injection]
CONDITIONAL_PROCESSING[Conditional Sections<br/>Quantized models, etc.]
end
end
%% Demo Space Deployment
subgraph "Demo Space Deployment"
DEPLOY_SCRIPT[deploy_demo_space.py<br/>Space Deployer]
subgraph "Space Setup"
SPACE_CREATION[Space Repository<br/>Create HF Space]
TEMPLATE_COPY[Template Copying<br/>demo_voxtral/ files]
ENV_INJECTION[Environment Setup<br/>Model config injection]
SECRET_SETUP[Secret Configuration<br/>HF_TOKEN, model vars]
end
end
%% Space Building & Testing
subgraph "Space Building"
BUILD_TRIGGER[Build Trigger<br/>Automatic build start]
DEPENDENCY_INSTALL[Dependency Installation<br/>requirements.txt]
MODEL_DOWNLOAD[Model Download<br/>From HF Hub]
APP_INITIALIZATION[App Initialization<br/>Gradio app setup]
end
%% Live Demo
subgraph "Live Demo Space"
GRADIO_INTERFACE[Gradio Interface<br/>Interactive demo]
MODEL_INFERENCE[Model Inference<br/>Real-time ASR]
USER_INTERACTION[User Interaction<br/>Audio upload/playback]
end
%% External Services
subgraph "External Services"
HF_HUB[Hugging Face Hub<br/>Model & Space hosting]
HF_SPACES[HF Spaces Platform<br/>Demo hosting]
end
%% Flow Connections
TRAINED_MODEL --> PUSH_SCRIPT
TRAINING_CONFIG --> PUSH_SCRIPT
TRAINING_RESULTS --> PUSH_SCRIPT
MODEL_METADATA --> PUSH_SCRIPT
PUSH_SCRIPT --> REPO_CREATION
REPO_CREATION --> FILE_UPLOAD
FILE_UPLOAD --> METADATA_UPLOAD
METADATA_UPLOAD --> CARD_SCRIPT
TRAINING_CONFIG --> CARD_SCRIPT
TRAINING_RESULTS --> CARD_SCRIPT
CARD_SCRIPT --> TEMPLATE_LOAD
TEMPLATE_LOAD --> VARIABLE_REPLACEMENT
VARIABLE_REPLACEMENT --> CONDITIONAL_PROCESSING
CONDITIONAL_PROCESSING --> DEPLOY_SCRIPT
METADATA_UPLOAD --> DEPLOY_SCRIPT
DEPLOY_SCRIPT --> SPACE_CREATION
SPACE_CREATION --> TEMPLATE_COPY
TEMPLATE_COPY --> ENV_INJECTION
ENV_INJECTION --> SECRET_SETUP
SECRET_SETUP --> BUILD_TRIGGER
BUILD_TRIGGER --> DEPENDENCY_INSTALL
DEPENDENCY_INSTALL --> MODEL_DOWNLOAD
MODEL_DOWNLOAD --> APP_INITIALIZATION
APP_INITIALIZATION --> GRADIO_INTERFACE
GRADIO_INTERFACE --> MODEL_INFERENCE
MODEL_INFERENCE --> USER_INTERACTION
HF_HUB --> MODEL_DOWNLOAD
HF_SPACES --> GRADIO_INTERFACE
%% Styling
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef publishing fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
classDef generation fill:#fff3e0,stroke:#f57c00,stroke-width:2px
classDef deployment fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef building fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef demo fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px
class TRAINED_MODEL,TRAINING_CONFIG,TRAINING_RESULTS,MODEL_METADATA input
class PUSH_SCRIPT,REPO_CREATION,FILE_UPLOAD,METADATA_UPLOAD publishing
class CARD_SCRIPT,TEMPLATE_LOAD,VARIABLE_REPLACEMENT,CONDITIONAL_PROCESSING generation
class DEPLOY_SCRIPT,SPACE_CREATION,TEMPLATE_COPY,ENV_INJECTION,SECRET_SETUP deployment
class BUILD_TRIGGER,DEPENDENCY_INSTALL,MODEL_DOWNLOAD,APP_INITIALIZATION building
class GRADIO_INTERFACE,MODEL_INFERENCE,USER_INTERACTION demo
class HF_HUB,HF_SPACES external
Deployment Pipeline Overview
This diagram illustrates the complete deployment pipeline that takes a trained Voxtral model and makes it available as an interactive demo on Hugging Face Spaces.
Input Sources
Trained Model Artifacts
- Model Files:
model.safetensors
,config.json
,tokenizer.json
- Training Config: Hyperparameters and training setup
- Training Results: Metrics, loss curves, evaluation results
- Model Metadata: Name, description, base model information
Model Publishing Phase
push_to_huggingface.py Script
# Initialize publisher
pusher = HuggingFacePusher(
model_path=output_dir,
repo_name=repo_name,
token=hf_token
)
# Push model
success = pusher.push_model(training_config, results)
Publishing Steps
- Repository Creation: Create HF Hub repository
- File Upload: Upload all model files
- Metadata Upload: Upload training config and results
Model Card Generation
generate_model_card.py Script
# Create generator
generator = ModelCardGenerator()
# Generate card
variables = {
"model_name": model_name,
"repo_name": repo_id,
"base_model": base_model,
# ... other variables
}
content = generator.generate_model_card(variables)
Card Processing
- Template Loading: Load from
templates/model_card.md
- Variable Replacement: Inject actual values
- Conditional Processing: Handle optional sections
Demo Space Deployment
deploy_demo_space.py Script
# Initialize deployer
deployer = DemoSpaceDeployer(
hf_token=token,
hf_username=username,
model_id=model_id,
demo_type="voxtral"
)
# Deploy space
success = deployer.deploy()
Space Setup Process
- Space Creation: Create HF Space repository
- Template Copying: Copy demo template files
- Environment Injection: Set model-specific variables
- Secret Configuration: Configure HF_TOKEN and model variables
Space Building Process
Automatic Build Trigger
- Dependency Installation:
pip install -r requirements.txt
- Model Download: Download model from HF Hub
- App Initialization: Setup Gradio application
Demo Template Structure
templates/spaces/demo_voxtral/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # Space documentation
Live Demo Features
Gradio Interface
- Audio Upload: File upload or recording
- Real-time Inference: Live ASR transcription
- Interactive Controls: Model parameters, settings
Model Inference Pipeline
- Audio Processing: Convert to model inputs
- Transcription Generation: Run ASR inference
- Result Display: Show transcription with confidence
Configuration Management
Environment Variables
# Set in Space secrets/environment
os.environ['HF_MODEL_ID'] = model_id
os.environ['MODEL_NAME'] = model_name
os.environ['HF_TOKEN'] = token # For model access
Demo-Specific Settings
- Model Configuration: Base model, subfolder, quantization
- UI Branding: Custom titles, descriptions, links
- Example Prompts: Pre-configured demo examples
Error Handling & Monitoring
Build Process Monitoring
- Build Logs: Real-time build status
- Error Detection: Failed dependency installation
- Retry Logic: Automatic rebuild on failure
Runtime Monitoring
- Space Health: Uptime and responsiveness
- Model Loading: Successful model initialization
- Inference Errors: Runtime error handling
Security Considerations
Token Management
- Read-Only Tokens: Use read-only tokens for demo spaces
- Secret Storage: Secure storage of HF_TOKEN
- Access Control: Proper repository permissions
Resource Management
- Memory Limits: Space hardware constraints
- Timeout Handling: Inference timeout protection
- Rate Limiting: Prevent abuse
Integration Points
With Training Scripts
- Training Config: Used for model card generation
- Training Results: Included in model metadata
- Model Path: Direct path to trained model files
With Interface (interface.py)
- Parameter Passing: Deployment settings from UI
- Progress Updates: Deployment progress to user
- Result Links: Direct links to deployed spaces
Deployment Workflows
Full Pipeline (Recommended)
- Train model β Generate model card β Push to Hub β Deploy demo
- All steps automated through single interface action
- Comprehensive error handling and rollback
Manual Deployment
- Use individual scripts for granular control
- Custom configuration and branding
- Debugging and troubleshooting capabilities
CI/CD Integration
- Automated Triggers: GitHub Actions integration
- Version Control: Model versioning and releases
- Testing: Automated demo testing
Performance Optimization
Space Hardware Selection
- CPU Basic: Free tier, sufficient for small models
- GPU Options: For larger models requiring acceleration
- Memory Scaling: Based on model size requirements
Model Optimization
- Quantization: 4-bit quantization for smaller footprint
- Model Sharding: Split large models across memory
- Caching: Model caching for faster cold starts
Monitoring & Analytics
Space Analytics
- Usage Metrics: Daily active users, session duration
- Performance Metrics: Inference latency, error rates
- User Feedback: Demo effectiveness and issues
Model Analytics
- Download Stats: Model popularity and usage
- Citation Tracking: Academic and research usage
- Community Feedback: GitHub issues and discussions
See also: