Deployment Pipeline

graph TB
    %% Input Sources
    subgraph "Inputs"
        TRAINED_MODEL[Trained Model<br/>Local directory]
        TRAINING_CONFIG[Training Config<br/>JSON/YAML]
        TRAINING_RESULTS[Training Results<br/>Metrics & logs]
        MODEL_METADATA[Model Metadata<br/>Name, description, etc.]
    end

    %% Model Publishing
    subgraph "Model Publishing"
        PUSH_SCRIPT[push_to_huggingface.py<br/>Model Publisher]

        subgraph "Publishing Steps"
            REPO_CREATION[Repository Creation<br/>HF Hub API]
            FILE_UPLOAD[File Upload<br/>Model files to HF]
            METADATA_UPLOAD[Metadata Upload<br/>Config & results]
        end
    end

    %% Model Card Generation
    subgraph "Model Card Generation"
        CARD_SCRIPT[generate_model_card.py<br/>Card Generator]

        subgraph "Card Components"
            TEMPLATE_LOAD[Template Loading<br/>model_card.md]
            VARIABLE_REPLACEMENT[Variable Replacement<br/>Config injection]
            CONDITIONAL_PROCESSING[Conditional Sections<br/>Quantized models, etc.]
        end
    end

    %% Demo Space Deployment
    subgraph "Demo Space Deployment"
        DEPLOY_SCRIPT[deploy_demo_space.py<br/>Space Deployer]

        subgraph "Space Setup"
            SPACE_CREATION[Space Repository<br/>Create HF Space]
            TEMPLATE_COPY[Template Copying<br/>demo_voxtral/ files]
            ENV_INJECTION[Environment Setup<br/>Model config injection]
            SECRET_SETUP[Secret Configuration<br/>HF_TOKEN, model vars]
        end
    end

    %% Space Building & Testing
    subgraph "Space Building"
        BUILD_TRIGGER[Build Trigger<br/>Automatic build start]
        DEPENDENCY_INSTALL[Dependency Installation<br/>requirements.txt]
        MODEL_DOWNLOAD[Model Download<br/>From HF Hub]
        APP_INITIALIZATION[App Initialization<br/>Gradio app setup]
    end

    %% Live Demo
    subgraph "Live Demo Space"
        GRADIO_INTERFACE[Gradio Interface<br/>Interactive demo]
        MODEL_INFERENCE[Model Inference<br/>Real-time ASR]
        USER_INTERACTION[User Interaction<br/>Audio upload/playback]
    end

    %% External Services
    subgraph "External Services"
        HF_HUB[Hugging Face Hub<br/>Model & Space hosting]
        HF_SPACES[HF Spaces Platform<br/>Demo hosting]
    end

    %% Flow Connections
    TRAINED_MODEL --> PUSH_SCRIPT
    TRAINING_CONFIG --> PUSH_SCRIPT
    TRAINING_RESULTS --> PUSH_SCRIPT
    MODEL_METADATA --> PUSH_SCRIPT

    PUSH_SCRIPT --> REPO_CREATION
    REPO_CREATION --> FILE_UPLOAD
    FILE_UPLOAD --> METADATA_UPLOAD

    METADATA_UPLOAD --> CARD_SCRIPT
    TRAINING_CONFIG --> CARD_SCRIPT
    TRAINING_RESULTS --> CARD_SCRIPT

    CARD_SCRIPT --> TEMPLATE_LOAD
    TEMPLATE_LOAD --> VARIABLE_REPLACEMENT
    VARIABLE_REPLACEMENT --> CONDITIONAL_PROCESSING

    CONDITIONAL_PROCESSING --> DEPLOY_SCRIPT
    METADATA_UPLOAD --> DEPLOY_SCRIPT

    DEPLOY_SCRIPT --> SPACE_CREATION
    SPACE_CREATION --> TEMPLATE_COPY
    TEMPLATE_COPY --> ENV_INJECTION
    ENV_INJECTION --> SECRET_SETUP

    SECRET_SETUP --> BUILD_TRIGGER
    BUILD_TRIGGER --> DEPENDENCY_INSTALL
    DEPENDENCY_INSTALL --> MODEL_DOWNLOAD
    MODEL_DOWNLOAD --> APP_INITIALIZATION

    APP_INITIALIZATION --> GRADIO_INTERFACE
    GRADIO_INTERFACE --> MODEL_INFERENCE
    MODEL_INFERENCE --> USER_INTERACTION

    HF_HUB --> MODEL_DOWNLOAD
    HF_SPACES --> GRADIO_INTERFACE

    %% Styling
    classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef publishing fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
    classDef generation fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    classDef deployment fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef building fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef demo fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
    classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px

    class TRAINED_MODEL,TRAINING_CONFIG,TRAINING_RESULTS,MODEL_METADATA input
    class PUSH_SCRIPT,REPO_CREATION,FILE_UPLOAD,METADATA_UPLOAD publishing
    class CARD_SCRIPT,TEMPLATE_LOAD,VARIABLE_REPLACEMENT,CONDITIONAL_PROCESSING generation
    class DEPLOY_SCRIPT,SPACE_CREATION,TEMPLATE_COPY,ENV_INJECTION,SECRET_SETUP deployment
    class BUILD_TRIGGER,DEPENDENCY_INSTALL,MODEL_DOWNLOAD,APP_INITIALIZATION building
    class GRADIO_INTERFACE,MODEL_INFERENCE,USER_INTERACTION demo
    class HF_HUB,HF_SPACES external

Deployment Pipeline Overview

This diagram illustrates the complete deployment pipeline that takes a trained Voxtral model and makes it available as an interactive demo on Hugging Face Spaces.

Input Sources

Trained Model Artifacts

Model Files: model.safetensors, config.json, tokenizer.json
Training Config: Hyperparameters and training setup
Training Results: Metrics, loss curves, evaluation results
Model Metadata: Name, description, base model information

Model Publishing Phase

push_to_huggingface.py Script

# Initialize publisher
pusher = HuggingFacePusher(
    model_path=output_dir,
    repo_name=repo_name,
    token=hf_token
)

# Push model
success = pusher.push_model(training_config, results)

Publishing Steps

Repository Creation: Create HF Hub repository
File Upload: Upload all model files
Metadata Upload: Upload training config and results

Model Card Generation

generate_model_card.py Script

# Create generator
generator = ModelCardGenerator()

# Generate card
variables = {
    "model_name": model_name,
    "repo_name": repo_id,
    "base_model": base_model,
    # ... other variables
}
content = generator.generate_model_card(variables)

Card Processing

Template Loading: Load from templates/model_card.md
Variable Replacement: Inject actual values
Conditional Processing: Handle optional sections

Demo Space Deployment

deploy_demo_space.py Script

# Initialize deployer
deployer = DemoSpaceDeployer(
    hf_token=token,
    hf_username=username,
    model_id=model_id,
    demo_type="voxtral"
)

# Deploy space
success = deployer.deploy()

Space Setup Process

Space Creation: Create HF Space repository
Template Copying: Copy demo template files
Environment Injection: Set model-specific variables
Secret Configuration: Configure HF_TOKEN and model variables

Space Building Process

Automatic Build Trigger

Dependency Installation: pip install -r requirements.txt
Model Download: Download model from HF Hub
App Initialization: Setup Gradio application

Demo Template Structure

templates/spaces/demo_voxtral/
├── app.py              # Main Gradio application
├── requirements.txt    # Python dependencies
└── README.md          # Space documentation

Live Demo Features

Gradio Interface

Audio Upload: File upload or recording
Real-time Inference: Live ASR transcription
Interactive Controls: Model parameters, settings

Model Inference Pipeline

Audio Processing: Convert to model inputs
Transcription Generation: Run ASR inference
Result Display: Show transcription with confidence

Configuration Management

Environment Variables

# Set in Space secrets/environment
os.environ['HF_MODEL_ID'] = model_id
os.environ['MODEL_NAME'] = model_name
os.environ['HF_TOKEN'] = token  # For model access

Demo-Specific Settings

Model Configuration: Base model, subfolder, quantization
UI Branding: Custom titles, descriptions, links
Example Prompts: Pre-configured demo examples

Error Handling & Monitoring

Build Process Monitoring

Build Logs: Real-time build status
Error Detection: Failed dependency installation
Retry Logic: Automatic rebuild on failure

Runtime Monitoring

Space Health: Uptime and responsiveness
Model Loading: Successful model initialization
Inference Errors: Runtime error handling

Security Considerations

Token Management

Read-Only Tokens: Use read-only tokens for demo spaces
Secret Storage: Secure storage of HF_TOKEN
Access Control: Proper repository permissions

Resource Management

Memory Limits: Space hardware constraints
Timeout Handling: Inference timeout protection
Rate Limiting: Prevent abuse

Integration Points

With Training Scripts

Training Config: Used for model card generation
Training Results: Included in model metadata
Model Path: Direct path to trained model files

With Interface (interface.py)

Parameter Passing: Deployment settings from UI
Progress Updates: Deployment progress to user
Result Links: Direct links to deployed spaces

Deployment Workflows

Full Pipeline (Recommended)

Train model → Generate model card → Push to Hub → Deploy demo
All steps automated through single interface action
Comprehensive error handling and rollback

Manual Deployment

Use individual scripts for granular control
Custom configuration and branding
Debugging and troubleshooting capabilities

CI/CD Integration

Automated Triggers: GitHub Actions integration
Version Control: Model versioning and releases
Testing: Automated demo testing

Performance Optimization

Space Hardware Selection

CPU Basic: Free tier, sufficient for small models
GPU Options: For larger models requiring acceleration
Memory Scaling: Based on model size requirements

Model Optimization

Quantization: 4-bit quantization for smaller footprint
Model Sharding: Split large models across memory
Caching: Model caching for faster cold starts

Monitoring & Analytics

Space Analytics

Usage Metrics: Daily active users, session duration
Performance Metrics: Inference latency, error rates
User Feedback: Demo effectiveness and issues

Model Analytics

Download Stats: Model popularity and usage
Citation Tracking: Academic and research usage
Community Feedback: GitHub issues and discussions