VoxFactory / docs /deployment-pipeline.md
Joseph Pollack
adds docs
a3a3978 unverified

A newer version of the Gradio SDK is available: 5.45.0

Upgrade

Deployment Pipeline

graph TB
    %% Input Sources
    subgraph "Inputs"
        TRAINED_MODEL[Trained Model<br/>Local directory]
        TRAINING_CONFIG[Training Config<br/>JSON/YAML]
        TRAINING_RESULTS[Training Results<br/>Metrics & logs]
        MODEL_METADATA[Model Metadata<br/>Name, description, etc.]
    end

    %% Model Publishing
    subgraph "Model Publishing"
        PUSH_SCRIPT[push_to_huggingface.py<br/>Model Publisher]

        subgraph "Publishing Steps"
            REPO_CREATION[Repository Creation<br/>HF Hub API]
            FILE_UPLOAD[File Upload<br/>Model files to HF]
            METADATA_UPLOAD[Metadata Upload<br/>Config & results]
        end
    end

    %% Model Card Generation
    subgraph "Model Card Generation"
        CARD_SCRIPT[generate_model_card.py<br/>Card Generator]

        subgraph "Card Components"
            TEMPLATE_LOAD[Template Loading<br/>model_card.md]
            VARIABLE_REPLACEMENT[Variable Replacement<br/>Config injection]
            CONDITIONAL_PROCESSING[Conditional Sections<br/>Quantized models, etc.]
        end
    end

    %% Demo Space Deployment
    subgraph "Demo Space Deployment"
        DEPLOY_SCRIPT[deploy_demo_space.py<br/>Space Deployer]

        subgraph "Space Setup"
            SPACE_CREATION[Space Repository<br/>Create HF Space]
            TEMPLATE_COPY[Template Copying<br/>demo_voxtral/ files]
            ENV_INJECTION[Environment Setup<br/>Model config injection]
            SECRET_SETUP[Secret Configuration<br/>HF_TOKEN, model vars]
        end
    end

    %% Space Building & Testing
    subgraph "Space Building"
        BUILD_TRIGGER[Build Trigger<br/>Automatic build start]
        DEPENDENCY_INSTALL[Dependency Installation<br/>requirements.txt]
        MODEL_DOWNLOAD[Model Download<br/>From HF Hub]
        APP_INITIALIZATION[App Initialization<br/>Gradio app setup]
    end

    %% Live Demo
    subgraph "Live Demo Space"
        GRADIO_INTERFACE[Gradio Interface<br/>Interactive demo]
        MODEL_INFERENCE[Model Inference<br/>Real-time ASR]
        USER_INTERACTION[User Interaction<br/>Audio upload/playback]
    end

    %% External Services
    subgraph "External Services"
        HF_HUB[Hugging Face Hub<br/>Model & Space hosting]
        HF_SPACES[HF Spaces Platform<br/>Demo hosting]
    end

    %% Flow Connections
    TRAINED_MODEL --> PUSH_SCRIPT
    TRAINING_CONFIG --> PUSH_SCRIPT
    TRAINING_RESULTS --> PUSH_SCRIPT
    MODEL_METADATA --> PUSH_SCRIPT

    PUSH_SCRIPT --> REPO_CREATION
    REPO_CREATION --> FILE_UPLOAD
    FILE_UPLOAD --> METADATA_UPLOAD

    METADATA_UPLOAD --> CARD_SCRIPT
    TRAINING_CONFIG --> CARD_SCRIPT
    TRAINING_RESULTS --> CARD_SCRIPT

    CARD_SCRIPT --> TEMPLATE_LOAD
    TEMPLATE_LOAD --> VARIABLE_REPLACEMENT
    VARIABLE_REPLACEMENT --> CONDITIONAL_PROCESSING

    CONDITIONAL_PROCESSING --> DEPLOY_SCRIPT
    METADATA_UPLOAD --> DEPLOY_SCRIPT

    DEPLOY_SCRIPT --> SPACE_CREATION
    SPACE_CREATION --> TEMPLATE_COPY
    TEMPLATE_COPY --> ENV_INJECTION
    ENV_INJECTION --> SECRET_SETUP

    SECRET_SETUP --> BUILD_TRIGGER
    BUILD_TRIGGER --> DEPENDENCY_INSTALL
    DEPENDENCY_INSTALL --> MODEL_DOWNLOAD
    MODEL_DOWNLOAD --> APP_INITIALIZATION

    APP_INITIALIZATION --> GRADIO_INTERFACE
    GRADIO_INTERFACE --> MODEL_INFERENCE
    MODEL_INFERENCE --> USER_INTERACTION

    HF_HUB --> MODEL_DOWNLOAD
    HF_SPACES --> GRADIO_INTERFACE

    %% Styling
    classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef publishing fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
    classDef generation fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    classDef deployment fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef building fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef demo fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
    classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px

    class TRAINED_MODEL,TRAINING_CONFIG,TRAINING_RESULTS,MODEL_METADATA input
    class PUSH_SCRIPT,REPO_CREATION,FILE_UPLOAD,METADATA_UPLOAD publishing
    class CARD_SCRIPT,TEMPLATE_LOAD,VARIABLE_REPLACEMENT,CONDITIONAL_PROCESSING generation
    class DEPLOY_SCRIPT,SPACE_CREATION,TEMPLATE_COPY,ENV_INJECTION,SECRET_SETUP deployment
    class BUILD_TRIGGER,DEPENDENCY_INSTALL,MODEL_DOWNLOAD,APP_INITIALIZATION building
    class GRADIO_INTERFACE,MODEL_INFERENCE,USER_INTERACTION demo
    class HF_HUB,HF_SPACES external

Deployment Pipeline Overview

This diagram illustrates the complete deployment pipeline that takes a trained Voxtral model and makes it available as an interactive demo on Hugging Face Spaces.

Input Sources

Trained Model Artifacts

  • Model Files: model.safetensors, config.json, tokenizer.json
  • Training Config: Hyperparameters and training setup
  • Training Results: Metrics, loss curves, evaluation results
  • Model Metadata: Name, description, base model information

Model Publishing Phase

push_to_huggingface.py Script

# Initialize publisher
pusher = HuggingFacePusher(
    model_path=output_dir,
    repo_name=repo_name,
    token=hf_token
)

# Push model
success = pusher.push_model(training_config, results)

Publishing Steps

  1. Repository Creation: Create HF Hub repository
  2. File Upload: Upload all model files
  3. Metadata Upload: Upload training config and results

Model Card Generation

generate_model_card.py Script

# Create generator
generator = ModelCardGenerator()

# Generate card
variables = {
    "model_name": model_name,
    "repo_name": repo_id,
    "base_model": base_model,
    # ... other variables
}
content = generator.generate_model_card(variables)

Card Processing

  1. Template Loading: Load from templates/model_card.md
  2. Variable Replacement: Inject actual values
  3. Conditional Processing: Handle optional sections

Demo Space Deployment

deploy_demo_space.py Script

# Initialize deployer
deployer = DemoSpaceDeployer(
    hf_token=token,
    hf_username=username,
    model_id=model_id,
    demo_type="voxtral"
)

# Deploy space
success = deployer.deploy()

Space Setup Process

  1. Space Creation: Create HF Space repository
  2. Template Copying: Copy demo template files
  3. Environment Injection: Set model-specific variables
  4. Secret Configuration: Configure HF_TOKEN and model variables

Space Building Process

Automatic Build Trigger

  • Dependency Installation: pip install -r requirements.txt
  • Model Download: Download model from HF Hub
  • App Initialization: Setup Gradio application

Demo Template Structure

templates/spaces/demo_voxtral/
β”œβ”€β”€ app.py              # Main Gradio application
β”œβ”€β”€ requirements.txt    # Python dependencies
└── README.md          # Space documentation

Live Demo Features

Gradio Interface

  • Audio Upload: File upload or recording
  • Real-time Inference: Live ASR transcription
  • Interactive Controls: Model parameters, settings

Model Inference Pipeline

  • Audio Processing: Convert to model inputs
  • Transcription Generation: Run ASR inference
  • Result Display: Show transcription with confidence

Configuration Management

Environment Variables

# Set in Space secrets/environment
os.environ['HF_MODEL_ID'] = model_id
os.environ['MODEL_NAME'] = model_name
os.environ['HF_TOKEN'] = token  # For model access

Demo-Specific Settings

  • Model Configuration: Base model, subfolder, quantization
  • UI Branding: Custom titles, descriptions, links
  • Example Prompts: Pre-configured demo examples

Error Handling & Monitoring

Build Process Monitoring

  • Build Logs: Real-time build status
  • Error Detection: Failed dependency installation
  • Retry Logic: Automatic rebuild on failure

Runtime Monitoring

  • Space Health: Uptime and responsiveness
  • Model Loading: Successful model initialization
  • Inference Errors: Runtime error handling

Security Considerations

Token Management

  • Read-Only Tokens: Use read-only tokens for demo spaces
  • Secret Storage: Secure storage of HF_TOKEN
  • Access Control: Proper repository permissions

Resource Management

  • Memory Limits: Space hardware constraints
  • Timeout Handling: Inference timeout protection
  • Rate Limiting: Prevent abuse

Integration Points

With Training Scripts

  • Training Config: Used for model card generation
  • Training Results: Included in model metadata
  • Model Path: Direct path to trained model files

With Interface (interface.py)

  • Parameter Passing: Deployment settings from UI
  • Progress Updates: Deployment progress to user
  • Result Links: Direct links to deployed spaces

Deployment Workflows

Full Pipeline (Recommended)

  1. Train model β†’ Generate model card β†’ Push to Hub β†’ Deploy demo
  2. All steps automated through single interface action
  3. Comprehensive error handling and rollback

Manual Deployment

  1. Use individual scripts for granular control
  2. Custom configuration and branding
  3. Debugging and troubleshooting capabilities

CI/CD Integration

  • Automated Triggers: GitHub Actions integration
  • Version Control: Model versioning and releases
  • Testing: Automated demo testing

Performance Optimization

Space Hardware Selection

  • CPU Basic: Free tier, sufficient for small models
  • GPU Options: For larger models requiring acceleration
  • Memory Scaling: Based on model size requirements

Model Optimization

  • Quantization: 4-bit quantization for smaller footprint
  • Model Sharding: Split large models across memory
  • Caching: Model caching for faster cold starts

Monitoring & Analytics

Space Analytics

  • Usage Metrics: Daily active users, session duration
  • Performance Metrics: Inference latency, error rates
  • User Feedback: Demo effectiveness and issues

Model Analytics

  • Download Stats: Model popularity and usage
  • Citation Tracking: Academic and research usage
  • Community Feedback: GitHub issues and discussions

See also: