Spaces:

Tonic
/

VoxFactory

Running

App Files Files Community

VoxFactory / docs /deployment-pipeline.md

Joseph Pollack

adds docs

a3a3978 unverified 9 days ago

preview code

raw

history blame contribute delete

10.3 kB

	# Deployment Pipeline

	```mermaid
	graph TB
	%% Input Sources
	subgraph "Inputs"
	TRAINED_MODEL[Trained Model<br/>Local directory]
	TRAINING_CONFIG[Training Config<br/>JSON/YAML]
	TRAINING_RESULTS[Training Results<br/>Metrics & logs]
	MODEL_METADATA[Model Metadata<br/>Name, description, etc.]
	end

	%% Model Publishing
	subgraph "Model Publishing"
	PUSH_SCRIPT[push_to_huggingface.py<br/>Model Publisher]

	subgraph "Publishing Steps"
	REPO_CREATION[Repository Creation<br/>HF Hub API]
	FILE_UPLOAD[File Upload<br/>Model files to HF]
	METADATA_UPLOAD[Metadata Upload<br/>Config & results]
	end
	end

	%% Model Card Generation
	subgraph "Model Card Generation"
	CARD_SCRIPT[generate_model_card.py<br/>Card Generator]

	subgraph "Card Components"
	TEMPLATE_LOAD[Template Loading<br/>model_card.md]
	VARIABLE_REPLACEMENT[Variable Replacement<br/>Config injection]
	CONDITIONAL_PROCESSING[Conditional Sections<br/>Quantized models, etc.]
	end
	end

	%% Demo Space Deployment
	subgraph "Demo Space Deployment"
	DEPLOY_SCRIPT[deploy_demo_space.py<br/>Space Deployer]

	subgraph "Space Setup"
	SPACE_CREATION[Space Repository<br/>Create HF Space]
	TEMPLATE_COPY[Template Copying<br/>demo_voxtral/ files]
	ENV_INJECTION[Environment Setup<br/>Model config injection]
	SECRET_SETUP[Secret Configuration<br/>HF_TOKEN, model vars]
	end
	end

	%% Space Building & Testing
	subgraph "Space Building"
	BUILD_TRIGGER[Build Trigger<br/>Automatic build start]
	DEPENDENCY_INSTALL[Dependency Installation<br/>requirements.txt]
	MODEL_DOWNLOAD[Model Download<br/>From HF Hub]
	APP_INITIALIZATION[App Initialization<br/>Gradio app setup]
	end

	%% Live Demo
	subgraph "Live Demo Space"
	GRADIO_INTERFACE[Gradio Interface<br/>Interactive demo]
	MODEL_INFERENCE[Model Inference<br/>Real-time ASR]
	USER_INTERACTION[User Interaction<br/>Audio upload/playback]
	end

	%% External Services
	subgraph "External Services"
	HF_HUB[Hugging Face Hub<br/>Model & Space hosting]
	HF_SPACES[HF Spaces Platform<br/>Demo hosting]
	end

	%% Flow Connections
	TRAINED_MODEL --> PUSH_SCRIPT
	TRAINING_CONFIG --> PUSH_SCRIPT
	TRAINING_RESULTS --> PUSH_SCRIPT
	MODEL_METADATA --> PUSH_SCRIPT

	PUSH_SCRIPT --> REPO_CREATION
	REPO_CREATION --> FILE_UPLOAD
	FILE_UPLOAD --> METADATA_UPLOAD

	METADATA_UPLOAD --> CARD_SCRIPT
	TRAINING_CONFIG --> CARD_SCRIPT
	TRAINING_RESULTS --> CARD_SCRIPT

	CARD_SCRIPT --> TEMPLATE_LOAD
	TEMPLATE_LOAD --> VARIABLE_REPLACEMENT
	VARIABLE_REPLACEMENT --> CONDITIONAL_PROCESSING

	CONDITIONAL_PROCESSING --> DEPLOY_SCRIPT
	METADATA_UPLOAD --> DEPLOY_SCRIPT

	DEPLOY_SCRIPT --> SPACE_CREATION
	SPACE_CREATION --> TEMPLATE_COPY
	TEMPLATE_COPY --> ENV_INJECTION
	ENV_INJECTION --> SECRET_SETUP

	SECRET_SETUP --> BUILD_TRIGGER
	BUILD_TRIGGER --> DEPENDENCY_INSTALL
	DEPENDENCY_INSTALL --> MODEL_DOWNLOAD
	MODEL_DOWNLOAD --> APP_INITIALIZATION

	APP_INITIALIZATION --> GRADIO_INTERFACE
	GRADIO_INTERFACE --> MODEL_INFERENCE
	MODEL_INFERENCE --> USER_INTERACTION

	HF_HUB --> MODEL_DOWNLOAD
	HF_SPACES --> GRADIO_INTERFACE

	%% Styling
	classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
	classDef publishing fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
	classDef generation fill:#fff3e0,stroke:#f57c00,stroke-width:2px
	classDef deployment fill:#fce4ec,stroke:#c2185b,stroke-width:2px
	classDef building fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
	classDef demo fill:#e1f5fe,stroke:#0277bd,stroke-width:2px
	classDef external fill:#f5f5f5,stroke:#424242,stroke-width:2px

	class TRAINED_MODEL,TRAINING_CONFIG,TRAINING_RESULTS,MODEL_METADATA input
	class PUSH_SCRIPT,REPO_CREATION,FILE_UPLOAD,METADATA_UPLOAD publishing
	class CARD_SCRIPT,TEMPLATE_LOAD,VARIABLE_REPLACEMENT,CONDITIONAL_PROCESSING generation
	class DEPLOY_SCRIPT,SPACE_CREATION,TEMPLATE_COPY,ENV_INJECTION,SECRET_SETUP deployment
	class BUILD_TRIGGER,DEPENDENCY_INSTALL,MODEL_DOWNLOAD,APP_INITIALIZATION building
	class GRADIO_INTERFACE,MODEL_INFERENCE,USER_INTERACTION demo
	class HF_HUB,HF_SPACES external
	```

	## Deployment Pipeline Overview

	This diagram illustrates the complete deployment pipeline that takes a trained Voxtral model and makes it available as an interactive demo on Hugging Face Spaces.

	### Input Sources

	#### Trained Model Artifacts
	- Model Files: `model.safetensors`, `config.json`, `tokenizer.json`
	- Training Config: Hyperparameters and training setup
	- Training Results: Metrics, loss curves, evaluation results
	- Model Metadata: Name, description, base model information

	### Model Publishing Phase

	#### push_to_huggingface.py Script
	```python
	# Initialize publisher
	pusher = HuggingFacePusher(
	model_path=output_dir,
	repo_name=repo_name,
	token=hf_token
	)

	# Push model
	success = pusher.push_model(training_config, results)
	```

	#### Publishing Steps
	1. Repository Creation: Create HF Hub repository
	2. File Upload: Upload all model files
	3. Metadata Upload: Upload training config and results

	### Model Card Generation

	#### generate_model_card.py Script
	```python
	# Create generator
	generator = ModelCardGenerator()

	# Generate card
	variables = {
	"model_name": model_name,
	"repo_name": repo_id,
	"base_model": base_model,
	# ... other variables
	}
	content = generator.generate_model_card(variables)
	```

	#### Card Processing
	1. Template Loading: Load from `templates/model_card.md`
	2. Variable Replacement: Inject actual values
	3. Conditional Processing: Handle optional sections

	### Demo Space Deployment

	#### deploy_demo_space.py Script
	```python
	# Initialize deployer
	deployer = DemoSpaceDeployer(
	hf_token=token,
	hf_username=username,
	model_id=model_id,
	demo_type="voxtral"
	)

	# Deploy space
	success = deployer.deploy()
	```

	#### Space Setup Process
	1. Space Creation: Create HF Space repository
	2. Template Copying: Copy demo template files
	3. Environment Injection: Set model-specific variables
	4. Secret Configuration: Configure HF_TOKEN and model variables

	### Space Building Process

	#### Automatic Build Trigger
	- Dependency Installation: `pip install -r requirements.txt`
	- Model Download: Download model from HF Hub
	- App Initialization: Setup Gradio application

	#### Demo Template Structure
	```
	templates/spaces/demo_voxtral/
	├── app.py # Main Gradio application
	├── requirements.txt # Python dependencies
	└── README.md # Space documentation
	```

	### Live Demo Features

	#### Gradio Interface
	- Audio Upload: File upload or recording
	- Real-time Inference: Live ASR transcription
	- Interactive Controls: Model parameters, settings

	#### Model Inference Pipeline
	- Audio Processing: Convert to model inputs
	- Transcription Generation: Run ASR inference
	- Result Display: Show transcription with confidence

	### Configuration Management

	#### Environment Variables
	```python
	# Set in Space secrets/environment
	os.environ['HF_MODEL_ID'] = model_id
	os.environ['MODEL_NAME'] = model_name
	os.environ['HF_TOKEN'] = token # For model access
	```

	#### Demo-Specific Settings
	- Model Configuration: Base model, subfolder, quantization
	- UI Branding: Custom titles, descriptions, links
	- Example Prompts: Pre-configured demo examples

	### Error Handling & Monitoring

	#### Build Process Monitoring
	- Build Logs: Real-time build status
	- Error Detection: Failed dependency installation
	- Retry Logic: Automatic rebuild on failure

	#### Runtime Monitoring
	- Space Health: Uptime and responsiveness
	- Model Loading: Successful model initialization
	- Inference Errors: Runtime error handling

	### Security Considerations

	#### Token Management
	- Read-Only Tokens: Use read-only tokens for demo spaces
	- Secret Storage: Secure storage of HF_TOKEN
	- Access Control: Proper repository permissions

	#### Resource Management
	- Memory Limits: Space hardware constraints
	- Timeout Handling: Inference timeout protection
	- Rate Limiting: Prevent abuse

	### Integration Points

	#### With Training Scripts
	- Training Config: Used for model card generation
	- Training Results: Included in model metadata
	- Model Path: Direct path to trained model files

	#### With Interface (interface.py)
	- Parameter Passing: Deployment settings from UI
	- Progress Updates: Deployment progress to user
	- Result Links: Direct links to deployed spaces

	### Deployment Workflows

	#### Full Pipeline (Recommended)
	1. Train model → Generate model card → Push to Hub → Deploy demo
	2. All steps automated through single interface action
	3. Comprehensive error handling and rollback

	#### Manual Deployment
	1. Use individual scripts for granular control
	2. Custom configuration and branding
	3. Debugging and troubleshooting capabilities

	#### CI/CD Integration
	- Automated Triggers: GitHub Actions integration
	- Version Control: Model versioning and releases
	- Testing: Automated demo testing

	### Performance Optimization

	#### Space Hardware Selection
	- CPU Basic: Free tier, sufficient for small models
	- GPU Options: For larger models requiring acceleration
	- Memory Scaling: Based on model size requirements

	#### Model Optimization
	- Quantization: 4-bit quantization for smaller footprint
	- Model Sharding: Split large models across memory
	- Caching: Model caching for faster cold starts

	### Monitoring & Analytics

	#### Space Analytics
	- Usage Metrics: Daily active users, session duration
	- Performance Metrics: Inference latency, error rates
	- User Feedback: Demo effectiveness and issues

	#### Model Analytics
	- Download Stats: Model popularity and usage
	- Citation Tracking: Academic and research usage
	- Community Feedback: GitHub issues and discussions

	See also:
	- [Architecture Overview](architecture.md)
	- [Training Pipeline](training-pipeline.md)
	- [Data Flow](data-flow.md)