train-modle / DEPLOYMENT_GUIDE.md
fokan's picture
Initial clean commit: Multi-Modal Knowledge Distillation Platform
ab4e093

Deployment Guide for Hugging Face Spaces

This guide provides step-by-step instructions for deploying the Multi-Modal Knowledge Distillation application to Hugging Face Spaces.

πŸ“‹ Pre-Deployment Checklist

βœ… Project Structure Complete

  • All required files and directories are present
  • Python syntax validation passed
  • Frontend files are properly structured

βœ… Configuration Validated

  • requirements.txt contains all necessary dependencies
  • spaces_config.yaml is properly configured
  • API endpoints are implemented and accessible

βœ… Documentation Complete

  • Comprehensive README.md with usage instructions
  • API documentation included
  • Troubleshooting guide provided

πŸš€ Deployment Steps

Step 1: Create Hugging Face Space

  1. Go to Hugging Face Spaces

  2. Configure Space Settings

    • Space name: multi-modal-knowledge-distillation (or your preferred name)
    • License: MIT
    • SDK: Gradio
    • Hardware: T4 small (minimum) or T4 medium (recommended)
    • Visibility: Public or Private (your choice)
  3. Initialize Repository

    • Choose "Initialize with README"
    • Click "Create Space"

Step 2: Upload Project Files

Upload all the following files to your Space repository:

Core Application Files

app.py                    # Main FastAPI application
requirements.txt          # Python dependencies
spaces_config.yaml       # Hugging Face Spaces configuration
README.md                # Project documentation
.gitignore               # Git ignore rules

Source Code

src/
β”œβ”€β”€ __init__.py          # Package initialization
β”œβ”€β”€ model_loader.py      # Model loading utilities
β”œβ”€β”€ distillation.py      # Knowledge distillation engine
└── utils.py             # Utility functions

Frontend Files

templates/
└── index.html           # Main web interface

static/
β”œβ”€β”€ css/
β”‚   └── style.css        # Application styles
└── js/
    └── main.js          # Frontend JavaScript

Directory Structure (will be created automatically)

uploads/                 # Uploaded model files
models/                  # Trained models
temp/                    # Temporary files
logs/                    # Application logs

Step 3: Configure Hardware

  1. Go to Space Settings

    • Click on "Settings" tab in your Space
    • Navigate to "Hardware" section
  2. Select Hardware

    • Minimum: T4 small (16GB RAM, 1x T4 GPU)
    • Recommended: T4 medium (32GB RAM, 1x T4 GPU)
    • For large models: A10G small or larger
  3. Apply Changes

    • Click "Update hardware"
    • Your Space will restart with new hardware

Step 4: Monitor Deployment

  1. Build Process

    • Watch the "Logs" tab for build progress
    • Build typically takes 5-10 minutes
    • Dependencies will be installed automatically
  2. Common Build Issues

    • PyTorch installation: May take several minutes
    • CUDA compatibility: Ensure PyTorch version supports your hardware
    • Memory issues: Upgrade hardware if needed
  3. Successful Deployment

    • Space status shows "Running"
    • Application is accessible via the Space URL
    • Health check endpoint responds correctly

πŸ”§ Configuration Options

Environment Variables

You can set these in your Space settings:

# Server Configuration
PORT=7860                 # Default port (usually not needed)
HOST=0.0.0.0             # Default host

# Resource Limits
MAX_FILE_SIZE=5368709120  # 5GB max file size
MAX_MODELS=10            # Maximum teacher models
MAX_TRAINING_TIME=3600   # 1 hour training limit

# GPU Configuration
CUDA_VISIBLE_DEVICES=0   # GPU device selection

Hardware Recommendations

Use Case Hardware RAM GPU Cost
Demo/Testing CPU Basic 16GB None Free
Small Models T4 small 16GB T4 Low
Production T4 medium 32GB T4 Medium
Large Models A10G small 24GB A10G High

πŸ§ͺ Testing Your Deployment

1. Health Check

curl https://your-space-name-username.hf.space/health

2. Web Interface

  • Visit your Space URL
  • Test file upload functionality
  • Verify model selection works
  • Check training configuration options

3. API Endpoints

Test key endpoints:

  • GET / - Main interface
  • POST /upload - File upload
  • GET /models - List models
  • WebSocket /ws/{session_id} - Real-time updates

πŸ› Troubleshooting

Build Failures

PyTorch Installation Issues:

# Check if CUDA version is compatible
# Update requirements.txt if needed
torch==2.1.0+cu118

Memory Issues During Build:

  • Upgrade to higher hardware tier
  • Reduce dependency versions
  • Remove unnecessary packages

Runtime Issues

Out of Memory:

  • Increase hardware tier
  • Reduce batch size in training
  • Implement model sharding

Model Loading Failures:

  • Check file format compatibility
  • Verify Hugging Face model exists
  • Ensure sufficient disk space

WebSocket Connection Issues:

  • Check browser compatibility
  • Verify firewall settings
  • Try refreshing the page

Performance Issues

Slow Training:

  • Upgrade to GPU hardware
  • Increase batch size
  • Use mixed precision training

High Memory Usage:

  • Monitor system resources
  • Implement automatic cleanup
  • Reduce model cache size

πŸ“Š Monitoring and Maintenance

Logs and Monitoring

  • Check Space logs regularly
  • Monitor resource usage
  • Set up alerts for failures

Updates and Maintenance

  • Keep dependencies updated
  • Monitor for security issues
  • Regular cleanup of temporary files

Scaling Considerations

  • Monitor user load
  • Consider multiple Space instances
  • Implement load balancing if needed

πŸ”’ Security Best Practices

File Upload Security

  • Validate all uploaded files
  • Implement size limits
  • Scan for malicious content

API Security

  • Implement rate limiting
  • Validate all inputs
  • Use HTTPS only

Resource Protection

  • Monitor resource usage
  • Implement timeouts
  • Automatic cleanup procedures

πŸ“ˆ Performance Optimization

Model Loading

  • Cache frequently used models
  • Implement lazy loading
  • Use model compression

Training Optimization

  • Use mixed precision
  • Implement gradient checkpointing
  • Optimize batch sizes

Frontend Performance

  • Minimize JavaScript bundle
  • Optimize CSS delivery
  • Use CDN for static assets

🎯 Success Metrics

Your deployment is successful when:

βœ… Functionality

  • All API endpoints respond correctly
  • File uploads work without errors
  • Training completes successfully
  • Model downloads work properly

βœ… Performance

  • Page loads in < 3 seconds
  • Training starts within 30 seconds
  • Real-time updates work smoothly
  • Resource usage is within limits

βœ… User Experience

  • Interface is responsive on all devices
  • Error messages are clear and helpful
  • Progress tracking works accurately
  • Documentation is accessible

πŸ“ž Support and Resources


Your Multi-Modal Knowledge Distillation application is now ready for production deployment! πŸŽ‰