Spaces:
Running
on
Zero
Running
on
Zero
Usage Guide - WAN 2.2 Image-to-Video LoRA Demo
Quick Start
1. Deploying to Hugging Face Spaces
To deploy this demo to Hugging Face Spaces:
# Install git-lfs if not already installed
git lfs install
# Create a new Space on huggingface.co
# Then clone your space repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME
# Copy all files from this demo
cp -r * YOUR_SPACE_NAME/
# Commit and push
git add .
git commit -m "Initial commit: WAN 2.2 Image-to-Video LoRA Demo"
git push
2. Running Locally
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the app
python app.py
The app will be available at http://localhost:7860
Using the Demo
Basic Usage
- Upload Image: Click the image upload area and select an image file
- Enter Prompt: Type a description of the motion you want (e.g., "A person walking forward, cinematic")
- Click Generate: Wait for the video to be generated (first run will download the model)
- View Result: The generated video will appear in the output area
Advanced Settings
Expand the "Advanced Settings" accordion to access:
Inference Steps (20-100): More steps = higher quality but slower generation
- 20-30: Fast, lower quality
- 50: Balanced (recommended)
- 80-100: Slow, highest quality
Guidance Scale (1.0-15.0): How closely to follow the prompt
- 1.0-3.0: More creative, less faithful to prompt
- 6.0: Balanced (recommended)
- 10.0-15.0: Very faithful to prompt, less creative
Use LoRA: Enable/disable LoRA fine-tuning
LoRA Type:
- High-Noise: Best for dynamic, action-heavy scenes
- Low-Noise: Best for subtle, smooth motions
Example Prompts
Good Prompts
- "A cat walking through a garden, sunny day, high quality"
- "Waves crashing on a beach, sunset lighting, cinematic"
- "A car driving down a highway, fast motion, 4k"
- "Smoke rising from a campfire, slow motion"
Tips for Better Results
- Be Specific: Include details about motion, lighting, and quality
- Use Keywords: "cinematic", "high quality", "4k", "smooth"
- Describe Motion: Clearly state what should move and how
- Consider Style: Add style descriptors like "photorealistic" or "animated"
Troubleshooting
Out of Memory Error
If you encounter OOM errors:
- The model requires significant VRAM (16GB+ recommended)
- On Hugging Face Spaces, ensure you're using at least
gpu-mediumhardware - For local runs, try reducing the number of frames or using CPU offloading
Slow Generation
- First generation will be slower (model downloads)
- Reduce inference steps for faster results
- Ensure GPU is being used (check logs for "Loading model on cuda")
Model Not Loading
If the model fails to load:
- Check your internet connection (model is ~20GB)
- Ensure sufficient disk space
- For Hugging Face Spaces, check your Space's logs
Customization
Using Your Own LoRA Files
To use your own LoRA weights:
- Upload LoRA
.safetensorsfiles to Hugging Face - Update the URLs in
app.py:
HIGH_NOISE_LORA_URL = "https://huggingface.co/YOUR_USERNAME/YOUR_REPO/resolve/main/your_lora.safetensors"
- Uncomment and implement the LoRA loading code in the
generate_videofunction
Changing the Model
To use a different model:
- Update
MODEL_IDinapp.py - Ensure the model is compatible with
CogVideoXImageToVideoPipeline - Adjust memory optimizations if needed
Performance Notes
- GPU (A10G/T4): ~2-3 minutes per video
- GPU (A100): ~1-2 minutes per video
- CPU: Not recommended (20+ minutes)
API Access
For programmatic access, you can use the Gradio Client:
from gradio_client import Client
client = Client("YOUR_USERNAME/YOUR_SPACE_NAME")
result = client.predict(
image="path/to/image.jpg",
prompt="A cat walking",
api_name="/predict"
)
Credits
- Model: CogVideoX by THUDM
- Framework: Hugging Face Diffusers
- Interface: Gradio
License
Apache 2.0 - See LICENSE file for details