no@email.com
Initial commit
4b40584
|
raw
history blame
4.4 kB

Usage Guide - WAN 2.2 Image-to-Video LoRA Demo

Quick Start

1. Deploying to Hugging Face Spaces

To deploy this demo to Hugging Face Spaces:

# Install git-lfs if not already installed
git lfs install

# Create a new Space on huggingface.co
# Then clone your space repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME

# Copy all files from this demo
cp -r * YOUR_SPACE_NAME/

# Commit and push
git add .
git commit -m "Initial commit: WAN 2.2 Image-to-Video LoRA Demo"
git push

2. Running Locally

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the app
python app.py

The app will be available at http://localhost:7860

Using the Demo

Basic Usage

  1. Upload Image: Click the image upload area and select an image file
  2. Enter Prompt: Type a description of the motion you want (e.g., "A person walking forward, cinematic")
  3. Click Generate: Wait for the video to be generated (first run will download the model)
  4. View Result: The generated video will appear in the output area

Advanced Settings

Expand the "Advanced Settings" accordion to access:

  • Inference Steps (20-100): More steps = higher quality but slower generation

    • 20-30: Fast, lower quality
    • 50: Balanced (recommended)
    • 80-100: Slow, highest quality
  • Guidance Scale (1.0-15.0): How closely to follow the prompt

    • 1.0-3.0: More creative, less faithful to prompt
    • 6.0: Balanced (recommended)
    • 10.0-15.0: Very faithful to prompt, less creative
  • Use LoRA: Enable/disable LoRA fine-tuning

  • LoRA Type:

    • High-Noise: Best for dynamic, action-heavy scenes
    • Low-Noise: Best for subtle, smooth motions

Example Prompts

Good Prompts

  • "A cat walking through a garden, sunny day, high quality"
  • "Waves crashing on a beach, sunset lighting, cinematic"
  • "A car driving down a highway, fast motion, 4k"
  • "Smoke rising from a campfire, slow motion"

Tips for Better Results

  1. Be Specific: Include details about motion, lighting, and quality
  2. Use Keywords: "cinematic", "high quality", "4k", "smooth"
  3. Describe Motion: Clearly state what should move and how
  4. Consider Style: Add style descriptors like "photorealistic" or "animated"

Troubleshooting

Out of Memory Error

If you encounter OOM errors:

  1. The model requires significant VRAM (16GB+ recommended)
  2. On Hugging Face Spaces, ensure you're using at least gpu-medium hardware
  3. For local runs, try reducing the number of frames or using CPU offloading

Slow Generation

  • First generation will be slower (model downloads)
  • Reduce inference steps for faster results
  • Ensure GPU is being used (check logs for "Loading model on cuda")

Model Not Loading

If the model fails to load:

  1. Check your internet connection (model is ~20GB)
  2. Ensure sufficient disk space
  3. For Hugging Face Spaces, check your Space's logs

Customization

Using Your Own LoRA Files

To use your own LoRA weights:

  1. Upload LoRA .safetensors files to Hugging Face
  2. Update the URLs in app.py:
HIGH_NOISE_LORA_URL = "https://huggingface.co/YOUR_USERNAME/YOUR_REPO/resolve/main/your_lora.safetensors"
  1. Uncomment and implement the LoRA loading code in the generate_video function

Changing the Model

To use a different model:

  1. Update MODEL_ID in app.py
  2. Ensure the model is compatible with CogVideoXImageToVideoPipeline
  3. Adjust memory optimizations if needed

Performance Notes

  • GPU (A10G/T4): ~2-3 minutes per video
  • GPU (A100): ~1-2 minutes per video
  • CPU: Not recommended (20+ minutes)

API Access

For programmatic access, you can use the Gradio Client:

from gradio_client import Client

client = Client("YOUR_USERNAME/YOUR_SPACE_NAME")
result = client.predict(
    image="path/to/image.jpg",
    prompt="A cat walking",
    api_name="/predict"
)

Credits

  • Model: CogVideoX by THUDM
  • Framework: Hugging Face Diffusers
  • Interface: Gradio

License

Apache 2.0 - See LICENSE file for details