Spaces:
Running
Running
# Models Overview | |
WanGP supports multiple video generation models, each optimized for different use cases and hardware configurations. | |
## Wan 2.1 Text2Video Models | |
Please note that that the term *Text2Video* refers to the underlying Wan architecture but as it has been greatly improved overtime many derived Text2Video models can now generate videos using images. | |
#### Wan 2.1 Text2Video 1.3B | |
- **Size**: 1.3 billion parameters | |
- **VRAM**: 6GB minimum | |
- **Speed**: Fast generation | |
- **Quality**: Good quality for the size | |
- **Best for**: Quick iterations, lower-end hardware | |
- **Command**: `python wgp.py --t2v-1-3B` | |
#### Wan 2.1 Text2Video 14B | |
- **Size**: 14 billion parameters | |
- **VRAM**: 12GB+ recommended | |
- **Speed**: Slower but higher quality | |
- **Quality**: Excellent detail and coherence | |
- **Best for**: Final production videos | |
- **Command**: `python wgp.py --t2v-14B` | |
#### Wan Vace 1.3B | |
- **Type**: ControlNet for advanced video control | |
- **VRAM**: 6GB minimum | |
- **Features**: Motion transfer, object injection, inpainting | |
- **Best for**: Advanced video manipulation | |
- **Command**: `python wgp.py --vace-1.3B` | |
#### Wan Vace 14B | |
- **Type**: Large ControlNet model | |
- **VRAM**: 12GB+ recommended | |
- **Features**: All Vace features with higher quality | |
- **Best for**: Professional video editing workflows | |
#### MoviiGen (Experimental) | |
- **Resolution**: Claims 1080p capability | |
- **VRAM**: 20GB+ required | |
- **Speed**: Very slow generation | |
- **Features**: Should generate cinema like video, specialized for 2.1 / 1 ratios | |
- **Status**: Experimental, feedback welcome | |
<BR> | |
## Wan 2.1 Image-to-Video Models | |
#### Wan 2.1 Image2Video 14B | |
- **Size**: 14 billion parameters | |
- **VRAM**: 12GB+ recommended | |
- **Speed**: Slower but higher quality | |
- **Quality**: Excellent detail and coherence | |
- **Best for**: Most Loras available work with this model | |
- **Command**: `python wgp.py --i2v-14B` | |
#### FLF2V | |
- **Type**: Start/end frame specialist | |
- **Resolution**: Optimized for 720p | |
- **Official**: Wan team supported | |
- **Use case**: Image-to-video with specific endpoints | |
<BR> | |
## Wan 2.1 Specialized Models | |
#### FantasySpeaking | |
- **Type**: Talking head animation | |
- **Input**: Voice track + image | |
- **Works on**: People and objects | |
- **Use case**: Lip-sync and voice-driven animation | |
#### Phantom | |
- **Type**: Person/object transfer | |
- **Resolution**: Works well at 720p | |
- **Requirements**: 30+ steps for good results | |
- **Best for**: Transferring subjects between videos | |
#### Recam Master | |
- **Type**: Viewpoint change | |
- **Requirements**: 81+ frame input videos, 15+ denoising steps | |
- **Use case**: View same scene from different angles | |
#### Sky Reels v2 | |
- **Type**: Diffusion Forcing model | |
- **Specialty**: "Infinite length" videos | |
- **Features**: High quality continuous generation | |
<BR> | |
## Wan Fun InP Models | |
#### Wan Fun InP 1.3B | |
- **Size**: 1.3 billion parameters | |
- **VRAM**: 6GB minimum | |
- **Quality**: Good for the size, accessible to lower hardware | |
- **Best for**: Entry-level image animation | |
- **Command**: `python wgp.py --i2v-1-3B` | |
#### Wan Fun InP 14B | |
- **Size**: 14 billion parameters | |
- **VRAM**: 12GB+ recommended | |
- **Quality**: Better end image support | |
- **Limitation**: Existing loras don't work as well | |
<BR> | |
## Wan Special Loras | |
### Safe-Forcing lightx2v Lora | |
- **Type**: Distilled model (Lora implementation) | |
- **Speed**: 4-8 steps generation, 2x faster (no classifier free guidance) | |
- **Compatible**: Works with t2v and i2v Wan 14B models | |
- **Setup**: Requires Safe-Forcing lightx2v Lora (see [LORAS.md](LORAS.md)) | |
### Causvid Lora | |
- **Type**: Distilled model (Lora implementation) | |
- **Speed**: 4-12 steps generation, 2x faster (no classifier free guidance) | |
- **Compatible**: Works with Wan 14B models | |
- **Setup**: Requires CausVid Lora (see [LORAS.md](LORAS.md)) | |
<BR> | |
## Hunyuan Video Models | |
#### Hunyuan Video Text2Video | |
- **Quality**: Among the best open source t2v models | |
- **VRAM**: 12GB+ recommended | |
- **Speed**: Slower generation but excellent results | |
- **Features**: Superior text adherence and video quality, up to 10s of video | |
- **Best for**: High-quality text-to-video generation | |
#### Hunyuan Video Custom | |
- **Specialty**: Identity preservation | |
- **Use case**: Injecting specific people into videos | |
- **Quality**: Excellent for character consistency | |
- **Best for**: Character-focused video generation | |
#### Hunyuan Video Avater | |
- **Specialty**: Generate up to 15s of high quality speech / song driven Video . | |
- **Use case**: Injecting specific people into videos | |
- **Quality**: Excellent for character consistency | |
- **Best for**: Character-focused video generation, Video synchronized with voice | |
<BR> | |
## LTX Video Models | |
#### LTX Video 13B | |
- **Specialty**: Long video generation | |
- **Resolution**: Fast 720p generation | |
- **VRAM**: Optimized by WanGP (4x reduction in requirements) | |
- **Best for**: Longer duration videos | |
#### LTX Video 13B Distilled | |
- **Speed**: Generate in less than one minute | |
- **Quality**: Very high quality despite speed | |
- **Best for**: Rapid prototyping and quick results | |
<BR> | |
## Model Selection Guide | |
### By Hardware (VRAM) | |
#### 6-8GB VRAM | |
- Wan 2.1 T2V 1.3B | |
- Wan Fun InP 1.3B | |
- Wan Vace 1.3B | |
#### 10-12GB VRAM | |
- Wan 2.1 T2V 14B | |
- Wan Fun InP 14B | |
- Hunyuan Video (with optimizations) | |
- LTX Video 13B | |
#### 16GB+ VRAM | |
- All models supported | |
- Longer videos possible | |
- Higher resolutions | |
- Multiple simultaneous Loras | |
#### 20GB+ VRAM | |
- MoviiGen (experimental 1080p) | |
- Very long videos | |
- Maximum quality settings | |
### By Use Case | |
#### Quick Prototyping | |
1. **LTX Video 13B Distilled** - Fastest, high quality | |
2. **Wan 2.1 T2V 1.3B** - Fast, good quality | |
3. **CausVid Lora** - 4-12 steps, very fast | |
#### Best Quality | |
1. **Hunyuan Video** - Overall best t2v quality | |
2. **Wan 2.1 T2V 14B** - Excellent Wan quality | |
3. **Wan Vace 14B** - Best for controlled generation | |
#### Advanced Control | |
1. **Wan Vace 14B/1.3B** - Motion transfer, object injection | |
2. **Phantom** - Person/object transfer | |
3. **FantasySpeaking** - Voice-driven animation | |
#### Long Videos | |
1. **LTX Video 13B** - Specialized for length | |
2. **Sky Reels v2** - Infinite length videos | |
3. **Wan Vace + Sliding Windows** - Up to 1 minute | |
#### Lower Hardware | |
1. **Wan Fun InP 1.3B** - Image-to-video | |
2. **Wan 2.1 T2V 1.3B** - Text-to-video | |
3. **Wan Vace 1.3B** - Advanced control | |
<BR> | |
## Performance Comparison | |
### Speed (Relative) | |
1. **CausVid Lora** (4-12 steps) - Fastest | |
2. **LTX Video Distilled** - Very fast | |
3. **Wan 1.3B models** - Fast | |
4. **Wan 14B models** - Medium | |
5. **Hunyuan Video** - Slower | |
6. **MoviiGen** - Slowest | |
### Quality (Subjective) | |
1. **Hunyuan Video** - Highest overall | |
2. **Wan 14B models** - Excellent | |
3. **LTX Video models** - Very good | |
4. **Wan 1.3B models** - Good | |
5. **CausVid** - Good (varies with steps) | |
### VRAM Efficiency | |
1. **Wan 1.3B models** - Most efficient | |
2. **LTX Video** (with WanGP optimizations) | |
3. **Wan 14B models** | |
4. **Hunyuan Video** | |
5. **MoviiGen** - Least efficient | |
<BR> | |
## Model Switching | |
WanGP allows switching between models without restarting: | |
1. Use the dropdown menu in the web interface | |
2. Models are loaded on-demand | |
3. Previous model is unloaded to save VRAM | |
4. Settings are preserved when possible | |
<BR> | |
## Tips for Model Selection | |
### First Time Users | |
Start with **Wan 2.1 T2V 1.3B** to learn the interface and test your hardware. | |
### Production Work | |
Use **Hunyuan Video** or **Wan 14B** models for final output quality. | |
### Experimentation | |
**CausVid Lora** or **LTX Distilled** for rapid iteration and testing. | |
### Specialized Tasks | |
- **VACE** for advanced control | |
- **FantasySpeaking** for talking heads | |
- **LTX Video** for long sequences | |
### Hardware Optimization | |
Always start with the largest model your VRAM can handle, then optimize settings for speed vs quality based on your needs. |