zen-director / README.md
zeekay's picture
Initial commit: Zen Director based on Wan 2.2 TI2V-5B
af14b0e verified
|
raw
history blame
3.08 kB
---
license: apache-2.0
language:
- en
- zh
pipeline_tag: text-to-video
tags:
- zen
- hanzo-ai
- video-generation
- text-to-video
- image-to-video
- wan2.2
- diffusion
base_model: Wan-AI/Wan2.2-TI2V-5B
---
# Zen Director
Video generation model based on Wan 2.2, specialized for text-to-video and image-to-video generation.
## Base Model
Built on **[Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)** - Text-to-Image-to-Video model with 5B parameters.
**Note:** This is based on Wan 2.2. Wan 2.5 is announced but not yet open-source. We will upgrade to Wan 2.5 when it becomes available.
## Capabilities
- **Text-to-Video**: Generate videos from text descriptions
- **Image-to-Video**: Animate static images into videos
- **High Resolution**: Supports high-quality video generation
- **Efficient**: Optimized MoE architecture for fast inference
## Model Details
- **Architecture**: Mixture-of-Experts (MoE) Transformer
- **Parameters**: 5B total
- **Base**: Wan 2.2 TI2V
- **Resolution**: Up to 1280x720
- **Frame Rate**: 24 FPS
- **Duration**: Up to 5 seconds
## Installation
```bash
pip install diffusers transformers accelerate torch
pip install av opencv-python pillow
```
## Usage
### Text-to-Video
```python
from diffusers import DiffusionPipeline
import torch
# Load the model
pipe = DiffusionPipeline.from_pretrained(
"zenlm/zen-director",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Generate video from text
prompt = "A serene sunset over a calm ocean with waves gently lapping at the shore"
video = pipe(prompt, num_frames=120, height=720, width=1280).frames
# Save video
from diffusers.utils import export_to_video
export_to_video(video, "output.mp4", fps=24)
```
### Image-to-Video
```python
from PIL import Image
# Load starting image
image = Image.open("input.jpg")
# Generate video from image
video = pipe(
prompt="Animate this image with gentle camera movement",
image=image,
num_frames=120
).frames
export_to_video(video, "animated.mp4", fps=24)
```
## Performance
- **Inference Speed**: ~2-3 seconds/frame on A100
- **Memory**: Requires 24GB+ VRAM for full resolution
- **Quantization**: FP16 recommended for consumer GPUs
## Roadmap
- โœ… **v1.0** - Wan 2.2 TI2V-5B base (current)
- ๐Ÿ”„ **v2.0** - Upgrade to Wan 2.5 when open-source
- ๐Ÿ“‹ **Future** - Fine-tuning for specific styles and domains
## Limitations
- Requires high-end GPU (24GB+ VRAM recommended)
- Video duration limited to 5 seconds
- Best results with detailed, specific prompts
- Some motion artifacts in complex scenes
## Citation
```bibtex
@misc{zen-director-2025,
title={Zen Director: Video Generation with Wan 2.2},
author={Hanzo AI},
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/zenlm/zen-director}}
}
@article{wan2024,
title={Wan 2.2: High-Quality Video Generation},
author={Wan-AI Team},
journal={arXiv preprint},
year={2024}
}
```
## License
Apache 2.0
---
**Note**: Based on Wan 2.2. Will be upgraded to Wan 2.5 when it becomes open-source.