--- license: apache-2.0 language: - en - zh pipeline_tag: text-to-video tags: - zen - hanzo-ai - video-generation - text-to-video - image-to-video - wan2.2 - diffusion base_model: Wan-AI/Wan2.2-TI2V-5B --- # Zen Director Video generation model based on Wan 2.2, specialized for text-to-video and image-to-video generation. ## Base Model Built on **[Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)** - Text-to-Image-to-Video model with 5B parameters. **Note:** This is based on Wan 2.2. Wan 2.5 is announced but not yet open-source. We will upgrade to Wan 2.5 when it becomes available. ## Capabilities - **Text-to-Video**: Generate videos from text descriptions - **Image-to-Video**: Animate static images into videos - **High Resolution**: Supports high-quality video generation - **Efficient**: Optimized MoE architecture for fast inference ## Model Details - **Architecture**: Mixture-of-Experts (MoE) Transformer - **Parameters**: 5B total - **Base**: Wan 2.2 TI2V - **Resolution**: Up to 1280x720 - **Frame Rate**: 24 FPS - **Duration**: Up to 5 seconds ## Installation ```bash pip install diffusers transformers accelerate torch pip install av opencv-python pillow ``` ## Usage ### Text-to-Video ```python from diffusers import DiffusionPipeline import torch # Load the model pipe = DiffusionPipeline.from_pretrained( "zenlm/zen-director", torch_dtype=torch.float16 ) pipe = pipe.to("cuda") # Generate video from text prompt = "A serene sunset over a calm ocean with waves gently lapping at the shore" video = pipe(prompt, num_frames=120, height=720, width=1280).frames # Save video from diffusers.utils import export_to_video export_to_video(video, "output.mp4", fps=24) ``` ### Image-to-Video ```python from PIL import Image # Load starting image image = Image.open("input.jpg") # Generate video from image video = pipe( prompt="Animate this image with gentle camera movement", image=image, num_frames=120 ).frames export_to_video(video, "animated.mp4", fps=24) ``` ## Performance - **Inference Speed**: ~2-3 seconds/frame on A100 - **Memory**: Requires 24GB+ VRAM for full resolution - **Quantization**: FP16 recommended for consumer GPUs ## Roadmap - ✅ **v1.0** - Wan 2.2 TI2V-5B base (current) - 🔄 **v2.0** - Upgrade to Wan 2.5 when open-source - 📋 **Future** - Fine-tuning for specific styles and domains ## Limitations - Requires high-end GPU (24GB+ VRAM recommended) - Video duration limited to 5 seconds - Best results with detailed, specific prompts - Some motion artifacts in complex scenes ## Citation ```bibtex @misc{zen-director-2025, title={Zen Director: Video Generation with Wan 2.2}, author={Hanzo AI}, year={2025}, publisher={HuggingFace}, howpublished={\url{https://huggingface.co/zenlm/zen-director}} } @article{wan2024, title={Wan 2.2: High-Quality Video Generation}, author={Wan-AI Team}, journal={arXiv preprint}, year={2024} } ``` ## License Apache 2.0 --- **Note**: Based on Wan 2.2. Will be upgraded to Wan 2.5 when it becomes open-source.