metadata
			license: apache-2.0
language:
  - en
  - zh
pipeline_tag: text-to-video
tags:
  - zen
  - hanzo-ai
  - video-generation
  - text-to-video
  - image-to-video
  - wan2.2
  - diffusion
base_model: Wan-AI/Wan2.2-TI2V-5B
Zen Director
Video generation model based on Wan 2.2, specialized for text-to-video and image-to-video generation.
Base Model
Built on Wan-AI/Wan2.2-TI2V-5B - Text-to-Image-to-Video model with 5B parameters.
Note: This is based on Wan 2.2. Wan 2.5 is announced but not yet open-source. We will upgrade to Wan 2.5 when it becomes available.
Capabilities
- Text-to-Video: Generate videos from text descriptions
 - Image-to-Video: Animate static images into videos
 - High Resolution: Supports high-quality video generation
 - Efficient: Optimized MoE architecture for fast inference
 
Model Details
- Architecture: Mixture-of-Experts (MoE) Transformer
 - Parameters: 5B total
 - Base: Wan 2.2 TI2V
 - Resolution: Up to 1280x720
 - Frame Rate: 24 FPS
 - Duration: Up to 5 seconds
 
Installation
pip install diffusers transformers accelerate torch
pip install av opencv-python pillow
Usage
Text-to-Video
from diffusers import DiffusionPipeline
import torch
# Load the model
pipe = DiffusionPipeline.from_pretrained(
    "zenlm/zen-director",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Generate video from text
prompt = "A serene sunset over a calm ocean with waves gently lapping at the shore"
video = pipe(prompt, num_frames=120, height=720, width=1280).frames
# Save video
from diffusers.utils import export_to_video
export_to_video(video, "output.mp4", fps=24)
Image-to-Video
from PIL import Image
# Load starting image
image = Image.open("input.jpg")
# Generate video from image
video = pipe(
    prompt="Animate this image with gentle camera movement",
    image=image,
    num_frames=120
).frames
export_to_video(video, "animated.mp4", fps=24)
Performance
- Inference Speed: ~2-3 seconds/frame on A100
 - Memory: Requires 24GB+ VRAM for full resolution
 - Quantization: FP16 recommended for consumer GPUs
 
Roadmap
- โ v1.0 - Wan 2.2 TI2V-5B base (current)
 - ๐ v2.0 - Upgrade to Wan 2.5 when open-source
 - ๐ Future - Fine-tuning for specific styles and domains
 
Limitations
- Requires high-end GPU (24GB+ VRAM recommended)
 - Video duration limited to 5 seconds
 - Best results with detailed, specific prompts
 - Some motion artifacts in complex scenes
 
Citation
@misc{zen-director-2025,
  title={Zen Director: Video Generation with Wan 2.2},
  author={Hanzo AI},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/zenlm/zen-director}}
}
@article{wan2024,
  title={Wan 2.2: High-Quality Video Generation},
  author={Wan-AI Team},
  journal={arXiv preprint},
  year={2024}
}
License
Apache 2.0
Note: Based on Wan 2.2. Will be upgraded to Wan 2.5 when it becomes open-source.