zen-director / README.md

Initial commit: Zen Director based on Wan 2.2 TI2V-5B

af14b0e verified about 2 months ago

3.08 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	pipeline_tag: text-to-video
	tags:
	- zen
	- hanzo-ai
	- video-generation
	- text-to-video
	- image-to-video
	- wan2.2
	- diffusion
	base_model: Wan-AI/Wan2.2-TI2V-5B
	---

	# Zen Director

	Video generation model based on Wan 2.2, specialized for text-to-video and image-to-video generation.

	## Base Model

	Built on [Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) - Text-to-Image-to-Video model with 5B parameters.

	Note: This is based on Wan 2.2. Wan 2.5 is announced but not yet open-source. We will upgrade to Wan 2.5 when it becomes available.

	## Capabilities

	- Text-to-Video: Generate videos from text descriptions
	- Image-to-Video: Animate static images into videos
	- High Resolution: Supports high-quality video generation
	- Efficient: Optimized MoE architecture for fast inference

	## Model Details

	- Architecture: Mixture-of-Experts (MoE) Transformer
	- Parameters: 5B total
	- Base: Wan 2.2 TI2V
	- Resolution: Up to 1280x720
	- Frame Rate: 24 FPS
	- Duration: Up to 5 seconds

	## Installation

	```bash
	pip install diffusers transformers accelerate torch
	pip install av opencv-python pillow
	```

	## Usage

	### Text-to-Video

	```python
	from diffusers import DiffusionPipeline
	import torch

	# Load the model
	pipe = DiffusionPipeline.from_pretrained(
	"zenlm/zen-director",
	torch_dtype=torch.float16
	)
	pipe = pipe.to("cuda")

	# Generate video from text
	prompt = "A serene sunset over a calm ocean with waves gently lapping at the shore"
	video = pipe(prompt, num_frames=120, height=720, width=1280).frames

	# Save video
	from diffusers.utils import export_to_video
	export_to_video(video, "output.mp4", fps=24)
	```

	### Image-to-Video

	```python
	from PIL import Image

	# Load starting image
	image = Image.open("input.jpg")

	# Generate video from image
	video = pipe(
	prompt="Animate this image with gentle camera movement",
	image=image,
	num_frames=120
	).frames

	export_to_video(video, "animated.mp4", fps=24)
	```

	## Performance

	- Inference Speed: ~2-3 seconds/frame on A100
	- Memory: Requires 24GB+ VRAM for full resolution
	- Quantization: FP16 recommended for consumer GPUs

	## Roadmap

	- ✅ v1.0 - Wan 2.2 TI2V-5B base (current)
	- 🔄 v2.0 - Upgrade to Wan 2.5 when open-source
	- 📋 Future - Fine-tuning for specific styles and domains

	## Limitations

	- Requires high-end GPU (24GB+ VRAM recommended)
	- Video duration limited to 5 seconds
	- Best results with detailed, specific prompts
	- Some motion artifacts in complex scenes

	## Citation

	```bibtex
	@misc{zen-director-2025,
	title={Zen Director: Video Generation with Wan 2.2},
	author={Hanzo AI},
	year={2025},
	publisher={HuggingFace},
	howpublished={\url{https://huggingface.co/zenlm/zen-director}}
	}

	@article{wan2024,
	title={Wan 2.2: High-Quality Video Generation},
	author={Wan-AI Team},
	journal={arXiv preprint},
	year={2024}
	}
	```

	## License

	Apache 2.0

	---

	Note: Based on Wan 2.2. Will be upgraded to Wan 2.5 when it becomes open-source.