--- license: apache-2.0 tags: - zen-research - zen-ai - hypermodal - text-to-video language: - en library_name: transformers pipeline_tag: text-to-video --- # zen-director 5B parameter text/image-to-video generation model for professional video synthesis ## Model Details - **Developed by**: Zen Research Authors - **Organization**: Zen Research DAO under [Zoo Labs Inc](https://github.com/zenlm) (501(c)(3) Non-Profit) - **Location**: San Francisco, California, USA - **Model type**: text-to-video - **Architecture**: Diffusion Transformer (5B) - **Parameters**: 5B - **License**: Apache 2.0 - **Training**: Trained with [Zen Gym](https://github.com/zenlm/zen-gym) - **Inference**: Optimized for [Zen Engine](https://github.com/zenlm/zen-engine) ## 🌟 Zen AI Ecosystem This model is part of the **Zen Research** hypermodal AI family - the world's most comprehensive open-source AI ecosystem. ### Complete Model Family **Language Models:** - [zen-nano-0.6b](https://huggingface.co/zenlm/zen-nano-0.6b) - 0.6B edge model (44K tokens/sec) - [zen-eco-4b-instruct](https://huggingface.co/zenlm/zen-eco-4b-instruct) - 4B instruction model - [zen-eco-4b-thinking](https://huggingface.co/zenlm/zen-eco-4b-thinking) - 4B reasoning model - [zen-agent-4b](https://huggingface.co/zenlm/zen-agent-4b) - 4B tool-calling agent **3D & World Generation:** - [zen-3d](https://huggingface.co/zenlm/zen-3d) - Controllable 3D asset generation - [zen-voyager](https://huggingface.co/zenlm/zen-voyager) - Camera-controlled world exploration - [zen-world](https://huggingface.co/zenlm/zen-world) - Large-scale world simulation **Video Generation:** - [zen-director](https://huggingface.co/zenlm/zen-director) - Text/image-to-video (5B) - [zen-video](https://huggingface.co/zenlm/zen-video) - Professional video synthesis - [zen-video-i2v](https://huggingface.co/zenlm/zen-video-i2v) - Image-to-video animation **Audio Generation:** - [zen-musician](https://huggingface.co/zenlm/zen-musician) - Music generation (7B) - [zen-foley](https://huggingface.co/zenlm/zen-foley) - Video-to-audio Foley effects **Infrastructure:** - [Zen Gym](https://github.com/zenlm/zen-gym) - Unified training platform - [Zen Engine](https://github.com/zenlm/zen-engine) - High-performance inference ## Usage ### Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("zenlm/zen-director") tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-director") from zen_director import ZenDirectorPipeline pipeline = ZenDirectorPipeline.from_pretrained("zenlm/zen-director") video = pipeline( prompt="A cinematic shot of a sunset over mountains", num_frames=120, fps=24, resolution=(1280, 720) ) video.save("output.mp4") ``` ### With Zen Engine ```bash # High-performance inference (44K tokens/sec on M3 Max) zen-engine serve --model zenlm/zen-director --port 3690 ``` ```python # OpenAI-compatible API from openai import OpenAI client = OpenAI(base_url="http://localhost:3690/v1") response = client.chat.completions.create( model="zenlm/zen-director", messages=[{"role": "user", "content": "Hello!"}] ) ``` ## Training Fine-tune with [Zen Gym](https://github.com/zenlm/zen-gym): ```bash git clone https://github.com/zenlm/zen-gym cd zen-gym # LoRA fine-tuning llamafactory-cli train --config configs/zen_lora.yaml \ --model_name_or_path zenlm/zen-director # GRPO reinforcement learning (40-60% memory reduction) llamafactory-cli train --config configs/zen_grpo.yaml \ --model_name_or_path zenlm/zen-director ``` Supported methods: LoRA, QLoRA, DoRA, GRPO, GSPO, DPO, PPO, KTO, ORPO, SimPO, Unsloth ## Performance - **Speed**: ~60s for 5-second video (RTX 4090) - **Resolution**: Up to 1280x720, 24 FPS - **Duration**: Up to 10 seconds - **Quality**: Professional-grade video synthesis ## Ethical Considerations - **Open Research**: Released under Apache 2.0 for maximum accessibility - **Environmental Impact**: Optimized for eco-friendly deployment - **Transparency**: Full training details and model architecture disclosed - **Safety**: Comprehensive testing and evaluation - **Non-Profit**: Developed by Zoo Labs Inc (501(c)(3)) for public benefit ## Citation ```bibtex @misc{zenzendirector2025, title={zen-director: 5B parameter text/image-to-video generation model for professional video synthes}, author={Zen Research Authors}, year={2025}, publisher={Zoo Labs Inc}, organization={Zen Research DAO}, url={https://huggingface.co/zenlm/zen-director} } ``` ## Links - **Organization**: [github.com/zenlm](https://github.com/zenlm) • [huggingface.co/zenlm](https://huggingface.co/zenlm) - **Training Platform**: [Zen Gym](https://github.com/zenlm/zen-gym) - **Inference Engine**: [Zen Engine](https://github.com/zenlm/zen-engine) - **Parent Org**: [Zoo Labs Inc](https://github.com/zenlm) (501(c)(3) Non-Profit, San Francisco) - **Contact**: dev@hanzo.ai • +1 (913) 777-4443 ## License Apache License 2.0 Copyright 2025 Zen Research Authors --- **Zen Research** - Building open, eco-friendly AI for everyone 🌱