---
license: apache-2.0
tags:
- zen-research
- zen-ai
- hypermodal
- text-to-video
language:
- en
library_name: transformers
pipeline_tag: text-to-video
---

# zen-director

5B parameter text/image-to-video generation model for professional video synthesis

## Model Details

- **Developed by**: Zen Research Authors
- **Organization**: Zen Research DAO under [Zoo Labs Inc](https://github.com/zenlm) (501(c)(3) Non-Profit)
- **Location**: San Francisco, California, USA
- **Model type**: text-to-video
- **Architecture**: Diffusion Transformer (5B)
- **Parameters**: 5B
- **License**: Apache 2.0
- **Training**: Trained with [Zen Gym](https://github.com/zenlm/zen-gym)
- **Inference**: Optimized for [Zen Engine](https://github.com/zenlm/zen-engine)

## 🌟 Zen AI Ecosystem

This model is part of the **Zen Research** hypermodal AI family - the world's most comprehensive open-source AI ecosystem.

### Complete Model Family

**Language Models:**
- [zen-nano-0.6b](https://huggingface.co/zenlm/zen-nano-0.6b) - 0.6B edge model (44K tokens/sec)
- [zen-eco-4b-instruct](https://huggingface.co/zenlm/zen-eco-4b-instruct) - 4B instruction model
- [zen-eco-4b-thinking](https://huggingface.co/zenlm/zen-eco-4b-thinking) - 4B reasoning model
- [zen-agent-4b](https://huggingface.co/zenlm/zen-agent-4b) - 4B tool-calling agent

**3D & World Generation:**
- [zen-3d](https://huggingface.co/zenlm/zen-3d) - Controllable 3D asset generation
- [zen-voyager](https://huggingface.co/zenlm/zen-voyager) - Camera-controlled world exploration
- [zen-world](https://huggingface.co/zenlm/zen-world) - Large-scale world simulation

**Video Generation:**
- [zen-director](https://huggingface.co/zenlm/zen-director) - Text/image-to-video (5B)
- [zen-video](https://huggingface.co/zenlm/zen-video) - Professional video synthesis
- [zen-video-i2v](https://huggingface.co/zenlm/zen-video-i2v) - Image-to-video animation

**Audio Generation:**
- [zen-musician](https://huggingface.co/zenlm/zen-musician) - Music generation (7B)
- [zen-foley](https://huggingface.co/zenlm/zen-foley) - Video-to-audio Foley effects

**Infrastructure:**
- [Zen Gym](https://github.com/zenlm/zen-gym) - Unified training platform
- [Zen Engine](https://github.com/zenlm/zen-engine) - High-performance inference

## Usage

### Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("zenlm/zen-director")
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-director")

from zen_director import ZenDirectorPipeline

pipeline = ZenDirectorPipeline.from_pretrained("zenlm/zen-director")
video = pipeline(
    prompt="A cinematic shot of a sunset over mountains",
    num_frames=120,
    fps=24,
    resolution=(1280, 720)
)
video.save("output.mp4")
```

### With Zen Engine

```bash
# High-performance inference (44K tokens/sec on M3 Max)
zen-engine serve --model zenlm/zen-director --port 3690
```

```python
# OpenAI-compatible API
from openai import OpenAI

client = OpenAI(base_url="http://localhost:3690/v1")
response = client.chat.completions.create(
    model="zenlm/zen-director",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

## Training

Fine-tune with [Zen Gym](https://github.com/zenlm/zen-gym):

```bash
git clone https://github.com/zenlm/zen-gym
cd zen-gym

# LoRA fine-tuning
llamafactory-cli train --config configs/zen_lora.yaml \
    --model_name_or_path zenlm/zen-director

# GRPO reinforcement learning (40-60% memory reduction)
llamafactory-cli train --config configs/zen_grpo.yaml \
    --model_name_or_path zenlm/zen-director
```

Supported methods: LoRA, QLoRA, DoRA, GRPO, GSPO, DPO, PPO, KTO, ORPO, SimPO, Unsloth

## Performance

- **Speed**: ~60s for 5-second video (RTX 4090)
- **Resolution**: Up to 1280x720, 24 FPS
- **Duration**: Up to 10 seconds
- **Quality**: Professional-grade video synthesis

## Ethical Considerations

- **Open Research**: Released under Apache 2.0 for maximum accessibility
- **Environmental Impact**: Optimized for eco-friendly deployment
- **Transparency**: Full training details and model architecture disclosed
- **Safety**: Comprehensive testing and evaluation
- **Non-Profit**: Developed by Zoo Labs Inc (501(c)(3)) for public benefit

## Citation

```bibtex
@misc{zenzendirector2025,
  title={zen-director: 5B parameter text/image-to-video generation model for professional video synthes},
  author={Zen Research Authors},
  year={2025},
  publisher={Zoo Labs Inc},
  organization={Zen Research DAO},
  url={https://huggingface.co/zenlm/zen-director}
}
```

## Links

- **Organization**: [github.com/zenlm](https://github.com/zenlm) • [huggingface.co/zenlm](https://huggingface.co/zenlm)
- **Training Platform**: [Zen Gym](https://github.com/zenlm/zen-gym)
- **Inference Engine**: [Zen Engine](https://github.com/zenlm/zen-engine)
- **Parent Org**: [Zoo Labs Inc](https://github.com/zenlm) (501(c)(3) Non-Profit, San Francisco)
- **Contact**: dev@hanzo.ai • +1 (913) 777-4443

## License

Apache License 2.0

Copyright 2025 Zen Research Authors

---

**Zen Research** - Building open, eco-friendly AI for everyone 🌱