Zen-Omni (30B)
Part of the Zen AI Model Family
Model Description
Parameters: 30B
Base Model: Qwen/Qwen2.5-32B-Instruct
Specialization: Multimodal understanding & generation
Training: Multimodal training with vision-language pairs
Context: 32K-128K tokens
Thinking: Up to 256,000 tokens
Files in This Repository
This repository contains ALL formats and quantizations:
π· SafeTensors (Original)
model.safetensors- Full precision weightsconfig.json- Model configurationtokenizer.json- Fast tokenizer
π’ GGUF Quantized
zen-omni-30b-instruct-Q4_K_M.gguf- 4-bit (recommended)zen-omni-30b-instruct-Q5_K_M.gguf- 5-bit (balanced)zen-omni-30b-instruct-Q8_0.gguf- 8-bit (high quality)
π MLX (Apple Silicon)
mlx-4bit/- 4-bit quantized for M-seriesmlx-8bit/- 8-bit quantized for M-series
Performance
| Benchmark | Score | Rank |
|---|---|---|
| MMLU | 68.4% | Top 10% |
| GSM8K | 71.2% | Top 15% |
| HumanEval | 48.3% | Top 20% |
Quick Start
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("zenlm/zen-omni-30b-instruct")
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-omni-30b-instruct")
# With thinking mode
messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(messages, enable_thinking=True)
GGUF with llama.cpp
./main -m zen-omni-30b-instruct-Q4_K_M.gguf -p "Your prompt" -n 512
MLX for Apple Silicon
from mlx_lm import load, generate
model, tokenizer = load("zenlm/zen-omni-30b-instruct")
response = generate(model, tokenizer, "Your prompt", max_tokens=200)
Unique Training Background
Multimodal training with vision-language pairs
This model was specifically optimized for multimodal understanding & generation with careful attention to:
- Inference efficiency
- Memory footprint
- Quality preservation
- Thinking capabilities
Part of the Zen Family β’ Collection β’ GitHub
- Downloads last month
- 4