MD3P-Int8 - INT8 Quantized Moondream3 for MLX
An INT8 quantized version of Moondream3, offering a balance between model quality and size for MLX deployment.
Model Details
| Component | Original (BF16) | This Model |
|---|---|---|
| MoE Experts (layers 4-23) | BF16 | int8 |
| Vision Encoder | BF16 | BF16 (preserved) |
| Text Attention | BF16 | int8 |
| Text MLP (layers 0-3) | BF16 | int8 |
| Embeddings | BF16 | BF16 (preserved) |
| Total Size | ~12 GB | ~10 GB |
Quantization Details
- Method: Affine quantization (bits=8, group_size=64)
- Target: Text model layers (attention, MLP, MoE experts)
- Preserved: Vision encoder and embeddings at BF16 for quality
Comparison with INT4 Variants
| Model | Size | Quality | Use Case |
|---|---|---|---|
| md3p-int8 (this) | 10 GB | Higher | Desktop/Server MLX |
| md3p-int4 | 6.48 GB | Medium | Memory-constrained |
| md3p-int4-smol | 5.43 GB | Lower | iOS (~6GB limit) |
Usage
This model is designed for use with MLX-based Moondream implementations.
# Example with mlx-lm or similar
from mlx_lm import load, generate
model, tokenizer = load("lewi/md3p-int8")
Source & License
- Original Model: moondream/moondream3-preview
- License: Apache 2.0 (same as original)
Acknowledgments
Thanks to the Moondream team for the original model and Apache 2.0 license.
- Downloads last month
- 16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for lewi/md3p-int8
Base model
moondream/moondream3-preview