MiniMax-M2.5 — SWAN Mixed-Precision (4-bit avg)

This is MiniMax-M2.5 quantized using SWAN (Statistical Weight Analysis for N-bit allocation) — a data-free per-tensor mixed-precision quantization method for MLX on Apple Silicon.

Key Features

  • Data-free quantization: No calibration dataset required — uses weight statistics only
  • Per-tensor bit allocation: Each tensor gets 2, 4, 8, or 16-bit based on sensitivity analysis
  • MLX native: Ready for inference on Apple Silicon via mlx_lm

Results

Metric SWAN (this model) Uniform 4-bit SWAN vs Uniform
PPL (WikiText-2, mean) 8.787 8.957 -1.9%
PPL (WikiText-2, median) 9.169 9.399 -2.4%
PPL (WikiText-2, trimmed) 8.748 8.924 -2.0%
Model size 118 GB 120 GB -1.7%
Peak memory 121 GB 123 GB -1.6%

Evaluation config: WikiText-2 test split, sequence length 2048, 256 samples, seed 42.

Usage

pip install mlx-lm

# Generate text
python -m mlx_lm.generate \
    --model baa-ai/MiniMax-M2.5-SWAN-4bit \
    --prompt "Hello, how are you?"

# Interactive chat
python -m mlx_lm.chat --model baa-ai/MiniMax-M2.5-SWAN-4bit

Quantization Details

Setting Value
Source model MiniMaxAI/MiniMax-M2.5 (FP8, 229B params, 10B active)
Method SWAN v4 (adaptive normalization + optimized thresholds)
Normalization Adaptive (percentile-based, optimal for MoE)
Thresholds t2=0.20, t8=0.75, t16=0.95 (grid-search optimized)
Average bits 3.77 bpw

Bit Distribution

Precision Parameters Percentage
2-bit (group_size=32) 90.7B 39.7%
4-bit (group_size=128) 115.6B 50.5%
8-bit (kept at FP8) 17.6B 7.7%
16-bit (protected) 4.8B 2.1%

Hardware Requirements

  • Apple Silicon with at least 128 GB unified memory (192 GB recommended)
  • Peak memory during inference: ~121 GB

About SWAN

SWAN computes four sensitivity metrics per tensor:

  • SVD spectral concentration
  • Excess kurtosis
  • Output noise amplification
  • Reconstruction error proxy (NRMSE)

These are combined into a composite score that drives automatic bit-width allocation — without any calibration data.

Paper: SWAN: Data-Free Mixed-Precision Quantization for LLMs via Multi-Metric Sensitivity Analysis (Black Sheep AI Research, 2026)

Downloads last month
168
Safetensors
Model size
229B params
Tensor type
BF16
·
F32
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for baa-ai/MiniMax-M2.5-SWAN-4bit

Quantized
(59)
this model

Space using baa-ai/MiniMax-M2.5-SWAN-4bit 1