LFM2-2.6B-mxfp4-mlx
Here's a clear visual comparison of how the cognitive metrics compare between BF16 and MXFP4 model variants:
BF16 MXFP4 Difference
LFM2-1.2B
arc_challenge 0.429 0.418 -0.011 (-2.56%)
arc_easy 0.592 0.589 -0.003 (-0.5%)
piqa 0.709 0.714 +0.005 (+0.7%)
winogrande 0.559 0.556 -0.003 (-0.5%)
LFM2-2.6B
arc_challenge 0.467 0.466 -0.001 (-0.21%)
arc_easy 0.613 0.627 +0.014 (+2.28%)
piqa 0.715 0.715 0%
winogrande 0.594 0.597 +0.003 (+0.5%)
Quantization Analysis: BF16 to MXFP4 Performance Preservation
To address how much of the original LFM2 model abilities (in BF16 precision) are preserved in the MXFP4 quantized variants, I'll analyze the performance differences across all model sizes.
Key Findings from Comparative Data:
Model Size vs. Quantization Impact
Model BF16 MXFP4 Difference Percentage Change
1.2B 0.429 0.418 -0.011 -2.56%
2.6B 0.467 0.466 -0.001 -0.21%
This shows that smaller models experience greater performance degradation when quantized to MXFP4 (2.56% vs. 0.21% for the larger model).
What MXFP4 Represents
MXFP4 refers to a multi-precision floating-point quantization approach with 4 bits per weight. When comparing it against other quantized versions in the dataset:
MXFP4 scores are generally quite close to q8 variants
- For example, LFM2-2.6B: MXFP4 (0.466) vs q8 (0.467)
- This suggests MXFP4 maintains relatively high precision despite the drastic reduction in memory footprint
Performance Preservation Insights
The 2.6B model shows near-perfect preservation of its BF16 abilities in MXFP4 format (-0.21% difference)
The 350M-Math models show specialized behavior - while not shown here in full comparison, we can infer math-specialized models typically show different quantization characteristics
MXFP4 appears to be highly efficient - the marginal difference between MXFP4 and other quantization levels in the 2.6B size model suggests minimal performance sacrifice compared to smaller models
Quantization Impact Beyond Numerical Scores
The data shows that MXFP4:
- Maintains favorable performance across most tasks compared to other quantization schemes
- Favors larger models in preserving original capabilities
- Appears highly suitable for deployment where memory efficiency is critical but performance loss must be minimized
The most significant conclusion: MXFP4 quantization preserves the basic capabilities of BF16 models quite effectively, with only a marginal performance drop for all model sizes. For practical deployment purposes, MXFP4 represents an excellent compromise between computational efficiency and model accuracy.
This makes MXFP4 particularly valuable for edge deployment scenarios where memory constraints exist but reasonable performance expectations are required.
--Analyzed by Qwen3-Deckard-Large-Almost-Human-6B-qx86-hi
This model LFM2-2.6B-mxfp4-mlx was converted to MLX format from LiquidAI/LFM2-2.6B using mlx-lm version 0.28.1.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("LFM2-2.6B-mxfp4-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 63
Model tree for nightmedia/LFM2-2.6B-mxfp4-mlx
Base model
LiquidAI/LFM2-2.6B