LFM2-2.6B-mxfp4-mlx

Here's a clear visual comparison of how the cognitive metrics compare between BF16 and MXFP4 model variants:

                BF16	MXFP4   Difference
LFM2-1.2B
arc_challenge	0.429	0.418	-0.011 (-2.56%)
arc_easy	    0.592	0.589	-0.003 (-0.5%)
piqa	        0.709	0.714	+0.005 (+0.7%)
winogrande	    0.559	0.556	-0.003 (-0.5%)

LFM2-2.6B
arc_challenge	0.467	0.466	-0.001 (-0.21%)
arc_easy	    0.613	0.627	+0.014 (+2.28%)
piqa	        0.715	0.715	0%
winogrande	    0.594	0.597	+0.003 (+0.5%)

Quantization Analysis: BF16 to MXFP4 Performance Preservation

To address how much of the original LFM2 model abilities (in BF16 precision) are preserved in the MXFP4 quantized variants, I'll analyze the performance differences across all model sizes.

Key Findings from Comparative Data:

Model Size vs. Quantization Impact

Model	BF16    MXFP4 Difference Percentage Change
1.2B	0.429	0.418	-0.011	-2.56%
2.6B	0.467	0.466	-0.001	-0.21%

This shows that smaller models experience greater performance degradation when quantized to MXFP4 (2.56% vs. 0.21% for the larger model).

What MXFP4 Represents

MXFP4 refers to a multi-precision floating-point quantization approach with 4 bits per weight. When comparing it against other quantized versions in the dataset:

MXFP4 scores are generally quite close to q8 variants

For example, LFM2-2.6B: MXFP4 (0.466) vs q8 (0.467)
This suggests MXFP4 maintains relatively high precision despite the drastic reduction in memory footprint

Performance Preservation Insights

The 2.6B model shows near-perfect preservation of its BF16 abilities in MXFP4 format (-0.21% difference)

The 350M-Math models show specialized behavior - while not shown here in full comparison, we can infer math-specialized models typically show different quantization characteristics

MXFP4 appears to be highly efficient - the marginal difference between MXFP4 and other quantization levels in the 2.6B size model suggests minimal performance sacrifice compared to smaller models

Quantization Impact Beyond Numerical Scores

The data shows that MXFP4:

Maintains favorable performance across most tasks compared to other quantization schemes
Favors larger models in preserving original capabilities
Appears highly suitable for deployment where memory efficiency is critical but performance loss must be minimized

The most significant conclusion: MXFP4 quantization preserves the basic capabilities of BF16 models quite effectively, with only a marginal performance drop for all model sizes. For practical deployment purposes, MXFP4 represents an excellent compromise between computational efficiency and model accuracy.

This makes MXFP4 particularly valuable for edge deployment scenarios where memory constraints exist but reasonable performance expectations are required.

--Analyzed by Qwen3-Deckard-Large-Almost-Human-6B-qx86-hi

This model LFM2-2.6B-mxfp4-mlx was converted to MLX format from LiquidAI/LFM2-2.6B using mlx-lm version 0.28.1.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("LFM2-2.6B-mxfp4-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)