Just some testing.

The bf16 layers are compressed using the Dfloat11 lossless compression mixed with nvfp4 layers.

This isnt the perfect balance between nvfp4 layers and Dfloat11 compressed layers and changes a good amount from model to model but it is a start.

flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors

Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression. and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression.

The balance needs to be found between layers we want to have the nvfp4 speed vs Dfloat11 lossless compression slower than bf16 but faster than offloading model into ram. This matters more for larger models with many bf16 layers. Wan, qwen, ltx are high on the list to do.

Downloads last month: 8

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ApacheOne/FluxKlein4b-nvfp4_dfloat11_mixed

Base model

black-forest-labs/FLUX.2-klein-4B

Quantized

(16)

this model

Collection including ApacheOne/FluxKlein4b-nvfp4_dfloat11_mixed

nvfp4_dfloat11

Collection

The bf16 layers are compressed using the Dfloat11 lossless compression mixed with nvfp4 layers. • 2 items • Updated 4 days ago