Just some testing.
The bf16 layers are compressed using the Dfloat11 lossless compression mixed with nvfp4 layers.
This isnt the perfect balance between nvfp4 layers and Dfloat11 compressed layers and changes a good amount from model to model but it is a start.
flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors
Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression. and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression.
The balance needs to be found between layers we want to have the nvfp4 speed vs Dfloat11 lossless compression slower than bf16 but faster than offloading model into ram. This matters more for larger models with many bf16 layers. Wan, qwen, ltx are high on the list to do.
- Downloads last month
- 8
Model tree for ApacheOne/FluxKlein4b-nvfp4_dfloat11_mixed
Base model
black-forest-labs/FLUX.2-klein-4B