High-quality QAT FP4 models to use with the fp_quant vLLM/Transformers integration on Blackwell NVIDIA GPUs. See https://arxiv.org/abs/2509.23202
-
ISTA-DASLab/Llama-3.2-1B-Instruct-FPQuant-QAT-NVFP4
Text Generation • 0.8B • Updated • 17 -
ISTA-DASLab/Llama-3.2-1B-Instruct-FPQuant-QAT-MXFP4
Text Generation • 0.8B • Updated • 14 -
ISTA-DASLab/Llama-3.2-3B-Instruct-FPQuant-QAT-NVFP4
Text Generation • 2B • Updated • 21 -
ISTA-DASLab/Llama-3.2-3B-Instruct-FPQuant-QAT-MXFP4
Text Generation • 2B • Updated • 15