FP8 Model with Precision Recovery

Source: https://huggingface.co/LifuWang/DistillT5
File: model.safetensors
FP8 Format: E5M2
Architecture: all
Precision Recovery Type: LoRA
Precision Recovery File: model-lora-r64-all.safetensors if available
FP8 File: model-fp8-e5m2.safetensors

Usage (Inference)

from safetensors.torch import load_file
import torch

# Load FP8 model
fp8_state = load_file("model-fp8-e5m2.safetensors")

# Load precision recovery file if available
recovery_state = {}
if "model-lora-r64-all.safetensors":
    recovery_state = load_file("model-lora-r64-all.safetensors")

# Reconstruct high-precision weights
reconstructed = {}
for key in fp8_state:
    # Dequantize FP8 to target precision
    fp_weight = fp8_state[key].to(torch.float32)
    
    if recovery_state:
        # For LoRA approach
        if f"lora_A.{key}" in recovery_state and f"lora_B.{key}" in recovery_state:
            A = recovery_state[f"lora_A.{key}"].to(torch.float32)
            B = recovery_state[f"lora_B.{key}"].to(torch.float32)
            error_correction = B @ A
            reconstructed[key] = fp_weight + error_correction
        # For correction factor approach
        elif f"correction.{key}" in recovery_state:
            correction = recovery_state[f"correction.{key}"].to(torch.float32)
            reconstructed[key] = fp_weight + correction
        else:
            reconstructed[key] = fp_weight
    else:
        reconstructed[key] = fp_weight

print("Model reconstructed with FP8 error recovery")

Note: This precision recovery targets FP8 quantization errors. Average quantization error: 0.052733

Downloads last month: 8

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support