FP8 Model with Precision Recovery
- Source:
https://huggingface.co/LifuWang/DistillT5 - File:
model.safetensors - FP8 Format:
E5M2 - Architecture: all
- Precision Recovery Type: LoRA
- Precision Recovery File:
model-lora-r64-all.safetensorsif available - FP8 File:
model-fp8-e5m2.safetensors
Usage (Inference)
from safetensors.torch import load_file
import torch
# Load FP8 model
fp8_state = load_file("model-fp8-e5m2.safetensors")
# Load precision recovery file if available
recovery_state = {}
if "model-lora-r64-all.safetensors":
recovery_state = load_file("model-lora-r64-all.safetensors")
# Reconstruct high-precision weights
reconstructed = {}
for key in fp8_state:
# Dequantize FP8 to target precision
fp_weight = fp8_state[key].to(torch.float32)
if recovery_state:
# For LoRA approach
if f"lora_A.{key}" in recovery_state and f"lora_B.{key}" in recovery_state:
A = recovery_state[f"lora_A.{key}"].to(torch.float32)
B = recovery_state[f"lora_B.{key}"].to(torch.float32)
error_correction = B @ A
reconstructed[key] = fp_weight + error_correction
# For correction factor approach
elif f"correction.{key}" in recovery_state:
correction = recovery_state[f"correction.{key}"].to(torch.float32)
reconstructed[key] = fp_weight + correction
else:
reconstructed[key] = fp_weight
else:
reconstructed[key] = fp_weight
print("Model reconstructed with FP8 error recovery")
Note: This precision recovery targets FP8 quantization errors. Average quantization error: 0.052733
- Downloads last month
- 97
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support