codemichaeld
/

T5Base_fp8

@@ -3,36 +3,56 @@ library_name: diffusers
 tags:
 - fp8
 - safetensors
-- lora
-- low-rank
 - diffusion
 - converted-by-gradio
 ---
-# FP8 Model with Low-Rank LoRA
 - **Source**: `https://huggingface.co/LifuWang/DistillT5`
 - **File**: `model.safetensors`
 - **FP8 Format**: `E5M2`
-- **LoRA Rank**: 128
 - **Architecture**: text_encoder
-- **LoRA File**: `model-lora-r128.safetensors`
 - **FP8 File**: `model-fp8-e5m2.safetensors`
 ## Usage (Inference)
 ```python
 from safetensors.torch import load_file
 import torch
 # Load FP8 model
 fp8_state = load_file("model-fp8-e5m2.safetensors")
-lora_state = load_file("model-lora-r128.safetensors")
-# Reconstruct approximate original weights
 reconstructed = {}
 for key in fp8_state:
-    if f"lora_A.{key}" in lora_state and f"lora_B.{key}" in lora_state:
-        A = lora_state[f"lora_A.{key}"].to(torch.float32)
-        B = lora_state[f"lora_B.{key}"].to(torch.float32)
-        lora_weight = B @ A  # (out_features, rank) @ (rank, in_features) -> (out_features, in_features)
-        fp8_weight = fp8_state[key].to(torch.float32)
-        reconstructed[key] = fp8_weight + lora_weight
     else:
-        reconstructed[key] = fp8_state[key].to(torch.float32)
 ```
-> Requires PyTorch ≥ 2.1 for FP8 support.

 tags:
 - fp8
 - safetensors
+- precision-recovery
 - diffusion
 - converted-by-gradio
 ---
+# FP8 Model with Precision Recovery
 - **Source**: `https://huggingface.co/LifuWang/DistillT5`
 - **File**: `model.safetensors`
 - **FP8 Format**: `E5M2`
 - **Architecture**: text_encoder
+- **Precision Recovery Type**: LoRA
+- **Precision Recovery File**: `model-lora-r64-text_encoder.safetensors` if available
 - **FP8 File**: `model-fp8-e5m2.safetensors`
 ## Usage (Inference)
 ```python
 from safetensors.torch import load_file
 import torch
 # Load FP8 model
 fp8_state = load_file("model-fp8-e5m2.safetensors")
+# Load precision recovery file if available
+recovery_state = {}
+if "model-lora-r64-text_encoder.safetensors":
+    recovery_state = load_file("model-lora-r64-text_encoder.safetensors")
+# Reconstruct high-precision weights
 reconstructed = {}
 for key in fp8_state:
+    # Dequantize FP8 to target precision
+    fp_weight = fp8_state[key].to(torch.float32)
+    if recovery_state:
+        # For LoRA approach
+        if f"lora_A.{key}" in recovery_state and f"lora_B.{key}" in recovery_state:
+            A = recovery_state[f"lora_A.{key}"].to(torch.float32)
+            B = recovery_state[f"lora_B.{key}"].to(torch.float32)
+            error_correction = B @ A
+            reconstructed[key] = fp_weight + error_correction
+        # For correction factor approach
+        elif f"correction.{key}" in recovery_state:
+            correction = recovery_state[f"correction.{key}"].to(torch.float32)
+            reconstructed[key] = fp_weight + correction
+        else:
+            reconstructed[key] = fp_weight
     else:
+        reconstructed[key] = fp_weight
+print("Model reconstructed with FP8 error recovery")
 ```
+> **Note**: This precision recovery targets FP8 quantization errors.
+> Average quantization error: 0.052733