T5Base_fp8 / README.md
codemichaeld's picture
Upload README.md with huggingface_hub
34f3335 verified
|
raw
history blame
1.21 kB
metadata
library_name: diffusers
tags:
  - fp8
  - safetensors
  - lora
  - low-rank
  - diffusion
  - converted-by-gradio

FP8 Model with Low-Rank LoRA

  • Source: https://huggingface.co/LifuWang/DistillT5
  • File: model.safetensors
  • FP8 Format: E5M2
  • LoRA Rank: 128
  • Architecture: text_encoder
  • LoRA File: model-lora-r128.safetensors
  • FP8 File: model-fp8-e5m2.safetensors

Usage (Inference)

from safetensors.torch import load_file
import torch
# Load FP8 model
fp8_state = load_file("model-fp8-e5m2.safetensors")
lora_state = load_file("model-lora-r128.safetensors")
# Reconstruct approximate original weights
reconstructed = {}
for key in fp8_state:
    if f"lora_A.{key}" in lora_state and f"lora_B.{key}" in lora_state:
        A = lora_state[f"lora_A.{key}"].to(torch.float32)
        B = lora_state[f"lora_B.{key}"].to(torch.float32)
        lora_weight = B @ A  # (out_features, rank) @ (rank, in_features) -> (out_features, in_features)
        fp8_weight = fp8_state[key].to(torch.float32)
        reconstructed[key] = fp8_weight + lora_weight
    else:
        reconstructed[key] = fp8_state[key].to(torch.float32)

Requires PyTorch ≥ 2.1 for FP8 support.