T5Base_fp8 / README.md

codemichaeld

Upload README.md with huggingface_hub

34f3335 verified 5 months ago

1.21 kB

library_name: diffusers
tags:
  - fp8
  - safetensors
  - lora
  - low-rank
  - diffusion
  - converted-by-gradio

FP8 Model with Low-Rank LoRA

Source: https://huggingface.co/LifuWang/DistillT5
File: model.safetensors
FP8 Format: E5M2
LoRA Rank: 128
Architecture: text_encoder
LoRA File: model-lora-r128.safetensors
FP8 File: model-fp8-e5m2.safetensors

Usage (Inference)

from safetensors.torch import load_file
import torch
# Load FP8 model
fp8_state = load_file("model-fp8-e5m2.safetensors")
lora_state = load_file("model-lora-r128.safetensors")
# Reconstruct approximate original weights
reconstructed = {}
for key in fp8_state:
    if f"lora_A.{key}" in lora_state and f"lora_B.{key}" in lora_state:
        A = lora_state[f"lora_A.{key}"].to(torch.float32)
        B = lora_state[f"lora_B.{key}"].to(torch.float32)
        lora_weight = B @ A  # (out_features, rank) @ (rank, in_features) -> (out_features, in_features)
        fp8_weight = fp8_state[key].to(torch.float32)
        reconstructed[key] = fp8_weight + lora_weight
    else:
        reconstructed[key] = fp8_state[key].to(torch.float32)

Requires PyTorch ≥ 2.1 for FP8 support.