How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("fill-in-base-model", dtype=torch.bfloat16, device_map="cuda")
pipe.load_lora_weights("codemichaeld/sdxl_clip_g_fp8")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

FP8 Model with Low-Rank LoRA

  • Source: https://huggingface.co/hum-ma/SDXL-models-GGUF/clip
  • File: clip_g.safetensors
  • FP8 Format: E5M2
  • LoRA Rank: 64
  • LoRA File: clip_g-lora-r64.safetensors

Usage (Inference)

from safetensors.torch import load_file
import torch

# Load FP8 model
fp8_state = load_file("clip_g-fp8-e5m2.safetensors")
lora_state = load_file("clip_g-lora-r64.safetensors")

# Reconstruct approximate original weights
reconstructed = {}
for key in fp8_state:
    if f"lora_A.{key}" in lora_state and f"lora_B.{key}" in lora_state:
        A = lora_state[f"lora_A.{key}"].to(torch.float32)
        B = lora_state[f"lora_B.{key}"].to(torch.float32)
        lora_weight = B @ A  # (rank, out) @ (in, rank) -> (out, in)
        fp8_weight = fp8_state[key].to(torch.float32)
        reconstructed[key] = fp8_weight + lora_weight
    else:
        reconstructed[key] = fp8_state[key].to(torch.float32)

Requires PyTorch ≥ 2.1 for FP8 support.

Downloads last month
102
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support