Dynamic 8 bit quantization of Lightricks/LTX-2 using SDNQ.

This model uses per layer fine grained quantization.
What dtype to use for a layer is selected dynamically by trial and error until the std normalized mse loss is lower than the selected threshold.

Minimum allowed dtype is set to int8 and std normalized mse loss threshold is set to 2e-4.
This created a mixed precision model with int8 and float8_e3m4fn dtypes.
SVD quantization and Group Sizes are disabled.

Usage:

pip install sdnq
import torch
import diffusers
from diffusers.pipelines.ltx2.export_utils import encode_video
from sdnq import SDNQConfig # import sdnq to register it into diffusers and transformers
from sdnq.common import use_torch_compile as triton_is_available
from sdnq.loader import apply_sdnq_options_to_model

pipe = diffusers.LTX2Pipeline.from_pretrained("Disty0/LTX-2-SDNQ-8bit-dynamic", torch_dtype=torch.bfloat16)

# Enable INT8 and FP8 MatMul for AMD, Intel ARC and Nvidia GPUs:
if triton_is_available and (torch.cuda.is_available() or torch.xpu.is_available()):
    pipe.transformer = apply_sdnq_options_to_model(pipe.transformer, use_quantized_matmul=True)
    pipe.text_encoder = apply_sdnq_options_to_model(pipe.text_encoder, use_quantized_matmul=True)
    # pipe.transformer = torch.compile(pipe.transformer) # optional for faster speeds

pipe.vae.enable_tiling()
pipe.enable_model_cpu_offload()

prompt = "A close-up of a cheerful girl puppet with curly auburn yarn hair and wide button eyes, holding a small red umbrella above her head. Rain falls gently around her. She looks upward and begins to sing with joy in English: \"It's raining, it's raining, I love it when its raining.\" Her fabric mouth opening and closing to a melodic tune. Her hands grip the umbrella handle as she sways slightly from side to side in rhythm. The camera holds steady as the rain sparkles against the soft lighting. Her eyes blink occasionally as she sings."
negative_prompt = "blurry, low quality, still frame, frames, watermark, overlay, titles, has blurbox, has subtitles"

frame_rate = 25.0
video, audio = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=768,
    height=512,
    num_frames=121,
    frame_rate=frame_rate,
    num_inference_steps=40,
    guidance_scale=4.0,
    generator=torch.manual_seed(10),
    output_type="np",
    return_dict=False,
)
video = (video * 255).round().astype("uint8")
video = torch.from_numpy(video)


encode_video(
    video[0],
    fps=frame_rate,
    audio=audio[0].float().cpu(),
    audio_sample_rate=pipe.vocoder.config.output_sampling_rate,  # should be 24000
    output_path="ltx2_t2v_sdnq-8bit-dynamic.mp4",
)
Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Disty0/LTX-2-SDNQ-8bit-dynamic

Base model

Lightricks/LTX-2
Quantized
(4)
this model

Collection including Disty0/LTX-2-SDNQ-8bit-dynamic