Update README.md

432e0d3 verified about 14 hours ago

5.43 kB

	---
	library_name: diffusers
	pipeline_tag: image-to-video
	base_model: Lightricks/LTX-2.3
	tags:
	- image-to-video
	- text-to-video
	- video-to-video
	- image-text-to-video
	- audio-to-video
	- text-to-audio
	- video-to-audio
	- audio-to-audio
	- text-to-audio-video
	- image-to-audio-video
	- image-text-to-audio-video
	- ltx-2
	- ltx-2-3
	- ltx-video
	- ltxv
	- lightricks
	license: other
	license_name: ltx-video-2-open-source-license
	license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE
	---

	# LTX-2.3 Distilled (Diffusers)

	Diffusers-format weights for the distilled LTX-2.3 model from [Lightricks/LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3). Runs in 8 steps with CFG = 1, trading some flexibility for substantially faster inference.

	The non-distilled base model is at [`diffusers/LTX-2.3-Diffusers`](https://huggingface.co/diffusers/LTX-2.3-Diffusers).

	## Usage

	Requires a recent build of `diffusers` with LTX-2 support:

	```bash
	pip install -U git+https://github.com/huggingface/diffusers
	```

	The distilled checkpoint uses a fixed sigma schedule. Always pass `sigmas=DISTILLED_SIGMA_VALUES`, `num_inference_steps=8`, and `guidance_scale=1.0`.

	### Text-to-video + audio

	```python
	import torch
	from diffusers import LTX2Pipeline
	from diffusers.pipelines.ltx2.export_utils import encode_video
	from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES

	pipe = LTX2Pipeline.from_pretrained(
	"diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
	)
	pipe.enable_model_cpu_offload()

	prompt = "A flowing river in a forest at golden hour, gentle wind in the leaves."
	frame_rate = 24.0

	video, audio = pipe(
	prompt=prompt,
	negative_prompt=DEFAULT_NEGATIVE_PROMPT,
	width=768,
	height=512,
	num_frames=121,
	frame_rate=frame_rate,
	num_inference_steps=8,
	sigmas=DISTILLED_SIGMA_VALUES,
	guidance_scale=1.0,
	output_type="np",
	return_dict=False,
	)

	encode_video(
	video[0],
	fps=frame_rate,
	audio=audio[0].float().cpu(),
	audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
	output_path="ltx2_distilled_t2v.mp4",
	)
	```

	### First-last-frame-to-video (FLF2V)

	```python
	import torch
	from diffusers import LTX2ConditionPipeline
	from diffusers.pipelines.ltx2.pipeline_ltx2_condition import LTX2VideoCondition
	from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES
	from diffusers.utils import load_image

	pipe = LTX2ConditionPipeline.from_pretrained(
	"diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
	)
	pipe.enable_model_cpu_offload()

	first_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
	last_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")

	conditions = [
	LTX2VideoCondition(frames=first_image, index=0, strength=1.0),
	LTX2VideoCondition(frames=last_image, index=-1, strength=1.0),
	]

	prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings."
	frame_rate = 24.0

	video = pipe(
	conditions=conditions,
	prompt=prompt,
	negative_prompt=DEFAULT_NEGATIVE_PROMPT,
	width=768,
	height=512,
	num_frames=121,
	frame_rate=frame_rate,
	num_inference_steps=8,
	sigmas=DISTILLED_SIGMA_VALUES,
	guidance_scale=1.0,
	output_type="np",
	return_dict=False,
	)
	```

	### HDR generation (IC-LoRA)

	```python
	import torch
	from safetensors import safe_open
	from diffusers import LTX2HDRPipeline
	from diffusers.pipelines.ltx2.export_utils import encode_hdr_tensor_to_mp4
	from diffusers.pipelines.ltx2.pipeline_ltx2_hdr_lora import LTX2HDRReferenceCondition
	from diffusers.pipelines.ltx2.utils import DISTILLED_SIGMA_VALUES
	from diffusers.utils import load_video

	pipe = LTX2HDRPipeline.from_pretrained(
	"diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
	)
	pipe.enable_model_cpu_offload()
	pipe.load_lora_weights(
	"Lightricks/LTX-2.3-22b-IC-LoRA-HDR",
	adapter_name="hdr_lora",
	weight_name="ltx-2.3-22b-ic-lora-hdr-0.9.safetensors",
	)
	pipe.set_adapters("hdr_lora", 1.0)

	reference_video = load_video("input.mp4")
	ref_cond = LTX2HDRReferenceCondition(frames=reference_video, strength=1.0)

	with safe_open("ltx-2.3-22b-ic-lora-hdr-scene-emb.safetensors", framework="pt", device="cuda") as f:
	connector_video_embeds = f.get_tensor("video_context")
	connector_audio_embeds = f.get_tensor("audio_context")

	hdr_video = pipe(
	reference_conditions=[ref_cond],
	connector_video_embeds=connector_video_embeds,
	connector_audio_embeds=connector_audio_embeds,
	width=768,
	height=512,
	num_frames=121,
	frame_rate=24.0,
	num_inference_steps=8,
	sigmas=DISTILLED_SIGMA_VALUES,
	guidance_scale=1.0,
	output_type="pt",
	return_dict=False,
	)[0]

	encode_hdr_tensor_to_mp4(hdr_video[0], output_mp4="ltx2_hdr.mp4", frame_rate=24.0)
	```

	## Notes

	- `width` and `height` must be divisible by 32; `num_frames` must equal `8k + 1`.
	- See the [Diffusers LTX-2 docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2) for multimodal guidance, prompt enhancement, and the upscaling/refinement pipeline.

	## License

	These weights are released under the [LTX Video 2 Open Source License](https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE).