Instructions to use ibyteohdear/Lightricks-LTX-2.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ibyteohdear/Lightricks-LTX-2.3 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ibyteohdear/Lightricks-LTX-2.3", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
- LTX-2.3 Model Card
- Model Checkpoints
- Online demo
- Run locally
- Train the model
- LTX 2.3 Music Video Creator V5.1
- Included Workflows
- Full walkthrough video on entire process. Please watch the full video and follow along.
- Sample Videos
- Sample 1 - Text to video using my Lux_Sensual Style LoRa. Light Post-Editing
- Sample 2 - Text to video using my Fantasy Painter Style LoRa. Straight From ComfyUI
- Sample 3 - Text to Video using my CyberPunk Style LoRa. Straight From ComfyUI
- Sample 4 - Text to Video using a character LoRa. Straight From ComfyUI
- Sample 5 - Image to Video using Z-image turbo with a character lora and LTX character lora. Straight From ComfyUI
- Sample 6 - Text to video with a custom LTX lora trained on both character on location. Light Post-Editing
- Sample 7 - Text to video using my Fantasy Realism LoRa. Straight From ComfyUI
- Sample 8 - Text to video using a LTX character loRa. Straight From ComfyUI
- Sample 9 - Text to video with Fantasy Realism Style LoRa. Straight From ComfyUI
- Sample 10 - Image to video using Zimage and LTX character loRa. CapCut FX And Overlays
- Sample 11 - Image to video using Z-image character loRA. CapCut FX And Overlays
- Walkthrough
- Requirements
- Community And Downloads
- Included Workflows
LTX-2.3 Model Card
This model card focuses on the LTX-2.3 model, which is a significant update to the LTX-2 model with improved audio and visual quality as well as enhanced prompt adherence. LTX-2 was presented in the paper LTX-2: Efficient Joint Audio-Visual Foundation Model.
π»π» If you want to dive in right to the code - it is available here. πΎπΎ
LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.
Model Checkpoints
| Name | Notes |
|---|---|
| ltx-2.3-22b-dev | The full model, flexible and trainable in bf16 |
| ltx-2.3-22b-distilled | The distilled version of the full model, 8 steps, CFG=1 |
| ltx-2.3-22b-distilled-1.1 | The distilled v1.1 version of the full model, 8 steps, CFG=1 - A different aesthetic experience and improved audio compared to v1.0 |
| ltx-2.3-22b-distilled-lora-384 | A LoRA version of the distilled model applicable to the full model |
| ltx-2.3-22b-distilled-lora-384-1.1 | A LoRA version of the v1.1 distilled model applicable to the full model |
| ltx-2.3-spatial-upscaler-x2-1.1 | An x2 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolution |
| ltx-2.3-spatial-upscaler-x1.5-1.0 | An x1.5 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolution |
| ltx-2.3-temporal-upscaler-x2-1.0 | An x2 temporal upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher FPS |
Model Details
- Developed by: Lightricks
- Model type: Diffusion-based audio-video foundation model
- Language(s): English
Online demo
LTX-2.3 is accessible right away via the API Playground.
Run locally
Direct use license
You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the license.
ComfyUI
We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager. For manual installation information, please refer to our documentation site.
PyTorch codebase
The LTX-2 codebase is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'. The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.
Installation
git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
# From the repository root
uv sync
source .venv/bin/activate
Inference
To use our model, please follow the instructions in our ltx-pipelines package.
Diffusers π§¨
LTX-2.3 support in the Diffusers Python library is coming soon!
General tips:
- Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.
- In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames.
- For tips on writing effective prompts, please visit our Prompting guide
Limitations
- This model is not intended or able to provide factual information.
- As a statistical model this checkpoint might amplify existing societal biases.
- The model may fail to generate videos that matches the prompts perfectly.
- Prompt following is heavily influenced by the prompting-style.
- The model may generate content that is inappropriate or offensive.
- When generating audio without speech, the audio may be of lower quality.
Train the model
The base (dev) model is fully trainable.
It's extremely easy to reproduce the LoRAs and IC-LoRAs we publish with the model by following the instructions on the LTX-2 Trainer Readme.
Training for motion, style or likeness (sound+appearance) can take less than an hour in many settings.
Citation
@article{hacohen2025ltx2,
title={LTX-2: Efficient Joint Audio-Visual Foundation Model},
author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and Bitterman, Yaki and Kvochko, Andrew and Berkowitz, Avishai and Shalem, Daniel and Lifschitz, Daphna and Moshe, Dudu and Porat, Eitan and Richardson, Eitan and Guy Shiran and Itay Chachy and Jonathan Chetboun and Michael Finkelson and Michael Kupchick and Nir Zabari and Nitzan Guetta and Noa Kotler and Ofir Bibi and Ori Gordon and Poriya Panet and Roi Benita and Shahar Armon and Victor Kulikov and Yaron Inger and Yonatan Shiftan and Zeev Melumian and Zeev Farbman},
journal={arXiv preprint arXiv:2601.03233},
year={2025}
}
--
Sulphur 2
An uncensored video generation model based on LTX 2.3 supporting both t2v and i2v natively, as well as all of the other ltx 2.3 formats.
Join our Discord
Support the next version of the project, even just a few dollars would go a long way: Kofi
Get Started: To get started with the model, I recommend downloading either of the dev versions, (fp8mixed or bf16) and downloading the distill lora provided. By the way, I'm aware the workflows contain sulphur_final right now, just use the lora or use the full models, don't use both at the same time.
This model contains a prompt enhancer. The easiest way to get started with the prompt enhancer is by using it on lmstudio. The way to accomplish this is by going to your model folder inside lmstudio, then opening it up in your file explorer. Create a folder named "Sulphur", then a folder inside that called "promptenhancer". Inside that folder, place the gguf file and the mmproj file. Once you've done that, you should be able to load the prompt enhancer in lmstudio. There is no system prompt for it, just send the text (and an image) you'd like to be enhanced.
*As a note, this readme will contain better setup instructions and how to train on top of the model soon.
Links
Credits
- (TenStrip) β Testing & model merging (His i2v merge of sulphur 2, highly recommend for i2v)
- @s1lv3rc01n β Testing & model merging/quantizing (silveroxides)
- @mov7162 β Musubi Tuner guidance
- And many others, if you'd like to be on the credits and I didn't place you here, message me I likely assumed you didn't want to be here.
Funders
- Anonymous funder #1 β Supported the original Sulphur
- Anonymous funder #2 β Made Sulphur 2 possible; this model wouldn't exist without them
Thank you to everyone who contributed.
10 Eros
v1.2 Changelog: Leveraged tuned connector data to reduce face drift and aid long prompts/director. Also using sulphur EXP weights on top of v1 to hone the most explicit motions. All common issues like mistaken extra anatomy, subtitles, unexpected transitions, etc all still present from v1.
https://huggingface.co/TenStrip/LTX2.3-10Eros_Workflows
Quants: https://huggingface.co/vantagewithai/LTX2.3-10Eros-GGUF/tree/main
Nodes: https://github.com/TenStrip/10S-Comfy-nodes
Reliant on https://huggingface.co/SulphurAI/Sulphur-2-base This is a different merge attempt for ideal I2V use. It uses layer scaled merges of different steps, it's not a straight weight merge. It behaves much nicer than lora load and respects prompt. Prompt should be enhanced, LTX has very little self reasoning and input when it is conditioned, first frame and all following motions, evolutions, and audio must be commanded-you will get nothing if you don't ask it.
BF16 loads as a checkpoint with clip and VAEs.
Fp8_mixed_learned is the better FP8 version and is a full checkpoint as well, quant by S1LV3RC01N.
Kijai split files are for 10Eros FP8 Transformer version, but it has a different structure and variance. That one goes inside diffusion_models: https://huggingface.co/Kijai/LTX2.3_comfy/tree/main
!!! Larger distilled Loras will harm the model's fine tune, try the cond_safe ones: https://huggingface.co/TenStrip/LTX2.3_Distilled_Lora_1.1_Experiments/tree/main
For prompt enhancement, try this foreword in Grok or Uncensored LLM:
Generate a video scene script with a description based on the attached image for an LLM that has a tokenizer that uses interleaved attention to support long-context understanding that is fed into a multimodal video model. Strict specification, follow up to the word: No timestamps. No unnecessary embellishment. Output only plain English text and make it a copy box.
First, describe the image initial scene in concise natural language; subject(s), subject(s) appearance, subject(s) composition and pose, background, and context.
Next, formulate a naturally evolving scenario that would take place describing every moving body part, composition change, and manipulation from the uploaded initial frame that would be reflected in the video models post-latent evolution output. If the image is explicit or sexual in nature, use full anatomical terminology and spice it up slightly with visually representable erotic themes.
Center the prompt around this basic idea: [ concept ]
interweave this dialogue or sound concept into the scene with descriptions of voice tone followed by the lines delivered in quotations, in a temporal sequence between or during motions. Dialogue should be concise and non-rambling as it will take away from video quality: [ dialogue ]
Inside that prompt describe only notable audio and audio queues, both normal and explicit; background noise as well as foley and natural sounds. In a temporal sequence paired with coinciding motions. In the case of absent dialogue or soundscapes and only if background music is fitting; describe a fitting genre and melodic tone with matching mood.
Output only text following above instruction. Follow-up suggestions should be on the topic of expanding or changing motion or dialogue from the output text.
LTX 2.3 Music Video Creator V5.1
ComfyUI workflows for creating music-videos with LTX 2.3. This release includes a prompt-creation workflow plus both text-to-video and image-to-video music video workflows.
These workflows are designed for creators who want a fast and almost fully automated setup for building cinematic music video clips, generating scene prompts, adding optional LoRAs, and controlling advanced prompt details.
Included Workflows
Important: You must run the Prompt Creator workflow first before using the T2V or I2V video workflows.
LTX2.3_Music_Video_Creator_Prompt_Creator_V5.jsonLTX2.3_Music_Video_Creator_T2V_V5.1.jsonLTX2.3_Music_Video_Creator_I2V_V5.1.json
Full walkthrough video on entire process. Please watch the full video and follow along.
Sample Videos
These samples were created with the LTX 2.3 Music Video Creator workflows.
Sample 1 - Text to video using my Lux_Sensual Style LoRa. Light Post-Editing
This sample includes light post-editing in CapCut. I ran the workflow a few times to get different shots, then edited the final version together.
Sample 2 - Text to video using my Fantasy Painter Style LoRa. Straight From ComfyUI
This sample is straight from ComfyUI with no post-editing.
Sample 3 - Text to Video using my CyberPunk Style LoRa. Straight From ComfyUI
This sample is straight from ComfyUI with no post-editing.
Sample 4 - Text to Video using a character LoRa. Straight From ComfyUI
This sample is straight from ComfyUI with no post-editing.
Sample 5 - Image to Video using Z-image turbo with a character lora and LTX character lora. Straight From ComfyUI
This sample is straight from ComfyUI with no post-editing.
Sample 6 - Text to video with a custom LTX lora trained on both character on location. Light Post-Editing
This sample includes light post-editing in CapCut. I ran the workflow a few times to get different shots, then edited the final version together.
Sample 7 - Text to video using my Fantasy Realism LoRa. Straight From ComfyUI
This sample is straight from ComfyUI with no post-editing.
Sample 8 - Text to video using a LTX character loRa. Straight From ComfyUI
This sample is straight from ComfyUI with no post-editing.
Sample 9 - Text to video with Fantasy Realism Style LoRa. Straight From ComfyUI
This sample is straight from ComfyUI with no post-editing.
Sample 10 - Image to video using Zimage and LTX character loRa. CapCut FX And Overlays
This sample includes CapCut FX, filters, and overlays. The generated video itself came straight from ComfyUI.
Sample 11 - Image to video using Z-image character loRA. CapCut FX And Overlays
This sample includes CapCut FX, filters, and overlays. The generated video itself came straight from ComfyUI.
Walkthrough
Create automated AI music videos with my full LTX 2.3 workflow for ComfyUI free and local.
In this walkthrough, I show how the workflow takes a song, analyzes the timing, creates scene prompts from lyrics, and generates a finished music video using LTX 2.3.
The workflow is split into two parts.
π΅ Workflow 1 handles audio upload, beat detection, scene timing, lyrics, style and theme, story idea, subjects and locations, and prompt generation.
π¬ Workflow 2 handles the actual video generation, including an image-to-video workflow with Z-Image Turbo and LTX 2.3, plus a text-to-video workflow with LTX and LoRA support. Both include advanced prompt controls, scene generation, Remake Mode, and final video stitching.
β¨ This workflow is designed to reduce manual setup time while still giving you control over style, characters, camera motion, timing, seeds, LoRAs, and final edits.
π‘ For the best results, I recommend starting with the default settings first, then experimenting with LoRAs, seeds, advanced settings, and Remake Mode as you get more comfortable.
Requirements
- ComfyUI
- LTX 2.3 models
- Z-Image Turbo model
- FFmpeg installed for audio stitching
- My vrgamedevgirl custom nodes
- Impact Pack custom node for auto-queue
- llama-cpp-python
Community And Downloads
Join my Discord for support, updates, beta features, and to share your work: HERE
Download my custom nodes and workflows: HERE
#ComfyUI #LTX #AIvideo #AIMusicVideo #TextToVideo #ImageToVideo #AIWorkflow #GenerativeAI
import os
import torch
from safetensors.torch import load_file, save_file
import os
from huggingface_hub import HfApi, snapshot_download
HF_TOKEN = os.getenv("HF_TOKEN")
api = HfApi()
DEST_REPO = "ibyteohdear/Lightricks-LTX-2.3"
# Path to the folder where you cloned/downloaded the 'tree/main' files
input_folder = snapshot_download(
"maximsobolev275/10Eros-v12-lora-r768-rsvd",
allow_patterns=[
"*",
]
)
output_file = "10Eros_v12_r768.safetensors"
combined_state_dict = {}
# The repo splits weights by layers/keys (e.g., u_matrices, v_matrices)
# Loop through the downloaded safetensors chunks and pack them into a standard LoRA dictionary
for file_name in os.listdir(input_folder):
if file_name.endswith(".safetensors") and file_name != output_file:
file_path = os.path.join(input_folder, file_name)
print(f"Loading chunk: {file_name}")
chunk_dict = load_file(file_path)
for key, tensor in chunk_dict.items():
# Remap the internal RSVD keys to standard LoRA keys if necessary
# Standard formatting typically expects: lora_down.weight and lora_up.weight
combined_state_dict[key] = tensor
# Save as a single, unified LoRA file
print(f"Compiling into final LoRA: {output_file}")
save_file(combined_state_dict, output_file)
print("Done! Place this file in your models/loras folder.")
api.upload_file(
path_or_fileobj=output_file,,
path_in_repo="10Eros_v12_r768.safetensors",
repo_id=DEST_REPO,
token=HF_TOKEN,
)
import spaces
import os
import sys
import torch
import shutil
from huggingface_hub import hf_hub_download, snapshot_download
HF_TOKEN = os.environ.get("HF_TOKEN")
BASE = "/tmp/hf"
P_REPO = "ibyteohdear/ltx-2.3-packages"
@spaces.GPU(duration=600)
def run_pipeline():
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 1. Download base weights (The complete 46.1 GB flat file)
print("Downloading base BF16 checkpoint...")
vanilla_base_path = hf_hub_download(
repo_id="ibyteohdear/Lightricks-LTX-2.3",
filename="sulphur_distil_bf16.safetensors",
token=HF_TOKEN,
cache_dir="/tmp/hf_cache"
)
# 2. Download LoRA
print("Downloading LoRA weights...")
lora_path = hf_hub_download(
repo_id="ibyteohdear/Lightricks-LTX-2.3",
filename="10Eros_v12_r768.safetensors",
token=HF_TOKEN,
cache_dir="/tmp/hf_cache"
)
# 3. Environment/Packages setup
print("Downloading packages snapshot...")
ltx_path = snapshot_download(
repo_id=P_REPO,
local_dir=f"{BASE}/ltx",
token=HF_TOKEN
)
sys.path.insert(0, f"{ltx_path}/packages/ltx-pipelines/src")
sys.path.insert(0, f"{ltx_path}/packages/ltx-core/src")
from safetensors.torch import load_file, save_file, safe_open
# 4. Load the entire original 46.1 GB dictionary into RAM memory
print("Loading tensors into RAM...")
base_state_dict = load_file(vanilla_base_path, device="cpu")
# 5. Load LoRA weights
lora_state_dict = load_file(lora_path, device="cpu")
# 6. Apply LoRA updates directly onto matching base keys
print("Starting LoRA fusion loop...")
baked_count = 0
added_count = 0
skipped_count = 0
lora_strength = 1.0
# Scan the LoRA dictionary keys
for lora_key in list(lora_state_dict.keys()):
# Path variation A: Standard low-rank naming (.lora_down / .lora_up)
if ".lora_down.weight" in lora_key:
prefix = lora_key.split(".lora_down.weight")[0]
down_key = f"{prefix}.lora_down.weight"
up_key = f"{prefix}.lora_up.weight"
if up_key in lora_state_dict:
# Comfy keys prefix with "diffusion_model.". Strip it to match the base checkpoint structure.
target_base_key = f"{prefix}.weight".replace("model.diffusion_model.", "").replace("diffusion_model.", "")
# Fallback to raw key if it doesn't match stripped layout configurations
if target_base_key not in base_state_dict:
target_base_key = f"{prefix}.weight"
# CRITICAL: Only touch the base model if the key exists there natively!
if target_base_key in base_state_dict:
try:
# Move only the small targeted matrices to GPU sequentially for low VRAM consumption
W_base = base_state_dict[target_base_key].to(device=device, dtype=torch.bfloat16)
U = lora_state_dict[up_key].to(device=device, dtype=torch.bfloat16)
D = lora_state_dict[down_key].to(device=device, dtype=torch.bfloat16)
# Compute matrix low-rank product adjustment: ΞW = (Up x Down) * strength
delta_W = torch.matmul(U, D) * lora_strength
# Store result back into the system RAM checkpoint array
base_state_dict[target_base_key] = (W_base + delta_W).cpu()
baked_count += 1
except Exception as e:
print(f"Failed to bake layer {target_base_key}: {e}")
skipped_count += 1
else:
skipped_count += 1
# Path variation B: Low-rank dimension naming (.lora_A / .lora_B)
elif ".lora_A.weight" in lora_key:
prefix = lora_key.split(".lora_A.weight")[0]
a_key = f"{prefix}.lora_A.weight"
b_key = f"{prefix}.lora_B.weight"
if b_key not in lora_state_dict:
continue
target_base_key = f"{prefix}.weight".replace("model.diffusion_model.", "").replace("diffusion_model.", "")
if target_base_key not in base_state_dict and f"{prefix}.weight" in base_state_dict:
target_base_key = f"{prefix}.weight"
if target_base_key in base_state_dict:
try:
W_base = base_state_dict[target_base_key].to(device=device, dtype=torch.bfloat16)
A = lora_state_dict[a_key].to(device=device, dtype=torch.bfloat16)
B = lora_state_dict[b_key].to(device=device, dtype=torch.bfloat16)
delta_W = torch.matmul(B, A) * lora_strength
if delta_W.shape != W_base.shape:
delta_W = torch.matmul(A, B) * lora_strength
base_state_dict[target_base_key] = (W_base + delta_W).cpu()
baked_count += 1
except Exception as e:
print(f"Failed to bake layer {target_base_key}: {e}")
skipped_count += 1
else:
skipped_count += 1
# --- PROOF & LOGGING REGION ---
print("\n==================================================")
print(" FUSION VERIFICATION ")
print("==================================================")
print(f" Successfully Baked Layers : {baked_count}")
print(f" Newly Injected Multi-Layers: {added_count}")
print(f" Skipped / Mismatched Keys : {skipped_count}")
print("==================================================")
if baked_count == 0 and added_count == 0:
print("β CRITICAL WARNING: Zero operations were completed. Output will be unmodified!")
return
else:
print("β
SUCCESS: BF16 loop fusion complete.\n")
# 7. Extract the original header metadata so the inference app knows the exact shapes
try:
with safe_open(vanilla_base_path, framework="pt", device="cpu") as f:
original_metadata = f.metadata()
except Exception as e:
original_metadata = None
# 8. DISK MANAGEMENT: Wipe cache down to free disk space before exporting
print("Cleaning cache directory...")
try:
if os.path.exists("/tmp/hf_cache"):
shutil.rmtree("/tmp/hf_cache")
except Exception as e:
pass
# 9. Write out and upload file
from huggingface_hub import HfApi
api = HfApi()
DEST_REPO = "ibyteohdear/Lightricks-LTX-2.3"
output_filename = "/tmp/LTX2.3_DISTILLED_BAKED.safetensors"
print("Saving the new baked safetensors file...")
if original_metadata:
save_file(base_state_dict, output_filename, metadata=original_metadata)
else:
save_file(base_state_dict, output_filename)
print(f"Uploading target file to Hugging Face: {DEST_REPO}...")
api.upload_file(
path_or_fileobj=output_filename,
path_in_repo="LTX2.3_DISTILLED_BAKED_LTX_10Eros_v12_r768.safetensors",
repo_id=DEST_REPO,
token=HF_TOKEN,
)
print("Pipeline execution complete.")
if __name__ == "__main__":
run_pipeline()
import spaces
import os
import sys
import torch
import shutil
from huggingface_hub import hf_hub_download, snapshot_download
HF_TOKEN = os.environ.get("HF_TOKEN")
BASE = "/tmp/hf"
P_REPO = "ibyteohdear/ltx-2.3-packages"
@spaces.GPU(duration=600)
def run_pipeline():
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 1. Download base weights (FP8 mixed checkpoint)
print("Downloading base FP8 mixed checkpoint...")
vanilla_base_path = hf_hub_download(
repo_id="ibyteohdear/Lightricks-LTX-2.3",
filename="sulphur_distil_fp8mixed.safetensors",
token=HF_TOKEN,
cache_dir="/tmp/hf_cache"
)
# 2. Download LoRA
print("Downloading LoRA weights...")
lora_path = hf_hub_download(
repo_id="ibyteohdear/Lightricks-LTX-2.3",
filename="10Eros_v12_r768.safetensors",
token=HF_TOKEN,
cache_dir="/tmp/hf_cache"
)
# 3. Environment/Packages setup
print("Downloading packages snapshot...")
ltx_path = snapshot_download(
repo_id=P_REPO,
local_dir=f"{BASE}/ltx",
token=HF_TOKEN
)
sys.path.insert(0, f"{ltx_path}/packages/ltx-pipelines/src")
sys.path.insert(0, f"{ltx_path}/packages/ltx-core/src")
from safetensors.torch import load_file, save_file, safe_open
# 4. Load weights into memory
print("Loading tensors into RAM...")
base_state_dict = load_file(vanilla_base_path, device="cpu")
lora_state_dict = load_file(lora_path, device="cpu")
lora_keys = list(lora_state_dict.keys())
# 5. Apply LoRA updates directly onto matching base keys
print("Starting LoRA fusion loop...")
baked_count = 0
added_count = 0
skipped_count = 0
lora_strength = 1.0
for lora_key in lora_keys:
# Match using the true suffix found in logs: .lora_A.weight
if ".lora_A.weight" in lora_key:
prefix = lora_key.split(".lora_A.weight")[0]
a_key = f"{prefix}.lora_A.weight"
b_key = f"{prefix}.lora_B.weight"
if b_key in lora_state_dict:
# Try matching with standard prefix cleanups if raw mismatch happens
target_base_key = f"{prefix}.weight".replace("model.diffusion_model.", "").replace("diffusion_model.", "")
# Fallback to the raw structured base key if stripped version isn't present
if target_base_key not in base_state_dict:
target_base_key = f"{prefix}.weight"
# CRITICAL: Only touch the base model if the target key exists natively
if target_base_key in base_state_dict:
try:
# Store original datatype (preserves FP8 / BF16 mixed layout)
orig_dtype = base_state_dict[target_base_key].dtype
# Upcast to high-precision for tensor math on GPU
W_base = base_state_dict[target_base_key].to(device=device, dtype=torch.bfloat16)
A = lora_state_dict[a_key].to(device=device, dtype=torch.bfloat16)
B = lora_state_dict[b_key].to(device=device, dtype=torch.bfloat16)
# Compute LoRA adjustment: ΞW = (B x A) * strength
delta_W = torch.matmul(B, A) * lora_strength
# Dynamic shape orientation handling
if delta_W.shape != W_base.shape:
delta_W = torch.matmul(A, B) * lora_strength
fused_W = W_base + delta_W
# Downcast back to match original FP8/BF16 configuration
base_state_dict[target_base_key] = fused_W.to(dtype=orig_dtype).cpu()
baked_count += 1
except Exception as e:
print(f"Failed to bake layer {target_base_key}: {e}")
skipped_count += 1
else:
# SAFE: Unique LoRA framework elements are safely ignored
# keeping the base dictionary pristine for offloading setups.
skipped_count += 1
# --- PROOF & LOGGING REGION ---
print("\n==================================================")
print(" FUSION VERIFICATION ")
print("==================================================")
print(f" Successfully Baked Layers : {baked_count}")
print(f" Newly Injected Multi-Layers: {added_count}")
print(f" Skipped / Mismatched Keys : {skipped_count}")
print("==================================================")
if baked_count == 0 and added_count == 0:
print("β CRITICAL WARNING: Zero operations were completed. Output will be unmodified!")
return
else:
print("β
SUCCESS: FP8 loop fusion complete.\n")
# 6. Extract the original header metadata
try:
with safe_open(vanilla_base_path, framework="pt", device="cpu") as f:
original_metadata = f.metadata()
except Exception as e:
original_metadata = None
# 7. DISK MANAGEMENT: Clear space before writing out huge outputs
print("Cleaning cache directory...")
try:
if os.path.exists("/tmp/hf_cache"):
shutil.rmtree("/tmp/hf_cache")
except Exception as e:
pass
# 8. Write out and upload file
from huggingface_hub import HfApi
api = HfApi()
DEST_REPO = "ibyteohdear/Lightricks-LTX-2.3"
output_filename = "/tmp/LTX2.3_DISTILLED_BAKED.safetensors"
print("Saving the new baked safetensors file...")
if original_metadata:
save_file(base_state_dict, output_filename, metadata=original_metadata)
else:
save_file(base_state_dict, output_filename)
print(f"Uploading target file to Hugging Face: {DEST_REPO}...")
api.upload_file(
path_or_fileobj=output_filename,
path_in_repo="LTX2.3_DISTILLED_BAKED_LTX_10Eros_v12_r768_fp8.safetensors",
repo_id=DEST_REPO,
token=HF_TOKEN,
)
print("Pipeline execution complete.")
if __name__ == "__main__":
run_pipeline()
can also try
import spaces
import os
import sys
import torch
import shutil
from huggingface_hub import hf_hub_download, snapshot_download
HF_TOKEN = os.environ.get("HF_TOKEN")
BASE = "/tmp/hf"
P_REPO = "ibyteohdear/ltx-2.3-packages"
@spaces.GPU(duration=600)
def run_pipeline():
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 1. Download base weights (The 1.1 distilled variant)
print("Downloading 1.1 Distilled base checkpoint...")
vanilla_base_path = hf_hub_download(
repo_id="ibyteohdear/Lightricks-LTX-2.3",
filename="ltx-2.3-22b-distilled-1.1.safetensors",
token=HF_TOKEN,
cache_dir="/tmp/hf_cache"
)
# 2. Download LoRA
print("Downloading LoRA weights...")
lora_path = hf_hub_download(
repo_id="maximsobolev275/LTX-10Eros-LoRA-r768",
filename="LTX_10Eros-v12_LoRA_fro99-avgrank91.safetensors",
token=HF_TOKEN,
cache_dir="/tmp/hf_cache"
)
# 3. Environment/Packages setup
print("Downloading packages snapshot...")
ltx_path = snapshot_download(
repo_id=P_REPO,
local_dir=f"{BASE}/ltx",
token=HF_TOKEN
)
sys.path.insert(0, f"{ltx_path}/packages/ltx-pipelines/src")
sys.path.insert(0, f"{ltx_path}/packages/ltx-core/src")
from safetensors.torch import load_file, save_file, safe_open
# 4. Load the entire original 46.1 GB dictionary into RAM memory
print("Loading tensors into RAM...")
base_state_dict = load_file(vanilla_base_path, device="cpu")
# 5. Load LoRA weights
lora_state_dict = load_file(lora_path, device="cpu")
# 6. Apply LoRA updates directly onto matching base keys
print("Starting LoRA fusion loop...")
baked_count = 0
added_count = 0
skipped_count = 0
lora_strength = 1.0
# Build a lookup map of base keys using simplified structural names
def simplify_key(k):
return k.replace("model.diffusion_model.", "").replace("diffusion_model.", "").replace("transformer.", "")
base_lookup = {simplify_key(k): k for k in base_state_dict.keys()}
for lora_key in list(lora_state_dict.keys()):
target_base_key = None
is_type_a = False
# Path variation A: Standard low-rank naming (.lora_down / .lora_up)
if ".lora_down.weight" in lora_key:
prefix = lora_key.split(".lora_down.weight")[0]
down_key = f"{prefix}.lora_down.weight"
up_key = f"{prefix}.lora_up.weight"
if up_key in lora_state_dict:
is_type_a = True
simple_prefix = simplify_key(f"{prefix}.weight")
target_base_key = base_lookup.get(simple_prefix)
# Path variation B: Low-rank dimension naming (.lora_A / .lora_B)
elif ".lora_A.weight" in lora_key:
prefix = lora_key.split(".lora_A.weight")[0]
a_key = f"{prefix}.lora_A.weight"
b_key = f"{prefix}.lora_B.weight"
if b_key in lora_state_dict:
simple_prefix = simplify_key(f"{prefix}.weight")
target_base_key = base_lookup.get(simple_prefix)
# If a valid layout structure match was identified, fuse the tensors
if target_base_key and target_base_key in base_state_dict:
try:
W_base = base_state_dict[target_base_key].to(device=device, dtype=torch.bfloat16)
if is_type_a:
U = lora_state_dict[up_key].to(device=device, dtype=torch.bfloat16)
D = lora_state_dict[down_key].to(device=device, dtype=torch.bfloat16)
delta_W = torch.matmul(U, D) * lora_strength
else:
A = lora_state_dict[a_key].to(device=device, dtype=torch.bfloat16)
B = lora_state_dict[b_key].to(device=device, dtype=torch.bfloat16)
delta_W = torch.matmul(B, A) * lora_strength
if delta_W.shape != W_base.shape:
delta_W = torch.matmul(A, B) * lora_strength
base_state_dict[target_base_key] = (W_base + delta_W).cpu()
baked_count += 1
except Exception as e:
print(f"Failed to bake layer {target_base_key}: {e}")
skipped_count += 1
else:
if ".lora_down.weight" in lora_key or ".lora_A.weight" in lora_key:
skipped_count += 1
# --- PROOF & LOGGING REGION ---
print("\n==================================================")
print(" FUSION VERIFICATION ")
print("==================================================")
print(f" Successfully Baked Layers : {baked_count}")
print(f" Newly Injected Multi-Layers: {added_count}")
print(f" Skipped / Mismatched Keys : {skipped_count}")
print("==================================================")
if baked_count == 0 and added_count == 0:
print("β CRITICAL WARNING: Zero operations were completed. Output will be unmodified!")
return
else:
print("β
SUCCESS: Dynamic key resolution fusion complete.\n")
# 7. Extract original header metadata
try:
with safe_open(vanilla_base_path, framework="pt", device="cpu") as f:
original_metadata = f.metadata()
except Exception as e:
original_metadata = None
# 8. DISK MANAGEMENT: Wipe cache down to free disk space
print("Cleaning cache directory...")
try:
if os.path.exists("/tmp/hf_cache"):
shutil.rmtree("/tmp/hf_cache")
except Exception as e:
pass
# 9. Write out and upload file
from huggingface_hub import HfApi
api = HfApi()
DEST_REPO = "ibyteohdear/Lightricks-LTX-2.3"
output_filename = "/tmp/LTX2.3_DISTILLED_BAKED.safetensors"
print("Saving the new baked safetensors file...")
if original_metadata:
save_file(base_state_dict, output_filename, metadata=original_metadata)
else:
save_file(base_state_dict, output_filename)
print(f"Uploading target file to Hugging Face: {DEST_REPO}...")
api.upload_file(
path_or_fileobj=output_filename,
path_in_repo="LTX2.3_DISTILLED_1.1_BAKED_v12_LoRA_fro99-avgrank91.safetensors",
repo_id=DEST_REPO,
token=HF_TOKEN,
)
print("Pipeline execution complete.")
if __name__ == "__main__":
run_pipeline()
- Downloads last month
- 6,076
