| | --- |
| | pipeline_tag: image-to-video |
| | tags: |
| | - image-to-video |
| | - text-to-video |
| | - video-to-video |
| | - image-text-to-video |
| | - audio-to-video |
| | - text-to-audio |
| | - video-to-audio |
| | - audio-to-audio |
| | - text-to-audio-video |
| | - image-to-audio-video |
| | - image-text-to-audio-video |
| | - ltx-2 |
| | - ltx-video |
| | - ltxv |
| | - lightricks |
| | pinned: true |
| | language: |
| | - en |
| | - de |
| | - es |
| | - fr |
| | - ja |
| | - ko |
| | - zh |
| | - it |
| | - pt |
| | license: other |
| | license_name: ltx-2-community-license-agreement |
| | license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE |
| | library_name: diffusers |
| | demo: https://app.ltx.studio/ltx-2-playground/i2v |
| | --- |
| | |
| | **Split version of Split LTX-2 checkpoint - Model/VAE/Audio VAE/Text Encoder** |
| |
|
| | **Original model Link:** [https://huggingface.co/Lightricks/LTX-2](https://huggingface.co/Lightricks/LTX-2) |
| |
|
| | **Watch us at Youtube:** [@VantageWithAI](https://www.youtube.com/@vantagewithai) |
| |
|
| | # LTX-2 Model Card |
| | This model card focuses on the LTX-2 model, codebase available [here](https://github.com/Lightricks/LTX-2). |
| |
|
| | LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution. |
| |
|
| | [](https://www.youtube.com/watch?v=8fWAJXZJbRA) |
| |
|
| | # Model Checkpoints |
| |
|
| | | Name | Notes | |
| | |--------------------------------|----------------------------------------------------------------------------------------------------------------| |
| | | ltx-2-19b-dev | The full model, flexible and trainable in bf16 | |
| | | ltx-2-19b-dev-fp8 | The full model in fp8 quantization | |
| | | ltx-2-19b-dev-fp4 | The full model in nvfp4 quantization | |
| | | ltx-2-19b-distilled | The distilled version of the full model, 8 steps, CFG=1 | |
| | | ltx-2-19b-distilled-lora-384 | A LoRA version of the distilled model applicable to the full model | |
| | | ltx-2-spatial-upscaler-x2-1.0 | An x2 spatial upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher resolution | |
| | | ltx-2-temporal-upscaler-x2-1.0 | An x2 temporal upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher FPS | |
| |
|
| | ## Model Details |
| | - **Developed by:** Lightricks |
| | - **Model type:** Diffusion-based audio-video foundation model |
| | - **Language(s):** English |
| |
|
| | # Online demo |
| | LTX-2 is accessible right away via the following links: |
| | - [LTX-Studio text-to-video](https://app.ltx.studio/ltx-2-playground/t2v) |
| | - [LTX-Studio image-to-video](https://app.ltx.studio/ltx-2-playground/i2v) |
| |
|
| | # Run locally |
| |
|
| | ## Direct use license |
| | You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the [license](./LICENSE). |
| |
|
| | ## ComfyUI |
| | We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager. |
| | For manual installation information, please refer to our [documentation site](https://docs.ltx.video/open-source-model/integration-tools/comfy-ui). |
| |
|
| | ## PyTorch codebase |
| |
|
| | The [LTX-2 codebase](https://github.com/Lightricks/LTX-2) is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'. |
| | The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7. |
| |
|
| | ## Diffusers 🧨 |
| |
|
| | LTX-2 is supported in the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) for image-to-video generation. |
| |
|
| | ## General tips: |
| | * Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1. |
| | * In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames. |
| | * For tips on writing effective prompts, please visit our [Prompting guide](https://ltx.video/blog/how-to-prompt-for-ltx-2) |
| |
|
| | ### Limitations |
| | - This model is not intended or able to provide factual information. |
| | - As a statistical model this checkpoint might amplify existing societal biases. |
| | - The model may fail to generate videos that matches the prompts perfectly. |
| | - Prompt following is heavily influenced by the prompting-style. |
| | - The model may generate content that is inappropriate or offensive. |
| | - When generating audio without speech, the audio may be of lower quality. |
| |
|
| | # Train the model |
| |
|
| | The base (dev) model is fully trainable. |
| |
|
| | It's extremely easy to reproduce the LoRAs and IC-LoRAs we publish with the model by following the instructions on the [LTX-2 Trainer Readme](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-trainer/README.md). |
| |
|
| | Training for motion, style or likeness (sound+appearance) can take less than an hour in many settings. |
| |
|