--- pipeline_tag: image-to-video tags: - image-to-video - text-to-video - video-to-video - image-text-to-video - audio-to-video - text-to-audio - video-to-audio - audio-to-audio - text-to-audio-video - image-to-audio-video - image-text-to-audio-video - ltx-2 - ltx-video - ltxv - lightricks pinned: true language: - en - de - es - fr - ja - ko - zh - it - pt license: other license_name: ltx-2-open-weights-license license_link: https://static.lightricks.com/legal/ltx-2-open-weights-license-0.X.pdf library_name: diffusers demo: https://app.ltx.studio/ltx-2-playground/i2v --- # LTX-2 Model Card This model card focuses on the LTX-2 model, codebase available [here](https://github.com/Lightricks/LTX-2). LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution. trailer # Model Checkpoints | Name | Notes | |--------------------------------|----------------------------------------------------------------------------------------------------------------| | ltx-2-19b-dev | The full model, flexible and trainable in bf16 | | ltx-2-19b-dev-fp8 | The full model in fp8 quantization | | ltx-2-19b-dev-fp4 | The full model in nvfp4 quantization | | ltx-2-19b-distilled | The distilled version of the full model, 8 steps, CFG=1 | | ltx-2-19b-distilled-lora-384 | A LoRA version of the distilled model applicable to the full model | | ltx-2-spatial-upscaler-x2-1.0 | An x2 spatial upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher resolution | | ltx-2-temporal-upscaler-x2-1.0 | An x2 temporal upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher FPS | ## Model Details - **Developed by:** Lightricks - **Model type:** Diffusion-based audio-video foundation model - **Language(s):** English # Online demo LTX-2 is accessible right away via the following links: - [LTX-Studio text-to-video](https://app.ltx.studio/ltx-2-playground/t2v) - [LTX-Studio image-to-video](https://app.ltx.studio/ltx-2-playground/i2v) # Run locally ## Direct use license You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the [license](https://static.lightricks.com/legal/ltx-2-open-weights-license-0.X.pdf). ## ComfyUI We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager. For manual installation information, please refer to our [documentation site](https://docs.ltx.video/open-source-model/integration-tools/comfy-ui). ## PyTorch codebase The [LTX-2 codebase](https://github.com/Lightricks/LTX-2) is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'. The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7. ### Installation ```bash git clone https://github.com/Lightricks/LTX-2.git cd LTX-2 # From the repository root uv sync source .venv/bin/activate ``` ### Inference To use our model, please follow the instructions in our [ltx-pipelines](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/README.md) package. ## Diffusers 🧨 LTX-2 is supported in the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) for image-to-video generation. ## General tips: * Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1. * In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames. * For tips on writing effective prompts, please visit our [Prompting guide](https://ltx.video/blog/how-to-prompt-for-ltx-2) ### Limitations - This model is not intended or able to provide factual information. - As a statistical model this checkpoint might amplify existing societal biases. - The model may fail to generate videos that matches the prompts perfectly. - Prompt following is heavily influenced by the prompting-style. - The model may generate content that is inappropriate or offensive. - When generating audio without speech, the audio may be of lower quality. ## Image-to-video examples | | | | |:---:|:---:|:---:| | ![example1](./media/ltx-video_i2v_example_00001.gif) | ![example2](./media/ltx-video_i2v_example_00002.gif) | ![example3](./media/ltx-video_i2v_example_00003.gif) | | ![example4](./media/ltx-video_i2v_example_00004.gif) | ![example5](./media/ltx-video_i2v_example_00005.gif) | ![example6](./media/ltx-video_i2v_example_00006.gif) | | ![example7](./media/ltx-video_i2v_example_00007.gif) | ![example8](./media/ltx-video_i2v_example_00008.gif) | ![example9](./media/ltx-video_i2v_example_00009.gif) |