LTX-2 / README.md

ofirbibi

Update README.md

7ea9334 28 days ago

preview code

raw

history blame

5.45 kB

metadata

pipeline_tag: image-to-video
tags:
  - image-to-video
  - text-to-video
  - video-to-video
  - image-text-to-video
  - audio-to-video
  - text-to-audio
  - video-to-audio
  - audio-to-audio
  - text-to-audio-video
  - image-to-audio-video
  - image-text-to-audio-video
  - ltx-2
  - ltx-video
  - ltxv
  - lightricks
pinned: true
language:
  - en
  - de
  - es
  - fr
  - ja
  - ko
  - zh
  - it
  - pt
license: other
license_name: ltx-2-open-weights-license
license_link: https://static.lightricks.com/legal/ltx-2-open-weights-license-0.X.pdf
library_name: diffusers
demo: https://app.ltx.studio/ltx-2-playground/i2v

LTX-2 Model Card

This model card focuses on the LTX-2 model, codebase available here.

LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.

Model Checkpoints

Name	Notes
ltx-2-19b-dev	The full model, flexible and trainable in bf16
ltx-2-19b-dev-fp8	The full model in fp8 quantization
ltx-2-19b-dev-fp4	The full model in nvfp4 quantization
ltx-2-19b-distilled	The distilled version of the full model, 8 steps, CFG=1
ltx-2-19b-distilled-lora-384	A LoRA version of the distilled model applicable to the full model
ltx-2-spatial-upscaler-x2-1.0	An x2 spatial upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher resolution
ltx-2-temporal-upscaler-x2-1.0	An x2 temporal upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher FPS

Model Details

Developed by: Lightricks
Model type: Diffusion-based audio-video foundation model
Language(s): English

Online demo

LTX-2 is accessible right away via the following links:

Run locally

Direct use license

You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the license.

ComfyUI

We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager. For manual installation information, please refer to our documentation site.

PyTorch codebase

The LTX-2 codebase is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'. The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.

Installation

git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2

# From the repository root
uv sync
source .venv/bin/activate

Inference

To use our model, please follow the instructions in our ltx-pipelines package.

Diffusers 🧨

LTX-2 is supported in the Diffusers Python library for image-to-video generation.

General tips:

Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.
In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames.
For tips on writing effective prompts, please visit our Prompting guide

Limitations

This model is not intended or able to provide factual information.
As a statistical model this checkpoint might amplify existing societal biases.
The model may fail to generate videos that matches the prompts perfectly.
Prompt following is heavily influenced by the prompting-style.
The model may generate content that is inappropriate or offensive.
When generating audio without speech, the audio may be of lower quality.

Image-to-video examples