LTX-2 / README.md
ofirbibi's picture
Update README.md
7ea9334
|
raw
history blame
5.45 kB
metadata
pipeline_tag: image-to-video
tags:
  - image-to-video
  - text-to-video
  - video-to-video
  - image-text-to-video
  - audio-to-video
  - text-to-audio
  - video-to-audio
  - audio-to-audio
  - text-to-audio-video
  - image-to-audio-video
  - image-text-to-audio-video
  - ltx-2
  - ltx-video
  - ltxv
  - lightricks
pinned: true
language:
  - en
  - de
  - es
  - fr
  - ja
  - ko
  - zh
  - it
  - pt
license: other
license_name: ltx-2-open-weights-license
license_link: https://static.lightricks.com/legal/ltx-2-open-weights-license-0.X.pdf
library_name: diffusers
demo: https://app.ltx.studio/ltx-2-playground/i2v

LTX-2 Model Card

This model card focuses on the LTX-2 model, codebase available here.

LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.

trailer

Model Checkpoints

Name Notes
ltx-2-19b-dev The full model, flexible and trainable in bf16
ltx-2-19b-dev-fp8 The full model in fp8 quantization
ltx-2-19b-dev-fp4 The full model in nvfp4 quantization
ltx-2-19b-distilled The distilled version of the full model, 8 steps, CFG=1
ltx-2-19b-distilled-lora-384 A LoRA version of the distilled model applicable to the full model
ltx-2-spatial-upscaler-x2-1.0 An x2 spatial upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher resolution
ltx-2-temporal-upscaler-x2-1.0 An x2 temporal upscaler for the ltx-2 latents, used in multi stage (multiscale) pipelines for higher FPS

Model Details

  • Developed by: Lightricks
  • Model type: Diffusion-based audio-video foundation model
  • Language(s): English

Online demo

LTX-2 is accessible right away via the following links:

Run locally

Direct use license

You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the license.

ComfyUI

We recommend you use the built-in LTXVideo nodes that can be found in the ComfyUI Manager. For manual installation information, please refer to our documentation site.

PyTorch codebase

The LTX-2 codebase is a monorepo with several packages. From model definition in 'ltx-core' to pipelines in 'ltx-pipelines' and training capabilities in 'ltx-trainer'. The codebase was tested with Python >=3.12, CUDA version >12.7, and supports PyTorch ~= 2.7.

Installation

git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2

# From the repository root
uv sync
source .venv/bin/activate

Inference

To use our model, please follow the instructions in our ltx-pipelines package.

Diffusers 🧨

LTX-2 is supported in the Diffusers Python library for image-to-video generation.

General tips:

  • Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.
  • In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input should be padded with -1 and then cropped to the desired resolution and number of frames.
  • For tips on writing effective prompts, please visit our Prompting guide

Limitations

  • This model is not intended or able to provide factual information.
  • As a statistical model this checkpoint might amplify existing societal biases.
  • The model may fail to generate videos that matches the prompts perfectly.
  • Prompt following is heavily influenced by the prompting-style.
  • The model may generate content that is inappropriate or offensive.
  • When generating audio without speech, the audio may be of lower quality.

Image-to-video examples

example1 example2 example3
example4 example5 example6
example7 example8 example9