LTX-2 Image-to-Video Adapter LoRA

A high-rank LoRA adapter for LTX-Video 2 that substantially improves image-to-video generation quality. No complex workflows, no image preprocessing, no compression tricks -- just a direct image embedding pipeline that works.

What This Is

This LoRA was trained on 30,000 generated videos spanning a wide range of subjects, styles, and motion types. The result is a highly generalized adapter that strengthens LTX-2's ability to take a single image and produce coherent, high-fidelity video from it.

Key Specs

Parameter	Value
Base Model	LTX-Video 2
LoRA Rank	256
Training Set	~30,000 generated videos
Training Scope	Visual only (no explicit audio training)

What It Does

Improved image fidelity -- the generated video maintains stronger adherence to the source image with less drift or distortion across frames.
Better motion coherence -- subjects move more naturally and consistently throughout the clip.
Broader generalization -- performs well across diverse subjects and scenes without needing per-category tuning.
Zero-workflow overhead -- no ControlNet, no IP-Adapter stacking, no image manipulation required. Load the LoRA, attach an image embedding, prompt, and generate.

A Note on Audio

Audio was not explicitly trained into this LoRA. However, due to the nature of how LTX-2 handles its latent space, there are subtle shifts in audio output compared to the base model. This is a side effect of the training process, not an intentional feature.

Usage (ComfyUI)

Place the LoRA file in your ComfyUI/models/loras/ directory.
Add an LTX-2 model loader node and load the base LTX-2 checkpoint.
Add a Load LoRA node and select this adapter.
Connect an image embedding node with your source image.
Add your text prompt and generate.

No additional nodes, preprocessing steps, or auxiliary models are needed.

Examples

Three reference videos demonstrating the adapter's output quality:

Model Details

Architecture: LoRA (Low-Rank Adaptation) applied to LTX-Video 2's transformer layers
Rank 256 provides a high-capacity adaptation while remaining efficient to load and merge
Training data was intentionally diverse to avoid overfitting to any single domain, producing a general-purpose image-to-video adapter rather than a style-specific fine-tune

License

Please refer to the LTX-Video license for base model terms.