LTX-2 Image-to-Video Adapter LoRA
A high-rank LoRA adapter for LTX-Video 2 that substantially improves image-to-video generation quality. No complex workflows, no image preprocessing, no compression tricks -- just a direct image embedding pipeline that works.
What This Is
This LoRA was trained on 30,000 generated videos spanning a wide range of subjects, styles, and motion types. The result is a highly generalized adapter that strengthens LTX-2's ability to take a single image and produce coherent, high-fidelity video from it.
Key Specs
| Parameter | Value |
|---|---|
| Base Model | LTX-Video 2 |
| LoRA Rank | 256 |
| Training Set | ~30,000 generated videos |
| Training Scope | Visual only (no explicit audio training) |
What It Does
- Improved image fidelity -- the generated video maintains stronger adherence to the source image with less drift or distortion across frames.
- Better motion coherence -- subjects move more naturally and consistently throughout the clip.
- Broader generalization -- performs well across diverse subjects and scenes without needing per-category tuning.
- Zero-workflow overhead -- no ControlNet, no IP-Adapter stacking, no image manipulation required. Load the LoRA, attach an image embedding, prompt, and generate.
A Note on Audio
Audio was not explicitly trained into this LoRA. However, due to the nature of how LTX-2 handles its latent space, there are subtle shifts in audio output compared to the base model. This is a side effect of the training process, not an intentional feature.
Usage (ComfyUI)
- Place the LoRA file in your
ComfyUI/models/loras/directory. - Add an LTX-2 model loader node and load the base LTX-2 checkpoint.
- Add a Load LoRA node and select this adapter.
- Connect an image embedding node with your source image.
- Add your text prompt and generate.
No additional nodes, preprocessing steps, or auxiliary models are needed.
Workflow
Examples
Reference videos demonstrating the adapter's output quality:
Model Details
- Architecture: LoRA (Low-Rank Adaptation) applied to LTX-Video 2's transformer layers
- Rank 256 provides a high-capacity adaptation while remaining efficient to load and merge
- Training data was intentionally diverse to avoid overfitting to any single domain, producing a general-purpose image-to-video adapter rather than a style-specific fine-tune
License
Please refer to the LTX-Video license for base model terms.
