Instructions to use LiconStudio/LTX-2.3-Multiple-Subject-Reference with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use LiconStudio/LTX-2.3-Multiple-Subject-Reference with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("LiconStudio/LTX-2.3-Multiple-Subject-Reference", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| tags: | |
| - video-generation | |
| - multi-reference | |
| - LTX-2.3 | |
| base_model: | |
| - Lightricks/LTX-2.3 | |
| github: https://github.com/liconstudio/ComfyUI-Licon-MSR | |
| library_name: diffusers | |
| ## Overview | |
| This model implements a novel approach to multi-reference video generation using **Multiple Subject Reference (MSR)**. Instead of introducing additional encoder branches or fusion modules, we transform multiple static reference images into a pseudo-video sequence that shares the same representation space as the target video. | |
| ## Usage | |
| This LoRA requires the **[ComfyUI-Licon-MSR](https://github.com/liconstudio/ComfyUI-Licon-MSR)** plugin for ComfyUI. A sample workflow is included in the model files for easy testing and experimentation. | |
| ## Key Features | |
| ### Multi-Reference Visual Memory | |
| - **Token-level reference preservation**: Multiple reference images are encoded as video latents, preserving fine-grained visual information at token level rather than compressing into a single embedding | |
| - **Native self-attention retrieval**: The target video tokens directly access reference tokens through the model's existing self-attention mechanism—no new architectural components needed | |
| - **In-context conditioning**: References serve as "visual memory" within the main token sequence, not as external conditioning inputs | |
| ### Flexible Reference Composition | |
| - **2 to 5 reference images**: Supports varying numbers of reference inputs with increasing complexity | |
| - **Complementary semantic roles**: Each reference image can carry different information: | |
| - Subject identity | |
| - Object/prop details | |
| - Scene/background | |
| - Local textures | |
| - Multiple viewpoints | |
| ## What It Can Do | |
| ### Identity Preservation Across References | |
| Generate videos where multiple reference identities are simultaneously preserved: | |
| - Multiple characters from different reference images | |
| - Character + object combinations | |
| - Object + scene compositions | |
| ### Relation-Based Composition | |
| Beyond mere identity preservation, the model can compose references based on textual relation descriptions: | |
| - Action interactions (handing, picking up, pushing) | |
| - Spatial relationships (left-right, foreground-background) | |
| - Temporal event structures (start → process → result) | |
| ### Cross-Reference Attribute Selection | |
| The model learns to selectively retrieve attributes from different references: | |
| - Face from reference A, clothing from reference B | |
| - Object identity from one reference, pose/position from another | |
| - Background elements from scene references | |
| ## Usage Tips (V1 Version) | |
| - **Prompt description**: Requires concise but accurate description of reference images. Over-description or under-description both lead to consistency degradation | |
| - **High-motion scenes**: 50fps recommended to ensure smooth motion coherence | |
| - **Generation reliability**: Typically requires 2-3 sampling runs to achieve accurate results | |
| ## Results Showcase | |
| ### V1 Version | |
| | Reference Images | Generated Video | | |
| |:---:|:---:| | |
| | <img src="validition_v1/01/1.jpg" width="80"> <img src="validition_v1/01/2.jpg" width="80"> <img src="validition_v1/01/bg.png" width="80"> | [▶ Play](validition_v1/01/video.mp4) | | |
| | <img src="validition_v1/07/1.jpg" width="80"> <img src="validition_v1/07/2.jpg" width="80"> <img src="validition_v1/07/bg.png" width="80"> | [▶ Play](validition_v1/07/video.mp4) | | |
| | <img src="validition_v1/05/1.png" width="70"> <img src="validition_v1/05/2.png" width="70"> <img src="validition_v1/05/5.png" width="70"> <img src="validition_v1/05/bg.png" width="70"> | [▶ Play](validition_v1/05/video.mp4) | | |
| --- |