---
license: other
license_name: ltx-2-community-license-agreement
license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE
language:
- en
- zh
library_name: diffusers
tags:
- video-generation
- video-reasoning
- logical-reasoning
- lora
- ltx-2.3
base_model:
- Lightricks/LTX-2.3
---
# LTX-2 VBVR LoRA - Video Reasoning
LoRA fine-tuned weights for LTX-2.3 22B on the VBVR (A Very Big Video Reasoning Suite) dataset.
## Training Data
**To ensure training quality, we preprocessed the full 1,000,000 videos from the official dataset and randomly sample during training to maintain data diversity. We adopt the official parameters with batch_size=16 and rank=32 to prevent catastrophic forgetting caused by excessively large rank.**
The VBVR dataset contains 200 reasoning task categories, with ~5,000 variants per task, totaling ~1M videos. Main task types include:
- **Object Trajectory**: Objects moving to target positions
- **Physical Reasoning**: Rolling balls, collisions, gravity
- **Causal Relationships**: Conditional triggers, chain reactions
- **Spatial Relationships**: Relative positions, path planning
## Model Details
| Item | Details |
|------|---------|
| Base Model | ltx-2.3-22b-dev |
| Training Method | LoRA Fine-tuning |
| LoRA Rank | 32 |
| Effective Batch Size | 16 |
| Mixed Precision | BF16 |
## TODO List
### Dataset Release Plan
| Dataset | Videos | Status |
|---------|--------|--------|
| VBVR-96K | 96,000 | ✅ Released |
| VBVR-240K | 240,000 | 🔄 Processing |
| VBVR-480K | 480,000 | 📋 Planned |
## LoRA Capabilities
This LoRA adapter enhances the base LTX-2 model for production video generation workflows:
- **Enhanced Complex Prompt Understanding**: Accurately interprets multi-object, multi-condition prompts with detailed spatial descriptions and temporal sequences, reducing prompt misinterpretation in production scenarios.
- **Improved Motion Dynamics**: Generates smooth, physically plausible object movements with natural acceleration, deceleration, and trajectory curves, avoiding robotic or unnatural motion patterns.
- **Temporal Consistency**: Maintains object appearance, lighting, and scene coherence throughout the video sequence, reducing flickering and frame-to-frame artifacts common in generated videos.
- **Precise Timing Control**: Enables accurate control over action duration, pacing, and synchronization between multiple moving elements based on prompt semantics.
- **Multi-Object Interaction**: Handles complex scenes with multiple objects interacting simultaneously, including collisions, following, avoiding, and coordinated movements.
- **Camera and Framing Stability**: Maintains consistent camera perspective and framing throughout the sequence, avoiding unwanted camera shake or unexpected viewpoint changes.
## Training Configuration
| Config | Value |
|--------|-------|
| Learning Rate | 1e-4 |
| Scheduler | Cosine |
| Gradient Accumulation | 16 steps |
| Gradient Clipping | 1.0 |
| Optimizer | AdamW |
## Evaluation Metrics

| Metric | Value |
|--------|-------|
| Training Steps | ~6,000 |
| Final Loss | ~0.008 |
| Loss Reduction | 44% (from 0.014 to 0.008) |
## Video Demo
### Training Progress Comparison
#### Step 0 (Base Model)
Initial model output.
#### Step 6000 (Fine-tuned)
After 6K steps of training.
## Dataset
This model is trained on the VBVR (Video Benchmark for Video Reasoning) dataset from [video-reason.com](https://video-reason.com/).
## Contact
For questions or suggestions, please open an issue on Hugging Face or contact the author directly.