Update README.md

3bbb172 verified 5 days ago

3.87 kB

license: other
license_name: ltx-2-community-license-agreement
license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE
language:
  - en
  - zh
library_name: diffusers
tags:
  - video-generation
  - video-reasoning
  - logical-reasoning
  - lora
  - ltx-2.3
base_model:
  - Lightricks/LTX-2.3

LTX-2 VBVR LoRA - Video Reasoning

LoRA fine-tuned weights for LTX-2.3 22B on the VBVR (A Very Big Video Reasoning Suite) dataset.

Training Data

To ensure training quality, we preprocessed the full 1,000,000 videos from the official dataset and randomly sample during training to maintain data diversity. We adopt the official parameters with batch_size=16 and rank=32 to prevent catastrophic forgetting caused by excessively large rank.

The VBVR dataset contains 200 reasoning task categories, with ~5,000 variants per task, totaling ~1M videos. Main task types include:

Object Trajectory: Objects moving to target positions
Physical Reasoning: Rolling balls, collisions, gravity
Causal Relationships: Conditional triggers, chain reactions
Spatial Relationships: Relative positions, path planning

Model Details

Item	Details
Base Model	ltx-2.3-22b-dev
Training Method	LoRA Fine-tuning
LoRA Rank	32
Effective Batch Size	16
Mixed Precision	BF16

TODO List

Dataset Release Plan

Dataset	Videos	Status
VBVR-96K	96,000	✅ Released
VBVR-240K	240,000	🔄 Processing
VBVR-480K	480,000	📋 Planned

LoRA Capabilities

This LoRA adapter enhances the base LTX-2 model for production video generation workflows:

Enhanced Complex Prompt Understanding: Accurately interprets multi-object, multi-condition prompts with detailed spatial descriptions and temporal sequences, reducing prompt misinterpretation in production scenarios.
Improved Motion Dynamics: Generates smooth, physically plausible object movements with natural acceleration, deceleration, and trajectory curves, avoiding robotic or unnatural motion patterns.
Temporal Consistency: Maintains object appearance, lighting, and scene coherence throughout the video sequence, reducing flickering and frame-to-frame artifacts common in generated videos.
Precise Timing Control: Enables accurate control over action duration, pacing, and synchronization between multiple moving elements based on prompt semantics.
Multi-Object Interaction: Handles complex scenes with multiple objects interacting simultaneously, including collisions, following, avoiding, and coordinated movements.
Camera and Framing Stability: Maintains consistent camera perspective and framing throughout the sequence, avoiding unwanted camera shake or unexpected viewpoint changes.

Training Configuration

Config	Value
Learning Rate	1e-4
Scheduler	Cosine
Gradient Accumulation	16 steps
Gradient Clipping	1.0
Optimizer	AdamW

Evaluation Metrics

Metric	Value
Training Steps	~6,000
Final Loss	~0.008
Loss Reduction	44% (from 0.014 to 0.008)

Video Demo

Training Progress Comparison

Step 0 (Base Model)

Initial model output.

Step 6000 (Fine-tuned)

After 6K steps of training.

Dataset

This model is trained on the VBVR (Video Benchmark for Video Reasoning) dataset from video-reason.com.

Contact

For questions or suggestions, please open an issue on Hugging Face or contact the author directly.