MUZED Motion LoRA β LTX-2 19B
Motion LoRA for contemporary dance video generation on LTX-2 19B. Trained on real dancer footage to capture fluid upper-body and full-body movement patterns.
Quick Start
# Using LTX-2 inference pipeline
python -m ltx_pipelines.ti2vid_one_stage \
--checkpoint-path /path/to/ltx-2-19b-dev.safetensors \
--gemma-root /path/to/gemma-text-encoder \
--lora /path/to/lora_weights_step_01500.safetensors 1.0 \
--prompt "DNCMOV upper body dance movement, fluid arm extension with gentle torso rotation, contemporary dance" \
--negative-prompt "worst quality, inconsistent motion, blurry, jittery, distorted" \
--width 1536 --height 864 --num-frames 161 --frame-rate 30 \
--num-inference-steps 50 --video-cfg-guidance-scale 4.0 \
--output-path output.mp4
With first-frame conditioning (I2V)
# Add --image flag for image-to-video
--image first_frame.png 0 1.0
Trigger Token
DNCMOV β must be included at the start of your prompt.
Training Details
| Parameter | Value |
|---|---|
| Base model | LTX-2 19B (ltx-2-19b-dev.safetensors) |
| Training mode | LoRA |
| Rank | 64 |
| Alpha | 64 |
| Target modules | attn1.* + attn2.* + ff.net.* (video-only) |
| Resolution | 1536Γ864Γ161 (27,216 tokens) |
| Frame rate | 30 fps |
| Dataset | 125 clips Γ 5.37s from 3 source videos |
| Steps | 1500 (ongoing to 2000) |
| Learning rate | 1e-4 (linear decay) |
| Batch size | 1 (grad accumulation 2) |
| Mixed precision | bf16 |
| Gradient checkpointing | enabled |
| First-frame conditioning | 50% (supports both T2V and I2V) |
| Hardware | 1ΓH200 (141GB) |
| Training time | |
| Final loss | 0.229 |
Target Modules (video-only attention + FFN)
attn1.to_k, attn1.to_q, attn1.to_v, attn1.to_out.0
attn2.to_k, attn2.to_q, attn2.to_v, attn2.to_out.0
ff.net.0.proj, ff.net.2
Dataset
125 clips at 1536Γ864, 161 frames (5.37s) each, 30fps. Source: 3 real dancer videos β 2 upper-body (canon), 1 full-body. Captions describe movement type, body parts, direction, and energy using dance vocabulary. Captioned with Gemini 2.0 Flash + Qwen2.5-Omni, manually curated.
Movement vocabulary: flowing, sharp, rhythmic, sustained, percussive, lateral, circular, contracting, expanding, meditative, explosive.
Checkpoints
| File | Step | Loss | Notes |
|---|---|---|---|
lora_weights_step_01400.safetensors |
1400 | ~0.23 | |
lora_weights_step_01500.safetensors |
1500 | 0.229 | End of first run |
More checkpoints (1600-2000) will be added as training continues.
Pipeline
Raw video β Slice (161f) β Caption (Gemini) β Curate β VAE encode β Train LoRA
β
Generate (T2V / I2V) β Load LoRA + base model
Recommended Inference Settings
| Parameter | Value |
|---|---|
| Resolution | 1536Γ864Γ161 (match training) |
| Guidance (CFG) | 3.0β5.0 |
| STG scale | 1.0 |
| Inference steps | 40β50 |
| LoRA strength | 0.8β1.0 |
| Frame rate | 30 fps |
| Quantization | fp8-cast (for GPUs < 80GB) |
License
LTX2 Community License
Model tree for KoshiMazaki/muzed-motion-lora
Base model
Lightricks/LTX-2