MUZED Motion LoRA β€” LTX-2 19B

Motion LoRA for contemporary dance video generation on LTX-2 19B. Trained on real dancer footage to capture fluid upper-body and full-body movement patterns.

Quick Start

# Using LTX-2 inference pipeline
python -m ltx_pipelines.ti2vid_one_stage \
  --checkpoint-path /path/to/ltx-2-19b-dev.safetensors \
  --gemma-root /path/to/gemma-text-encoder \
  --lora /path/to/lora_weights_step_01500.safetensors 1.0 \
  --prompt "DNCMOV upper body dance movement, fluid arm extension with gentle torso rotation, contemporary dance" \
  --negative-prompt "worst quality, inconsistent motion, blurry, jittery, distorted" \
  --width 1536 --height 864 --num-frames 161 --frame-rate 30 \
  --num-inference-steps 50 --video-cfg-guidance-scale 4.0 \
  --output-path output.mp4

With first-frame conditioning (I2V)

# Add --image flag for image-to-video
--image first_frame.png 0 1.0

Trigger Token

DNCMOV β€” must be included at the start of your prompt.

Training Details

Parameter Value
Base model LTX-2 19B (ltx-2-19b-dev.safetensors)
Training mode LoRA
Rank 64
Alpha 64
Target modules attn1.* + attn2.* + ff.net.* (video-only)
Resolution 1536Γ—864Γ—161 (27,216 tokens)
Frame rate 30 fps
Dataset 125 clips Γ— 5.37s from 3 source videos
Steps 1500 (ongoing to 2000)
Learning rate 1e-4 (linear decay)
Batch size 1 (grad accumulation 2)
Mixed precision bf16
Gradient checkpointing enabled
First-frame conditioning 50% (supports both T2V and I2V)
Hardware 1Γ—H200 (141GB)
Training time 12.4h for 1500 steps (29.8s/step)
Final loss 0.229

Target Modules (video-only attention + FFN)

attn1.to_k, attn1.to_q, attn1.to_v, attn1.to_out.0
attn2.to_k, attn2.to_q, attn2.to_v, attn2.to_out.0
ff.net.0.proj, ff.net.2

Dataset

125 clips at 1536Γ—864, 161 frames (5.37s) each, 30fps. Source: 3 real dancer videos β€” 2 upper-body (canon), 1 full-body. Captions describe movement type, body parts, direction, and energy using dance vocabulary. Captioned with Gemini 2.0 Flash + Qwen2.5-Omni, manually curated.

Movement vocabulary: flowing, sharp, rhythmic, sustained, percussive, lateral, circular, contracting, expanding, meditative, explosive.

Checkpoints

File Step Loss Notes
lora_weights_step_01400.safetensors 1400 ~0.23
lora_weights_step_01500.safetensors 1500 0.229 End of first run

More checkpoints (1600-2000) will be added as training continues.

Pipeline

Raw video β†’ Slice (161f) β†’ Caption (Gemini) β†’ Curate β†’ VAE encode β†’ Train LoRA
                                                                         ↓
                                          Generate (T2V / I2V) ← Load LoRA + base model

Recommended Inference Settings

Parameter Value
Resolution 1536Γ—864Γ—161 (match training)
Guidance (CFG) 3.0–5.0
STG scale 1.0
Inference steps 40–50
LoRA strength 0.8–1.0
Frame rate 30 fps
Quantization fp8-cast (for GPUs < 80GB)

License

LTX2 Community License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW

This task can take several minutes

Model tree for KoshiMazaki/muzed-motion-lora

Base model

Lightricks/LTX-2
Adapter
(49)
this model