MUZED Motion LoRA — LTX-2 19B

Motion LoRA for contemporary dance video generation on LTX-2 19B. Trained on real dancer footage to capture fluid upper-body and full-body movement patterns.

Quick Start

# Using LTX-2 inference pipeline
python -m ltx_pipelines.ti2vid_one_stage \
  --checkpoint-path /path/to/ltx-2-19b-dev.safetensors \
  --gemma-root /path/to/gemma-text-encoder \
  --lora /path/to/lora_weights_step_01500.safetensors 1.0 \
  --prompt "DNCMOV upper body dance movement, fluid arm extension with gentle torso rotation, contemporary dance" \
  --negative-prompt "worst quality, inconsistent motion, blurry, jittery, distorted" \
  --width 1536 --height 864 --num-frames 161 --frame-rate 30 \
  --num-inference-steps 50 --video-cfg-guidance-scale 4.0 \
  --output-path output.mp4

With first-frame conditioning (I2V)

# Add --image flag for image-to-video
--image first_frame.png 0 1.0

Trigger Token

DNCMOV — must be included at the start of your prompt.

Training Details

Parameter	Value
Base model	LTX-2 19B (`ltx-2-19b-dev.safetensors`)
Training mode	LoRA
Rank	64
Alpha	64
Target modules	`attn1.` + `attn2.` + `ff.net.*` (video-only)
Resolution	1536×864×161 (27,216 tokens)
Frame rate	30 fps
Dataset	125 clips × 5.37s from 3 source videos
Steps	1500 (ongoing to 2000)
Learning rate	1e-4 (linear decay)
Batch size	1 (grad accumulation 2)
Mixed precision	bf16
Gradient checkpointing	enabled
First-frame conditioning	50% (supports both T2V and I2V)
Hardware	1×H200 (141GB)
Training time	~~12.4h for 1500 steps (~~29.8s/step)
Final loss	0.229

Target Modules (video-only attention + FFN)

attn1.to_k, attn1.to_q, attn1.to_v, attn1.to_out.0
attn2.to_k, attn2.to_q, attn2.to_v, attn2.to_out.0
ff.net.0.proj, ff.net.2

Dataset

125 clips at 1536×864, 161 frames (5.37s) each, 30fps. Source: 3 real dancer videos — 2 upper-body (canon), 1 full-body. Captions describe movement type, body parts, direction, and energy using dance vocabulary. Captioned with Gemini 2.0 Flash + Qwen2.5-Omni, manually curated.

Movement vocabulary: flowing, sharp, rhythmic, sustained, percussive, lateral, circular, contracting, expanding, meditative, explosive.

Checkpoints

File	Step	Loss	Notes
`lora_weights_step_01400.safetensors`	1400	~0.23
`lora_weights_step_01500.safetensors`	1500	0.229	End of first run

More checkpoints (1600-2000) will be added as training continues.

Pipeline

Raw video → Slice (161f) → Caption (Gemini) → Curate → VAE encode → Train LoRA
                                                                         ↓
                                          Generate (T2V / I2V) ← Load LoRA + base model

Recommended Inference Settings

Parameter	Value
Resolution	1536×864×161 (match training)
Guidance (CFG)	3.0–5.0
STG scale	1.0
Inference steps	40–50
LoRA strength	0.8–1.0
Frame rate	30 fps
Quantization	fp8-cast (for GPUs < 80GB)

License

LTX2 Community License

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for KoshiMazaki/muzed-motion-lora

Base model

Lightricks/LTX-2

Adapter

(53)

this model

KoshiMazaki
/

muzed-motion-lora