MotionLCM - Latent Consistency Model for Human Motion Generation

Text-to-motion baseline integrated into the hftrainer Model Zoo. Our reproduction vendors the MLD motion VAE, the latent consistency denoiser, the LCM scheduler wiring, and the SentenceT5 text wrapper into hftrainer.models.motion.motionlcm._motionlcm, so inference no longer imports the upstream repository at runtime.

Task Text-to-Motion (T2M)
Bundle / Pipeline MotionLCMBundle / MotionLCMPipeline
Motion representation HumanML3D-263 (263-dim, 20 fps, 22 joints)
Backbone MLD VAE + latent consistency denoiser, default 1 LCM step
Text encoder SentenceT5-Large (sentence-transformers/sentence-t5-large, frozen)
Paper MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model, Dai et al., ECCV 2024
Original code https://github.com/Dai-Wenxun/MotionLCM

Weights

Current hftrainer artifact:

Artifact Location Contents Status
MotionLCM HumanML3D ZeyuLing/hftrainer-motionlcm-humanml3d / checkpoints/motionlcm/humanml3d vae.safetensors + denoiser.safetensors + motionlcm_config.json + Mean.npy / Std.npy uploaded hftrainer artifact; v1 benchmark checkpoint

The local artifact reloads through the same from_pretrained surface as the published model-zoo checkpoints:

from hftrainer.pipelines.motionlcm import MotionLCMPipeline

pipe = MotionLCMPipeline.from_pretrained(
    "ZeyuLing/hftrainer-motionlcm-humanml3d",
    device="cuda",
)
motions = pipe.infer_t2m(
    ["a person walks forward then sits down"],
    [120],
    num_inference_steps=1,
)

Package the artifact from upstream checkpoints:

python3 scripts/eval/convert_motionlcm_checkpoint.py \
    --vae_ckpt ref_repo/MotionLCM/experiments_t2m/mld_humanml/mld_humanml_v1.ckpt \
    --denoiser_ckpt ref_repo/MotionLCM/experiments_t2m/motionlcm_humanml/motionlcm_humanml_v1.ckpt \
    --out_dir checkpoints/motionlcm/humanml3d \
    --verify

The frozen SentenceT5-Large encoder is resolved by name rather than duplicated inside the artifact. For fully offline use, snapshot the text encoder into the local Hugging Face cache before calling from_pretrained.

The published artifact uses the upstream v1 benchmark checkpoints: mld_humanml_v1.ckpt and motionlcm_humanml_v1.ckpt. The non-v1 files in the same upstream folder are a different latent-shape family and produced collapsed dynamic features in the hftrainer evaluator (FID=44.24, Diversity=5.70 on HML3D-263); do not use those numbers as model-card metrics.


Motion representation

HumanML3D-263, the standard redundant T2M feature (Guo et al.), 20 fps, 22-joint SMPL skeleton. Per frame (263 dims):

Slice Dim Meaning
root_rot_vel 1 root angular velocity (about Y)
root_lin_vel 2 root linear velocity (XZ plane)
root_y 1 root height
ric_data 63 local joint positions (21x3)
rot_data 126 local joint rotations (21x6, cont. 6D)
local_vel 66 local joint velocities (22x3)
foot_contact 4 binary foot-contact labels

MotionLCM samples in the MLD latent space and decodes directly back to HumanML3D-263. Convert to SMPL or MotionStreamer-272 with hftrainer.motion.representation.convert when cross-model comparison requires another evaluator space.


Evaluation

Generation follows the official HumanML3D protocol: standard test split, native 263-dim @ 20 fps, first caption, and one prediction per test id.

# 1) generate
python3 scripts/eval/motionlcm_t2m_h3d263.py \
    --data_root ref_repo/CondMDI/dataset/HumanML3D \
    --model_path checkpoints/motionlcm/humanml3d \
    --num_inference_steps 1 \
    --out_dir outputs/evaluation/motionlcm_h3d263_official/motionlcm_263

# 2) score with the HumanML3D-263 evaluator
python3 scripts/eval/verify_evaluators.py --which hml263 \
    --hml263-pred outputs/evaluation/motionlcm_h3d263_official/motionlcm_263

Report the LCM step count (--num_inference_steps) alongside any metrics. The model-zoo table should use metrics copied from the generated evaluator JSON, not handwritten values.

HumanML3D-263 evaluator (fixed v1 artifact, n=3970)

Metric JSON: outputs/evaluation/motionlcm_hml3d_v1_fixed_20260617/metrics/verify_hml263.json.

Metric hftrainer
FID โ†“ 0.2921
R-Precision Top-1 / 2 / 3 โ†‘ 0.4958 / 0.6906 / 0.7883
Diversity โ†’ 9.5662
MM-Dist โ†“ 3.0813
GT(real) R-Precision Top-1 / 2 / 3 0.5135 / 0.7108 / 0.8069
GT(real) Diversity / MM-Dist 9.4527 / 2.9323

The debug sanity check that made the previous metrics untrusted was feature distribution collapse: the old non-v1 artifact produced root/velocity features with order-of-magnitude smaller variance and almost-always-on foot contacts. The fixed v1 artifact restores root velocity, local velocity, root height and foot-contact statistics close to the HumanML3D test distribution before metric evaluation.

MotionStreamer-272 evaluator (cross-representation, n=7392)

Metric JSON: outputs/evaluation/motionlcm_hml3d_v1_fixed_ms272_ik8_20260617/metrics/verify_ms272.json.

Metric MotionLCM HML263โ†’SMPL135โ†’MS272 MS272 GT(real)
FID โ†“ 149.9622 0.0
R-Precision Top-1 / 2 / 3 โ†‘ 0.4428 / 0.6059 / 0.6904 0.7059 / 0.8569 / 0.9106
Diversity โ†’ 24.7223 27.2813
MM-Dist โ†“ 20.3028 15.0066

As with other native HML263 baselines, the MS272 row includes a representation bridge (HML263 -> SMPL motion_135 via IK refine-80 -> MotionStreamer-272) and should be interpreted as a cross-representation diagnostic, not a native MotionLCM paper number.


Implementation notes

  • Vendored, ref_repo-independent: hftrainer/models/motion/motionlcm/ holds the MLD VAE, latent denoiser, text wrapper, scheduler config, and generation helper with package-local imports.
  • Checkpoint architecture is inferred from raw weights: upstream releases include both one-token v1 and sixteen-token checkpoint families; raw loading reads vae.global_motion_token / vae.latent_pre.weight so the artifact is built with the matching latent shape.
  • Sub-modules: vae + denoiser + scheduler; the default generation path uses distilled classifier-free guidance folded into the timestep conditioning.
  • Normalization travels with the checkpoint: Mean.npy / Std.npy are the HumanML3D training stats embedded in the artifact.
  • Text encoder: SentenceT5-Large is frozen and currently resolved by name; keep this explicit in any published Hub card.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support