InterGen โ€” Diffusion-based Multi-human Motion Generation

Two-person text-to-motion baseline integrated into the hftrainer Model Zoo. The reproduction is self-contained and independent of external source trees: InterGen's inference runtime lives in hftrainer.models.motion.intergen.network, and the checkpoint plus normalization stats live in checkpoints/intergen/hftrainer_interhuman.

Task Two-person Text-to-Motion
Bundle InterGenBundle
Processed HF artifact ZeyuLing/hftrainer-intergen-interhuman
Local artifact checkpoints/intergen/hftrainer_interhuman
Motion representation InterHuman native-262 per person, 30 fps
Text encoder CLIP ViT-L/14@336px (frozen)
Paper InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions, CVPR 2024 โ€” arXiv:2304.05684
Original code https://github.com/tr3e/intergen

Weights

Artifact Location Contents Status
InterGen InterHuman checkpoints/intergen/hftrainer_interhuman / ZeyuLing/hftrainer-intergen-interhuman intergen.ckpt, global_mean.npy, global_std.npy, intergen_config.json, generated README.md hftrainer inference artifact

Load from Hugging Face:

from hftrainer.models.motion.intergen import InterGenBundle

bundle = InterGenBundle.from_pretrained(
    "ZeyuLing/hftrainer-intergen-interhuman",
    device="cuda",
)
motion = bundle.generate(
    ["two people shake hands and then walk apart"],
    motion_len=120,
    seed=1234,
)  # (1, 120, 2, 262), denormalized InterHuman native-262

Use the local artifact in the same way:

from hftrainer.models.motion.intergen import InterGenBundle

bundle = InterGenBundle.from_pretrained(
    "checkpoints/intergen/hftrainer_interhuman",
    device="cuda",
)
motion = bundle.generate(
    ["one person walks toward another person"],
    motion_len=210,
    seed=123,
)  # (B, T, 2, 262), denormalized InterHuman native-262

The runtime never imports third_party/intergen, ref_repo, _vendor, or a copied upstream package path. InterGenBundle loads the native hftrainer network modules and the artifact stats directly.

The Hub checkpoint is an inference-only hftrainer artifact: intergen.ckpt contains the model state_dict and lightweight metadata, while optimizer, callback, scheduler, and PyTorch-Lightning loop states are removed.

Motion Representation

InterGen uses the InterHuman native-262 feature per person:

Slice Dim Meaning
joint positions 66 22 joints x xyz
joint velocities 66 22 joints x xyz velocity
local rotations 126 21 joints x 6D rotation
foot contacts 4 binary contact labels

InterGenBundle.generate returns (B, T, 2, 262) after de-normalization with the packaged InterGen training stats. Use hftrainer.motion.representation.interhuman262 for SMPL-X / joints conversion when needed.

For visualization, the model-zoo web viewer fits the 262 joint-position block to a body-only SMPL mesh. That mesh bridge is a viewer convenience; the canonical model output and evaluator input remain native InterHuman-262.

Evaluation

The official InterGen evaluation path is InterCLIP over native InterHuman-262 packs. In hftrainer this is:

python3 tools/eval_interclip_2p_native262.py \
  --gt outputs/evaluation/interhuman_gt_native262.npz \
  --pred InterGen=outputs/evaluation/intergen_native262.npz \
  --out-json outputs/evaluation/intergen_interclip262_metrics.json

Input packs are .npz files with m1, m2, lens, and texts arrays:

np.savez(path, m1=m1, m2=m2, lens=lens, texts=texts)

Verification

Parity with the original source tree was checked on a short deterministic sample:

Check Result
checkpoint load missing / unexpected 0 / 0
source vs hftrainer output max abs diff 0.0
source vs hftrainer output mean abs diff 0.0
artifact reload smoke Bundle.from_pretrained(...) + one text prompt

The parity run used the same checkpoint, prompt, seed, and the default ddim50 sampling strategy on a short sequence. The current viewer smoke was also checked on "two people shake hands and then walk apart" with motion_len=120, seed=1234; the generated InterHuman-262 output was fitted to SMPL for inspection.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for ZeyuLing/hftrainer-intergen-interhuman