| --- |
| license: cc-by-nc-4.0 |
| tags: |
| - text-to-motion |
| - bimanual-hands |
| - diffusion |
| library_name: pytorch |
| --- |
| |
| # HandX โ Diffusion Text-to-Motion Checkpoints |
|
|
| Diffusion checkpoints for **HandX: Scaling Bimanual Motion and Interaction Generation** (CVPR 2026). |
| They generate two-hand motion from text (separate text branches for the left hand, right hand, |
| and their interaction), using an MDM-style diffusion model with a frozen T5-base text encoder. |
|
|
| - ๐ Paper: https://arxiv.org/abs/2603.28766 |
| - ๐ฆ Dataset: https://huggingface.co/datasets/alexzhang598/HandX |
|
|
| ## Checkpoints |
|
|
| | Folder | Decoder layers | latent_dim | |
| |--------|----------------|------------| |
| | `layers4` | 4 | 256 | |
| | `layers8` | 8 | 512 | |
| | `layers12` | 12 | 512 (best model in the paper) | |
| |
| Each folder has `model.pt` (weights) and `config.yaml`. |
| |
| ## Loading |
| |
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
| from omegaconf import OmegaConf |
| # run from the `diffusion/` directory of the HandX repo |
| from src.diffusion.utils.model_utils import create_model_and_diffusion |
|
|
| variant = "layers12" |
| cfg = OmegaConf.load(hf_hub_download("alexzhang598/HandX-diffusion", f"{variant}/config.yaml")) |
| model, diffusion = create_model_and_diffusion(cfg.model) |
| sd = torch.load(hf_hub_download("alexzhang598/HandX-diffusion", f"{variant}/model.pt"), |
| map_location="cpu")["state_dict"] |
| model.load_state_dict(sd, strict=False) # missing keys are the frozen T5 encoder (loaded from t5-base) |
| ``` |
| |
| The checkpoints load with a standard `load_state_dict(..., strict=False)`; the only missing keys are |
| the frozen T5 weights, restored from `t5-base` at construction. |
| |