HandX-diffusion / README.md
alexzhang598's picture
Add model card
029dfc7 verified
|
Raw
History Blame Contribute Delete
1.65 kB
---
license: cc-by-nc-4.0
tags:
- text-to-motion
- bimanual-hands
- diffusion
library_name: pytorch
---
# HandX โ€” Diffusion Text-to-Motion Checkpoints
Diffusion checkpoints for **HandX: Scaling Bimanual Motion and Interaction Generation** (CVPR 2026).
They generate two-hand motion from text (separate text branches for the left hand, right hand,
and their interaction), using an MDM-style diffusion model with a frozen T5-base text encoder.
- ๐Ÿ“„ Paper: https://arxiv.org/abs/2603.28766
- ๐Ÿ“ฆ Dataset: https://huggingface.co/datasets/alexzhang598/HandX
## Checkpoints
| Folder | Decoder layers | latent_dim |
|--------|----------------|------------|
| `layers4` | 4 | 256 |
| `layers8` | 8 | 512 |
| `layers12` | 12 | 512 (best model in the paper) |
Each folder has `model.pt` (weights) and `config.yaml`.
## Loading
```python
import torch
from huggingface_hub import hf_hub_download
from omegaconf import OmegaConf
# run from the `diffusion/` directory of the HandX repo
from src.diffusion.utils.model_utils import create_model_and_diffusion
variant = "layers12"
cfg = OmegaConf.load(hf_hub_download("alexzhang598/HandX-diffusion", f"{variant}/config.yaml"))
model, diffusion = create_model_and_diffusion(cfg.model)
sd = torch.load(hf_hub_download("alexzhang598/HandX-diffusion", f"{variant}/model.pt"),
map_location="cpu")["state_dict"]
model.load_state_dict(sd, strict=False) # missing keys are the frozen T5 encoder (loaded from t5-base)
```
The checkpoints load with a standard `load_state_dict(..., strict=False)`; the only missing keys are
the frozen T5 weights, restored from `t5-base` at construction.