alexzhang598
/

HandX-diffusion

Model card Files Files and versions

HandX-diffusion / README.md

alexzhang598's picture

Add model card

029dfc7 verified 14 days ago

|

History Blame Contribute Delete

1.65 kB

	---
	license: cc-by-nc-4.0
	tags:
	- text-to-motion
	- bimanual-hands
	- diffusion
	library_name: pytorch
	---

	# HandX — Diffusion Text-to-Motion Checkpoints

	Diffusion checkpoints for HandX: Scaling Bimanual Motion and Interaction Generation (CVPR 2026).
	They generate two-hand motion from text (separate text branches for the left hand, right hand,
	and their interaction), using an MDM-style diffusion model with a frozen T5-base text encoder.

	- 📄 Paper: https://arxiv.org/abs/2603.28766
	- 📦 Dataset: https://huggingface.co/datasets/alexzhang598/HandX

	## Checkpoints

	\| Folder \| Decoder layers \| latent_dim \|
	\|--------\|----------------\|------------\|
	\| `layers4` \| 4 \| 256 \|
	\| `layers8` \| 8 \| 512 \|
	\| `layers12` \| 12 \| 512 (best model in the paper) \|

	Each folder has `model.pt` (weights) and `config.yaml`.

	## Loading

	```python
	import torch
	from huggingface_hub import hf_hub_download
	from omegaconf import OmegaConf
	# run from the `diffusion/` directory of the HandX repo
	from src.diffusion.utils.model_utils import create_model_and_diffusion

	variant = "layers12"
	cfg = OmegaConf.load(hf_hub_download("alexzhang598/HandX-diffusion", f"{variant}/config.yaml"))
	model, diffusion = create_model_and_diffusion(cfg.model)
	sd = torch.load(hf_hub_download("alexzhang598/HandX-diffusion", f"{variant}/model.pt"),
	map_location="cpu")["state_dict"]
	model.load_state_dict(sd, strict=False) # missing keys are the frozen T5 encoder (loaded from t5-base)
	```

	The checkpoints load with a standard `load_state_dict(..., strict=False)`; the only missing keys are
	the frozen T5 weights, restored from `t5-base` at construction.