Tevior
/

pose2rot

pose-estimation

inverse-kinematics

arbitrary-skeleton

Model card Files Files and versions

pose2rot / README.md

Tevior's picture

Upload folder using huggingface_hub

1e44786 verified 9 days ago

|

History Blame Contribute Delete

2.94 kB

	---
	license: mit
	tags:
	- motion-capture
	- pose-estimation
	- inverse-kinematics
	- animation
	- arbitrary-skeleton
	library_name: pytorch
	---

	# pose2rot — Joint Positions → 6D Rotations for Arbitrary Skeletons

	Pretrained checkpoints for the pose2rot model (`Pose2RotMemoryRestModel`): given a sequence of
	3D joint positions, predict per-joint 6D rotations (forward kinematics then recovers the full
	skeletal animation). One model handles arbitrary skeletons across 72 animal species — quadrupeds,
	bipeds, birds, reptiles, dinosaurs, arthropods, limbless snakes — via T5 joint-name embeddings,
	skeleton graph attention, and rest-pose FiLM conditioning.

	Code, training recipe, eval & QA scripts: https://github.com/CHDTevior/pose2rot

	This is a derivative work of [MocapAnything](https://github.com/phongdaot/MocapAnything)
	(MIT, © 2026 Dao Thien Phong; arXiv:2604.28130 MoCapAnything V2). ~29.7M params.

	## Checkpoints

	\| file \| training data \| use case \|
	\|---\|---\|---\|
	\| `pose2rot_v9_alldata_epoch60.pt` \| all 72 species \| best for demos / inference (the species is seen) \|
	\| `pose2rot_v10_heldout_epoch60.pt` \| seen/rare/unseen held-out split (test motions excluded) \| the decisive paper model for honest cross-topology eval \|
	\| `pose2rot_v8b_best_epoch40.pt` \| all species (earlier converged best) \| reference \|

	Each `.pt` holds `{model_state, optimizer_state, epoch}`. Configs: `config_v9_alldata.yaml`,
	`config_v10_split_heldout.yaml` (model section instantiates `Pose2RotMemoryRestModel`).

	## Results (geodesic angle error, degrees; MoCapAnything V2 reports 6.54° unseen / V1 ~17°)

	\| model \| seen \| rare \| unseen \| overall \|
	\|---\|---\|---\|---\|---\|
	\| v9 all-data (oracle) \| 7.2° \| 5.9° \| 6.4° \| 6.53° ≈ MoCapAnything 6.54° \|
	\| v10 true held-out \| 9.8° \| 12.7° \| 40.9° \| 28.0° \|

	When the species is seen, the model matches SOTA (6.5°). On a true held-out test, cross-topology
	generalization is a ceiling: unseen species with close training relatives generalize partially (Goat 17°,
	Coyote 19°), topologically distinctive ones do not (Pigeon ~67°, Spider ~73°).

	## Usage

	```python
	import torch
	from utils.config_utils import load_yaml_config, instantiate_from_config # from the GitHub repo

	cfg = load_yaml_config("config_v9_alldata.yaml")
	model = instantiate_from_config(cfg["model"]).eval().cuda()
	model.load_state_dict(torch.load("pose2rot_v9_alldata_epoch60.pt", map_location="cpu")["model_state"])
	# batch dict: position[B,T,J,3] + rest pose + T5 joint embeddings + skeleton graph + reference (see GitHub data/loader_v2.py)
	pred_rot6d = model(batch)["pred_rot6d"] # [B,T,J,6]
	```

	See https://github.com/CHDTevior/pose2rot for the full data pipeline, training, and evaluation.

	## License & Citation

	MIT. Built on [MocapAnything](https://github.com/phongdaot/MocapAnything) (Dao Thien Phong, MIT).
	Please also cite MoCapAnything (arXiv:2604.28130).