| --- |
| license: mit |
| tags: |
| - motion-capture |
| - pose-estimation |
| - inverse-kinematics |
| - animation |
| - arbitrary-skeleton |
| library_name: pytorch |
| --- |
| |
| # pose2rot — Joint Positions → 6D Rotations for Arbitrary Skeletons |
|
|
| Pretrained checkpoints for the **pose2rot** model (`Pose2RotMemoryRestModel`): given a sequence of |
| 3D joint **positions**, predict per-joint **6D rotations** (forward kinematics then recovers the full |
| skeletal animation). One model handles **arbitrary skeletons** across 72 animal species — quadrupeds, |
| bipeds, birds, reptiles, dinosaurs, arthropods, limbless snakes — via T5 joint-name embeddings, |
| skeleton graph attention, and rest-pose FiLM conditioning. |
|
|
| **Code, training recipe, eval & QA scripts:** https://github.com/CHDTevior/pose2rot |
|
|
| This is a derivative work of [MocapAnything](https://github.com/phongdaot/MocapAnything) |
| (MIT, © 2026 Dao Thien Phong; arXiv:2604.28130 MoCapAnything V2). ~29.7M params. |
|
|
| ## Checkpoints |
|
|
| | file | training data | use case | |
| |---|---|---| |
| | `pose2rot_v9_alldata_epoch60.pt` | all 72 species | **best for demos / inference** (the species is seen) | |
| | `pose2rot_v10_heldout_epoch60.pt` | seen/rare/unseen held-out split (test motions excluded) | the **decisive paper model** for honest cross-topology eval | |
| | `pose2rot_v8b_best_epoch40.pt` | all species (earlier converged best) | reference | |
|
|
| Each `.pt` holds `{model_state, optimizer_state, epoch}`. Configs: `config_v9_alldata.yaml`, |
| `config_v10_split_heldout.yaml` (model section instantiates `Pose2RotMemoryRestModel`). |
|
|
| ## Results (geodesic angle error, degrees; MoCapAnything V2 reports 6.54° unseen / V1 ~17°) |
|
|
| | model | seen | rare | unseen | overall | |
| |---|---|---|---|---| |
| | v9 all-data (oracle) | 7.2° | 5.9° | 6.4° | **6.53°** ≈ MoCapAnything 6.54° | |
| | v10 true held-out | 9.8° | 12.7° | 40.9° | 28.0° | |
|
|
| When the species is **seen**, the model matches SOTA (6.5°). On a **true held-out** test, cross-topology |
| generalization is a ceiling: unseen species with close training relatives generalize partially (Goat 17°, |
| Coyote 19°), topologically distinctive ones do not (Pigeon ~67°, Spider ~73°). |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from utils.config_utils import load_yaml_config, instantiate_from_config # from the GitHub repo |
| |
| cfg = load_yaml_config("config_v9_alldata.yaml") |
| model = instantiate_from_config(cfg["model"]).eval().cuda() |
| model.load_state_dict(torch.load("pose2rot_v9_alldata_epoch60.pt", map_location="cpu")["model_state"]) |
| # batch dict: position[B,T,J,3] + rest pose + T5 joint embeddings + skeleton graph + reference (see GitHub data/loader_v2.py) |
| pred_rot6d = model(batch)["pred_rot6d"] # [B,T,J,6] |
| ``` |
|
|
| See https://github.com/CHDTevior/pose2rot for the full data pipeline, training, and evaluation. |
|
|
| ## License & Citation |
|
|
| MIT. Built on [MocapAnything](https://github.com/phongdaot/MocapAnything) (Dao Thien Phong, MIT). |
| Please also cite MoCapAnything (arXiv:2604.28130). |
|
|