| --- |
| license: apache-2.0 |
| tags: |
| - robotics |
| - grasping |
| - graph-neural-network |
| - point-cloud |
| - contact-prediction |
| - morphology |
| - cross-attention |
| - cvae |
| - coordinate-space |
| --- |
| |
| # Graspmax — GeoMatch v2 · GeoMatch++ · GeoMatch v1 · RobotFingerPrint |
|
|
| **Graspmax** contains geometry-aware contact/coordinate prediction models for dexterous robotic |
| grasping, trained on the CMapDataset / GenDexGrasp dataset across 5 robot end-effectors |
| (EZGripper, Barrett, Robotiq 3-Finger, Allegro, ShadowHand). |
|
|
| > **⚠️ Version notice:** GeoMatch v1 and GeoMatch++ were trained with a corrupted |
| > `robot_keypoints.json` (2× scale factor and wrong shadowhand axis-swap stage). |
| > **Use GeoMatch v2 for any new work.** v1 and GeoMatch++ are kept for reproducibility only. |
| |
| --- |
| |
| ## Models at a Glance |
| |
| | Model | Status | Folder | Val loss | Val acc | |
| |---|---|---|---|---| |
| | **GeoMatch v2** | ✅ **Recommended** | `geomatch_v2/` | **1.594** | **0.695** | |
| | GeoMatch++ | ⚠️ Deprecated (built on v1 encoders) | `geomatch_pp/` | 0.350 | 0.940 | |
| | GeoMatch v1 | ⚠️ Deprecated (corrupted keypoints) | `geomatch_v1/` | 0.435 | 0.959 | |
| | **RobotFingerPrint** | ✅ Paper reproduction | `robotfingerprint/` | see below | — | |
|
|
| > The lower loss/higher accuracy of v1 and GeoMatch++ are an artefact of training on corrupted |
| > keypoints — the 2× scale inflated keypoint distances making the contact maps geometrically |
| > trivial to predict. v2 trains on correct geometry and is the only model that produces valid |
| > IK targets during grasp generation. |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ### GeoMatch (v1 and v2 share the same architecture) |
|
|
| Dual GCN encoder (object + robot surface) → L2-normalised embeddings → linear projection heads |
| (512→64) × 2 → 5 autoregressive MLP modules → per-keypoint BCE contact map prediction. |
|
|
| > Based on: [*Geometry Matching for Multi-Embodiment Grasping*](https://openreview.net/forum?id=oyWkrG-LD5) (NeurIPS 2024) |
|
|
| ### GeoMatch++ |
|
|
| Extends GeoMatch with a **morphology encoder** (GCN over the robot kinematic-tree graph, 9D node |
| features, 32 nodes) and a **DCP-style cross-attention transformer** that fuses object geometry with |
| robot morphology before contact prediction. Pretrained GeoMatch v1 encoders are frozen. |
|
|
| > Based on: [*GeoMatch++: Morphology-Aware Grasping via Correspondence Learning*](https://arxiv.org/abs/2412.18998) |
|
|
| ### RobotFingerPrint |
|
|
| A conditional VAE (`GcsCVAE`) that predicts a per-point **Unified Gripper Coordinate Space (UGCS)** |
| 2D coordinate map over an object's point cloud, conditioned on the object geometry. Unlike GeoMatch's |
| discrete per-keypoint contact classification, RobotFingerPrint regresses a continuous (u, v) |
| coordinate for every object point, which is what allows a single trained model to transfer grasps |
| across grippers with a *different number of fingers* without any manual re-targeting. |
|
|
| - **Encoder**: PointNet-style per-point Conv1d stack over `(object_pc, gt_uv)` → max-pool global |
| feature → linear heads to VAE latent mean/logvar (`encoder_layers_size=[5, 64, 128, 512, 512]`, |
| `latent_size=128`) |
| - **Decoder**: per-point features + global object feature + latent code → Conv1d stack |
| (`decoder_decoder_layers_size=[64+512+128, 512, 64]`) → two parallel U/V prediction heads |
| (`uv_layers_size=[64, 32, 1]`) |
| - **Loss**: reconstruction (weighted L2 on predicted vs. ground-truth UV coordinates) + annealed |
| KL divergence (weight increased every `ann_per_epochs` epochs following a temperature schedule) |
|
|
| > Based on: [*RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis*](https://arxiv.org/abs/2409.14519) (Khargonkar, Casas, Prabhakaran, Xiang) |
|
|
| ### Component Comparison (GeoMatch family) |
|
|
| | Component | GeoMatch v1 / v2 | GeoMatch++ | |
| |---|---|---| |
| | Object GCN encoder | 3 layers × 256 → 512, trainable | Same, **frozen** (from GeoMatch v1) | |
| | Robot surface GCN | 3 layers × 256 → 512, trainable | Same, **frozen** (from GeoMatch v1) | |
| | Morphology encoder | — | **NEW** GCN(9 → 256×3 → 512), trainable | |
| | Cross-attention | — | **NEW** DCP transformer (512-dim, 4 heads, 1 layer) | |
| | Projection heads | Linear(512→64) × 2 | Same, re-initialised | |
| | AR keypoint modules | 5× MLP | Same, re-initialised | |
| | **Total params** | **~1.9M** | **~6.4M (~5.8M trainable)** | |
|
|
| --- |
|
|
| ## What Changed in v2 (Keypoint Bug Fix) |
|
|
| GeoMatch requires a `robot_keypoints.json` that defines canonical 3D keypoint positions for each |
| robot in rest-pose world space. The v1 keypoints had two bugs: |
|
|
| **Bug 1 — 2× scale factor:** The generation script applied `world_pos *= 2.0`, citing |
| `HandModel`'s `hand_scale=2.0` class default. However, every actual call site passes |
| `hand_scale=1.0`, overriding that default. Because the scale was applied before the inverse-FK |
| projection that `HandModel.get_canonical_keypoints()` uses (`T⁻¹[2p;1] ≠ 2·T⁻¹[p;1]`), the |
| distortion was not uniform — it grew with each link's distance from the kinematic root, corrupting |
| both training labels and inference IK targets. |
|
|
| **Bug 2 — ShadowHand axis-swap at wrong stage:** The `[x, -z, y]` axis permutation for ShadowHand |
| was applied to the final world-space `world_pos` (after FK). The reference implementation |
| (`gripper_utils.py`) applies it to raw mesh points in link-local space *before* the visual-origin |
| transform. Rotation and axis permutation do not commute, so the wrong stage produced scrambled |
| keypoint positions for any ShadowHand link with a non-zero visual-origin rotation. |
|
|
| Both bugs were confirmed by comparing `generate_keypoints_json.py` against `gripper_utils.py` and |
| verified by observing that v1 ShadowHand tip keypoints had y-values of ~−0.84 m (outside any |
| physical hand envelope) versus the corrected ~0.01 m. |
|
|
| --- |
|
|
| ## Training Details |
|
|
| ### GeoMatch v2 ✅ (Recommended) |
|
|
| | Setting | Value | |
| |---|---| |
| | Dataset | CMapDataset (ContactDB + YCB), fixed keypoints | |
| | End-effectors | EZGripper, Barrett, Robotiq 3-Finger, Allegro, ShadowHand | |
| | Batch size | 256 | |
| | Optimizer | Adam (β₁=0.9, β₂=0.99) | |
| | Learning rate | 1e-4 | |
| | Epochs | 200 | |
| | Hardware | AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4 | |
| | Training time | 8.58 hours | |
| | Precision | FP32 | |
| | **Final val loss** | **1.594** | |
| | **Final val accuracy** | **0.695** | |
|
|
| #### GeoMatch v2 Training Curves |
|
|
| | Epoch | Val Loss | Val Accuracy | |
| |---|---|---| |
| | 0 | 1.935 | 0.205 | |
| | 25 | 1.731 | 0.563 | |
| | 50 | 1.675 | 0.580 | |
| | 100 | 1.649 | 0.632 | |
| | 150 | 1.603 | 0.656 | |
| | 199 | 1.594 | 0.695 | |
|
|
| --- |
|
|
| ### GeoMatch++ ⚠️ (Deprecated — built on GeoMatch v1 encoders) |
|
|
| | Setting | Value | |
| |---|---| |
| | Initialisation | Pretrained **GeoMatch v1** encoders (frozen) | |
| | Trainable params | ~5.8M | |
| | Batch size | 32 per GPU × 8 GPUs = 256 effective | |
| | Optimizer | Adam (β₁=0.9, β₂=0.99) | |
| | Learning rate | 5e-5 | |
| | Epochs | 150 | |
| | Hardware | 8× AMD Instinct MI300X, ROCm 6.2.4 (DDP) | |
| | Training time | ~2.8 hours | |
| | Precision | FP32 | |
| | **Final val loss** | **0.350** *(artefact of corrupted training data)* | |
| | **Final val accuracy** | **0.940** *(artefact of corrupted training data)* | |
|
|
| #### GeoMatch++ Training Curves |
|
|
| | Epoch | Val Loss | Val Accuracy | |
| |---|---|---| |
| | 0 | 0.465 | 0.999 | |
| | 25 | 0.370 | 0.880 | |
| | 89 | 0.362 | 0.902 | |
| | 149 | 0.350 | 0.940 | |
|
|
| --- |
|
|
| ### GeoMatch v1 ⚠️ (Deprecated — corrupted keypoints) |
|
|
| | Setting | Value | |
| |---|---| |
| | Dataset | CMapDataset (ContactDB + YCB), **corrupted keypoints** | |
| | Batch size | 256 | |
| | Optimizer | Adam (β₁=0.9, β₂=0.99) | |
| | Learning rate | 1e-4 | |
| | Epochs | 200 | |
| | Hardware | AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4 | |
| | Training time | 22.18 hours | |
| | Precision | FP32 | |
| | **Final val loss** | **0.435** *(artefact of corrupted training data)* | |
| | **Final val accuracy** | **0.959** *(artefact of corrupted training data)* | |
|
|
| --- |
|
|
| ### RobotFingerPrint ✅ (Paper reproduction — 4 experiments) |
|
|
| All 4 experiments use the exact recipe published by the paper's authors (this repo's own README): |
| `--n_epochs 16 --ann_temp 1.5 --ann_per_epochs 2`, plus code defaults `lr=1e-4`, `batch_size=64`, |
| `lw_recon=1000.0`, `lw_kld=0.01`, `attn_alpha=3`, Adam(β₁=0.9, β₂=0.999), seed=42. |
|
|
| | Setting | Value | |
| |---|---| |
| | Dataset | GenDexGrasp CMapDataset-sqrt_align + RobotFingerPrint UGCS coordinates | |
| | Batch size | 64 | |
| | Optimizer | Adam (β₁=0.9, β₂=0.999) | |
| | Learning rate | 1e-4 (StepLR decay disabled — `decay_lr_freq=1000` > total epochs) | |
| | Epochs | 16 | |
| | KL annealing | temperature 1.5, weight increased every 2 epochs | |
| | Hardware | AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4, single GPU per run | |
| | Precision | FP32 | |
| |
| | Experiment | Seen grippers | Held-out (unseen) | Training time | Final val recon loss | Final val KLD | Final val overall | |
| |---|---|---|---|---|---|---| |
| | `fullrobots/` | ezgripper, barrett, robotiq_3finger, allegro, shadowhand | none (main paper result) | 449 s | **0.2495** | 112.57 | 268.72 | |
| | `unseen_barrett/` | ezgripper, robotiq_3finger, allegro, shadowhand | barrett | 366 s | **0.2568** | 105.99 | 274.90 | |
| | `unseen_ezgripper/` | barrett, robotiq_3finger, allegro, shadowhand | ezgripper | 383 s | **0.2791** | 109.36 | 297.77 | |
| | `unseen_shadowhand/` | ezgripper, barrett, robotiq_3finger, allegro | shadowhand | 366 s | **0.2590** | 111.11 | 278.02 | |
| |
| > "Val recon loss" is the reconstruction term of the CVAE loss (weighted L2 between predicted and |
| > ground-truth UGCS coordinates) — lower is better. The 3 unseen-gripper runs measure how well the |
| > model's learned coordinate space generalizes to a gripper never seen during training (evaluated |
| > only on the 4 remaining seen grippers' validation split, same as the paper's ablation setup — a |
| > full cross-gripper zero-shot transfer evaluation requires the downstream grasp generation + |
| > IsaacGym stability test pipeline, not covered by this checkpoint alone). |
| |
| --- |
| |
| ## Checkpoints |
| |
| ### GeoMatch v2 ✅ (Use these) |
| |
| | File | Epoch | Val Loss | Notes | |
| |---|---|---|---| |
| | `geomatch_v2/checkpoint_epoch50.pth` | 50 | 1.675 | Early convergence | |
| | `geomatch_v2/checkpoint_epoch100.pth` | 100 | 1.649 | Mid-training | |
| | `geomatch_v2/checkpoint_epoch150.pth` | 150 | 1.603 | Near-converged | |
| | `geomatch_v2/final.pth` | 199 | 1.594 | **Final model (recommended)** | |
|
|
| ### GeoMatch++ ⚠️ (Deprecated) |
|
|
| | File | Epoch | Notes | |
| |---|---|---| |
| | `geomatch_pp/checkpoint_epoch50.pth` | 50 | Early convergence | |
| | `geomatch_pp/checkpoint_epoch100.pth` | 100 | Mid-training | |
| | `geomatch_pp/checkpoint_epoch140.pth` | 140 | Near-converged | |
| | `geomatch_pp/final.pth` | 149 | Final (deprecated) | |
|
|
| ### GeoMatch v1 ⚠️ (Deprecated) |
|
|
| | File | Epoch | Notes | |
| |---|---|---| |
| | `geomatch_v1/checkpoint_epoch50.pth` | 50 | Early convergence | |
| | `geomatch_v1/checkpoint_epoch100.pth` | 100 | Mid-training | |
| | `geomatch_v1/checkpoint_epoch150.pth` | 150 | Near-converged | |
| | `geomatch_v1/final.pth` | 200 | Final (deprecated) | |
|
|
| ### RobotFingerPrint ✅ (final checkpoint only per experiment) |
|
|
| | File | Experiment | Notes | |
| |---|---|---| |
| | `robotfingerprint/fullrobots/final.ckpt` | All 5 grippers seen | **Main paper result — recommended** | |
| | `robotfingerprint/unseen_barrett/final.ckpt` | Barrett held out | Generalization ablation | |
| | `robotfingerprint/unseen_ezgripper/final.ckpt` | EZGripper held out | Generalization ablation | |
| | `robotfingerprint/unseen_shadowhand/final.ckpt` | ShadowHand held out | Generalization ablation | |
|
|
| > These are full PyTorch Lightning checkpoints (model weights + hyperparameters, no optimizer/epoch |
| > intermediates — only the last training epoch of each run is kept). Only the final epoch is |
| > published; intermediate per-epoch checkpoints are not included here. |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### GeoMatch v2 (Recommended) |
|
|
| ```python |
| import torch, sys |
| sys.path.append(".") |
| import config |
| from models.geomatch import GeoMatch |
| |
| model = GeoMatch(config).cuda() |
| model.load_state_dict(torch.load("geomatch_v2/final.pth", map_location="cuda")) |
| model.eval() |
| |
| with torch.no_grad(): |
| contact_map, keypoint_probs = model( |
| obj_pc, # [B, 2048, 3] object point cloud |
| robot_pc, # [B, 6, 3] robot surface points (6 keypoints) |
| robot_key_point_idx, # [B, 6] keypoint indices into robot_pc |
| obj_adj, # [B, 2048, 2048] object adjacency (sparse COO) |
| robot_adj, # [B, 6, 6] robot adjacency |
| xyz_prev, # [B, 6, 3] previous keypoint positions |
| ) |
| # contact_map: [B, 2048, 6, 1] — per-object-point × per-keypoint contact probability |
| # keypoint_probs: [B, 2048, 5, 1] — autoregressive keypoint contact probabilities |
| ``` |
|
|
| ### GeoMatch++ (Deprecated — kept for reproducibility) |
|
|
| ```python |
| import torch, sys |
| sys.path.append(".") |
| import config |
| from models.geomatch_pp import GeoMatchPP |
| |
| model = GeoMatchPP(config).cuda() |
| model.load_state_dict(torch.load("geomatch_pp/final.pth", map_location="cuda")) |
| model.eval() |
| |
| with torch.no_grad(): |
| contact_map, keypoint_probs = model( |
| obj_pc, # [B, 2048, 3] |
| robot_pc, # [B, 6, 3] |
| robot_key_point_idx, # [B, 6] |
| obj_adj, # [B, 2048, 2048] |
| robot_adj, # [B, 6, 6] |
| xyz_prev, # [B, 6, 3] |
| morph_features, # [B, 32, 9] morphology node features |
| morph_adj, # [B, 32, 32] morphology adjacency |
| ) |
| ``` |
|
|
| Morphology graphs are pre-built per robot using `preprocess_morphology.py` → `gnn_morphology_new.pt`. |
|
|
| ### RobotFingerPrint |
|
|
| Lightning checkpoints store hyperparameters alongside weights, so the model reconstructs itself |
| directly from the `.ckpt` file — no separate config needed: |
|
|
| ```python |
| import sys |
| sys.path.append(".") |
| from robotfingerprint.model.grasp_network import GcsGraspModel |
| |
| model = GcsGraspModel.load_from_checkpoint("robotfingerprint/fullrobots/final.ckpt") |
| model.eval().cuda() |
| |
| with torch.no_grad(): |
| # input_pc: [B, N, 3] object point cloud; gt_gcs only used to establish shape during |
| # training — at inference time use model.model.predict(input_pc) for sampling-based prediction |
| pred_uv = model.model.predict(input_pc.cuda()) |
| # pred_uv: [B, N, 2] predicted Unified Gripper Coordinate Space (u, v) per object point |
| ``` |
|
|
| For the full downstream pipeline (coordinate inference on held-out objects → grasp generation → |
| IsaacGym stability testing), see the |
| [RobotFingerPrint repository](https://github.com/IRVLUTD/robot-finger-print) `gcs_gdx_inf_cvae.py` |
| and `gcs_gdx_grasp_gen.py` scripts, using these checkpoints via `--logdir`/`--ckpt`. |
|
|
| --- |
|
|
| ## Repository Structure |
|
|
| ``` |
| geomatch_v1/ # GeoMatch v1 checkpoints (deprecated, corrupted keypoints) |
| checkpoint_epoch50.pth |
| checkpoint_epoch100.pth |
| checkpoint_epoch150.pth |
| final.pth |
| geomatch_pp/ # GeoMatch++ checkpoints (deprecated, built on v1 encoders) |
| checkpoint_epoch50.pth |
| checkpoint_epoch100.pth |
| checkpoint_epoch140.pth |
| final.pth |
| geomatch_v2/ # GeoMatch v2 checkpoints (recommended) |
| checkpoint_epoch50.pth |
| checkpoint_epoch100.pth |
| checkpoint_epoch150.pth |
| final.pth |
| robotfingerprint/ # RobotFingerPrint (arXiv:2409.14519) paper reproduction |
| fullrobots/final.ckpt |
| unseen_barrett/final.ckpt |
| unseen_ezgripper/final.ckpt |
| unseen_shadowhand/final.ckpt |
| model/ |
| grasp_network.py # GcsGraspModel (Lightning module) |
| modules.py # GcsCVAE, PointNetCmapEncoder/Decoder |
| loss.py # GcsLoss (recon + annealed KLD) |
| models/ # Shared GeoMatch source (v1/v2/++) |
| geomatch.py |
| geomatch_pp.py |
| gnn.py |
| mlp.py |
| config.py # Hyperparameters for GeoMatch models |
| generate_keypoints_json.py # Fixed keypoint generator (used for v2 training data) |
| ``` |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{geomatch2024, |
| title = {Geometry Matching for Multi-Embodiment Grasping}, |
| booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, |
| year = {2024}, |
| } |
| |
| @article{geomatch_pp2024, |
| title = {GeoMatch++: Morphology-Aware Grasping via Correspondence Learning}, |
| journal = {arXiv preprint arXiv:2412.18998}, |
| year = {2024}, |
| } |
| |
| @article{khargonkar2024robotfingerprint, |
| title = {RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis}, |
| author = {Khargonkar, Ninad and Casas, Luis Felipe and Prabhakaran, Balakrishnan and Xiang, Yu}, |
| journal = {arXiv preprint arXiv:2409.14519}, |
| year = {2024}, |
| } |
| ``` |
|
|
| --- |
|
|
| ## License |
|
|
| Original GeoMatch code © 2023 DeepMind Technologies Limited, licensed under the Apache License 2.0. |
| GeoMatch++ extension, v2 training, and all GeoMatch checkpoints produced by |
| [Dimios45](https://huggingface.co/Dimios45) as part of the Graspmax project. |
|
|
| RobotFingerPrint model source (`robotfingerprint/model/`) is from the original authors' |
| [repository](https://github.com/IRVLUTD/robot-finger-print) (MIT-style license, see their repo for |
| exact terms); checkpoints in `robotfingerprint/` were trained by |
| [Dimios45](https://huggingface.co/Dimios45) reproducing the paper's published recipe. |
|
|