Graspmax / README.md
Dimios45's picture
Reorganize GeoMatch checkpoints into per-version subfolders; add RobotFingerPrint (arXiv:2409.14519) paper reproduction checkpoints
34da7e1 verified
|
Raw
History Blame Contribute Delete
17.2 kB
---
license: apache-2.0
tags:
- robotics
- grasping
- graph-neural-network
- point-cloud
- contact-prediction
- morphology
- cross-attention
- cvae
- coordinate-space
---
# Graspmax — GeoMatch v2 · GeoMatch++ · GeoMatch v1 · RobotFingerPrint
**Graspmax** contains geometry-aware contact/coordinate prediction models for dexterous robotic
grasping, trained on the CMapDataset / GenDexGrasp dataset across 5 robot end-effectors
(EZGripper, Barrett, Robotiq 3-Finger, Allegro, ShadowHand).
> **⚠️ Version notice:** GeoMatch v1 and GeoMatch++ were trained with a corrupted
> `robot_keypoints.json` (2× scale factor and wrong shadowhand axis-swap stage).
> **Use GeoMatch v2 for any new work.** v1 and GeoMatch++ are kept for reproducibility only.
---
## Models at a Glance
| Model | Status | Folder | Val loss | Val acc |
|---|---|---|---|---|
| **GeoMatch v2** | ✅ **Recommended** | `geomatch_v2/` | **1.594** | **0.695** |
| GeoMatch++ | ⚠️ Deprecated (built on v1 encoders) | `geomatch_pp/` | 0.350 | 0.940 |
| GeoMatch v1 | ⚠️ Deprecated (corrupted keypoints) | `geomatch_v1/` | 0.435 | 0.959 |
| **RobotFingerPrint** | ✅ Paper reproduction | `robotfingerprint/` | see below | — |
> The lower loss/higher accuracy of v1 and GeoMatch++ are an artefact of training on corrupted
> keypoints — the 2× scale inflated keypoint distances making the contact maps geometrically
> trivial to predict. v2 trains on correct geometry and is the only model that produces valid
> IK targets during grasp generation.
---
## Architecture
### GeoMatch (v1 and v2 share the same architecture)
Dual GCN encoder (object + robot surface) → L2-normalised embeddings → linear projection heads
(512→64) × 2 → 5 autoregressive MLP modules → per-keypoint BCE contact map prediction.
> Based on: [*Geometry Matching for Multi-Embodiment Grasping*](https://openreview.net/forum?id=oyWkrG-LD5) (NeurIPS 2024)
### GeoMatch++
Extends GeoMatch with a **morphology encoder** (GCN over the robot kinematic-tree graph, 9D node
features, 32 nodes) and a **DCP-style cross-attention transformer** that fuses object geometry with
robot morphology before contact prediction. Pretrained GeoMatch v1 encoders are frozen.
> Based on: [*GeoMatch++: Morphology-Aware Grasping via Correspondence Learning*](https://arxiv.org/abs/2412.18998)
### RobotFingerPrint
A conditional VAE (`GcsCVAE`) that predicts a per-point **Unified Gripper Coordinate Space (UGCS)**
2D coordinate map over an object's point cloud, conditioned on the object geometry. Unlike GeoMatch's
discrete per-keypoint contact classification, RobotFingerPrint regresses a continuous (u, v)
coordinate for every object point, which is what allows a single trained model to transfer grasps
across grippers with a *different number of fingers* without any manual re-targeting.
- **Encoder**: PointNet-style per-point Conv1d stack over `(object_pc, gt_uv)` → max-pool global
feature → linear heads to VAE latent mean/logvar (`encoder_layers_size=[5, 64, 128, 512, 512]`,
`latent_size=128`)
- **Decoder**: per-point features + global object feature + latent code → Conv1d stack
(`decoder_decoder_layers_size=[64+512+128, 512, 64]`) → two parallel U/V prediction heads
(`uv_layers_size=[64, 32, 1]`)
- **Loss**: reconstruction (weighted L2 on predicted vs. ground-truth UV coordinates) + annealed
KL divergence (weight increased every `ann_per_epochs` epochs following a temperature schedule)
> Based on: [*RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis*](https://arxiv.org/abs/2409.14519) (Khargonkar, Casas, Prabhakaran, Xiang)
### Component Comparison (GeoMatch family)
| Component | GeoMatch v1 / v2 | GeoMatch++ |
|---|---|---|
| Object GCN encoder | 3 layers × 256 → 512, trainable | Same, **frozen** (from GeoMatch v1) |
| Robot surface GCN | 3 layers × 256 → 512, trainable | Same, **frozen** (from GeoMatch v1) |
| Morphology encoder | — | **NEW** GCN(9 → 256×3 → 512), trainable |
| Cross-attention | — | **NEW** DCP transformer (512-dim, 4 heads, 1 layer) |
| Projection heads | Linear(512→64) × 2 | Same, re-initialised |
| AR keypoint modules | 5× MLP | Same, re-initialised |
| **Total params** | **~1.9M** | **~6.4M (~5.8M trainable)** |
---
## What Changed in v2 (Keypoint Bug Fix)
GeoMatch requires a `robot_keypoints.json` that defines canonical 3D keypoint positions for each
robot in rest-pose world space. The v1 keypoints had two bugs:
**Bug 1 — 2× scale factor:** The generation script applied `world_pos *= 2.0`, citing
`HandModel`'s `hand_scale=2.0` class default. However, every actual call site passes
`hand_scale=1.0`, overriding that default. Because the scale was applied before the inverse-FK
projection that `HandModel.get_canonical_keypoints()` uses (`T⁻¹[2p;1] ≠ 2·T⁻¹[p;1]`), the
distortion was not uniform — it grew with each link's distance from the kinematic root, corrupting
both training labels and inference IK targets.
**Bug 2 — ShadowHand axis-swap at wrong stage:** The `[x, -z, y]` axis permutation for ShadowHand
was applied to the final world-space `world_pos` (after FK). The reference implementation
(`gripper_utils.py`) applies it to raw mesh points in link-local space *before* the visual-origin
transform. Rotation and axis permutation do not commute, so the wrong stage produced scrambled
keypoint positions for any ShadowHand link with a non-zero visual-origin rotation.
Both bugs were confirmed by comparing `generate_keypoints_json.py` against `gripper_utils.py` and
verified by observing that v1 ShadowHand tip keypoints had y-values of ~−0.84 m (outside any
physical hand envelope) versus the corrected ~0.01 m.
---
## Training Details
### GeoMatch v2 ✅ (Recommended)
| Setting | Value |
|---|---|
| Dataset | CMapDataset (ContactDB + YCB), fixed keypoints |
| End-effectors | EZGripper, Barrett, Robotiq 3-Finger, Allegro, ShadowHand |
| Batch size | 256 |
| Optimizer | Adam (β₁=0.9, β₂=0.99) |
| Learning rate | 1e-4 |
| Epochs | 200 |
| Hardware | AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4 |
| Training time | 8.58 hours |
| Precision | FP32 |
| **Final val loss** | **1.594** |
| **Final val accuracy** | **0.695** |
#### GeoMatch v2 Training Curves
| Epoch | Val Loss | Val Accuracy |
|---|---|---|
| 0 | 1.935 | 0.205 |
| 25 | 1.731 | 0.563 |
| 50 | 1.675 | 0.580 |
| 100 | 1.649 | 0.632 |
| 150 | 1.603 | 0.656 |
| 199 | 1.594 | 0.695 |
---
### GeoMatch++ ⚠️ (Deprecated — built on GeoMatch v1 encoders)
| Setting | Value |
|---|---|
| Initialisation | Pretrained **GeoMatch v1** encoders (frozen) |
| Trainable params | ~5.8M |
| Batch size | 32 per GPU × 8 GPUs = 256 effective |
| Optimizer | Adam (β₁=0.9, β₂=0.99) |
| Learning rate | 5e-5 |
| Epochs | 150 |
| Hardware | 8× AMD Instinct MI300X, ROCm 6.2.4 (DDP) |
| Training time | ~2.8 hours |
| Precision | FP32 |
| **Final val loss** | **0.350** *(artefact of corrupted training data)* |
| **Final val accuracy** | **0.940** *(artefact of corrupted training data)* |
#### GeoMatch++ Training Curves
| Epoch | Val Loss | Val Accuracy |
|---|---|---|
| 0 | 0.465 | 0.999 |
| 25 | 0.370 | 0.880 |
| 89 | 0.362 | 0.902 |
| 149 | 0.350 | 0.940 |
---
### GeoMatch v1 ⚠️ (Deprecated — corrupted keypoints)
| Setting | Value |
|---|---|
| Dataset | CMapDataset (ContactDB + YCB), **corrupted keypoints** |
| Batch size | 256 |
| Optimizer | Adam (β₁=0.9, β₂=0.99) |
| Learning rate | 1e-4 |
| Epochs | 200 |
| Hardware | AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4 |
| Training time | 22.18 hours |
| Precision | FP32 |
| **Final val loss** | **0.435** *(artefact of corrupted training data)* |
| **Final val accuracy** | **0.959** *(artefact of corrupted training data)* |
---
### RobotFingerPrint ✅ (Paper reproduction — 4 experiments)
All 4 experiments use the exact recipe published by the paper's authors (this repo's own README):
`--n_epochs 16 --ann_temp 1.5 --ann_per_epochs 2`, plus code defaults `lr=1e-4`, `batch_size=64`,
`lw_recon=1000.0`, `lw_kld=0.01`, `attn_alpha=3`, Adam(β₁=0.9, β₂=0.999), seed=42.
| Setting | Value |
|---|---|
| Dataset | GenDexGrasp CMapDataset-sqrt_align + RobotFingerPrint UGCS coordinates |
| Batch size | 64 |
| Optimizer | Adam (β₁=0.9, β₂=0.999) |
| Learning rate | 1e-4 (StepLR decay disabled — `decay_lr_freq=1000` > total epochs) |
| Epochs | 16 |
| KL annealing | temperature 1.5, weight increased every 2 epochs |
| Hardware | AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4, single GPU per run |
| Precision | FP32 |
| Experiment | Seen grippers | Held-out (unseen) | Training time | Final val recon loss | Final val KLD | Final val overall |
|---|---|---|---|---|---|---|
| `fullrobots/` | ezgripper, barrett, robotiq_3finger, allegro, shadowhand | none (main paper result) | 449 s | **0.2495** | 112.57 | 268.72 |
| `unseen_barrett/` | ezgripper, robotiq_3finger, allegro, shadowhand | barrett | 366 s | **0.2568** | 105.99 | 274.90 |
| `unseen_ezgripper/` | barrett, robotiq_3finger, allegro, shadowhand | ezgripper | 383 s | **0.2791** | 109.36 | 297.77 |
| `unseen_shadowhand/` | ezgripper, barrett, robotiq_3finger, allegro | shadowhand | 366 s | **0.2590** | 111.11 | 278.02 |
> "Val recon loss" is the reconstruction term of the CVAE loss (weighted L2 between predicted and
> ground-truth UGCS coordinates) — lower is better. The 3 unseen-gripper runs measure how well the
> model's learned coordinate space generalizes to a gripper never seen during training (evaluated
> only on the 4 remaining seen grippers' validation split, same as the paper's ablation setup — a
> full cross-gripper zero-shot transfer evaluation requires the downstream grasp generation +
> IsaacGym stability test pipeline, not covered by this checkpoint alone).
---
## Checkpoints
### GeoMatch v2 ✅ (Use these)
| File | Epoch | Val Loss | Notes |
|---|---|---|---|
| `geomatch_v2/checkpoint_epoch50.pth` | 50 | 1.675 | Early convergence |
| `geomatch_v2/checkpoint_epoch100.pth` | 100 | 1.649 | Mid-training |
| `geomatch_v2/checkpoint_epoch150.pth` | 150 | 1.603 | Near-converged |
| `geomatch_v2/final.pth` | 199 | 1.594 | **Final model (recommended)** |
### GeoMatch++ ⚠️ (Deprecated)
| File | Epoch | Notes |
|---|---|---|
| `geomatch_pp/checkpoint_epoch50.pth` | 50 | Early convergence |
| `geomatch_pp/checkpoint_epoch100.pth` | 100 | Mid-training |
| `geomatch_pp/checkpoint_epoch140.pth` | 140 | Near-converged |
| `geomatch_pp/final.pth` | 149 | Final (deprecated) |
### GeoMatch v1 ⚠️ (Deprecated)
| File | Epoch | Notes |
|---|---|---|
| `geomatch_v1/checkpoint_epoch50.pth` | 50 | Early convergence |
| `geomatch_v1/checkpoint_epoch100.pth` | 100 | Mid-training |
| `geomatch_v1/checkpoint_epoch150.pth` | 150 | Near-converged |
| `geomatch_v1/final.pth` | 200 | Final (deprecated) |
### RobotFingerPrint ✅ (final checkpoint only per experiment)
| File | Experiment | Notes |
|---|---|---|
| `robotfingerprint/fullrobots/final.ckpt` | All 5 grippers seen | **Main paper result — recommended** |
| `robotfingerprint/unseen_barrett/final.ckpt` | Barrett held out | Generalization ablation |
| `robotfingerprint/unseen_ezgripper/final.ckpt` | EZGripper held out | Generalization ablation |
| `robotfingerprint/unseen_shadowhand/final.ckpt` | ShadowHand held out | Generalization ablation |
> These are full PyTorch Lightning checkpoints (model weights + hyperparameters, no optimizer/epoch
> intermediates — only the last training epoch of each run is kept). Only the final epoch is
> published; intermediate per-epoch checkpoints are not included here.
---
## Usage
### GeoMatch v2 (Recommended)
```python
import torch, sys
sys.path.append(".")
import config
from models.geomatch import GeoMatch
model = GeoMatch(config).cuda()
model.load_state_dict(torch.load("geomatch_v2/final.pth", map_location="cuda"))
model.eval()
with torch.no_grad():
contact_map, keypoint_probs = model(
obj_pc, # [B, 2048, 3] object point cloud
robot_pc, # [B, 6, 3] robot surface points (6 keypoints)
robot_key_point_idx, # [B, 6] keypoint indices into robot_pc
obj_adj, # [B, 2048, 2048] object adjacency (sparse COO)
robot_adj, # [B, 6, 6] robot adjacency
xyz_prev, # [B, 6, 3] previous keypoint positions
)
# contact_map: [B, 2048, 6, 1] — per-object-point × per-keypoint contact probability
# keypoint_probs: [B, 2048, 5, 1] — autoregressive keypoint contact probabilities
```
### GeoMatch++ (Deprecated — kept for reproducibility)
```python
import torch, sys
sys.path.append(".")
import config
from models.geomatch_pp import GeoMatchPP
model = GeoMatchPP(config).cuda()
model.load_state_dict(torch.load("geomatch_pp/final.pth", map_location="cuda"))
model.eval()
with torch.no_grad():
contact_map, keypoint_probs = model(
obj_pc, # [B, 2048, 3]
robot_pc, # [B, 6, 3]
robot_key_point_idx, # [B, 6]
obj_adj, # [B, 2048, 2048]
robot_adj, # [B, 6, 6]
xyz_prev, # [B, 6, 3]
morph_features, # [B, 32, 9] morphology node features
morph_adj, # [B, 32, 32] morphology adjacency
)
```
Morphology graphs are pre-built per robot using `preprocess_morphology.py``gnn_morphology_new.pt`.
### RobotFingerPrint
Lightning checkpoints store hyperparameters alongside weights, so the model reconstructs itself
directly from the `.ckpt` file — no separate config needed:
```python
import sys
sys.path.append(".")
from robotfingerprint.model.grasp_network import GcsGraspModel
model = GcsGraspModel.load_from_checkpoint("robotfingerprint/fullrobots/final.ckpt")
model.eval().cuda()
with torch.no_grad():
# input_pc: [B, N, 3] object point cloud; gt_gcs only used to establish shape during
# training — at inference time use model.model.predict(input_pc) for sampling-based prediction
pred_uv = model.model.predict(input_pc.cuda())
# pred_uv: [B, N, 2] predicted Unified Gripper Coordinate Space (u, v) per object point
```
For the full downstream pipeline (coordinate inference on held-out objects → grasp generation →
IsaacGym stability testing), see the
[RobotFingerPrint repository](https://github.com/IRVLUTD/robot-finger-print) `gcs_gdx_inf_cvae.py`
and `gcs_gdx_grasp_gen.py` scripts, using these checkpoints via `--logdir`/`--ckpt`.
---
## Repository Structure
```
geomatch_v1/ # GeoMatch v1 checkpoints (deprecated, corrupted keypoints)
checkpoint_epoch50.pth
checkpoint_epoch100.pth
checkpoint_epoch150.pth
final.pth
geomatch_pp/ # GeoMatch++ checkpoints (deprecated, built on v1 encoders)
checkpoint_epoch50.pth
checkpoint_epoch100.pth
checkpoint_epoch140.pth
final.pth
geomatch_v2/ # GeoMatch v2 checkpoints (recommended)
checkpoint_epoch50.pth
checkpoint_epoch100.pth
checkpoint_epoch150.pth
final.pth
robotfingerprint/ # RobotFingerPrint (arXiv:2409.14519) paper reproduction
fullrobots/final.ckpt
unseen_barrett/final.ckpt
unseen_ezgripper/final.ckpt
unseen_shadowhand/final.ckpt
model/
grasp_network.py # GcsGraspModel (Lightning module)
modules.py # GcsCVAE, PointNetCmapEncoder/Decoder
loss.py # GcsLoss (recon + annealed KLD)
models/ # Shared GeoMatch source (v1/v2/++)
geomatch.py
geomatch_pp.py
gnn.py
mlp.py
config.py # Hyperparameters for GeoMatch models
generate_keypoints_json.py # Fixed keypoint generator (used for v2 training data)
```
---
## Citation
```bibtex
@inproceedings{geomatch2024,
title = {Geometry Matching for Multi-Embodiment Grasping},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2024},
}
@article{geomatch_pp2024,
title = {GeoMatch++: Morphology-Aware Grasping via Correspondence Learning},
journal = {arXiv preprint arXiv:2412.18998},
year = {2024},
}
@article{khargonkar2024robotfingerprint,
title = {RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis},
author = {Khargonkar, Ninad and Casas, Luis Felipe and Prabhakaran, Balakrishnan and Xiang, Yu},
journal = {arXiv preprint arXiv:2409.14519},
year = {2024},
}
```
---
## License
Original GeoMatch code © 2023 DeepMind Technologies Limited, licensed under the Apache License 2.0.
GeoMatch++ extension, v2 training, and all GeoMatch checkpoints produced by
[Dimios45](https://huggingface.co/Dimios45) as part of the Graspmax project.
RobotFingerPrint model source (`robotfingerprint/model/`) is from the original authors'
[repository](https://github.com/IRVLUTD/robot-finger-print) (MIT-style license, see their repo for
exact terms); checkpoints in `robotfingerprint/` were trained by
[Dimios45](https://huggingface.co/Dimios45) reproducing the paper's published recipe.