Graspmax / README.md

Reorganize GeoMatch checkpoints into per-version subfolders; add RobotFingerPrint (arXiv:2409.14519) paper reproduction checkpoints

34da7e1 verified about 8 hours ago

preview code

Raw

History Blame Contribute Delete

17.2 kB

	---
	license: apache-2.0
	tags:
	- robotics
	- grasping
	- graph-neural-network
	- point-cloud
	- contact-prediction
	- morphology
	- cross-attention
	- cvae
	- coordinate-space
	---

	# Graspmax — GeoMatch v2 · GeoMatch++ · GeoMatch v1 · RobotFingerPrint

	Graspmax contains geometry-aware contact/coordinate prediction models for dexterous robotic
	grasping, trained on the CMapDataset / GenDexGrasp dataset across 5 robot end-effectors
	(EZGripper, Barrett, Robotiq 3-Finger, Allegro, ShadowHand).

	> ⚠️ Version notice: GeoMatch v1 and GeoMatch++ were trained with a corrupted
	> `robot_keypoints.json` (2× scale factor and wrong shadowhand axis-swap stage).
	> Use GeoMatch v2 for any new work. v1 and GeoMatch++ are kept for reproducibility only.

	---

	## Models at a Glance

	\| Model \| Status \| Folder \| Val loss \| Val acc \|
	\|---\|---\|---\|---\|---\|
	\| GeoMatch v2 \| ✅ Recommended \| `geomatch_v2/` \| 1.594 \| 0.695 \|
	\| GeoMatch++ \| ⚠️ Deprecated (built on v1 encoders) \| `geomatch_pp/` \| 0.350 \| 0.940 \|
	\| GeoMatch v1 \| ⚠️ Deprecated (corrupted keypoints) \| `geomatch_v1/` \| 0.435 \| 0.959 \|
	\| RobotFingerPrint \| ✅ Paper reproduction \| `robotfingerprint/` \| see below \| — \|

	> The lower loss/higher accuracy of v1 and GeoMatch++ are an artefact of training on corrupted
	> keypoints — the 2× scale inflated keypoint distances making the contact maps geometrically
	> trivial to predict. v2 trains on correct geometry and is the only model that produces valid
	> IK targets during grasp generation.

	---

	## Architecture

	### GeoMatch (v1 and v2 share the same architecture)

	Dual GCN encoder (object + robot surface) → L2-normalised embeddings → linear projection heads
	(512→64) × 2 → 5 autoregressive MLP modules → per-keypoint BCE contact map prediction.

	> Based on: [Geometry Matching for Multi-Embodiment Grasping](https://openreview.net/forum?id=oyWkrG-LD5) (NeurIPS 2024)

	### GeoMatch++

	Extends GeoMatch with a morphology encoder (GCN over the robot kinematic-tree graph, 9D node
	features, 32 nodes) and a DCP-style cross-attention transformer that fuses object geometry with
	robot morphology before contact prediction. Pretrained GeoMatch v1 encoders are frozen.

	> Based on: [GeoMatch++: Morphology-Aware Grasping via Correspondence Learning](https://arxiv.org/abs/2412.18998)

	### RobotFingerPrint

	A conditional VAE (`GcsCVAE`) that predicts a per-point Unified Gripper Coordinate Space (UGCS)
	2D coordinate map over an object's point cloud, conditioned on the object geometry. Unlike GeoMatch's
	discrete per-keypoint contact classification, RobotFingerPrint regresses a continuous (u, v)
	coordinate for every object point, which is what allows a single trained model to transfer grasps
	across grippers with a different number of fingers without any manual re-targeting.

	- Encoder: PointNet-style per-point Conv1d stack over `(object_pc, gt_uv)` → max-pool global
	feature → linear heads to VAE latent mean/logvar (`encoder_layers_size=[5, 64, 128, 512, 512]`,
	`latent_size=128`)
	- Decoder: per-point features + global object feature + latent code → Conv1d stack
	(`decoder_decoder_layers_size=[64+512+128, 512, 64]`) → two parallel U/V prediction heads
	(`uv_layers_size=[64, 32, 1]`)
	- Loss: reconstruction (weighted L2 on predicted vs. ground-truth UV coordinates) + annealed
	KL divergence (weight increased every `ann_per_epochs` epochs following a temperature schedule)

	> Based on: [RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis](https://arxiv.org/abs/2409.14519) (Khargonkar, Casas, Prabhakaran, Xiang)

	### Component Comparison (GeoMatch family)

	\| Component \| GeoMatch v1 / v2 \| GeoMatch++ \|
	\|---\|---\|---\|
	\| Object GCN encoder \| 3 layers × 256 → 512, trainable \| Same, frozen (from GeoMatch v1) \|
	\| Robot surface GCN \| 3 layers × 256 → 512, trainable \| Same, frozen (from GeoMatch v1) \|
	\| Morphology encoder \| — \| NEW GCN(9 → 256×3 → 512), trainable \|
	\| Cross-attention \| — \| NEW DCP transformer (512-dim, 4 heads, 1 layer) \|
	\| Projection heads \| Linear(512→64) × 2 \| Same, re-initialised \|
	\| AR keypoint modules \| 5× MLP \| Same, re-initialised \|
	\| Total params \| ~1.9M \| ~6.4M (~5.8M trainable) \|

	---

	## What Changed in v2 (Keypoint Bug Fix)

	GeoMatch requires a `robot_keypoints.json` that defines canonical 3D keypoint positions for each
	robot in rest-pose world space. The v1 keypoints had two bugs:

	Bug 1 — 2× scale factor: The generation script applied `world_pos *= 2.0`, citing
	`HandModel`'s `hand_scale=2.0` class default. However, every actual call site passes
	`hand_scale=1.0`, overriding that default. Because the scale was applied before the inverse-FK
	projection that `HandModel.get_canonical_keypoints()` uses (`T⁻¹[2p;1] ≠ 2·T⁻¹[p;1]`), the
	distortion was not uniform — it grew with each link's distance from the kinematic root, corrupting
	both training labels and inference IK targets.

	Bug 2 — ShadowHand axis-swap at wrong stage: The `[x, -z, y]` axis permutation for ShadowHand
	was applied to the final world-space `world_pos` (after FK). The reference implementation
	(`gripper_utils.py`) applies it to raw mesh points in link-local space before the visual-origin
	transform. Rotation and axis permutation do not commute, so the wrong stage produced scrambled
	keypoint positions for any ShadowHand link with a non-zero visual-origin rotation.

	Both bugs were confirmed by comparing `generate_keypoints_json.py` against `gripper_utils.py` and
	verified by observing that v1 ShadowHand tip keypoints had y-values of ~−0.84 m (outside any
	physical hand envelope) versus the corrected ~0.01 m.

	---

	## Training Details

	### GeoMatch v2 ✅ (Recommended)

	\| Setting \| Value \|
	\|---\|---\|
	\| Dataset \| CMapDataset (ContactDB + YCB), fixed keypoints \|
	\| End-effectors \| EZGripper, Barrett, Robotiq 3-Finger, Allegro, ShadowHand \|
	\| Batch size \| 256 \|
	\| Optimizer \| Adam (β₁=0.9, β₂=0.99) \|
	\| Learning rate \| 1e-4 \|
	\| Epochs \| 200 \|
	\| Hardware \| AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4 \|
	\| Training time \| 8.58 hours \|
	\| Precision \| FP32 \|
	\| Final val loss \| 1.594 \|
	\| Final val accuracy \| 0.695 \|

	#### GeoMatch v2 Training Curves

	\| Epoch \| Val Loss \| Val Accuracy \|
	\|---\|---\|---\|
	\| 0 \| 1.935 \| 0.205 \|
	\| 25 \| 1.731 \| 0.563 \|
	\| 50 \| 1.675 \| 0.580 \|
	\| 100 \| 1.649 \| 0.632 \|
	\| 150 \| 1.603 \| 0.656 \|
	\| 199 \| 1.594 \| 0.695 \|

	---

	### GeoMatch++ ⚠️ (Deprecated — built on GeoMatch v1 encoders)

	\| Setting \| Value \|
	\|---\|---\|
	\| Initialisation \| Pretrained GeoMatch v1 encoders (frozen) \|
	\| Trainable params \| ~5.8M \|
	\| Batch size \| 32 per GPU × 8 GPUs = 256 effective \|
	\| Optimizer \| Adam (β₁=0.9, β₂=0.99) \|
	\| Learning rate \| 5e-5 \|
	\| Epochs \| 150 \|
	\| Hardware \| 8× AMD Instinct MI300X, ROCm 6.2.4 (DDP) \|
	\| Training time \| ~2.8 hours \|
	\| Precision \| FP32 \|
	\| Final val loss \| 0.350 (artefact of corrupted training data) \|
	\| Final val accuracy \| 0.940 (artefact of corrupted training data) \|

	#### GeoMatch++ Training Curves

	\| Epoch \| Val Loss \| Val Accuracy \|
	\|---\|---\|---\|
	\| 0 \| 0.465 \| 0.999 \|
	\| 25 \| 0.370 \| 0.880 \|
	\| 89 \| 0.362 \| 0.902 \|
	\| 149 \| 0.350 \| 0.940 \|

	---

	### GeoMatch v1 ⚠️ (Deprecated — corrupted keypoints)

	\| Setting \| Value \|
	\|---\|---\|
	\| Dataset \| CMapDataset (ContactDB + YCB), corrupted keypoints \|
	\| Batch size \| 256 \|
	\| Optimizer \| Adam (β₁=0.9, β₂=0.99) \|
	\| Learning rate \| 1e-4 \|
	\| Epochs \| 200 \|
	\| Hardware \| AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4 \|
	\| Training time \| 22.18 hours \|
	\| Precision \| FP32 \|
	\| Final val loss \| 0.435 (artefact of corrupted training data) \|
	\| Final val accuracy \| 0.959 (artefact of corrupted training data) \|

	---

	### RobotFingerPrint ✅ (Paper reproduction — 4 experiments)

	All 4 experiments use the exact recipe published by the paper's authors (this repo's own README):
	`--n_epochs 16 --ann_temp 1.5 --ann_per_epochs 2`, plus code defaults `lr=1e-4`, `batch_size=64`,
	`lw_recon=1000.0`, `lw_kld=0.01`, `attn_alpha=3`, Adam(β₁=0.9, β₂=0.999), seed=42.

	\| Setting \| Value \|
	\|---\|---\|
	\| Dataset \| GenDexGrasp CMapDataset-sqrt_align + RobotFingerPrint UGCS coordinates \|
	\| Batch size \| 64 \|
	\| Optimizer \| Adam (β₁=0.9, β₂=0.999) \|
	\| Learning rate \| 1e-4 (StepLR decay disabled — `decay_lr_freq=1000` > total epochs) \|
	\| Epochs \| 16 \|
	\| KL annealing \| temperature 1.5, weight increased every 2 epochs \|
	\| Hardware \| AMD Instinct MI300X (192 GB HBM3), ROCm 6.2.4, single GPU per run \|
	\| Precision \| FP32 \|

	\| Experiment \| Seen grippers \| Held-out (unseen) \| Training time \| Final val recon loss \| Final val KLD \| Final val overall \|
	\|---\|---\|---\|---\|---\|---\|---\|
	\| `fullrobots/` \| ezgripper, barrett, robotiq_3finger, allegro, shadowhand \| none (main paper result) \| 449 s \| 0.2495 \| 112.57 \| 268.72 \|
	\| `unseen_barrett/` \| ezgripper, robotiq_3finger, allegro, shadowhand \| barrett \| 366 s \| 0.2568 \| 105.99 \| 274.90 \|
	\| `unseen_ezgripper/` \| barrett, robotiq_3finger, allegro, shadowhand \| ezgripper \| 383 s \| 0.2791 \| 109.36 \| 297.77 \|
	\| `unseen_shadowhand/` \| ezgripper, barrett, robotiq_3finger, allegro \| shadowhand \| 366 s \| 0.2590 \| 111.11 \| 278.02 \|

	> "Val recon loss" is the reconstruction term of the CVAE loss (weighted L2 between predicted and
	> ground-truth UGCS coordinates) — lower is better. The 3 unseen-gripper runs measure how well the
	> model's learned coordinate space generalizes to a gripper never seen during training (evaluated
	> only on the 4 remaining seen grippers' validation split, same as the paper's ablation setup — a
	> full cross-gripper zero-shot transfer evaluation requires the downstream grasp generation +
	> IsaacGym stability test pipeline, not covered by this checkpoint alone).

	---

	## Checkpoints

	### GeoMatch v2 ✅ (Use these)

	\| File \| Epoch \| Val Loss \| Notes \|
	\|---\|---\|---\|---\|
	\| `geomatch_v2/checkpoint_epoch50.pth` \| 50 \| 1.675 \| Early convergence \|
	\| `geomatch_v2/checkpoint_epoch100.pth` \| 100 \| 1.649 \| Mid-training \|
	\| `geomatch_v2/checkpoint_epoch150.pth` \| 150 \| 1.603 \| Near-converged \|
	\| `geomatch_v2/final.pth` \| 199 \| 1.594 \| Final model (recommended) \|

	### GeoMatch++ ⚠️ (Deprecated)

	\| File \| Epoch \| Notes \|
	\|---\|---\|---\|
	\| `geomatch_pp/checkpoint_epoch50.pth` \| 50 \| Early convergence \|
	\| `geomatch_pp/checkpoint_epoch100.pth` \| 100 \| Mid-training \|
	\| `geomatch_pp/checkpoint_epoch140.pth` \| 140 \| Near-converged \|
	\| `geomatch_pp/final.pth` \| 149 \| Final (deprecated) \|

	### GeoMatch v1 ⚠️ (Deprecated)

	\| File \| Epoch \| Notes \|
	\|---\|---\|---\|
	\| `geomatch_v1/checkpoint_epoch50.pth` \| 50 \| Early convergence \|
	\| `geomatch_v1/checkpoint_epoch100.pth` \| 100 \| Mid-training \|
	\| `geomatch_v1/checkpoint_epoch150.pth` \| 150 \| Near-converged \|
	\| `geomatch_v1/final.pth` \| 200 \| Final (deprecated) \|

	### RobotFingerPrint ✅ (final checkpoint only per experiment)

	\| File \| Experiment \| Notes \|
	\|---\|---\|---\|
	\| `robotfingerprint/fullrobots/final.ckpt` \| All 5 grippers seen \| Main paper result — recommended \|
	\| `robotfingerprint/unseen_barrett/final.ckpt` \| Barrett held out \| Generalization ablation \|
	\| `robotfingerprint/unseen_ezgripper/final.ckpt` \| EZGripper held out \| Generalization ablation \|
	\| `robotfingerprint/unseen_shadowhand/final.ckpt` \| ShadowHand held out \| Generalization ablation \|

	> These are full PyTorch Lightning checkpoints (model weights + hyperparameters, no optimizer/epoch
	> intermediates — only the last training epoch of each run is kept). Only the final epoch is
	> published; intermediate per-epoch checkpoints are not included here.

	---

	## Usage

	### GeoMatch v2 (Recommended)

	```python
	import torch, sys
	sys.path.append(".")
	import config
	from models.geomatch import GeoMatch

	model = GeoMatch(config).cuda()
	model.load_state_dict(torch.load("geomatch_v2/final.pth", map_location="cuda"))
	model.eval()

	with torch.no_grad():
	contact_map, keypoint_probs = model(
	obj_pc, # [B, 2048, 3] object point cloud
	robot_pc, # [B, 6, 3] robot surface points (6 keypoints)
	robot_key_point_idx, # [B, 6] keypoint indices into robot_pc
	obj_adj, # [B, 2048, 2048] object adjacency (sparse COO)
	robot_adj, # [B, 6, 6] robot adjacency
	xyz_prev, # [B, 6, 3] previous keypoint positions
	)
	# contact_map: [B, 2048, 6, 1] — per-object-point × per-keypoint contact probability
	# keypoint_probs: [B, 2048, 5, 1] — autoregressive keypoint contact probabilities
	```

	### GeoMatch++ (Deprecated — kept for reproducibility)

	```python
	import torch, sys
	sys.path.append(".")
	import config
	from models.geomatch_pp import GeoMatchPP

	model = GeoMatchPP(config).cuda()
	model.load_state_dict(torch.load("geomatch_pp/final.pth", map_location="cuda"))
	model.eval()

	with torch.no_grad():
	contact_map, keypoint_probs = model(
	obj_pc, # [B, 2048, 3]
	robot_pc, # [B, 6, 3]
	robot_key_point_idx, # [B, 6]
	obj_adj, # [B, 2048, 2048]
	robot_adj, # [B, 6, 6]
	xyz_prev, # [B, 6, 3]
	morph_features, # [B, 32, 9] morphology node features
	morph_adj, # [B, 32, 32] morphology adjacency
	)
	```

	Morphology graphs are pre-built per robot using `preprocess_morphology.py` → `gnn_morphology_new.pt`.

	### RobotFingerPrint

	Lightning checkpoints store hyperparameters alongside weights, so the model reconstructs itself
	directly from the `.ckpt` file — no separate config needed:

	```python
	import sys
	sys.path.append(".")
	from robotfingerprint.model.grasp_network import GcsGraspModel

	model = GcsGraspModel.load_from_checkpoint("robotfingerprint/fullrobots/final.ckpt")
	model.eval().cuda()

	with torch.no_grad():
	# input_pc: [B, N, 3] object point cloud; gt_gcs only used to establish shape during
	# training — at inference time use model.model.predict(input_pc) for sampling-based prediction
	pred_uv = model.model.predict(input_pc.cuda())
	# pred_uv: [B, N, 2] predicted Unified Gripper Coordinate Space (u, v) per object point
	```

	For the full downstream pipeline (coordinate inference on held-out objects → grasp generation →
	IsaacGym stability testing), see the
	[RobotFingerPrint repository](https://github.com/IRVLUTD/robot-finger-print) `gcs_gdx_inf_cvae.py`
	and `gcs_gdx_grasp_gen.py` scripts, using these checkpoints via `--logdir`/`--ckpt`.

	---

	## Repository Structure

	```
	geomatch_v1/ # GeoMatch v1 checkpoints (deprecated, corrupted keypoints)
	checkpoint_epoch50.pth
	checkpoint_epoch100.pth
	checkpoint_epoch150.pth
	final.pth
	geomatch_pp/ # GeoMatch++ checkpoints (deprecated, built on v1 encoders)
	checkpoint_epoch50.pth
	checkpoint_epoch100.pth
	checkpoint_epoch140.pth
	final.pth
	geomatch_v2/ # GeoMatch v2 checkpoints (recommended)
	checkpoint_epoch50.pth
	checkpoint_epoch100.pth
	checkpoint_epoch150.pth
	final.pth
	robotfingerprint/ # RobotFingerPrint (arXiv:2409.14519) paper reproduction
	fullrobots/final.ckpt
	unseen_barrett/final.ckpt
	unseen_ezgripper/final.ckpt
	unseen_shadowhand/final.ckpt
	model/
	grasp_network.py # GcsGraspModel (Lightning module)
	modules.py # GcsCVAE, PointNetCmapEncoder/Decoder
	loss.py # GcsLoss (recon + annealed KLD)
	models/ # Shared GeoMatch source (v1/v2/++)
	geomatch.py
	geomatch_pp.py
	gnn.py
	mlp.py
	config.py # Hyperparameters for GeoMatch models
	generate_keypoints_json.py # Fixed keypoint generator (used for v2 training data)
	```

	---

	## Citation

	```bibtex
	@inproceedings{geomatch2024,
	title = {Geometry Matching for Multi-Embodiment Grasping},
	booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
	year = {2024},
	}

	@article{geomatch_pp2024,
	title = {GeoMatch++: Morphology-Aware Grasping via Correspondence Learning},
	journal = {arXiv preprint arXiv:2412.18998},
	year = {2024},
	}

	@article{khargonkar2024robotfingerprint,
	title = {RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis},
	author = {Khargonkar, Ninad and Casas, Luis Felipe and Prabhakaran, Balakrishnan and Xiang, Yu},
	journal = {arXiv preprint arXiv:2409.14519},
	year = {2024},
	}
	```

	---

	## License

	Original GeoMatch code © 2023 DeepMind Technologies Limited, licensed under the Apache License 2.0.
	GeoMatch++ extension, v2 training, and all GeoMatch checkpoints produced by
	[Dimios45](https://huggingface.co/Dimios45) as part of the Graspmax project.

	RobotFingerPrint model source (`robotfingerprint/model/`) is from the original authors'
	[repository](https://github.com/IRVLUTD/robot-finger-print) (MIT-style license, see their repo for
	exact terms); checkpoints in `robotfingerprint/` were trained by
	[Dimios45](https://huggingface.co/Dimios45) reproducing the paper's published recipe.