OpenTouch VT2P Encoder

This repository contains the OpenTouch native retrieval encoder trained for the vt2p task:

Task: visual + tactile <-> pose
Model config: OpenTouch-DINOv3-B16-AllModalities
Sequence length: 20
Stride: 10
Training data: OpenTouch official retrieval HF dataset converted locally at datasets/opentouch_official_retrieval_hf
Initialization: warm-started from LeoJiangOR/opentouch-vp2t-encoder-best / local VP2T epoch_280.pt
Released checkpoint: epoch_300.pt

The best validation metrics in the run were observed at epoch 295, but checkpoints were saved every 10 epochs, so the released checkpoint is the final saved epoch 300 checkpoint.

Metrics

First full-validation evaluation, epoch 5:

Direction	R@1	R@5	R@10	mAP
visual+tactile -> pose	0.0097	0.0456	0.0794	0.0375
pose -> visual+tactile	0.0094	0.0429	0.0690	0.0343

Best observed validation, epoch 295:

Direction	R@1	R@5	R@10	mAP
visual+tactile -> pose	0.0466	0.1722	0.2553	0.1164
pose -> visual+tactile	0.0476	0.1601	0.2405	0.1131

Final saved checkpoint, epoch 300:

Direction	R@1	R@5	R@10	mAP
visual+tactile -> pose	0.0469	0.1648	0.2506	0.1137
pose -> visual+tactile	0.0446	0.1554	0.2439	0.1089

Files

epoch_300.pt: released final checkpoint
config/OpenTouch-DINOv3-B16-AllModalities.json: model config
results/results.jsonl: full validation history
params.txt: training hyperparameters

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support