OpenTouch VT2P Encoder

This repository contains the OpenTouch native retrieval encoder trained for the vt2p task:

  • Task: visual + tactile <-> pose
  • Model config: OpenTouch-DINOv3-B16-AllModalities
  • Sequence length: 20
  • Stride: 10
  • Training data: OpenTouch official retrieval HF dataset converted locally at datasets/opentouch_official_retrieval_hf
  • Initialization: warm-started from LeoJiangOR/opentouch-vp2t-encoder-best / local VP2T epoch_280.pt
  • Released checkpoint: epoch_300.pt

The best validation metrics in the run were observed at epoch 295, but checkpoints were saved every 10 epochs, so the released checkpoint is the final saved epoch 300 checkpoint.

Metrics

First full-validation evaluation, epoch 5:

Direction R@1 R@5 R@10 mAP
visual+tactile -> pose 0.0097 0.0456 0.0794 0.0375
pose -> visual+tactile 0.0094 0.0429 0.0690 0.0343

Best observed validation, epoch 295:

Direction R@1 R@5 R@10 mAP
visual+tactile -> pose 0.0466 0.1722 0.2553 0.1164
pose -> visual+tactile 0.0476 0.1601 0.2405 0.1131

Final saved checkpoint, epoch 300:

Direction R@1 R@5 R@10 mAP
visual+tactile -> pose 0.0469 0.1648 0.2506 0.1137
pose -> visual+tactile 0.0446 0.1554 0.2439 0.1089

Files

  • epoch_300.pt: released final checkpoint
  • config/OpenTouch-DINOv3-B16-AllModalities.json: model config
  • results/results.jsonl: full validation history
  • params.txt: training hyperparameters
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support