v2t Unified SSVTP Diffusion

Trained vis-to-tactile unified generator checkpoint from this paper's training pipeline.

Source

This artifact comes from the repository https://github.com/howard-lynn-ye/vr-teleop-isaacsim. Paper draft: https://github.com/howard-lynn-ye/vr-teleop-isaacsim/blob/main/paper/main.tex.

Pinned source commit for the uploaded training code context: 8084ec32d37f07d57bd71e424d6144aae9e4f8b9.

Contents

  • best_unified.pt: best EMA unified generator checkpoint.
  • final_unified.pt: final unified generator checkpoint.
  • train.log: training log.
  • val.jsonl: validation trace.
  • eval/: generated tactile evaluation report and sample grid.

The duplicate samples/ PNG directory is intentionally not included here because it is already tracked in the main GitHub history for recent commits.

Reproducibility

  • Source dataset line: SSVTP V+T pairs from mlfu7/Touch-Vision-Language-Dataset.
  • Canonical Phase 1 split reference: DATA_SEED=42.
  • Training script in the main repo: https://github.com/howard-lynn-ye/vr-teleop-isaacsim/blob/8084ec32d37f07d57bd71e424d6144aae9e4f8b9/path_d/code/train/train_v2t_unified.py.
  • Eval report in this repo records 46 SSVTP test pairs, DINOv2 encoders, and the caption-proxy probe.

Critical Caveats

  • Validation loss improved from about 1.108 to 0.0283, but qualitative inspection showed visual mode collapse: smooth global tactile averages without contact-marker or gel texture structure.
  • The eval artifact marks Good gen-T: False; generated tactile features underperform the vision-only caption-proxy probe.
  • This is not deployment-ready. Treat it as scaffolding for future TVL + SSVTP cross-dataset retraining and sensor-unified pretraining.

Related Repositories

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support