--- license: mit tags: - sonar - underwater-robotics - self-supervised-learning - vision-transformer - sim-to-real - synthetic-data - ijepa - dino - side-scan-sonar - autonomous-underwater-vehicle - cvpr2026 - macvi datasets: - bashakamal/ps3-simulator metrics: - accuracy - f1 model-index: - name: I-JEPA ViT-S/16 PS3 results: - task: type: image-classification name: Sonar Object Classification dataset: name: KLSG + SCTD (Real Sonar) type: real-sonar metrics: - type: accuracy value: 70.9 - type: f1 value: 0.733 --- # PS3 Simulator — Pretrained Sonar Weights Pretrained ViT-S/16 weights for underwater sonar object classification, trained exclusively on **synthetic PS3 Simulator data** — no real sonar data used at any stage. Accepted at **MaCVi Workshop @ CVPR 2026**. --- ## What is this? Real sonar data is expensive, restricted, and classified — making it extremely difficult to train AI models for underwater object recognition. PS3 Simulator solves this by generating physics-parametrised synthetic Side-Scan Sonar (SSS) images using Blender, then training I-JEPA — a structure-aware self-supervised learning framework — purely on synthetic data. These weights are the result of that training. --- ## Available Weights | File | Model | Pretrain | Epochs | |------|-------|----------|--------| | `jepa-ep200.pth.tar` | I-JEPA ViT-S/16 | PS3 Synthetic | 200 | | `dino_ps3_checkpoint.pth` | DINO ViT-S/16 | PS3 Synthetic | 200 | --- ## Results on Real Sonar Train: Synthetic PS3 only | Test: 876 real KSLG/SCTD images | Model | Acc (%) | ±Std | F1 | |-------|---------|------|-----| | Random Init | 23.0 | ±0.9 | 0.239 | | DINO PS3 (this repo) | 58.8 | ±11.9 | 0.639 | | **I-JEPA PS3 (this repo)** | **70.9** | **±4.5** | **0.733** | | DINO ImageNet | 78.8 | ±0.9 | 0.810 | | I-JEPA ImageNet ViT-H† | 86.0 | ±0.1 | 0.796 | †Upper bound — larger backbone, not directly comparable --- ## Usage ### Download weights ```python from huggingface_hub import hf_hub_download import torch # I-JEPA ViT-S/16 pretrained on PS3 path = hf_hub_download( repo_id="kamalbasha/ps3-simulator", filename="jepa-ep200.pth.tar") ckpt = torch.load(path, map_location='cpu') print(ckpt.keys()) # ['target_encoder', 'encoder', 'predictor', ...] ``` ### Load I-JEPA backbone ```python import timm import torch from huggingface_hub import hf_hub_download # Download weights path = hf_hub_download( repo_id="kamalbasha/ps3-simulator", filename="jepa-ep200.pth.tar") # Load backbone backbone = timm.create_model( 'vit_small_patch16_224', pretrained=False, num_classes=0) ckpt = torch.load(path, map_location='cpu') # Use target_encoder (EMA stable encoder) sd = {k.replace('module.', ''): v for k, v in ckpt['target_encoder'].items()} # Fix pos_embed shape [1,196,384] -> [1,197,384] if sd['pos_embed'].shape != backbone.pos_embed.shape: cls_pe = backbone.pos_embed[:, :1, :] sd['pos_embed'] = torch.cat( [cls_pe, sd['pos_embed']], dim=1) backbone.load_state_dict(sd, strict=False) backbone.eval() print('I-JEPA PS3 backbone loaded.') ``` ### Load DINO backbone ```python import timm import torch from huggingface_hub import hf_hub_download # Download weights path = hf_hub_download( repo_id="kamalbasha/ps3-simulator", filename="dino_ps3_checkpoint.pth") # Load backbone backbone = timm.create_model( 'vit_small_patch16_224', pretrained=False, num_classes=0) ckpt = torch.load(path, map_location='cpu') sd = {k.replace('module.', '').replace('backbone.', ''): v for k, v in ckpt['student'].items()} backbone.load_state_dict(sd, strict=False) backbone.eval() print('DINO PS3 backbone loaded.') ``` ### Full evaluation pipeline ```python # Clone repo and run notebook git clone https://github.com/bashakamal/ps3-simulator cd ps3-simulator pip install -r requirements.txt # Open and run: # stage3_evaluation/PS3_Stage2_Evaluation.ipynb ``` --- ## Training Details ### I-JEPA Pretraining ``` Model : ViT-S/16 Data : 1,008 synthetic PS3 SSS images (unlabeled) Epochs : 200 Optimizer: AdamW Config : configs/sonar_vits16.yaml ``` ### Fine-tuning Protocol ``` Backbone : I-JEPA pretrained ViT-S/16 Head : LayerNorm → Linear(384,256) → GELU → Dropout(0.1) → Linear(256,2) lr : 1e-4 Epochs : 100 (early stopping, patience=20) Data : Labeled synthetic PS3 images Test : Real KSLG/SCTD sonar (never seen) ``` --- ## PS3 Simulator Dataset Physics-parametrised synthetic SSS dataset: ``` Images : 1,008 Classes : Ship, Plane Altitude : 50m, 70m, 100m Seabed : Sand, Gravel Angles : Varied grazing angles Metadata : Per-image JSON with physical params ``` Dataset: [coming soon] GitHub: [github.com/bashakamal/ps3-simulator](https://github.com/bashakamal/ps3-simulator) --- ## Citation ```bibtex @inproceedings{basha2026ps3, title = {PS3 Simulator: Physics-Parametrised Synthetic Sonar for Self-Supervised Sim-to-Real Transfer}, author = {Basha, Kamal S; Athira Nambiar}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (MaCVi)}, year = {2026} } ``` --- ## Acknowledgements - [I-JEPA](https://github.com/facebookresearch/ijepa) — Facebook Research - [DINO](https://github.com/facebookresearch/dino) — Facebook Research - [timm](https://github.com/huggingface/pytorch-image-models) — HuggingFace - [Blender MCP](https://github.com/ahujasid/blender-mcp) — 3D generation - [SeabedObjects-KLSG](https://github.com/mvaldenegro/marine-debris-fls-datasets) --- ## License MIT License