| --- |
| license: mit |
| tags: |
| - sonar |
| - underwater-robotics |
| - self-supervised-learning |
| - vision-transformer |
| - sim-to-real |
| - synthetic-data |
| - ijepa |
| - dino |
| - side-scan-sonar |
| - autonomous-underwater-vehicle |
| - cvpr2026 |
| - macvi |
| datasets: |
| - bashakamal/ps3-simulator |
| metrics: |
| - accuracy |
| - f1 |
| model-index: |
| - name: I-JEPA ViT-S/16 PS3 |
| results: |
| - task: |
| type: image-classification |
| name: Sonar Object Classification |
| dataset: |
| name: KLSG + SCTD (Real Sonar) |
| type: real-sonar |
| metrics: |
| - type: accuracy |
| value: 70.9 |
| - type: f1 |
| value: 0.733 |
| --- |
| |
| # PS3 Simulator — Pretrained Sonar Weights |
|
|
| Pretrained ViT-S/16 weights for underwater |
| sonar object classification, trained exclusively |
| on **synthetic PS3 Simulator data** — |
| no real sonar data used at any stage. |
|
|
| Accepted at **MaCVi Workshop @ CVPR 2026**. |
|
|
| --- |
|
|
| ## What is this? |
|
|
| Real sonar data is expensive, restricted, and |
| classified — making it extremely difficult to |
| train AI models for underwater object recognition. |
|
|
| PS3 Simulator solves this by generating |
| physics-parametrised synthetic Side-Scan Sonar |
| (SSS) images using Blender, then training |
| I-JEPA — a structure-aware self-supervised |
| learning framework — purely on synthetic data. |
|
|
| These weights are the result of that training. |
|
|
| --- |
|
|
| ## Available Weights |
|
|
| | File | Model | Pretrain | Epochs | |
| |------|-------|----------|--------| |
| | `jepa-ep200.pth.tar` | I-JEPA ViT-S/16 | PS3 Synthetic | 200 | |
| | `dino_ps3_checkpoint.pth` | DINO ViT-S/16 | PS3 Synthetic | 200 | |
|
|
| --- |
|
|
| ## Results on Real Sonar |
|
|
| Train: Synthetic PS3 only | Test: 876 real KSLG/SCTD images |
|
|
| | Model | Acc (%) | ±Std | F1 | |
| |-------|---------|------|-----| |
| | Random Init | 23.0 | ±0.9 | 0.239 | |
| | DINO PS3 (this repo) | 58.8 | ±11.9 | 0.639 | |
| | **I-JEPA PS3 (this repo)** | **70.9** | **±4.5** | **0.733** | |
| | DINO ImageNet | 78.8 | ±0.9 | 0.810 | |
| | I-JEPA ImageNet ViT-H†| 86.0 | ±0.1 | 0.796 | |
|
|
| †Upper bound — larger backbone, not directly comparable |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Download weights |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import torch |
| |
| # I-JEPA ViT-S/16 pretrained on PS3 |
| path = hf_hub_download( |
| repo_id="kamalbasha/ps3-simulator", |
| filename="jepa-ep200.pth.tar") |
| |
| ckpt = torch.load(path, map_location='cpu') |
| print(ckpt.keys()) |
| # ['target_encoder', 'encoder', 'predictor', ...] |
| ``` |
|
|
| ### Load I-JEPA backbone |
|
|
| ```python |
| import timm |
| import torch |
| from huggingface_hub import hf_hub_download |
| |
| # Download weights |
| path = hf_hub_download( |
| repo_id="kamalbasha/ps3-simulator", |
| filename="jepa-ep200.pth.tar") |
| |
| # Load backbone |
| backbone = timm.create_model( |
| 'vit_small_patch16_224', |
| pretrained=False, |
| num_classes=0) |
| |
| ckpt = torch.load(path, map_location='cpu') |
| |
| # Use target_encoder (EMA stable encoder) |
| sd = {k.replace('module.', ''): v |
| for k, v in ckpt['target_encoder'].items()} |
| |
| # Fix pos_embed shape [1,196,384] -> [1,197,384] |
| if sd['pos_embed'].shape != backbone.pos_embed.shape: |
| cls_pe = backbone.pos_embed[:, :1, :] |
| sd['pos_embed'] = torch.cat( |
| [cls_pe, sd['pos_embed']], dim=1) |
| |
| backbone.load_state_dict(sd, strict=False) |
| backbone.eval() |
| print('I-JEPA PS3 backbone loaded.') |
| ``` |
|
|
| ### Load DINO backbone |
|
|
| ```python |
| import timm |
| import torch |
| from huggingface_hub import hf_hub_download |
| |
| # Download weights |
| path = hf_hub_download( |
| repo_id="kamalbasha/ps3-simulator", |
| filename="dino_ps3_checkpoint.pth") |
| |
| # Load backbone |
| backbone = timm.create_model( |
| 'vit_small_patch16_224', |
| pretrained=False, |
| num_classes=0) |
| |
| ckpt = torch.load(path, map_location='cpu') |
| sd = {k.replace('module.', '').replace('backbone.', ''): v |
| for k, v in ckpt['student'].items()} |
| |
| backbone.load_state_dict(sd, strict=False) |
| backbone.eval() |
| print('DINO PS3 backbone loaded.') |
| ``` |
|
|
| ### Full evaluation pipeline |
|
|
| ```python |
| # Clone repo and run notebook |
| git clone https://github.com/bashakamal/ps3-simulator |
| cd ps3-simulator |
| pip install -r requirements.txt |
| |
| # Open and run: |
| # stage3_evaluation/PS3_Stage2_Evaluation.ipynb |
| ``` |
|
|
| --- |
|
|
| ## Training Details |
|
|
| ### I-JEPA Pretraining |
|
|
| ``` |
| Model : ViT-S/16 |
| Data : 1,008 synthetic PS3 SSS images (unlabeled) |
| Epochs : 200 |
| Optimizer: AdamW |
| Config : configs/sonar_vits16.yaml |
| ``` |
|
|
| ### Fine-tuning Protocol |
|
|
| ``` |
| Backbone : I-JEPA pretrained ViT-S/16 |
| Head : LayerNorm → Linear(384,256) → GELU |
| → Dropout(0.1) → Linear(256,2) |
| lr : 1e-4 |
| Epochs : 100 (early stopping, patience=20) |
| Data : Labeled synthetic PS3 images |
| Test : Real KSLG/SCTD sonar (never seen) |
| ``` |
|
|
| --- |
|
|
| ## PS3 Simulator Dataset |
|
|
| Physics-parametrised synthetic SSS dataset: |
|
|
| ``` |
| Images : 1,008 |
| Classes : Ship, Plane |
| Altitude : 50m, 70m, 100m |
| Seabed : Sand, Gravel |
| Angles : Varied grazing angles |
| Metadata : Per-image JSON with physical params |
| ``` |
|
|
| Dataset: [coming soon] |
| GitHub: [github.com/bashakamal/ps3-simulator](https://github.com/bashakamal/ps3-simulator) |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{basha2026ps3, |
| title = {PS3 Simulator: Physics-Parametrised Synthetic |
| Sonar for Self-Supervised Sim-to-Real Transfer}, |
| author = {Basha, Kamal S; Athira Nambiar}, |
| booktitle = {Proceedings of the IEEE/CVF Conference on |
| Computer Vision and Pattern Recognition |
| Workshops (MaCVi)}, |
| year = {2026} |
| } |
| ``` |
|
|
| --- |
|
|
| ## Acknowledgements |
|
|
| - [I-JEPA](https://github.com/facebookresearch/ijepa) — Facebook Research |
| - [DINO](https://github.com/facebookresearch/dino) — Facebook Research |
| - [timm](https://github.com/huggingface/pytorch-image-models) — HuggingFace |
| - [Blender MCP](https://github.com/ahujasid/blender-mcp) — 3D generation |
| - [SeabedObjects-KLSG](https://github.com/mvaldenegro/marine-debris-fls-datasets) |
|
|
| --- |
|
|
| ## License |
|
|
| MIT License |