PS3 Simulator — Pretrained Sonar Weights

Pretrained ViT-S/16 weights for underwater sonar object classification, trained exclusively on synthetic PS3 Simulator data — no real sonar data used at any stage.

Accepted at MaCVi Workshop @ CVPR 2026.


What is this?

Real sonar data is expensive, restricted, and classified — making it extremely difficult to train AI models for underwater object recognition.

PS3 Simulator solves this by generating physics-parametrised synthetic Side-Scan Sonar (SSS) images using Blender, then training I-JEPA — a structure-aware self-supervised learning framework — purely on synthetic data.

These weights are the result of that training.


Available Weights

File Model Pretrain Epochs
jepa-ep200.pth.tar I-JEPA ViT-S/16 PS3 Synthetic 200
dino_ps3_checkpoint.pth DINO ViT-S/16 PS3 Synthetic 200

Results on Real Sonar

Train: Synthetic PS3 only | Test: 876 real KSLG/SCTD images

Model Acc (%) ±Std F1
Random Init 23.0 ±0.9 0.239
DINO PS3 (this repo) 58.8 ±11.9 0.639
I-JEPA PS3 (this repo) 70.9 ±4.5 0.733
DINO ImageNet 78.8 ±0.9 0.810
I-JEPA ImageNet ViT-H† 86.0 ±0.1 0.796

†Upper bound — larger backbone, not directly comparable


Usage

Download weights

from huggingface_hub import hf_hub_download
import torch

# I-JEPA ViT-S/16 pretrained on PS3
path = hf_hub_download(
    repo_id="kamalbasha/ps3-simulator",
    filename="jepa-ep200.pth.tar")

ckpt = torch.load(path, map_location='cpu')
print(ckpt.keys())
# ['target_encoder', 'encoder', 'predictor', ...]

Load I-JEPA backbone

import timm
import torch
from huggingface_hub import hf_hub_download

# Download weights
path = hf_hub_download(
    repo_id="kamalbasha/ps3-simulator",
    filename="jepa-ep200.pth.tar")

# Load backbone
backbone = timm.create_model(
    'vit_small_patch16_224',
    pretrained=False,
    num_classes=0)

ckpt = torch.load(path, map_location='cpu')

# Use target_encoder (EMA stable encoder)
sd = {k.replace('module.', ''): v
      for k, v in ckpt['target_encoder'].items()}

# Fix pos_embed shape [1,196,384] -> [1,197,384]
if sd['pos_embed'].shape != backbone.pos_embed.shape:
    cls_pe = backbone.pos_embed[:, :1, :]
    sd['pos_embed'] = torch.cat(
        [cls_pe, sd['pos_embed']], dim=1)

backbone.load_state_dict(sd, strict=False)
backbone.eval()
print('I-JEPA PS3 backbone loaded.')

Load DINO backbone

import timm
import torch
from huggingface_hub import hf_hub_download

# Download weights
path = hf_hub_download(
    repo_id="kamalbasha/ps3-simulator",
    filename="dino_ps3_checkpoint.pth")

# Load backbone
backbone = timm.create_model(
    'vit_small_patch16_224',
    pretrained=False,
    num_classes=0)

ckpt = torch.load(path, map_location='cpu')
sd = {k.replace('module.', '').replace('backbone.', ''): v
      for k, v in ckpt['student'].items()}

backbone.load_state_dict(sd, strict=False)
backbone.eval()
print('DINO PS3 backbone loaded.')

Full evaluation pipeline

# Clone repo and run notebook
git clone https://github.com/bashakamal/ps3-simulator
cd ps3-simulator
pip install -r requirements.txt

# Open and run:
# stage3_evaluation/PS3_Stage2_Evaluation.ipynb

Training Details

I-JEPA Pretraining

Model    : ViT-S/16
Data     : 1,008 synthetic PS3 SSS images (unlabeled)
Epochs   : 200
Optimizer: AdamW
Config   : configs/sonar_vits16.yaml

Fine-tuning Protocol

Backbone : I-JEPA pretrained ViT-S/16
Head     : LayerNorm → Linear(384,256) → GELU 
           → Dropout(0.1) → Linear(256,2)
lr       : 1e-4
Epochs   : 100 (early stopping, patience=20)
Data     : Labeled synthetic PS3 images
Test     : Real KSLG/SCTD sonar (never seen)

PS3 Simulator Dataset

Physics-parametrised synthetic SSS dataset:

Images   : 1,008
Classes  : Ship, Plane
Altitude : 50m, 70m, 100m
Seabed   : Sand, Gravel
Angles   : Varied grazing angles
Metadata : Per-image JSON with physical params

Dataset: [coming soon] GitHub: github.com/bashakamal/ps3-simulator


Citation

@inproceedings{basha2026ps3,
  title     = {PS3 Simulator: Physics-Parametrised Synthetic 
               Sonar for Self-Supervised Sim-to-Real Transfer},
  author    = {Basha, Kamal S; Athira Nambiar},
  booktitle = {Proceedings of the IEEE/CVF Conference on 
               Computer Vision and Pattern Recognition 
               Workshops (MaCVi)},
  year      = {2026}
}

Acknowledgements


License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results