PS3 Simulator — Pretrained Sonar Weights

Pretrained ViT-S/16 weights for underwater sonar object classification, trained exclusively on synthetic PS3 Simulator data — no real sonar data used at any stage.

Accepted at MaCVi Workshop @ CVPR 2026.

What is this?

Real sonar data is expensive, restricted, and classified — making it extremely difficult to train AI models for underwater object recognition.

PS3 Simulator solves this by generating physics-parametrised synthetic Side-Scan Sonar (SSS) images using Blender, then training I-JEPA — a structure-aware self-supervised learning framework — purely on synthetic data.

These weights are the result of that training.

Available Weights

File	Model	Pretrain	Epochs
`jepa-ep200.pth.tar`	I-JEPA ViT-S/16	PS3 Synthetic	200
`dino_ps3_checkpoint.pth`	DINO ViT-S/16	PS3 Synthetic	200

Results on Real Sonar

Train: Synthetic PS3 only | Test: 876 real KSLG/SCTD images

Model	Acc (%)	±Std	F1
Random Init	23.0	±0.9	0.239
DINO PS3 (this repo)	58.8	±11.9	0.639
I-JEPA PS3 (this repo)	70.9	±4.5	0.733
DINO ImageNet	78.8	±0.9	0.810
I-JEPA ImageNet ViT-H†	86.0	±0.1	0.796

†Upper bound — larger backbone, not directly comparable

Usage

Download weights

from huggingface_hub import hf_hub_download
import torch

# I-JEPA ViT-S/16 pretrained on PS3
path = hf_hub_download(
    repo_id="kamalbasha/ps3-simulator",
    filename="jepa-ep200.pth.tar")

ckpt = torch.load(path, map_location='cpu')
print(ckpt.keys())
# ['target_encoder', 'encoder', 'predictor', ...]

Load I-JEPA backbone

import timm
import torch
from huggingface_hub import hf_hub_download

# Download weights
path = hf_hub_download(
    repo_id="kamalbasha/ps3-simulator",
    filename="jepa-ep200.pth.tar")

# Load backbone
backbone = timm.create_model(
    'vit_small_patch16_224',
    pretrained=False,
    num_classes=0)

ckpt = torch.load(path, map_location='cpu')

# Use target_encoder (EMA stable encoder)
sd = {k.replace('module.', ''): v
      for k, v in ckpt['target_encoder'].items()}

# Fix pos_embed shape [1,196,384] -> [1,197,384]
if sd['pos_embed'].shape != backbone.pos_embed.shape:
    cls_pe = backbone.pos_embed[:, :1, :]
    sd['pos_embed'] = torch.cat(
        [cls_pe, sd['pos_embed']], dim=1)

backbone.load_state_dict(sd, strict=False)
backbone.eval()
print('I-JEPA PS3 backbone loaded.')

Load DINO backbone

import timm
import torch
from huggingface_hub import hf_hub_download

# Download weights
path = hf_hub_download(
    repo_id="kamalbasha/ps3-simulator",
    filename="dino_ps3_checkpoint.pth")

# Load backbone
backbone = timm.create_model(
    'vit_small_patch16_224',
    pretrained=False,
    num_classes=0)

ckpt = torch.load(path, map_location='cpu')
sd = {k.replace('module.', '').replace('backbone.', ''): v
      for k, v in ckpt['student'].items()}

backbone.load_state_dict(sd, strict=False)
backbone.eval()
print('DINO PS3 backbone loaded.')

Full evaluation pipeline

# Clone repo and run notebook
git clone https://github.com/bashakamal/ps3-simulator
cd ps3-simulator
pip install -r requirements.txt

# Open and run:
# stage3_evaluation/PS3_Stage2_Evaluation.ipynb

Training Details

I-JEPA Pretraining

Model    : ViT-S/16
Data     : 1,008 synthetic PS3 SSS images (unlabeled)
Epochs   : 200
Optimizer: AdamW
Config   : configs/sonar_vits16.yaml

Fine-tuning Protocol

Backbone : I-JEPA pretrained ViT-S/16
Head     : LayerNorm → Linear(384,256) → GELU 
           → Dropout(0.1) → Linear(256,2)
lr       : 1e-4
Epochs   : 100 (early stopping, patience=20)
Data     : Labeled synthetic PS3 images
Test     : Real KSLG/SCTD sonar (never seen)

PS3 Simulator Dataset

Physics-parametrised synthetic SSS dataset:

Images   : 1,008
Classes  : Ship, Plane
Altitude : 50m, 70m, 100m
Seabed   : Sand, Gravel
Angles   : Varied grazing angles
Metadata : Per-image JSON with physical params

Dataset: [coming soon] GitHub: github.com/bashakamal/ps3-simulator

Citation

@inproceedings{basha2026ps3,
  title     = {PS3 Simulator: Physics-Parametrised Synthetic 
               Sonar for Self-Supervised Sim-to-Real Transfer},
  author    = {Basha, Kamal S; Athira Nambiar},
  booktitle = {Proceedings of the IEEE/CVF Conference on 
               Computer Vision and Pattern Recognition 
               Workshops (MaCVi)},
  year      = {2026}
}

Acknowledgements

I-JEPA — Facebook Research
DINO — Facebook Research
timm — HuggingFace
Blender MCP — 3D generation
SeabedObjects-KLSG

License

MIT License

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

accuracy on KLSG + SCTD (Real Sonar)
self-reported

70.900
f1 on KLSG + SCTD (Real Sonar)
self-reported

0.733