PS3 Simulator — Pretrained Sonar Weights
Pretrained ViT-S/16 weights for underwater sonar object classification, trained exclusively on synthetic PS3 Simulator data — no real sonar data used at any stage.
Accepted at MaCVi Workshop @ CVPR 2026.
What is this?
Real sonar data is expensive, restricted, and classified — making it extremely difficult to train AI models for underwater object recognition.
PS3 Simulator solves this by generating physics-parametrised synthetic Side-Scan Sonar (SSS) images using Blender, then training I-JEPA — a structure-aware self-supervised learning framework — purely on synthetic data.
These weights are the result of that training.
Available Weights
| File | Model | Pretrain | Epochs |
|---|---|---|---|
jepa-ep200.pth.tar |
I-JEPA ViT-S/16 | PS3 Synthetic | 200 |
dino_ps3_checkpoint.pth |
DINO ViT-S/16 | PS3 Synthetic | 200 |
Results on Real Sonar
Train: Synthetic PS3 only | Test: 876 real KSLG/SCTD images
| Model | Acc (%) | ±Std | F1 |
|---|---|---|---|
| Random Init | 23.0 | ±0.9 | 0.239 |
| DINO PS3 (this repo) | 58.8 | ±11.9 | 0.639 |
| I-JEPA PS3 (this repo) | 70.9 | ±4.5 | 0.733 |
| DINO ImageNet | 78.8 | ±0.9 | 0.810 |
| I-JEPA ImageNet ViT-H†| 86.0 | ±0.1 | 0.796 |
†Upper bound — larger backbone, not directly comparable
Usage
Download weights
from huggingface_hub import hf_hub_download
import torch
# I-JEPA ViT-S/16 pretrained on PS3
path = hf_hub_download(
repo_id="kamalbasha/ps3-simulator",
filename="jepa-ep200.pth.tar")
ckpt = torch.load(path, map_location='cpu')
print(ckpt.keys())
# ['target_encoder', 'encoder', 'predictor', ...]
Load I-JEPA backbone
import timm
import torch
from huggingface_hub import hf_hub_download
# Download weights
path = hf_hub_download(
repo_id="kamalbasha/ps3-simulator",
filename="jepa-ep200.pth.tar")
# Load backbone
backbone = timm.create_model(
'vit_small_patch16_224',
pretrained=False,
num_classes=0)
ckpt = torch.load(path, map_location='cpu')
# Use target_encoder (EMA stable encoder)
sd = {k.replace('module.', ''): v
for k, v in ckpt['target_encoder'].items()}
# Fix pos_embed shape [1,196,384] -> [1,197,384]
if sd['pos_embed'].shape != backbone.pos_embed.shape:
cls_pe = backbone.pos_embed[:, :1, :]
sd['pos_embed'] = torch.cat(
[cls_pe, sd['pos_embed']], dim=1)
backbone.load_state_dict(sd, strict=False)
backbone.eval()
print('I-JEPA PS3 backbone loaded.')
Load DINO backbone
import timm
import torch
from huggingface_hub import hf_hub_download
# Download weights
path = hf_hub_download(
repo_id="kamalbasha/ps3-simulator",
filename="dino_ps3_checkpoint.pth")
# Load backbone
backbone = timm.create_model(
'vit_small_patch16_224',
pretrained=False,
num_classes=0)
ckpt = torch.load(path, map_location='cpu')
sd = {k.replace('module.', '').replace('backbone.', ''): v
for k, v in ckpt['student'].items()}
backbone.load_state_dict(sd, strict=False)
backbone.eval()
print('DINO PS3 backbone loaded.')
Full evaluation pipeline
# Clone repo and run notebook
git clone https://github.com/bashakamal/ps3-simulator
cd ps3-simulator
pip install -r requirements.txt
# Open and run:
# stage3_evaluation/PS3_Stage2_Evaluation.ipynb
Training Details
I-JEPA Pretraining
Model : ViT-S/16
Data : 1,008 synthetic PS3 SSS images (unlabeled)
Epochs : 200
Optimizer: AdamW
Config : configs/sonar_vits16.yaml
Fine-tuning Protocol
Backbone : I-JEPA pretrained ViT-S/16
Head : LayerNorm → Linear(384,256) → GELU
→ Dropout(0.1) → Linear(256,2)
lr : 1e-4
Epochs : 100 (early stopping, patience=20)
Data : Labeled synthetic PS3 images
Test : Real KSLG/SCTD sonar (never seen)
PS3 Simulator Dataset
Physics-parametrised synthetic SSS dataset:
Images : 1,008
Classes : Ship, Plane
Altitude : 50m, 70m, 100m
Seabed : Sand, Gravel
Angles : Varied grazing angles
Metadata : Per-image JSON with physical params
Dataset: [coming soon] GitHub: github.com/bashakamal/ps3-simulator
Citation
@inproceedings{basha2026ps3,
title = {PS3 Simulator: Physics-Parametrised Synthetic
Sonar for Self-Supervised Sim-to-Real Transfer},
author = {Basha, Kamal S; Athira Nambiar},
booktitle = {Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition
Workshops (MaCVi)},
year = {2026}
}
Acknowledgements
- I-JEPA — Facebook Research
- DINO — Facebook Research
- timm — HuggingFace
- Blender MCP — 3D generation
- SeabedObjects-KLSG
License
MIT License
Evaluation results
- accuracy on KLSG + SCTD (Real Sonar)self-reported70.900
- f1 on KLSG + SCTD (Real Sonar)self-reported0.733