ps3-simulator / README.md
kamalbasha's picture
Update README.md
0e9ada5 verified
---
license: mit
tags:
- sonar
- underwater-robotics
- self-supervised-learning
- vision-transformer
- sim-to-real
- synthetic-data
- ijepa
- dino
- side-scan-sonar
- autonomous-underwater-vehicle
- cvpr2026
- macvi
datasets:
- bashakamal/ps3-simulator
metrics:
- accuracy
- f1
model-index:
- name: I-JEPA ViT-S/16 PS3
results:
- task:
type: image-classification
name: Sonar Object Classification
dataset:
name: KLSG + SCTD (Real Sonar)
type: real-sonar
metrics:
- type: accuracy
value: 70.9
- type: f1
value: 0.733
---
# PS3 Simulator — Pretrained Sonar Weights
Pretrained ViT-S/16 weights for underwater
sonar object classification, trained exclusively
on **synthetic PS3 Simulator data** —
no real sonar data used at any stage.
Accepted at **MaCVi Workshop @ CVPR 2026**.
---
## What is this?
Real sonar data is expensive, restricted, and
classified — making it extremely difficult to
train AI models for underwater object recognition.
PS3 Simulator solves this by generating
physics-parametrised synthetic Side-Scan Sonar
(SSS) images using Blender, then training
I-JEPA — a structure-aware self-supervised
learning framework — purely on synthetic data.
These weights are the result of that training.
---
## Available Weights
| File | Model | Pretrain | Epochs |
|------|-------|----------|--------|
| `jepa-ep200.pth.tar` | I-JEPA ViT-S/16 | PS3 Synthetic | 200 |
| `dino_ps3_checkpoint.pth` | DINO ViT-S/16 | PS3 Synthetic | 200 |
---
## Results on Real Sonar
Train: Synthetic PS3 only | Test: 876 real KSLG/SCTD images
| Model | Acc (%) | ±Std | F1 |
|-------|---------|------|-----|
| Random Init | 23.0 | ±0.9 | 0.239 |
| DINO PS3 (this repo) | 58.8 | ±11.9 | 0.639 |
| **I-JEPA PS3 (this repo)** | **70.9** | **±4.5** | **0.733** |
| DINO ImageNet | 78.8 | ±0.9 | 0.810 |
| I-JEPA ImageNet ViT-H† | 86.0 | ±0.1 | 0.796 |
†Upper bound — larger backbone, not directly comparable
---
## Usage
### Download weights
```python
from huggingface_hub import hf_hub_download
import torch
# I-JEPA ViT-S/16 pretrained on PS3
path = hf_hub_download(
repo_id="kamalbasha/ps3-simulator",
filename="jepa-ep200.pth.tar")
ckpt = torch.load(path, map_location='cpu')
print(ckpt.keys())
# ['target_encoder', 'encoder', 'predictor', ...]
```
### Load I-JEPA backbone
```python
import timm
import torch
from huggingface_hub import hf_hub_download
# Download weights
path = hf_hub_download(
repo_id="kamalbasha/ps3-simulator",
filename="jepa-ep200.pth.tar")
# Load backbone
backbone = timm.create_model(
'vit_small_patch16_224',
pretrained=False,
num_classes=0)
ckpt = torch.load(path, map_location='cpu')
# Use target_encoder (EMA stable encoder)
sd = {k.replace('module.', ''): v
for k, v in ckpt['target_encoder'].items()}
# Fix pos_embed shape [1,196,384] -> [1,197,384]
if sd['pos_embed'].shape != backbone.pos_embed.shape:
cls_pe = backbone.pos_embed[:, :1, :]
sd['pos_embed'] = torch.cat(
[cls_pe, sd['pos_embed']], dim=1)
backbone.load_state_dict(sd, strict=False)
backbone.eval()
print('I-JEPA PS3 backbone loaded.')
```
### Load DINO backbone
```python
import timm
import torch
from huggingface_hub import hf_hub_download
# Download weights
path = hf_hub_download(
repo_id="kamalbasha/ps3-simulator",
filename="dino_ps3_checkpoint.pth")
# Load backbone
backbone = timm.create_model(
'vit_small_patch16_224',
pretrained=False,
num_classes=0)
ckpt = torch.load(path, map_location='cpu')
sd = {k.replace('module.', '').replace('backbone.', ''): v
for k, v in ckpt['student'].items()}
backbone.load_state_dict(sd, strict=False)
backbone.eval()
print('DINO PS3 backbone loaded.')
```
### Full evaluation pipeline
```python
# Clone repo and run notebook
git clone https://github.com/bashakamal/ps3-simulator
cd ps3-simulator
pip install -r requirements.txt
# Open and run:
# stage3_evaluation/PS3_Stage2_Evaluation.ipynb
```
---
## Training Details
### I-JEPA Pretraining
```
Model : ViT-S/16
Data : 1,008 synthetic PS3 SSS images (unlabeled)
Epochs : 200
Optimizer: AdamW
Config : configs/sonar_vits16.yaml
```
### Fine-tuning Protocol
```
Backbone : I-JEPA pretrained ViT-S/16
Head : LayerNorm → Linear(384,256) → GELU
→ Dropout(0.1) → Linear(256,2)
lr : 1e-4
Epochs : 100 (early stopping, patience=20)
Data : Labeled synthetic PS3 images
Test : Real KSLG/SCTD sonar (never seen)
```
---
## PS3 Simulator Dataset
Physics-parametrised synthetic SSS dataset:
```
Images : 1,008
Classes : Ship, Plane
Altitude : 50m, 70m, 100m
Seabed : Sand, Gravel
Angles : Varied grazing angles
Metadata : Per-image JSON with physical params
```
Dataset: [coming soon]
GitHub: [github.com/bashakamal/ps3-simulator](https://github.com/bashakamal/ps3-simulator)
---
## Citation
```bibtex
@inproceedings{basha2026ps3,
title = {PS3 Simulator: Physics-Parametrised Synthetic
Sonar for Self-Supervised Sim-to-Real Transfer},
author = {Basha, Kamal S; Athira Nambiar},
booktitle = {Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition
Workshops (MaCVi)},
year = {2026}
}
```
---
## Acknowledgements
- [I-JEPA](https://github.com/facebookresearch/ijepa) — Facebook Research
- [DINO](https://github.com/facebookresearch/dino) — Facebook Research
- [timm](https://github.com/huggingface/pytorch-image-models) — HuggingFace
- [Blender MCP](https://github.com/ahujasid/blender-mcp) — 3D generation
- [SeabedObjects-KLSG](https://github.com/mvaldenegro/marine-debris-fls-datasets)
---
## License
MIT License