Update README.md

0e9ada5 verified about 1 month ago

5.76 kB

	---
	license: mit
	tags:
	- sonar
	- underwater-robotics
	- self-supervised-learning
	- vision-transformer
	- sim-to-real
	- synthetic-data
	- ijepa
	- dino
	- side-scan-sonar
	- autonomous-underwater-vehicle
	- cvpr2026
	- macvi
	datasets:
	- bashakamal/ps3-simulator
	metrics:
	- accuracy
	- f1
	model-index:
	- name: I-JEPA ViT-S/16 PS3
	results:
	- task:
	type: image-classification
	name: Sonar Object Classification
	dataset:
	name: KLSG + SCTD (Real Sonar)
	type: real-sonar
	metrics:
	- type: accuracy
	value: 70.9
	- type: f1
	value: 0.733
	---

	# PS3 Simulator — Pretrained Sonar Weights

	Pretrained ViT-S/16 weights for underwater
	sonar object classification, trained exclusively
	on synthetic PS3 Simulator data —
	no real sonar data used at any stage.

	Accepted at MaCVi Workshop @ CVPR 2026.

	---

	## What is this?

	Real sonar data is expensive, restricted, and
	classified — making it extremely difficult to
	train AI models for underwater object recognition.

	PS3 Simulator solves this by generating
	physics-parametrised synthetic Side-Scan Sonar
	(SSS) images using Blender, then training
	I-JEPA — a structure-aware self-supervised
	learning framework — purely on synthetic data.

	These weights are the result of that training.

	---

	## Available Weights

	\| File \| Model \| Pretrain \| Epochs \|
	\|------\|-------\|----------\|--------\|
	\| `jepa-ep200.pth.tar` \| I-JEPA ViT-S/16 \| PS3 Synthetic \| 200 \|
	\| `dino_ps3_checkpoint.pth` \| DINO ViT-S/16 \| PS3 Synthetic \| 200 \|

	---

	## Results on Real Sonar

	Train: Synthetic PS3 only \| Test: 876 real KSLG/SCTD images

	\| Model \| Acc (%) \| ±Std \| F1 \|
	\|-------\|---------\|------\|-----\|
	\| Random Init \| 23.0 \| ±0.9 \| 0.239 \|
	\| DINO PS3 (this repo) \| 58.8 \| ±11.9 \| 0.639 \|
	\| I-JEPA PS3 (this repo) \| 70.9 \| ±4.5 \| 0.733 \|
	\| DINO ImageNet \| 78.8 \| ±0.9 \| 0.810 \|
	\| I-JEPA ImageNet ViT-H† \| 86.0 \| ±0.1 \| 0.796 \|

	†Upper bound — larger backbone, not directly comparable

	---

	## Usage

	### Download weights

	```python
	from huggingface_hub import hf_hub_download
	import torch

	# I-JEPA ViT-S/16 pretrained on PS3
	path = hf_hub_download(
	repo_id="kamalbasha/ps3-simulator",
	filename="jepa-ep200.pth.tar")

	ckpt = torch.load(path, map_location='cpu')
	print(ckpt.keys())
	# ['target_encoder', 'encoder', 'predictor', ...]
	```

	### Load I-JEPA backbone

	```python
	import timm
	import torch
	from huggingface_hub import hf_hub_download

	# Download weights
	path = hf_hub_download(
	repo_id="kamalbasha/ps3-simulator",
	filename="jepa-ep200.pth.tar")

	# Load backbone
	backbone = timm.create_model(
	'vit_small_patch16_224',
	pretrained=False,
	num_classes=0)

	ckpt = torch.load(path, map_location='cpu')

	# Use target_encoder (EMA stable encoder)
	sd = {k.replace('module.', ''): v
	for k, v in ckpt['target_encoder'].items()}

	# Fix pos_embed shape [1,196,384] -> [1,197,384]
	if sd['pos_embed'].shape != backbone.pos_embed.shape:
	cls_pe = backbone.pos_embed[:, :1, :]
	sd['pos_embed'] = torch.cat(
	[cls_pe, sd['pos_embed']], dim=1)

	backbone.load_state_dict(sd, strict=False)
	backbone.eval()
	print('I-JEPA PS3 backbone loaded.')
	```

	### Load DINO backbone

	```python
	import timm
	import torch
	from huggingface_hub import hf_hub_download

	# Download weights
	path = hf_hub_download(
	repo_id="kamalbasha/ps3-simulator",
	filename="dino_ps3_checkpoint.pth")

	# Load backbone
	backbone = timm.create_model(
	'vit_small_patch16_224',
	pretrained=False,
	num_classes=0)

	ckpt = torch.load(path, map_location='cpu')
	sd = {k.replace('module.', '').replace('backbone.', ''): v
	for k, v in ckpt['student'].items()}

	backbone.load_state_dict(sd, strict=False)
	backbone.eval()
	print('DINO PS3 backbone loaded.')
	```

	### Full evaluation pipeline

	```python
	# Clone repo and run notebook
	git clone https://github.com/bashakamal/ps3-simulator
	cd ps3-simulator
	pip install -r requirements.txt

	# Open and run:
	# stage3_evaluation/PS3_Stage2_Evaluation.ipynb
	```

	---

	## Training Details

	### I-JEPA Pretraining

	```
	Model : ViT-S/16
	Data : 1,008 synthetic PS3 SSS images (unlabeled)
	Epochs : 200
	Optimizer: AdamW
	Config : configs/sonar_vits16.yaml
	```

	### Fine-tuning Protocol

	```
	Backbone : I-JEPA pretrained ViT-S/16
	Head : LayerNorm → Linear(384,256) → GELU
	→ Dropout(0.1) → Linear(256,2)
	lr : 1e-4
	Epochs : 100 (early stopping, patience=20)
	Data : Labeled synthetic PS3 images
	Test : Real KSLG/SCTD sonar (never seen)
	```

	---

	## PS3 Simulator Dataset

	Physics-parametrised synthetic SSS dataset:

	```
	Images : 1,008
	Classes : Ship, Plane
	Altitude : 50m, 70m, 100m
	Seabed : Sand, Gravel
	Angles : Varied grazing angles
	Metadata : Per-image JSON with physical params
	```

	Dataset: [coming soon]
	GitHub: [github.com/bashakamal/ps3-simulator](https://github.com/bashakamal/ps3-simulator)

	---

	## Citation

	```bibtex
	@inproceedings{basha2026ps3,
	title = {PS3 Simulator: Physics-Parametrised Synthetic
	Sonar for Self-Supervised Sim-to-Real Transfer},
	author = {Basha, Kamal S; Athira Nambiar},
	booktitle = {Proceedings of the IEEE/CVF Conference on
	Computer Vision and Pattern Recognition
	Workshops (MaCVi)},
	year = {2026}
	}
	```

	---

	## Acknowledgements

	- [I-JEPA](https://github.com/facebookresearch/ijepa) — Facebook Research
	- [DINO](https://github.com/facebookresearch/dino) — Facebook Research
	- [timm](https://github.com/huggingface/pytorch-image-models) — HuggingFace
	- [Blender MCP](https://github.com/ahujasid/blender-mcp) — 3D generation
	- [SeabedObjects-KLSG](https://github.com/mvaldenegro/marine-debris-fls-datasets)

	---

	## License

	MIT License