Model Card for Model ID

SegFormer model with a DinoV3 VIT-B backbone fine-tuned for The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs.

Model Details

How to Get Started with the Model

The simplest way to use this model to segment an image of the Coralscapes dataset is as follows:

Install dependencies:

pip install torch transformers safetensors huggingface_hub datasets pillow

Run the model:

from pathlib import Path
import importlib.util
import torch
from datasets import load_dataset
from huggingface_hub import snapshot_download
from PIL import Image
REPO_ID = "EPFL-ECEO/coralscapes-vit-b-dpt"  # replace if needed

# 1) Download model repo snapshot
root = Path(snapshot_download(REPO_ID))

# 2) Load self-contained model code from the repo
spec = importlib.util.spec_from_file_location("coralscapes_hub_model", root / "coralscapes_hub_model.py")
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)

# 3) Build model + load weights
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = mod.Dinov3DPTSegmenter.from_pretrained(root, map_location=device).eval()

# 4) Load one test image from HF dataset
ds = load_dataset("EPFL-ECEO/coralscapes", split="test")
image = ds[42]["image"].convert("RGB")  # PIL image
image = image.resize((1376, 768), resample=Image.BILINEAR)  # (W, H), divisible by 16, agnostic to aspect ratio

# 5) Preprocess + inference
batch = model.processor(images=image, return_tensors="pt", do_resize=False)["pixel_values"].to(device)
with torch.no_grad():
    logits = model(batch)             # shape [1, C, H, W]
pred = logits.argmax(dim=1)[0].cpu()  # shape [H, W], class IDs

Training & Evaluation Details

Data

The model is trained on extended versions of train+val splits of the Coralscapes dataset which is a general-purpose dense semantic segmentation dataset for coral reefs.

Results

Single pass (768x1376 resolution):

Test Accuracy: 82.461
Test Mean IoU: 57.806

Double pass (1024x1024 left and right half of image, as in the paper):

Test Accuracy: 82.809
Test Mean IoU: 58.708

Citation

If you use this model, cite:

@inproceedings{sauder2025coralscapes,
  title={The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs},
  author={Sauder, Jonathan and Domazetoski, Viktor and Banc-Prandi, Guilhem and Perna, Gabriela and Meibom, Anders and Tuia, Devis},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision: Joint Workshop on Marine Vision},
  pages={2115--2122},
  year={2025}
}

Downloads last month: 58

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EPFL-ECEO/coralscapes-vit-b-dpt

Base model

facebook/dinov3-vit7b16-pretrain-lvd1689m

Finetuned

facebook/dinov3-vitb16-pretrain-lvd1689m

Finetuned

(17)

this model

Dataset used to train EPFL-ECEO/coralscapes-vit-b-dpt

Paper for EPFL-ECEO/coralscapes-vit-b-dpt

The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs

Paper • 2503.20000 • Published Mar 25, 2025 • 1