Krypsis-COD 🐙 — underwater camouflaged-animal segmentation, tiny

A compact (4.8 M params, 4.6 MB int8) segmenter for camouflaged marine animals. The frontier for this task is giant SAM/SAM2 adapters (Dual-SAM, MAS-SAM, SAM2-WaveUNet) at hundreds of MB; Krypsis-COD is ~130× smaller yet built the right way — a pretrained backbone with the boundary + frequency supervision the COD literature shows actually matters.

Held-out COD10K-Aquatic frames (flounder, frogfish, stingray, turtle — not in training): raw input → Krypsis-COD segmentation.


Parameters	`4,770,950` (~4.8 M)
Size on disk	`19.2` MB fp32 · `9.7` MB fp16 · `4.6` MB int8
Backbone	`pvt_v2_b0` (ImageNet-pretrained)
Input	3×352×352 RGB
Output	camouflaged-animal mask + edge map
Prior stem	AquaWave — 0 learnable params (UDCP + WB-residual + Haar high-freq)
CPU latency	`270` ms/image (single thread)

Method (what the literature says actually works)

Pretrained PVTv2 backbone. Camouflage is a global-context problem; transformer features + ImageNet pretraining are what every competitive COD model relies on.
AquaWave prior stem (0 params) feeds physics + frequency cues straight into the backbone: an Underwater Dark Channel Prior transmission map, a grey-world white-balance colour residual, and a Haar high-frequency energy map — light attenuation leaks in colour, camouflage leaks in frequency.
Edge / boundary co-supervision — the most-cited COD lever; an edge head is supervised by the mask boundary and fused into the mask head.
Deep multi-level supervision over the FPN decoder.
Trained at 352² on ~7,000 images (CAMO + COD10K corpus), mixed precision, discriminative LR (low for the pretrained backbone, high for the new heads).

See PAPER.md and RESEARCH_DOSSIER.md for the full method and reading list.

Results

CAMO test (standard COD benchmark):

metric	Krypsis-COD
S-measure ↑	`0.748`
weighted-F ↑	`0.643`
E-measure ↑	`0.793`
MAE ↓	`0.114`
IoU ↑	`0.563`
Dice ↑	`0.678`

Held-out (COD10K-heavy, unseen): S-measure 0.864 · MAE 0.037.

This is not a SOTA-accuracy record — the SAM-scale giants score higher. It is a strong accuracy-per-byte point: competitive structure metrics at ~130× less size.

Usage

import json, torch, numpy as np
from PIL import Image
from krypsis.cod import KrypsisCOD

cfg = json.load(open("config.json"))
m = KrypsisCOD(backbone=cfg["backbone"].split()[0], pretrained=False)
m.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu")); m.eval()

img = np.asarray(Image.open("reef.jpg").convert("RGB").resize((352, 352)), np.float32) / 255.
x = torch.from_numpy(img).permute(2, 0, 1)[None]
with torch.no_grad():
    mask = torch.sigmoid(m(x, want_aux=False)["mask"])[0, 0] > 0.5

Training & data

Data: Umair2002/COD_CAMO_train_data (~7,000 paired camouflage images+masks) for training; PassbyGrocer/CAMO test for the benchmark.
Compute: single Modal A10G GPU, 30 epochs.
Loss: boundary-weighted structure loss + Dice + deep supervision + edge BCE.

Limitations

Camouflage is hard; this tiny model trails the giant SAM adapters on raw accuracy. Held-out IoU on small COD10K objects is lower than its structure scores suggest. Not a substitute for expert identification.

Citation

@software{krypsis_cod_2026,
  title  = {Krypsis-COD: tiny underwater camouflaged-animal segmentation with a
            pretrained backbone and a zero-parameter physics-frequency prior},
  year   = {2026},
  url    = {https://huggingface.co/ryanrana/krypsis-nano}
}

Downloads last month: 6

ryanrana
/

krypsis-nano