Krypsis-COD π β underwater camouflaged-animal segmentation, tiny
A compact (4.8 M params, 4.6 MB int8) segmenter for camouflaged marine animals. The frontier for this task is giant SAM/SAM2 adapters (Dual-SAM, MAS-SAM, SAM2-WaveUNet) at hundreds of MB; Krypsis-COD is ~130Γ smaller yet built the right way β a pretrained backbone with the boundary + frequency supervision the COD literature shows actually matters.
Held-out COD10K-Aquatic frames (flounder, frogfish, stingray, turtle β not in training): raw input β Krypsis-COD segmentation.
| Parameters | 4,770,950 (~4.8 M) |
| Size on disk | 19.2 MB fp32 Β· 9.7 MB fp16 Β· 4.6 MB int8 |
| Backbone | pvt_v2_b0 (ImageNet-pretrained) |
| Input | 3Γ352Γ352 RGB |
| Output | camouflaged-animal mask + edge map |
| Prior stem | AquaWave β 0 learnable params (UDCP + WB-residual + Haar high-freq) |
| CPU latency | 270 ms/image (single thread) |
Method (what the literature says actually works)
- Pretrained PVTv2 backbone. Camouflage is a global-context problem; transformer features + ImageNet pretraining are what every competitive COD model relies on.
- AquaWave prior stem (0 params) feeds physics + frequency cues straight into the backbone: an Underwater Dark Channel Prior transmission map, a grey-world white-balance colour residual, and a Haar high-frequency energy map β light attenuation leaks in colour, camouflage leaks in frequency.
- Edge / boundary co-supervision β the most-cited COD lever; an edge head is supervised by the mask boundary and fused into the mask head.
- Deep multi-level supervision over the FPN decoder.
- Trained at 352Β² on ~7,000 images (CAMO + COD10K corpus), mixed precision, discriminative LR (low for the pretrained backbone, high for the new heads).
See PAPER.md and RESEARCH_DOSSIER.md for the full method and reading list.
Results
CAMO test (standard COD benchmark):
| metric | Krypsis-COD |
|---|---|
| S-measure β | 0.748 |
| weighted-F β | 0.643 |
| E-measure β | 0.793 |
| MAE β | 0.114 |
| IoU β | 0.563 |
| Dice β | 0.678 |
Held-out (COD10K-heavy, unseen): S-measure 0.864 Β· MAE 0.037.
This is not a SOTA-accuracy record β the SAM-scale giants score higher. It is a strong accuracy-per-byte point: competitive structure metrics at ~130Γ less size.
Usage
import json, torch, numpy as np
from PIL import Image
from krypsis.cod import KrypsisCOD
cfg = json.load(open("config.json"))
m = KrypsisCOD(backbone=cfg["backbone"].split()[0], pretrained=False)
m.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu")); m.eval()
img = np.asarray(Image.open("reef.jpg").convert("RGB").resize((352, 352)), np.float32) / 255.
x = torch.from_numpy(img).permute(2, 0, 1)[None]
with torch.no_grad():
mask = torch.sigmoid(m(x, want_aux=False)["mask"])[0, 0] > 0.5
Training & data
- Data:
Umair2002/COD_CAMO_train_data(~7,000 paired camouflage images+masks) for training;PassbyGrocer/CAMOtest for the benchmark. - Compute: single Modal A10G GPU,
30epochs. - Loss: boundary-weighted structure loss + Dice + deep supervision + edge BCE.
Limitations
Camouflage is hard; this tiny model trails the giant SAM adapters on raw accuracy. Held-out IoU on small COD10K objects is lower than its structure scores suggest. Not a substitute for expert identification.
Citation
@software{krypsis_cod_2026,
title = {Krypsis-COD: tiny underwater camouflaged-animal segmentation with a
pretrained backbone and a zero-parameter physics-frequency prior},
year = {2026},
url = {https://huggingface.co/ryanrana/krypsis-nano}
}
- Downloads last month
- -
