Segmentation Heads

A systematic study of segmentation head architectures operating on frozen vision transformer features. Given a dense spatial feature grid from any frozen backbone, what is the most parameter-efficient architecture for per-pixel semantic classification?

Standard practice treats the backbone and segmentation decoder as a joint system. Recent universal encoders produce spatial features of sufficient quality that the backbone can remain frozen while a lightweight head is trained on segmentation data. Under this regime, the head is the only variable.

This repository contains an arena framework for rapid comparison of segmentation head candidates and a collection of architectures spanning conventional decoders through novel minimal-parameter designs. All heads consume the same spatial feature tensor and produce per-pixel class predictions. The reference backbone is EUPE-ViT-B (86M parameters, frozen), but the framework is backbone-agnostic — the same heads can be evaluated against any frozen ViT that produces a stride-16 spatial feature grid.

Heads

Twelve architectures, all consuming a [B, 768, H, W] spatial feature tensor and producing [B, 150, H_out, W_out] ADE20K class logits. Each head lives in its own folder under heads/ with a single head.py implementation.

Name	Architecture	Origin
`linear_probe`	BatchNorm + 1×1 conv. The EUPE paper baseline.	Bolya et al., 2025 (PEspatial recipe)
`cofiber_linear`	Adjoint cofiber decomposition + shared 1×1 conv per scale	Original
`cofiber_threshold`	Cofiber decomposition + per-scale LayerNorm + prototype classification	Original
`prototype_bank`	Per-class learned prototypes, cosine similarity, no conv	Original
`wavelet`	Haar wavelet decomposition + per-subband classification	Original
`patch_attention`	Each patch attends to its k nearest neighbors before classifying	Original
`graph_crf`	k-NN graph in feature space, gated message passing	Original
`hypercolumn_linear`	Concatenate features from intermediate ViT blocks, single linear layer	Hariharan et al., 2015
`info_bottleneck`	Project to d ≪ 768 dimensions, classify from the compressed representation	Original
`tropical`	Tropical inner product replaces standard dot product	Original
`compression`	Surprise-based feature modulation + linear classification	Original
`curvature`	Discrete Riemannian curvature modulation + linear classification	Original

Arena Framework

arena.py runs any head by name against cached ADE20K backbone features. The arena pre-extracts features once, then each candidate trains and evaluates without touching the backbone again. Training is cross-entropy at 512×512 resolution against the 150-class ADE20K label space; evaluation reports mean Intersection-over-Union (mIoU).

Status

Heads are implemented and importable through the heads/ registry. The arena screening sweep across all 12 heads has not yet been run on a fresh ADE20K cache; results will be published here when available.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for phanerozoic/segmentation-heads

Base model

facebook/EUPE-ViT-B

Finetuned

(6)

this model

phanerozoic
/

segmentation-heads

Segmentation Heads

Heads

Arena Framework

Status

Model tree for phanerozoic/segmentation-heads

Dataset used to train phanerozoic/segmentation-heads