phanerozoic commited on
Commit
a2e791d
·
verified ·
1 Parent(s): ddfe4a1

Document the 12 head architectures and arena protocol

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -18,3 +18,30 @@ A systematic study of segmentation head architectures operating on frozen vision
18
  Standard practice treats the backbone and segmentation decoder as a joint system. Recent universal encoders produce spatial features of sufficient quality that the backbone can remain frozen while a lightweight head is trained on segmentation data. Under this regime, the head is the only variable.
19
 
20
  This repository contains an arena framework for rapid comparison of segmentation head candidates and a collection of architectures spanning conventional decoders through novel minimal-parameter designs. All heads consume the same spatial feature tensor and produce per-pixel class predictions. The reference backbone is [EUPE-ViT-B](https://huggingface.co/facebook/EUPE-ViT-B) (86M parameters, frozen), but the framework is backbone-agnostic — the same heads can be evaluated against any frozen ViT that produces a stride-16 spatial feature grid.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  Standard practice treats the backbone and segmentation decoder as a joint system. Recent universal encoders produce spatial features of sufficient quality that the backbone can remain frozen while a lightweight head is trained on segmentation data. Under this regime, the head is the only variable.
19
 
20
  This repository contains an arena framework for rapid comparison of segmentation head candidates and a collection of architectures spanning conventional decoders through novel minimal-parameter designs. All heads consume the same spatial feature tensor and produce per-pixel class predictions. The reference backbone is [EUPE-ViT-B](https://huggingface.co/facebook/EUPE-ViT-B) (86M parameters, frozen), but the framework is backbone-agnostic — the same heads can be evaluated against any frozen ViT that produces a stride-16 spatial feature grid.
21
+
22
+ ## Heads
23
+
24
+ Twelve architectures, all consuming a `[B, 768, H, W]` spatial feature tensor and producing `[B, 150, H_out, W_out]` ADE20K class logits. Each head lives in its own folder under `heads/` with a single `head.py` implementation.
25
+
26
+ | Name | Architecture | Origin |
27
+ |------|-------------|--------|
28
+ | `linear_probe` | BatchNorm + 1×1 conv. The EUPE paper baseline. | Bolya et al., 2025 (PEspatial recipe) |
29
+ | `cofiber_linear` | Adjoint cofiber decomposition + shared 1×1 conv per scale | Original |
30
+ | `cofiber_threshold` | Cofiber decomposition + per-scale LayerNorm + prototype classification | Original |
31
+ | `prototype_bank` | Per-class learned prototypes, cosine similarity, no conv | Original |
32
+ | `wavelet` | Haar wavelet decomposition + per-subband classification | Original |
33
+ | `patch_attention` | Each patch attends to its k nearest neighbors before classifying | Original |
34
+ | `graph_crf` | k-NN graph in feature space, gated message passing | Original |
35
+ | `hypercolumn_linear` | Concatenate features from intermediate ViT blocks, single linear layer | Hariharan et al., 2015 |
36
+ | `info_bottleneck` | Project to d ≪ 768 dimensions, classify from the compressed representation | Original |
37
+ | `tropical` | Tropical inner product replaces standard dot product | Original |
38
+ | `compression` | Surprise-based feature modulation + linear classification | Original |
39
+ | `curvature` | Discrete Riemannian curvature modulation + linear classification | Original |
40
+
41
+ ## Arena Framework
42
+
43
+ `arena.py` runs any head by name against cached ADE20K backbone features. The arena pre-extracts features once, then each candidate trains and evaluates without touching the backbone again. Training is cross-entropy at 512×512 resolution against the 150-class ADE20K label space; evaluation reports mean Intersection-over-Union (mIoU).
44
+
45
+ ## Status
46
+
47
+ Heads are implemented and importable through the `heads/` registry. The arena screening sweep across all 12 heads has not yet been run on a fresh ADE20K cache; results will be published here when available.