Prism

A multi-task head that attaches to a frozen vision backbone and produces detection, segmentation, and depth predictions from a single shared feature decomposition. Developed on EUPE-ViT-B (86M parameters, Meta FAIR) but applicable to any backbone that outputs dense spatial features.

The spatial features from the backbone are separated into scale bands by iteratively subtracting a downsampled-then-upsampled copy of the features from the features themselves. Each subtraction isolates the information present at one spatial resolution but absent from the next coarser resolution. This decomposition requires no learned parameters — it is a fixed sequence of average pooling, bilinear upsampling, and subtraction. The resulting scale bands are the input to all task heads.

Each task is a linear projection on the shared scale bands. Detection, segmentation, and depth differ only in their output weight matrices. The decomposition and the per-scale normalization are shared across all tasks and computed once per image.

Image → EUPE-ViT-B (frozen, 86M params) → spatial features [768, H, W]

  Scale decomposition (0 learned params):
    band_0 = features - upsample(pool(features))          stride 16
    band_1 = pool(features) - upsample(pool(pool(features)))   stride 32
    band_2 = pool(pool(features))                          stride 64

  Task outputs (one linear layer each):
    ├── Detection    (768 → 85 per location)    ~65K params
    ├── Segmentation (768 → 150 per location)   ~115K params
    ├── Depth        (768 → 256 per location)   ~197K params
    └── [any spatial task] (768 → N)

All three task heads combined: approximately 377K learned parameters. The decomposition and normalization add zero. For comparison, the standard Argus multi-task heads (FCOS detection + linear segmentation + DPT depth) total approximately 30M parameters on the same backbone.

The scale decomposition is derived from the adjoint structure of the upsample/downsample pair. A machine-checked proof of the decomposition's cross-scale orthogonality properties is available at phanerozoic/threshold-cofiber-detection.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support