Model Card for CARL

CARL is a camera-agnostic general-purpose feature extractor for spectral image analysis. It learns representations that transfer across spectral sensors with distinct channel counts and wavelength coverages, and supports downstream tasks such as:

  • classification
  • semantic segmentation
  • regression

This model is designed for spectral imagery, including RGB, multispectral, and hyperspectral data.

Model description

CARL consists of a spectral encoder and a spatial encoder, enabling spatio-spectral feature extraction. Crucially, any transformer-based spatial encoder can be integrated, including EVA-02, DINOv2, DINOv3, Perception Encoder.

Through a dedicated self-supervised pretraining strategy, strong pre-trained spatial encoders can be bridged to the spectral encoder, allowing for flexible input channel counts, while preserving robust feature extraction capabilities.

Published models:

  • CARL-EVA02-B: a CARL model with an EVA-02-B spatial encoder, pretrained on remote sensing data using the CARL-SSL self-supervised strategy.

Other configurations and pretrained models will be added in the future.

Model architecture

The github repository contains the full implementation of the CARL architecture, including the spectral encoder, spatial encoder integration, and downstream heads for classification and segmentation.

Expected inputs

CARL expects:

  • images with shape (B, C, H, W)
  • wavelengths with shape (B, C)
  • wavelengths expressed in micrometers
  • normalized image inputs (for example using dataset-level mean/std or per-image normalization)

Here, C denotes the spectral channel dimension and may vary across sensors.

Outputs

CARL produces spatio-spectral feature maps that can be used for downstream tasks. For example, the output of the spatial encoder can be pooled and fed into a linear head for classification, or passed through a segmentation head for pixel-wise predictions.

Evaluation

Results

The reader is referred to the associated paper for details on the evaluation protocols.

The following results are obtained using linear probing based on the CARL-SSL checkpoint.

Dataset m-ben m-eurosat m-forestnet m-crop-type SegMunich Wuhan LoveDA Rural WHU-OHS Avg. rank (vs. 6 models)
CARL-EVA02-B 69.0 84.4 47.0 26.5 38.9 21.5 21.7 21.7 1.6

Repository and paper

Citation

@inproceedings{
baumann2026carl,
title={{CARL}: Camera-Agnostic Representation Learning for Spectral Image Analysis},
author={Alexander Baumann and Leonardo Ayala and Silvia Seidlitz and Jan Sellner and Alexander Studier-Fischer and Berkin {\"O}zdemir and Lena Maier-hein and Slobodan Ilic},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=TpbhS1yfz0}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including alexbaumann99/CARL-EVA02-B

Paper for alexbaumann99/CARL-EVA02-B