Model Card for CARL

CARL is a camera-agnostic general-purpose feature extractor for spectral image analysis. It learns representations that transfer across spectral sensors with distinct channel counts and wavelength coverages, and supports downstream tasks such as:

classification
semantic segmentation
regression

This model is designed for spectral imagery, including RGB, multispectral, and hyperspectral data.

Model description

CARL consists of a spectral encoder and a spatial encoder, enabling spatio-spectral feature extraction. Crucially, any transformer-based spatial encoder can be integrated, including EVA-02, DINOv2, DINOv3, Perception Encoder.

Through a dedicated self-supervised pretraining strategy, strong pre-trained spatial encoders can be bridged to the spectral encoder, allowing for flexible input channel counts, while preserving robust feature extraction capabilities.

Published models:

CARL-EVA02-B: a CARL model with an EVA-02-B spatial encoder, pretrained on remote sensing data using the CARL-SSL self-supervised strategy.

Other configurations and pretrained models will be added in the future.

Model architecture

The github repository contains the full implementation of the CARL architecture, including the spectral encoder, spatial encoder integration, and downstream heads for classification and segmentation.

Expected inputs

CARL expects:

images with shape (B, C, H, W)
wavelengths with shape (B, C)
wavelengths expressed in micrometers
normalized image inputs (for example using dataset-level mean/std or per-image normalization)

Here, C denotes the spectral channel dimension and may vary across sensors.

Outputs

CARL produces spatio-spectral feature maps that can be used for downstream tasks. For example, the output of the spatial encoder can be pooled and fed into a linear head for classification, or passed through a segmentation head for pixel-wise predictions.

Evaluation

Results

The reader is referred to the associated paper for details on the evaluation protocols.

The following results are obtained using linear probing based on the CARL-SSL checkpoint.

Dataset	m-ben	m-eurosat	m-forestnet	m-crop-type	SegMunich	Wuhan	LoveDA Rural	WHU-OHS	Avg. rank (vs. 6 models)
CARL-EVA02-B	69.0	84.4	47.0	26.5	38.9	21.5	21.7	21.7	1.6

Repository and paper

Paper: https://arxiv.org/abs/2504.19223
Code and training pipeline: see the CARL github repository

Citation

@inproceedings{
baumann2026carl,
title={{CARL}: Camera-Agnostic Representation Learning for Spectral Image Analysis},
author={Alexander Baumann and Leonardo Ayala and Silvia Seidlitz and Jan Sellner and Alexander Studier-Fischer and Berkin {\"O}zdemir and Lena Maier-hein and Slobodan Ilic},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=TpbhS1yfz0}
}

Downloads last month: -

Inference Providers NEW

Image Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including alexbaumann99/CARL-EVA02-B

CARL

Collection

2 items • Updated Mar 11

Paper for alexbaumann99/CARL-EVA02-B

CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis

Paper • 2504.19223 • Published Apr 27, 2025 • 1