nicolashoudre
/

RAMEN

Image Feature Extraction

Model card Files Files and versions

RAMEN / README.md

nicolashoudre's picture

Link to RAMEN pretraining dataset (#2)

363e00c verified about 2 months ago

|

history blame contribute delete

2.95 kB

	---
	pipeline_tag: image-feature-extraction
	license: mit
	datasets:
	- IGNF/FLAIR-HUB
	---

	# RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation

	[Paper](https://huggingface.co/papers/2512.05025) \| [Code](https://github.com/nicolashoudre/RAMEN)

	RAMEN is a resolution-adjustable multimodal encoder that learns a shared visual representation across Earth Observation (EO) data in a fully sensor-agnostic manner. It treats modality and spatial/temporal resolutions as key input features, enabling coherent analysis across modalities. Its main methodological contribution is to define spatial resolution as a controllable output parameter, giving users direct control over the desired level of detail at inference and allowing explicit trade-offs between spatial precision and computational cost.

	<p align="center">
	<img src="https://github.com/nicolashoudre/RAMEN/raw/main/.figures/Intro_RAMEN.png" alt="RAMEN workflow" width="400"/>
	</p>

	## Key features

	- 🛰️ Sensor-agnostic foundation model: RAMEN supports any kind of multispectral, SAR or elevation maps modalities. Just specify input shape, channels and original spatial resolution (GSD) !
	- 🔧 Adjustable feature map resolution: Customize the resolution of feature maps to suit specific downstream tasks and computational constraints.
	- 🌍 Multimodal data fusion: Effectively combine data from multiple modalities into a unified representation.

	## PANGAEA Bench evaluation

	All downstream tasks results presented in RAMEN were conducted using the [PANGAEA](https://github.com/VMarsocci/pangaea-bench) Benchmark. We report here the main results obtained on eight tasks.

	\| Model \| BurnSr \| MADOS \| PASTIS \| Sen1Fl11 \| DEN \| CTM-SS \| SN7 \| AI4Farms \| Avg. mIoU \| Avg. Rank \|
	\|-------\|---------\|--------\|--------\|----------\|------\|--------\|------\|-----------\|-----------\|-----------\|
	\| CROMA \| 82.42 \| 67.55 \| 32.32 \| 90.89 \| 38.29 \| 49.38 \| 59.28 \| 25.65 \| 55.72 \| 6.50 \|
	\| DOFA \| 80.63 \| 59.58 \| 30.02 \| 89.37 \| 39.29 \| 51.33 \| 61.84 \| 27.07 \| 54.89 \| 7.50 \|
	\| TerraMind-B \| 82.42 \| 69.52 \| 40.51 \| 90.62 \| 37.87 \| 55.80 \| 60.61 \| 28.12 \| 58.18 \| 4.25 \|
	\| TerraMind-L \| 82.93 \| 75.57 \| 43.13 \| 90.78 \| 37.89 \| 55.04 \| 59.98 \| 27.47 \| 59.10 \| 3.75 \|
	\| RAMEN (ours) \| 85.02 \| 69.72 \| 42.29 \| 91.03 \| 39.85 \| 53.27 \| 60.31 \| 38.78 \| 60.03 \| 2.63 \|

	More informations on how to reproduce results and implement RAMEN in PANGAEA can be found in the [`pangaea-bench`](https://github.com/nicolashoudre/RAMEN/tree/main/pangaea-bench) folder.

	## Citation

	If you use RAMEN, please cite our paper:

	```bibtex
	@article{RAMEN,
	title={{RAMEN}: Resolution-Adjustable Multimodal Encoder for Earth Observation},
	author={Nicolas Houdré and Diego Marcos and Hugo Riffaud de Turckheim and Dino Ienco and Laurent Wendling and Camille Kurtz and Sylvain Lobry},
	journal={arXiv preprint arXiv:2512.05025},
	year={2025}
	}
	```