Add initial model card for RAMEN
#1
by
nielsr
HF Staff
- opened
README.md
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-feature-extraction
|
| 3 |
+
license: mit
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation
|
| 7 |
+
|
| 8 |
+
[Paper](https://huggingface.co/papers/2512.05025) | [Code](https://github.com/nicolashoudre/RAMEN)
|
| 9 |
+
|
| 10 |
+
RAMEN is a resolution-adjustable multimodal encoder that learns a shared visual representation across Earth Observation (EO) data in a fully sensor-agnostic manner. It treats modality and spatial/temporal resolutions as key input features, enabling coherent analysis across modalities. Its main methodological contribution is to define spatial resolution as a controllable output parameter, giving users direct control over the desired level of detail at inference and allowing explicit trade-offs between spatial precision and computational cost.
|
| 11 |
+
|
| 12 |
+
<p align="center">
|
| 13 |
+
<img src="https://github.com/nicolashoudre/RAMEN/raw/main/.figures/Intro_RAMEN.png" alt="RAMEN workflow" width="400"/>
|
| 14 |
+
</p>
|
| 15 |
+
|
| 16 |
+
## Key features
|
| 17 |
+
|
| 18 |
+
- 🛰️ **Sensor-agnostic foundation model**: RAMEN supports any kind of multispectral, SAR or elevation maps modalities. Just specify input shape, channels and original spatial resolution (GSD) !
|
| 19 |
+
- 🔧 **Adjustable feature map resolution**: Customize the resolution of feature maps to suit specific downstream tasks and computational constraints.
|
| 20 |
+
- 🌍 **Multimodal data fusion**: Effectively combine data from multiple modalities into a unified representation.
|
| 21 |
+
|
| 22 |
+
## PANGAEA Bench evaluation
|
| 23 |
+
|
| 24 |
+
All downstream tasks results presented in RAMEN were conducted using the [PANGAEA](https://github.com/VMarsocci/pangaea-bench) Benchmark. We report here the main results obtained on eight tasks.
|
| 25 |
+
|
| 26 |
+
| Model | BurnSr | MADOS | PASTIS | Sen1Fl11 | DEN | CTM-SS | SN7 | AI4Farms | Avg. mIoU | Avg. Rank |
|
| 27 |
+
|-------|---------|--------|--------|----------|------|--------|------|-----------|-----------|-----------|
|
| 28 |
+
| CROMA | 82.42 | 67.55 | 32.32 | 90.89 | 38.29 | 49.38 | 59.28 | 25.65 | 55.72 | 6.50 |
|
| 29 |
+
| DOFA | 80.63 | 59.58 | 30.02 | 89.37 | 39.29 | 51.33 | **61.84** | 27.07 | 54.89 | 7.50 |
|
| 30 |
+
| TerraMind-B | 82.42 | 69.52 | 40.51 | 90.62 | 37.87 | **55.80** | 60.61 | 28.12 | 58.18 | 4.25 |
|
| 31 |
+
| TerraMind-L | 82.93 | **75.57** | **43.13** | 90.78 | 37.89 | 55.04 | 59.98 | 27.47 | 59.10 | 3.75 |
|
| 32 |
+
| **RAMEN (ours)** | **85.02** | 69.72 | 42.29 | **91.03** | **39.85** | 53.27 | 60.31 | **38.78** | **60.03** | **2.63** |
|
| 33 |
+
|
| 34 |
+
More informations on how to reproduce results and implement RAMEN in PANGAEA can be found in the [`pangaea-bench`](https://github.com/nicolashoudre/RAMEN/tree/main/pangaea-bench) folder.
|
| 35 |
+
|
| 36 |
+
## Citation
|
| 37 |
+
|
| 38 |
+
If you use RAMEN, please cite our paper:
|
| 39 |
+
|
| 40 |
+
```bibtex
|
| 41 |
+
@article{RAMEN,
|
| 42 |
+
title={{RAMEN}: Resolution-Adjustable Multimodal Encoder for Earth Observation},
|
| 43 |
+
author={Nicolas Houdré and Diego Marcos and Hugo Riffaud de Turckheim and Dino Ienco and Laurent Wendling and Camille Kurtz and Sylvain Lobry},
|
| 44 |
+
journal={arXiv preprint arXiv:2512.05025},
|
| 45 |
+
year={2025}
|
| 46 |
+
}
|
| 47 |
+
```
|