Spaces:

ibm-esa-geospatial
/

challenge

Running

App Files Files Community

Leveraging Multimodal Geospatial Foundation Models for Near-Real-Time Global Flood Mapping

#12

by mirelatulbure - opened Dec 9, 2025

Discussion

mirelatulbure

Dec 9, 2025

•

edited Dec 10, 2025

Floods are the most deadly and economically disruptive weather-related hazard worldwide. Timely flood extent maps based on Earth Observation (EO) data provide equitable coverage and analysis. They are therefore crucial for emergency response managers, city planners, and residents in preparing for and recovering from future floods. Historical flood maps allow us to quantify damage and help understand which areas have flooded in the past and are thus more likely to flood in the future. Yet global flood intelligence remains limited by the scarcity of labeled data, cloud cover, inconsistent sensor coverage, region-specific model failures, and the difficulty of deploying robust models immediately after an event.

In 2024, the warmest year on record, extreme flood events affected communities across five continents. Floods in Europe (below) after Storm Boris, September 2024. Photos: New York Times.

Self-supervised learning Geospatial Foundation Models (GFMs) such as ESA–IBM TerraMind (TM; Jakubik et al. 2025) promise to overcome these challenges by learning generalizable representations from massive archives of multisensor Sentinel-1 (S1, Synthetic Aperture Radar, SAR) and Sentinel-2 (S2, optical) satellite imagery. GFMs also promise to generalize better than supervised deep learning models.

Yet, despite rapid progress, with currently 50+ GFMs (Lu et al. 2024), the development of geospatial GFMs has not advanced the science of task-specific applications (Longepe et al. 2025). There has been no systematic, global-scale evaluation of multimodal GFMs for flood-extent mapping, and little understanding of how they compare with widely used deep-learning models such as U-Net.

This study fills that gap, offering the first comprehensive benchmark of TM for global operational flood mapping using FloodsNet, a harmonized multimodal dataset of 85 flood events, assembled from four existing datasets: WorldFloods (Mateo-Garcia et al. 2021), Sen1Floods11 (Bonafilia et al. 2020, the U.S. Geological Survey Flood Training (Sleeter et al. 2020), and the United Nations Operational Satellite Applications Programme (Nemni et al. 2020). Here, we focus on multi-sensor, observation-based flood mapping to accurately represent past and current flood events.

We fine-tuned four TM backbone configurations (base vs. large models; frozen vs. unfrozen backbones) and compared them against the TM Sen1Floods11 example and a U-Net trained on both FloodsNet and Sen1Floods11.

Our main results include:

The base-unfrozen configuration provided the best balance of accuracy, precision, and recall at lower computational cost than the large model (Results table, below).
The large unfrozen model achieved the highest recall. Models trained on FloodsNet outperformed the Sen1Floods11-trained example in recall, while achieving similar overall accuracy.
U-Net achieved higher recall than all TM configurations, although with slightly lower accuracy and precision.

Table 1. Performance metrics per TerraMind backbone configuration (blue rows: frozen; yellow rows: unfrozen). The four models were trained on FloodsNet and tested on FloodsNet (darker tones) vs. Sen1Floods11 (lighter tones). The model in the TerraMind Example (green rows) was trained on Sen1Floods11 and tested on FloodsNet vs. Sen1Floods11. Values in bold are the best accuracy metrics achieved. Precision and recall refer to the class of interest (“water”).

Figure 1 (below). Predicted flood maps using the TerraMind base–unfrozen configuration fine-tuned on FloodsNet for the USA 2019 flood event. Columns show: (1) Sentinel-2 false-color composite (R: SWIR1, G: NIR narrow, B: Red), (2) reference label (white = water, black = non-water), (3) predicted flood extent (grey = no data), and (4) model confidence (softmax probabilities).

Key takeaways:

Integrating freely available, complementary, all-weather cloud-penetrating radar and optical (S1 & S2) data is essential for global flood applications.
GFMs, such as TM, have the potential to reduce compute and data needs for training to produce accurate flood maps.
Flood maps produced from GFMs have higher precision but lower recall than the U-Net.
A paradigm shift from technology (model)-centric to impact-driven (i.e., downstream tasks) is needed.
Similar to the DL trajectory around 2016, GFMs have seen incremental improvements and limited consideration of their societal, environmental, and economic impacts.
There is a need for a stronger focus on the deployment phase (downstream tasks), which is widely regarded as the ultimate goal of Responsible AI in EO (Ghamisi et al 2025).

A preprint of this manuscript is available on arXiv (DOI: https://arxiv.org/abs/2512.02055). This preprint corresponds to a manuscript currently under review in the ISPRS Journal of Photogrammetry and Remote Sensing.

Contact points:
Dr. Júlio Caineta (julio.caineta@ncsu.edu), Postdoctoral Fellow, North Carolina State University, USA
Dr. Mirela G. Tulbure (mtulbure@ncsu.edu), Professor of Geospatial Analytics, North Carolina State University, USA

References:

Bonafilia, Derrick, Beth Tellman, Tyler Anderson, and Erica Issenberg. 2020. “Sen1Floods11: A Georeferenced Dataset to Train and Test Deep Learning Flood Algorithms for Sentinel-1.” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2020-June (June): 835–45. https://doi.org/10.1109/CVPRW50498.2020.00113.
Ghamisi, Pedram, Weikang Yu, Andrea Marinoni, et al. 2025. “Responsible Artificial Intelligence for Earth Observation: Achievable and Realistic Paths to Serve the Collective Good.” IEEE Geoscience and Remote Sensing Magazine, 2–26. https://doi.org/10.1109/MGRS.2025.3529726.
Jakubik, Johannes, Felix Yang, Benedikt Blumenstiel, et al. 2025. “TerraMind: Large-Scale Generative Multimodality for Earth Observation.” arXiv:2504.11171. Preprint, arXiv, July 4. https://doi.org/10.48550/arXiv.2504.11171.
Lu, Siqi, Junlin Guo, James R. Zimmer-Dauphinee, et al. 2024. “AI Foundation Models in Remote Sensing: A Survey.” arXiv:2408.03464. Preprint, arXiv, August 6. https://doi.org/10.48550/arXiv.2408.03464.
Mateo-Garcia, Gonzalo, Joshua Veitch-Michaelis, Lewis Smith, et al. 2021. “Towards Global Flood Mapping Onboard Low Cost Satellites with Machine Learning.” Scientific Reports 11 (1): 1–12. https://doi.org/10.1038/s41598-021-86650-z.
Nemni, Edoardo, Joseph Bullock, Samir Belabbes, and Lars Bromley. 2020. “Fully Convolutional Neural Network for Rapid Flood Segmentation in Synthetic Aperture Radar Imagery.” Remote Sensing 12 (16): 2532. https://doi.org/10.3390/rs12162532.
Nicolas Longepe, Hamed Alemohammad, Anca Anghelea, et al. Earth Action in Transition: Highlights from the 2025 ESA-NASA International Workshop on AI Foundation Models for EO. Authorea. October 14, 2025. https://doi.org/10.22541/au.175346055.53428479/v2.
Sleeter, Rachel, Elizabeth Carter, John W Jones, et al. 2020. “Satellite-Derived Training Data for Automated Flood Detection in the Continental U.S.” U.S. Geological Survey. https://doi.org/10.5066/P9C7HYRV.

Acknowledgements. This work was supported by a NASA Terrestrial Hydrology Project (Grant Number 80NSSC21K0980).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment