arxiv:2603.01576

Cryo-Bench: Benchmarking Foundation Models for Cryosphere Applications

Published on Mar 2

· Submitted by

Lalit Maurya on Mar 3

Upvote

Authors:

Lalit Maurya ,

Abstract

Geo-Foundation Models demonstrate adaptable performance across Cryospheric applications with varying fine-tuning strategies, showing superior results when encoder parameters are optimized rather than frozen.

AI-generated summary

Geo-Foundation Models (GFMs) have been evaluated across diverse Earth observation task including multiple domains and have demonstrated strong potential of producing reliable maps even with sparse labels. However, benchmarking GFMs for Cryosphere applications has remained limited, primarily due to the lack of suitable evaluation datasets. To address this gap, we introduce Cryo-Bench, a benchmark compiled to evaluate GFM performance across key Cryospheric components. Cryo-Bench includes debris-covered glaciers, glacial lakes, sea ice, and calving fronts, spanning multiple sensors and broad geographic regions. We evaluate 14 GFMs alongside UNet and ViT baselines to assess their advantages, limitations, and optimal usage strategies. With a frozen encoder, UNet achieves the highest average mIoU of 66.38, followed by TerraMind at 64.02 across five evluation dataset included in Cryo-Bench. In the few-shot setting (10\% input data), GFMs such as DOFA and TerraMind outperform UNet, achieving mIoU scores of 59.53, 56.62, and 56.60, respectively, comapred to U-Net's 56.60. When fully finetuning GFMs, we observe inconsistent performance across datasets and models. However, tuning learning rate along with finetuning substantially improves GFM performance. For example, evaluation on two representative datasets (GLID and CaFFe) shows an average relative improvement of 12.77\%. Despite having minimal Cryosphere representation in their pretraining data, GFMs exhibit notable domain adaptation capabilities and produce meaningful results across tasks. Based on our findings, We recommend encoder fine-tuning with hyperparameter optimization optimization to achieve the best possible performance, while using frozen encoders when users need quick results without extensive experimentation.(https://github.com/Sk-2103/Cryo-Bench{GitHub}).

View arXiv page View PDF GitHub 0 Add to collection

Community

lalitmaurya47

Paper author Paper submitter about 16 hours ago

Cryo‑Bench, the first comprehensive benchmark designed to evaluate how well Geospatial Foundation Models (GFMs) handle one of Earth’s most challenging and climate‑critical domains — the cryosphere.

GFMs are rapidly transforming Earth observation research, but one domain remains notably underrepresented: Earth’s frozen regions. Cryosphere applications are largely absent from existing GFM pretraining datasets and rarely appear as downstream evaluation tasks. This makes them the perfect stress test for examining a model’s ability to generalize to difficult, unseen environments

🌍𝐖𝐡𝐚𝐭’𝐬 𝐈𝐧𝐬𝐢𝐝𝐞 𝐂𝐫𝐲𝐨‑𝐁𝐞𝐧𝐜𝐡?
We evaluated 14 GFMs and UNet/ViT baselines across five datasets that span four essential cryospheric tasks:
Supraglacial debris mapping
Glacial lake detection
Sea‑ice segmentation
Calving‑front detection
And we tested them under 3 evaluation protocols:
• Frozen encoder
• Learning‑rate–optimized fine‑tuning
• Few‑shot learning (10% data)

🔍 𝐊𝐞𝐲 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬
✔️ UNet (frozen encoder) leads with an average mIoU of 66.38, followed closely by TerraMind at 64.02.
✔️ In few-shot settings, models like DOFA and TerraMind outperform UNet, demonstrating impressive adaptability.
✔️ Full fine‑tuning shows inconsistent performance, but—importantly—learning‑rate tuning significantly boosts results, with up to 12.77% relative improvement on GLID and CaFFe datasets.
✔️ Despite minimal cryosphere data in pretraining, GFMs show surprisingly strong domain adaptation.