Update README.md

840b753 verified 17 days ago

4.08 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- remote-sensing
	- earth-observation
	- cloud-imputation
	- modis
	- prithvi
	- 3d-unet
	- difficulty-index
	datasets:
	- mod09ga
	- esa-worldcover-2021
	---

	# Dual Validation Framework for Analysis-Ready Satellite Data

	This repository contains the curated multitemporal dataset and the trained model weights for the paper: "A Dual Validation Framework for Analysis-Ready Satellite Data: A Scalable Pipeline and Stratified Performance Analysis."

	For the source code, data curation pipeline, and inference scripts, please visit our [GitHub Repository](https://github.com/TadieB/dual_validation_framework).

	## 🌍 Overview
	The convergence of petabyte-scale satellite archives and foundation models requires rigorous validation frameworks. This repository hosts data and models used to demonstrate a novel Dual Validation Framework.

	Our framework introduces a composite Difficulty Index (DI)—synthesizing spatial heterogeneity, phenological variability, and cloud persistence—to stratify model performance beyond standard aggregate metrics. The dataset focuses on the cloud-gap imputation task using multitemporal Earth observation data.

	## 📁 Repository Structure

	### 1. Dataset (`/dataset`)
	The dataset is provided in a cloud-optimized Zarr format, ensuring high-throughput parallel access suitable for distributed deep learning.
	* Source: MOD09GA (MODIS/Terra Surface Reflectance Daily, Level-2G, Collection 6.1)
	* Spatial Resolution: 500m
	* Temporal Window: June 14 – July 3, 2021 (20 consecutive days)
	* Region of Interest: Central Europe (lon 0°–20°E, lat 40°–60°N)
	* Ancillary Data: ESA WorldCover 2021 (10m, for spatial heterogeneity extraction)
	* Structure: `[Time, Bands, Height, Width]` tensor representing surface reflectance and multi-label usability masks.

	### 2. Model Weights (`/experiments_unified`)
	This repository includes the `.pth` PyTorch weights for the models evaluated in the study:
	* 3D U-Net (`unet_3d_best.pth`): Task-specific convolutional architecture trained natively on the 500m MODIS data. Demonstrates superior structural sample efficiency (highest SSIM and lowest RMSE).
	* Prithvi Foundation Model (`prithvi_finetuned.pth` & `prithvi_frozen.pth`): Weights for the fully fine-tuned and frozen variants of the Prithvi-EO-2.0 Vision Transformer. Demonstrates robust priors for spectral fidelity (lowest SAM).

	## 🚀 Usage
	The weights and Zarr datasets hosted here are designed to be used in conjunction with the data loaders and evaluation engines provided in our GitHub repository.

	Example usage for downloading and loading the dataset/models can be found in the [GitHub README](https://github.com/TadieB/dual_validation_framework).

	## 📊 Key Findings
	* Stratified Performance: Aggregate metrics obscure severe architectural failure modes. Deep learning architectures experience significant degradation in structurally coherent reconstruction when evaluated in the High-Difficulty stratum of our DI.
	* Architectural Tradeoffs: The 3D U-Net excels in structural preservation (SSIM) by leveraging local spatial convolutions, whereas the Prithvi foundation model excels in radiometric consistency (SAM) across domain gaps.
	* Evaluation Artifacts: The study highlights the mathematical instability of applying global structural metrics (like SSIM) to highly structured phenological regions, strongly advocating for masked pixel-wise evaluations in local imputation tasks.

	## 📝 Citation
	If you utilize this dataset, the model weights, or the dual validation methodology in your research, please cite our paper:

	```bibtex
	@article{tadie2026dualvalidation,
	title={A Dual Validation Framework for Analysis-Ready Satellite Data: A Scalable Pipeline and Stratified Performance Analysis},
	author={Tadie B. Medimem and Farid Melgani and Sandro Luigi Fiore and Valentine G. Anantharaj},
	journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
	year={2026},
	publisher={IEEE}
	}