--- license: cc-by-nc-4.0 tags: - hydrology - time-series - anomaly-detection - quality-control - streamflow - stage-discharge - foundation-model language: - en --- # HydroGEM HydroGEM is a self supervised foundation model for continental scale streamflow quality control. It produces per timestep anomaly probabilities and deploy safe suggested reconstructions for discharge and stage time series, intended for human in the loop review. ## Status and citation We are preparing the journal manuscript for submission to *Environmental Modelling and Software* (EMS). Until the journal version is available, please cite the preprint: HydroGEM preprint (arXiv): https://arxiv.org/abs/2512.14106 ## What is in this repository This Hugging Face repository provides minimal inference artifacts for reproducibility and evaluation. ### Inference model - `hydrogem_inference.pt` HydroGEM inference checkpoint ### Notebooks - `HydroGEM_Inference_ECCC_Tutorial.ipynb` ECCC zero shot inference tutorial - `HydroGEM_USGS Real Data_With SyntheticAnomalies_Benchmark_Tutorial_*.ipynb` USGS synthetic benchmark inference tutorial ### Mini data and metadata - `test_synthetic_mini.pkl` small test set used by the notebooks - `site_inventory_mini.json` small site metadata used by the notebooks ### Interactive result viewers - `HydroGEM_ECCC_ZeroShot_Results.html` interactive ECCC results viewer - `USGS_*_Results.html` interactive USGS results viewers, including synthetic anomaly detection and showcase pages - `HydroGEM Synthetic Anomaly Dashboard.html` synthetic injected anomaly visualization dashboard with single segment examples, geographic context, duration diversity, and multiple plots per example ### Documentation - `USGS_synthetic_test_set_documentation.pdf` describes how the USGS synthetic benchmark test set was constructed and evaluated - `ECCC_sites_and data labelling.pdf` documents the ECCC sites and the weak label construction used for the manuscript zero shot evaluation Training code and full scale data pipelines are not included. ## Synthetic anomaly dashboard The file `HydroGEM Synthetic Anomaly Dashboard.html` is an interactive viewer created to validate and communicate the synthetic injected anomaly test set. It includes true single segment examples per anomaly type and equation form, paired clean and corrupted signals, and compact diagnostics such as time series overlays, rating impact plots, normalized residuals, and pattern specific panels. If you want to see more examples beyond a single figure in the paper, open the HTML dashboard. ## Quickstart Choose 1 notebook and run all cells: - ECCC: `HydroGEM_Inference_ECCC_Tutorial.ipynb` - USGS: `HydroGEM_USGS Real Data_With SyntheticAnomalies_Benchmark_Tutorial_*.ipynb` Each notebook loads `hydrogem_inference.pt` and runs inference on the provided mini dataset. ## Inputs and outputs Inputs are paired discharge and stage time series in physical units within the notebooks, transformed internally into the model normalized space as described in the paper. Outputs include: - anomaly probability per timestep - binary detection mask using the notebook threshold - suggested reconstruction for discharge and stage in the same units shown in the notebook All suggested reconstructions require expert review before any operational use. ## License This repository is released under **CC BY NC 4.0** for research and non-commercial use. For deployment, integration, redistribution, or other licensing requests, please contact: ihaq@uvm.edu ## Contact Ijaz Ul Haq PhD in Computer Science, University of Vermont Senior Research Analyst, Water Resources Institute, University of Vermont Email: ihaq@uvm.edu