| --- |
| license: mit |
| library_name: pytorch |
| pipeline_tag: image-classification |
| tags: |
| - sngp |
| - uncertainty-estimation |
| - out-of-distribution-detection |
| - biomedical-imaging |
| - digital-pathology |
| - histopathology |
| - model-calibration |
| - reliable-ai |
| datasets: |
| - acevedo2020 |
| - jung2022 |
| - tang2019 |
| - wong2022 |
| - kather2016 |
| - kather2018 |
| --- |
| |
| # SNGP Models for Uncertainty-Aware Biomedical Image Classification |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| This repository contains trained Spectral-normalized Neural Gaussian Process (SNGP) models for uncertainty-aware image classification in biomedical imaging tasks, including white blood cells, amyloid plaques, and colorectal histopathology. |
|
|
| SNGP augments standard deep neural networks by applying spectral normalization and replacing the final dense layer with a Gaussian process layer, enabling improved uncertainty estimation and out-of-distribution (OOD) detection with a single forward pass. |
|
|
| - **Developed by:** Uma Meleti, Jeffrey J. Nirschl |
| - **Affiliation:** University of Wisconsin-Madison |
| - **Model type:** Convolutional neural network (ResNet18 backbone) with SNGP head |
| - **License:** MIT |
| - **Paper:** https://arxiv.org/abs/2602.02370 |
| - **Repository:** [https://github.com/nirschl-lab/sngp_core] |
| |
| --- |
| |
| ## How to Get Started with the Model |
| Load pretrained SNGP models from the Hugging Face Hub using the provided inference utilities. |
| |
| ### Installation |
| #### Clone repo and install |
| ```bash |
| # Clone repository |
| git clone https://github.com/nirschl-lab/sngp_core |
| cd sngp_core |
| |
| # Install uv |
| curl -Ls https://astral.sh/uv/install.sh | sh |
| |
| # Install dependencies |
| uv sync |
| ``` |
| |
| #### Python API |
| SNGP Inference with uncertainty quantification |
| ```python |
| import torch |
| from scripts.example_inference import quick_sngp_inference |
|
|
| # Create input batch [batch_size, channels, height, width] |
| batch = torch.randn(4, 3, 224, 224) |
| |
| # Load model from Hugging Face Hub and run inference |
| results = quick_sngp_inference( |
| "wong_sngp_resnet18", |
| batch, |
| device="cuda" # or "cpu" |
| ) |
| |
| # Outputs: |
| # - results["logits"]: Raw model outputs |
| # - results["predictions"]: Predicted class indices |
| # - results["confidence"]: Prediction confidence scores |
| # - results["variance"]: Uncertainty estimates |
| # - results["probabilities"]: Class probabilities |
| |
| print(f"Predictions: {results['predictions'].tolist()}") |
| print(f"Confidence: {results['confidence'].tolist()}") |
| print(f"Uncertainty (variance): {results['variance'].tolist()}") |
| ``` |
| |
| #### Baseline inference (deterministic) |
| ```python |
| import torch |
| from scripts.example_inference import quick_baseline_inference |
|
|
| batch = torch.randn(4, 3, 224, 224) |
|
|
| results = quick_baseline_inference( |
| "wong_baseline_resnet18", |
| batch, |
| device="cuda" # or "cpu" |
| ) |
| |
| print(f"Predictions: {results['predictions'].tolist()}") |
| print(f"Confidence: {results['confidence'].tolist()}") |
| ``` |
| --- |
| ## Uses |
|
|
| ### Direct Use |
| - Image classification in biomedical imaging datasets |
| - Estimation of predictive uncertainty via entropy/logit-based measures |
| - Detection of out-of-distribution (OOD) samples in medical imaging workflows |
|
|
| ### Downstream Use |
| - Integration into clinical decision-support pipelines (research only) |
| - Benchmarking uncertainty estimation methods (SNGP vs MC Dropout vs deterministic) |
| - Domain shift detection across institutions or datasets |
|
|
| ### Out-of-Scope Use |
| - Clinical diagnosis without expert oversight |
| - Deployment in safety-critical settings without validation |
| - Use on imaging modalities or domains not represented in training data |
|
|
| --- |
|
|
| ## Bias, Risks, and Limitations |
|
|
| ### Limitations |
| - Performance depends on dataset domain similarity (scanner, staining, preprocessing) |
| - OOD detection is not guaranteed to capture all distribution shifts |
| - Models trained on limited public datasets; may not generalize to all populations |
|
|
| ### Risks |
| - Misinterpretation of uncertainty estimates as calibrated probabilities |
| - False confidence on near-OOD samples |
| - Dataset-specific biases (e.g., acquisition site, staining protocols) |
|
|
| ### Recommendations |
| - Always use with human-in-the-loop (e.g., pathologist review) |
| - Validate on local institutional data before deployment |
| - Use uncertainty thresholds conservatively for rejection |