YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Mag-DINO
Title
Mag-DINO: Hierarchical Multi-Magnification Self-Supervised Learning (20x/40x) for Prostate Cancer Gleason Grading
Summary. This repository provides pretrained self-supervised Vision Transformer (ViT) models (DINO-v1/v2) for automated Gleason grade classification from prostate cancer whole-slide images (WSIs). Beyond high accuracy, the models emphasize reproducibility and potential clinical decision-support value by addressing inter-observer variability and workload in pathology.
Description
Mag-DINO is a hierarchical, multi-magnification histopathology pipeline that combines information from 20x and 40x views of prostate tissue. The framework uses self-supervised representation learning (DINO family) to obtain robust visual embeddings from WSI-derived patches, then trains downstream classifiers for Gleason grade prediction. The central idea is to capture both broader glandular context (lower magnification) and fine cellular morphology (higher magnification) within a unified workflow.
Dataset Information
- Primary dataset: TCGA-PRAD (The Cancer Genome Atlas - Prostate Adenocarcinoma)
- Scale used in this study: 403 patients, 449 diagnostic slides, 81,126 tiles (224x224)
- Task: multi-class Gleason pattern/grade group classification
- Split strategy: patient-level split (80% train / 20% test) to prevent leakage
- Data access/reference: please cite the dataset DOI/reference used in your manuscript and include the same citation in downstream publications.
π©Ί Clinical Motivation & Impact
Manual Gleason grading is time-consuming and subject to inter-observer variability, which can affect diagnostic consistency and treatment planning. By learning label-efficient, stain-robust representations with self-supervised learning (SSL), these models aim to:
- improve grading reproducibility,
- reduce pathologist workload,
- and facilitate timely, consistent decision-making as part of a decision-support pipeline.
Note: These models are intended for research and development of clinical decision-support systems; they are not cleared for direct clinical use.
π Model Description
- Framework: DINO (self-distillation without labels)
- Backbones evaluated: ViT-B/16 (DINO-v1), ViT-L/14 (DINO-v2)
- Downstream heads: MLP (best), k-NN, CNN heads (e.g., DenseNet)
- Training objective: SSL pretraining on histology tiles, followed by supervised fine-tuning for Gleason classes
- Library: PyTorch / Hugging Face Transformers
π§ͺ Data & Training
- Dataset: TCGA-PRAD
- Cases / slides / patches: 403 patients, 449 diagnostic slides, 81,126 224Γ224 tiles
- Split: patient-level (no patient leakage) β 80% train / 20% test
- Preprocessing: standard WSI tiling; color/stain variability present in TCGA
- Goal: multi-class Gleason grade classification
Classes
3+3, 3+4, 3+5, 4+3, 4+4, 4+5, 5+3, 5+4, 5+5
π Results
Best overall configuration: DINO-v1 ViT-B/16 + MLP
| Model | Backbone | Classifier | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|---|
| DINO-v1 (ViT-B/16) | B/16 | MLP | 90.40 | 90.40 | 90.39 | 90.36 |
| DINO-v1 (ViT-B/16) | B/16 | k-NN | 89.36 | 89.19 | 89.38 | 89.16 |
| DINO-v2 (ViT-L/14) | L/14 | MLP | 83.31 | 83.16 | 83.31 | 83.17 |
| CNN (DenseNet) | β | FC head | 87.86 | 87.17 | 87.86 | 87.20 |
- ROC-AUC: 0.991 (MLP)
- Agreement: Cohenβs ΞΊ = 0.89 (substantial)
- Feature analysis: SSL features showed robustness to stain variation & histologic heterogeneity; features 11 & 13 were most discriminative (feature 19 weak).
Interpretation. SSL-derived ViT features outperformed supervised baselines and were more robust to known sources of variability in computational pathologyβsupporting reproducible grading and potential clinical workflow integration (after appropriate validation).
π Key Contributions
- A unified multi-SSL pipeline (DINO-v1/v2, iBOT, token registration) for Gleason grading.
- Evidence that SSL features improve robustness/generalization over supervised baselines.
- Feature-level statistical validation (e.g., ANOVA, discriminant power).
- Pretrained weights released for reproducibility and benchmarking.
π©Έ Intended Use
Research use only. Suitable for:
- prototyping automated pathology tools,
- benchmarking histopathology classifiers,
- exploring self-supervised learning in medical imaging,
- building decision-support pipelines (with additional validation).
β οΈ Not for clinical use. External, multi-center validation and regulatory clearance are required prior to any deployment impacting patient care.
βοΈ Limitations & Ethical Considerations
- Domain shift: Trained on TCGA-PRAD; performance may vary with scanner type, staining protocol, lab workflow, or demographics.
- Generalization: Requires multi-institutional external validation.
- Fairness & bias: Assess subgroup performance before deployment.
- Human-in-the-loop: Models should augment, not replace, expert pathology review.
π¬ How to Use
from transformers import AutoImageProcessor, ViTForImageClassification
from PIL import Image
import torch
model_id = "buseyaren/self-supervised-prostate-cancer"
processor = AutoImageProcessor.from_pretrained(model_id)
model = ViTForImageClassification.from_pretrained(model_id)
model.eval()
img = Image.open("example_tile.png").convert("RGB")
inputs = processor(images=img, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = logits.softmax(dim=-1)
pred_id = probs.argmax(dim=-1).item()
pred_score = probs[0, pred_id].item()
id2label = model.config.id2label if hasattr(model.config, "id2label") else {}
print("Predicted class:", id2label.get(pred_id, pred_id), f"(p={pred_score:.3f})")
Code Information
Repository scripts are organized under codes/:
codes/extraction-wsis/: WSI download/filtering, slide-level views, tissue-aware patch extraction.codes/training/: DINO/ViT self-supervised training utilities.codes/feature-extraction/: embedding extraction from trained/self-supervised backbones.codes/classification/: downstream classifiers (MLP, k-NN, logistic regression, weighted variants).codes/evaluation/: evaluation scripts for linear/k-NN and related metrics.
Representative scripts:
codes/extraction-wsis/automatic_download_aws.pycodes/extraction-wsis/crop_20x_50percent.pycodes/feature-extraction/extract_features_andknn.pycodes/classification/mlp_classifier.pycodes/evaluation/eval_linear.pycodes/training/main_dino.py
Usage Instructions
1) Environment setup
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
If no requirements.txt is provided in your local copy, install core packages manually (see Requirements section).
2) Data preparation (WSI and patch extraction)
python codes/extraction-wsis/automatic_download_aws.py
python codes/extraction-wsis/data_filtering.py
python codes/extraction-wsis/crop_20x_50percent.py
3) Self-supervised training (optional, for reproduction)
python codes/training/main_dino.py
4) Feature extraction
python codes/feature-extraction/extract_features_andknn.py
5) Downstream training and evaluation
python codes/classification/mlp_classifier.py
python codes/evaluation/eval_linear.py
6) Hugging Face inference with released model weights
Use the Python inference snippet in the How to Use section above.
Requirements
Core dependencies:
- Python >= 3.9
- PyTorch
- torchvision
- transformers
- numpy
- scikit-learn
- pandas
- Pillow
- matplotlib
- OpenSlide (for WSI reading/extraction scripts)
System-level note: OpenSlide may require OS-level installation (e.g., libopenslide) in addition to Python bindings.
Methodology
- WSI curation and filtering: collect diagnostic slides and metadata, then apply quality/label filtering.
- Hierarchical multi-magnification patching: extract tissue-relevant patches at 20x/40x views.
- Self-supervised representation learning: pretrain/transfer DINO-based ViT encoders on histology data.
- Feature extraction: generate fixed embeddings for train/test cohorts.
- Downstream classification: train MLP, k-NN, and linear baselines on extracted embeddings.
- Evaluation and analysis: report Accuracy/Precision/Recall/F1, ROC-AUC, agreement metrics (Cohen's kappa), and feature-level statistics.
π§© Pipeline Scripts (optional)
In addition to the pretrained Hugging Face model, this repository also includes research scripts under codes/ to reproduce parts of the end-to-end pipeline:
codes/extraction-wsis/: WSI (whole-slide image) preparation, filtering, and patch/overview extraction (e.g., slide downloading, slide-level views, and tissue-aware patch extraction).codes/training/: DINO / ViT self-supervised pretraining code.codes/feature-extraction/: embedding/feature extraction from images using a DINO checkpoint (and related utilities).codes/classification/: classical downstream classifiers trained on extracted embeddings (e.g., MLP / k-NN variants, logistic regression, etc.).codes/evaluation/: evaluation utilities for the downstream classifiers.
These scripts are research prototypes and may require you to edit hard-coded paths/parameters at the top of each file to match your dataset layout and compute environment.
Representative scripts (edit paths/params in the files)
codes/extraction-wsis/automatic_download_aws.py: download TCGA WSI files from S3 URLs (usescurl).codes/extraction-wsis/data_filtering.py: filter a TCGA CSV and copy.svsfiles; can also organize bygleason_grade.codes/extraction-wsis/whole_view_wsis.pyandcodes/extraction-wsis/extract_slide_level.py: create low-magnification overview images and inspect slide metadata.codes/extraction-wsis/crop_20x_50percent.py: OpenSlide-based tissue-aware patch extraction (saves 256x256 patches).codes/feature-extraction/extract_features_andknn.py: extract DINO teacher embeddings and train/evaluate a k-NN classifier.codes/classification/: downstream classifiers trained on extracted embeddings (mlp_classifier.py,knn_classifier*.py,mlp_classifier_weighted.py,logistic_regression_classifier.py).codes/evaluation/: evaluation utilities (eval_linear.py,eval_knn.py).codes/training/main_dino.py: DINO-style ViT self-supervised pretraining (requires DINO/ViT dependencies such asvision_transformer).
Minimal workflow (high level)
- Prepare your WSIs locally (and update paths inside
codes/extraction-wsis/scripts). - Extract tissue-aware patches (256x256) or create the intermediate images required by your setup.
- Extract embeddings/features from patches using a DINO checkpoint (
codes/feature-extraction/). - Train and evaluate classical classifiers on the embeddings (
codes/classification/andcodes/evaluation/).
Notes
- OpenSlide is typically required for WSI reading in the extraction scripts.
- Use patient-level splitting to reduce the risk of data leakage.
- Downstream classifiers expect precomputed artifacts with specific filenames.
codes/classification/mlp_classifier.pyandcodes/classification/logistic_regression_classifier.pyuse these hard-coded relative filenames, so the files must be reachable from the scriptβs working directory (or you must edit the paths in the scripts):features_train_epoch64.npy,labels_train_epoch64.npy,features_test_epoch64.npy,labels_test_epoch64.npy,case_ids_train.pkl,case_ids_test.pkl. - You can rename the top-level folder (e.g.,
features/->embeddings/or any other name), but you must either run the classifier script from the folder containing the files listed above, or updatetrain_feat_path,train_lab_path,test_feat_path,test_lab_path,train_case_ids_path, andtest_case_ids_pathat the top of the script. - Feature consistency check:
codes/feature-extraction/check_features.pycurrently checksfeatures_train_epoch140.npy/features_test_epoch140.npy. - For certain DINO-v1/v2 backbone configurations, we provide diagnostic outputs (e.g., confusion-matrix visualizations) as performance summaries, while the corresponding full embedding artifacts are not included in the released package.
Citations
If you use this repository, please cite:
- This Mag-DINO work (replace with your final paper citation once available).
- DINO / DINOv2 foundational papers used for self-supervised representation learning.
- TCGA-PRAD dataset reference (including DOI/official citation used in your manuscript).
@misc{magdino2026,
title = {Mag-DINO: Hierarchical Multi-Magnification Self-Supervised Learning for Prostate Cancer Gleason Grading},
author = {Yaren Buse and collaborators},
year = {2026},
howpublished = {GitHub/Hugging Face repository},
note = {Research code and pretrained models}
}