YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Mag-DINO

Title

Mag-DINO: Hierarchical Multi-Magnification Self-Supervised Learning (20x/40x) for Prostate Cancer Gleason Grading

Summary. This repository provides pretrained self-supervised Vision Transformer (ViT) models (DINO-v1/v2) for automated Gleason grade classification from prostate cancer whole-slide images (WSIs). Beyond high accuracy, the models emphasize reproducibility and potential clinical decision-support value by addressing inter-observer variability and workload in pathology.


Description

Mag-DINO is a hierarchical, multi-magnification histopathology pipeline that combines information from 20x and 40x views of prostate tissue. The framework uses self-supervised representation learning (DINO family) to obtain robust visual embeddings from WSI-derived patches, then trains downstream classifiers for Gleason grade prediction. The central idea is to capture both broader glandular context (lower magnification) and fine cellular morphology (higher magnification) within a unified workflow.


Dataset Information

  • Primary dataset: TCGA-PRAD (The Cancer Genome Atlas - Prostate Adenocarcinoma)
  • Scale used in this study: 403 patients, 449 diagnostic slides, 81,126 tiles (224x224)
  • Task: multi-class Gleason pattern/grade group classification
  • Split strategy: patient-level split (80% train / 20% test) to prevent leakage
  • Data access/reference: please cite the dataset DOI/reference used in your manuscript and include the same citation in downstream publications.

🩺 Clinical Motivation & Impact

Manual Gleason grading is time-consuming and subject to inter-observer variability, which can affect diagnostic consistency and treatment planning. By learning label-efficient, stain-robust representations with self-supervised learning (SSL), these models aim to:

  • improve grading reproducibility,
  • reduce pathologist workload,
  • and facilitate timely, consistent decision-making as part of a decision-support pipeline.

Note: These models are intended for research and development of clinical decision-support systems; they are not cleared for direct clinical use.


πŸ“ Model Description

  • Framework: DINO (self-distillation without labels)
  • Backbones evaluated: ViT-B/16 (DINO-v1), ViT-L/14 (DINO-v2)
  • Downstream heads: MLP (best), k-NN, CNN heads (e.g., DenseNet)
  • Training objective: SSL pretraining on histology tiles, followed by supervised fine-tuning for Gleason classes
  • Library: PyTorch / Hugging Face Transformers

πŸ§ͺ Data & Training

  • Dataset: TCGA-PRAD
  • Cases / slides / patches: 403 patients, 449 diagnostic slides, 81,126 224Γ—224 tiles
  • Split: patient-level (no patient leakage) β€” 80% train / 20% test
  • Preprocessing: standard WSI tiling; color/stain variability present in TCGA
  • Goal: multi-class Gleason grade classification

Classes

3+3, 3+4, 3+5, 4+3, 4+4, 4+5, 5+3, 5+4, 5+5


πŸ“ˆ Results

Best overall configuration: DINO-v1 ViT-B/16 + MLP

Model Backbone Classifier Accuracy Precision Recall F1
DINO-v1 (ViT-B/16) B/16 MLP 90.40 90.40 90.39 90.36
DINO-v1 (ViT-B/16) B/16 k-NN 89.36 89.19 89.38 89.16
DINO-v2 (ViT-L/14) L/14 MLP 83.31 83.16 83.31 83.17
CNN (DenseNet) – FC head 87.86 87.17 87.86 87.20
  • ROC-AUC: 0.991 (MLP)
  • Agreement: Cohen’s ΞΊ = 0.89 (substantial)
  • Feature analysis: SSL features showed robustness to stain variation & histologic heterogeneity; features 11 & 13 were most discriminative (feature 19 weak).

Interpretation. SSL-derived ViT features outperformed supervised baselines and were more robust to known sources of variability in computational pathologyβ€”supporting reproducible grading and potential clinical workflow integration (after appropriate validation).


πŸš€ Key Contributions

  • A unified multi-SSL pipeline (DINO-v1/v2, iBOT, token registration) for Gleason grading.
  • Evidence that SSL features improve robustness/generalization over supervised baselines.
  • Feature-level statistical validation (e.g., ANOVA, discriminant power).
  • Pretrained weights released for reproducibility and benchmarking.

🩸 Intended Use

Research use only. Suitable for:

  • prototyping automated pathology tools,
  • benchmarking histopathology classifiers,
  • exploring self-supervised learning in medical imaging,
  • building decision-support pipelines (with additional validation).

⚠️ Not for clinical use. External, multi-center validation and regulatory clearance are required prior to any deployment impacting patient care.


βš–οΈ Limitations & Ethical Considerations

  • Domain shift: Trained on TCGA-PRAD; performance may vary with scanner type, staining protocol, lab workflow, or demographics.
  • Generalization: Requires multi-institutional external validation.
  • Fairness & bias: Assess subgroup performance before deployment.
  • Human-in-the-loop: Models should augment, not replace, expert pathology review.

πŸ”¬ How to Use

from transformers import AutoImageProcessor, ViTForImageClassification
from PIL import Image
import torch

model_id = "buseyaren/self-supervised-prostate-cancer"

processor = AutoImageProcessor.from_pretrained(model_id)
model = ViTForImageClassification.from_pretrained(model_id)
model.eval()

img = Image.open("example_tile.png").convert("RGB")
inputs = processor(images=img, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = logits.softmax(dim=-1)
    pred_id = probs.argmax(dim=-1).item()
    pred_score = probs[0, pred_id].item()

id2label = model.config.id2label if hasattr(model.config, "id2label") else {}
print("Predicted class:", id2label.get(pred_id, pred_id), f"(p={pred_score:.3f})")

Code Information

Repository scripts are organized under codes/:

  • codes/extraction-wsis/: WSI download/filtering, slide-level views, tissue-aware patch extraction.
  • codes/training/: DINO/ViT self-supervised training utilities.
  • codes/feature-extraction/: embedding extraction from trained/self-supervised backbones.
  • codes/classification/: downstream classifiers (MLP, k-NN, logistic regression, weighted variants).
  • codes/evaluation/: evaluation scripts for linear/k-NN and related metrics.

Representative scripts:

  • codes/extraction-wsis/automatic_download_aws.py
  • codes/extraction-wsis/crop_20x_50percent.py
  • codes/feature-extraction/extract_features_andknn.py
  • codes/classification/mlp_classifier.py
  • codes/evaluation/eval_linear.py
  • codes/training/main_dino.py

Usage Instructions

1) Environment setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

If no requirements.txt is provided in your local copy, install core packages manually (see Requirements section).

2) Data preparation (WSI and patch extraction)

python codes/extraction-wsis/automatic_download_aws.py
python codes/extraction-wsis/data_filtering.py
python codes/extraction-wsis/crop_20x_50percent.py

3) Self-supervised training (optional, for reproduction)

python codes/training/main_dino.py

4) Feature extraction

python codes/feature-extraction/extract_features_andknn.py

5) Downstream training and evaluation

python codes/classification/mlp_classifier.py
python codes/evaluation/eval_linear.py

6) Hugging Face inference with released model weights

Use the Python inference snippet in the How to Use section above.


Requirements

Core dependencies:

  • Python >= 3.9
  • PyTorch
  • torchvision
  • transformers
  • numpy
  • scikit-learn
  • pandas
  • Pillow
  • matplotlib
  • OpenSlide (for WSI reading/extraction scripts)

System-level note: OpenSlide may require OS-level installation (e.g., libopenslide) in addition to Python bindings.


Methodology

  1. WSI curation and filtering: collect diagnostic slides and metadata, then apply quality/label filtering.
  2. Hierarchical multi-magnification patching: extract tissue-relevant patches at 20x/40x views.
  3. Self-supervised representation learning: pretrain/transfer DINO-based ViT encoders on histology data.
  4. Feature extraction: generate fixed embeddings for train/test cohorts.
  5. Downstream classification: train MLP, k-NN, and linear baselines on extracted embeddings.
  6. Evaluation and analysis: report Accuracy/Precision/Recall/F1, ROC-AUC, agreement metrics (Cohen's kappa), and feature-level statistics.

🧩 Pipeline Scripts (optional)

In addition to the pretrained Hugging Face model, this repository also includes research scripts under codes/ to reproduce parts of the end-to-end pipeline:

  • codes/extraction-wsis/: WSI (whole-slide image) preparation, filtering, and patch/overview extraction (e.g., slide downloading, slide-level views, and tissue-aware patch extraction).
  • codes/training/: DINO / ViT self-supervised pretraining code.
  • codes/feature-extraction/: embedding/feature extraction from images using a DINO checkpoint (and related utilities).
  • codes/classification/: classical downstream classifiers trained on extracted embeddings (e.g., MLP / k-NN variants, logistic regression, etc.).
  • codes/evaluation/: evaluation utilities for the downstream classifiers.

These scripts are research prototypes and may require you to edit hard-coded paths/parameters at the top of each file to match your dataset layout and compute environment.

Representative scripts (edit paths/params in the files)

  • codes/extraction-wsis/automatic_download_aws.py: download TCGA WSI files from S3 URLs (uses curl).
  • codes/extraction-wsis/data_filtering.py: filter a TCGA CSV and copy .svs files; can also organize by gleason_grade.
  • codes/extraction-wsis/whole_view_wsis.py and codes/extraction-wsis/extract_slide_level.py: create low-magnification overview images and inspect slide metadata.
  • codes/extraction-wsis/crop_20x_50percent.py: OpenSlide-based tissue-aware patch extraction (saves 256x256 patches).
  • codes/feature-extraction/extract_features_andknn.py: extract DINO teacher embeddings and train/evaluate a k-NN classifier.
  • codes/classification/: downstream classifiers trained on extracted embeddings (mlp_classifier.py, knn_classifier*.py, mlp_classifier_weighted.py, logistic_regression_classifier.py).
  • codes/evaluation/: evaluation utilities (eval_linear.py, eval_knn.py).
  • codes/training/main_dino.py: DINO-style ViT self-supervised pretraining (requires DINO/ViT dependencies such as vision_transformer).

Minimal workflow (high level)

  1. Prepare your WSIs locally (and update paths inside codes/extraction-wsis/ scripts).
  2. Extract tissue-aware patches (256x256) or create the intermediate images required by your setup.
  3. Extract embeddings/features from patches using a DINO checkpoint (codes/feature-extraction/).
  4. Train and evaluate classical classifiers on the embeddings (codes/classification/ and codes/evaluation/).

Notes

  • OpenSlide is typically required for WSI reading in the extraction scripts.
  • Use patient-level splitting to reduce the risk of data leakage.
  • Downstream classifiers expect precomputed artifacts with specific filenames. codes/classification/mlp_classifier.py and codes/classification/logistic_regression_classifier.py use these hard-coded relative filenames, so the files must be reachable from the script’s working directory (or you must edit the paths in the scripts): features_train_epoch64.npy, labels_train_epoch64.npy, features_test_epoch64.npy, labels_test_epoch64.npy, case_ids_train.pkl, case_ids_test.pkl.
  • You can rename the top-level folder (e.g., features/ -> embeddings/ or any other name), but you must either run the classifier script from the folder containing the files listed above, or update train_feat_path, train_lab_path, test_feat_path, test_lab_path, train_case_ids_path, and test_case_ids_path at the top of the script.
  • Feature consistency check: codes/feature-extraction/check_features.py currently checks features_train_epoch140.npy / features_test_epoch140.npy.
  • For certain DINO-v1/v2 backbone configurations, we provide diagnostic outputs (e.g., confusion-matrix visualizations) as performance summaries, while the corresponding full embedding artifacts are not included in the released package.

Citations

If you use this repository, please cite:

  1. This Mag-DINO work (replace with your final paper citation once available).
  2. DINO / DINOv2 foundational papers used for self-supervised representation learning.
  3. TCGA-PRAD dataset reference (including DOI/official citation used in your manuscript).
@misc{magdino2026,
  title        = {Mag-DINO: Hierarchical Multi-Magnification Self-Supervised Learning for Prostate Cancer Gleason Grading},
  author       = {Yaren Buse and collaborators},
  year         = {2026},
  howpublished = {GitHub/Hugging Face repository},
  note         = {Research code and pretrained models}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support