Feature Extraction
PyTorch
English
moozy
pathology
computational-pathology
digital-pathology
foundation-model
whole-slide-image
vision-transformer
self-supervised-learning
slide-encoder
case-encoder
histopathology
medical-imaging
multiple-instance-learning
slide-level-representation
patient-level-representation
multi-task-learning
survival-analysis
cancer
oncology
tissue-classification
mutation-prediction
TCGA
CPTAC
transformer
Eval Results (legacy)
Commit ·
27b04b3
1
Parent(s): a795080
docs: add model card and data scale figure
Browse filesCo-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- .gitattributes +1 -0
- README.md +282 -0
- assets/data_scale_overview.png +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,282 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
library_name: moozy
|
| 4 |
+
pipeline_tag: feature-extraction
|
| 5 |
+
base_model: 1aurent/vit_small_patch8_224.lunit_dino
|
| 6 |
+
tags:
|
| 7 |
+
- pathology
|
| 8 |
+
- computational-pathology
|
| 9 |
+
- digital-pathology
|
| 10 |
+
- foundation-model
|
| 11 |
+
- whole-slide-image
|
| 12 |
+
- vision-transformer
|
| 13 |
+
- self-supervised-learning
|
| 14 |
+
- slide-encoder
|
| 15 |
+
- case-encoder
|
| 16 |
+
- histopathology
|
| 17 |
+
- medical-imaging
|
| 18 |
+
- multiple-instance-learning
|
| 19 |
+
- slide-level-representation
|
| 20 |
+
- patient-level-representation
|
| 21 |
+
- multi-task-learning
|
| 22 |
+
- survival-analysis
|
| 23 |
+
- cancer
|
| 24 |
+
- oncology
|
| 25 |
+
- tissue-classification
|
| 26 |
+
- mutation-prediction
|
| 27 |
+
- TCGA
|
| 28 |
+
- CPTAC
|
| 29 |
+
- pytorch
|
| 30 |
+
- transformer
|
| 31 |
+
datasets:
|
| 32 |
+
- MahmoodLab/Patho-Bench
|
| 33 |
+
metrics:
|
| 34 |
+
- f1
|
| 35 |
+
- roc_auc
|
| 36 |
+
- accuracy
|
| 37 |
+
language:
|
| 38 |
+
- en
|
| 39 |
+
model-index:
|
| 40 |
+
- name: MOOZY
|
| 41 |
+
results:
|
| 42 |
+
- task:
|
| 43 |
+
type: image-classification
|
| 44 |
+
name: Residual Cancer Burden Classification
|
| 45 |
+
dataset:
|
| 46 |
+
type: bc_therapy
|
| 47 |
+
name: BC Therapy
|
| 48 |
+
metrics:
|
| 49 |
+
- type: f1
|
| 50 |
+
value: 0.56
|
| 51 |
+
name: Weighted F1
|
| 52 |
+
- type: roc_auc
|
| 53 |
+
value: 0.74
|
| 54 |
+
name: Weighted ROC-AUC
|
| 55 |
+
- type: accuracy
|
| 56 |
+
value: 0.51
|
| 57 |
+
name: Balanced Accuracy
|
| 58 |
+
- task:
|
| 59 |
+
type: image-classification
|
| 60 |
+
name: TP53 Mutation Prediction
|
| 61 |
+
dataset:
|
| 62 |
+
type: cptac_brca
|
| 63 |
+
name: CPTAC-BRCA
|
| 64 |
+
metrics:
|
| 65 |
+
- type: f1
|
| 66 |
+
value: 0.87
|
| 67 |
+
name: Weighted F1
|
| 68 |
+
- type: roc_auc
|
| 69 |
+
value: 0.86
|
| 70 |
+
name: Weighted ROC-AUC
|
| 71 |
+
- type: accuracy
|
| 72 |
+
value: 0.86
|
| 73 |
+
name: Balanced Accuracy
|
| 74 |
+
- task:
|
| 75 |
+
type: image-classification
|
| 76 |
+
name: BAP1 Mutation Prediction
|
| 77 |
+
dataset:
|
| 78 |
+
type: cptac_ccrcc
|
| 79 |
+
name: CPTAC-CCRCC
|
| 80 |
+
metrics:
|
| 81 |
+
- type: f1
|
| 82 |
+
value: 0.89
|
| 83 |
+
name: Weighted F1
|
| 84 |
+
- type: roc_auc
|
| 85 |
+
value: 0.79
|
| 86 |
+
name: Weighted ROC-AUC
|
| 87 |
+
- type: accuracy
|
| 88 |
+
value: 0.78
|
| 89 |
+
name: Balanced Accuracy
|
| 90 |
+
- task:
|
| 91 |
+
type: image-classification
|
| 92 |
+
name: ACVR2A Mutation Prediction
|
| 93 |
+
dataset:
|
| 94 |
+
type: cptac_coad
|
| 95 |
+
name: CPTAC-COAD
|
| 96 |
+
metrics:
|
| 97 |
+
- type: f1
|
| 98 |
+
value: 0.91
|
| 99 |
+
name: Weighted F1
|
| 100 |
+
- type: roc_auc
|
| 101 |
+
value: 0.91
|
| 102 |
+
name: Weighted ROC-AUC
|
| 103 |
+
- type: accuracy
|
| 104 |
+
value: 0.90
|
| 105 |
+
name: Balanced Accuracy
|
| 106 |
+
- task:
|
| 107 |
+
type: image-classification
|
| 108 |
+
name: Histologic Grade Classification
|
| 109 |
+
dataset:
|
| 110 |
+
type: cptac_lscc
|
| 111 |
+
name: CPTAC-LSCC
|
| 112 |
+
metrics:
|
| 113 |
+
- type: f1
|
| 114 |
+
value: 0.78
|
| 115 |
+
name: Weighted F1
|
| 116 |
+
- type: roc_auc
|
| 117 |
+
value: 0.75
|
| 118 |
+
name: Weighted ROC-AUC
|
| 119 |
+
- type: accuracy
|
| 120 |
+
value: 0.77
|
| 121 |
+
name: Balanced Accuracy
|
| 122 |
+
- task:
|
| 123 |
+
type: image-classification
|
| 124 |
+
name: KRAS Mutation Prediction
|
| 125 |
+
dataset:
|
| 126 |
+
type: cptac_luad
|
| 127 |
+
name: CPTAC-LUAD
|
| 128 |
+
metrics:
|
| 129 |
+
- type: f1
|
| 130 |
+
value: 0.85
|
| 131 |
+
name: Weighted F1
|
| 132 |
+
- type: roc_auc
|
| 133 |
+
value: 0.80
|
| 134 |
+
name: Weighted ROC-AUC
|
| 135 |
+
- type: accuracy
|
| 136 |
+
value: 0.79
|
| 137 |
+
name: Balanced Accuracy
|
| 138 |
+
- task:
|
| 139 |
+
type: image-classification
|
| 140 |
+
name: IDH Status Classification
|
| 141 |
+
dataset:
|
| 142 |
+
type: ebrains
|
| 143 |
+
name: EBRAINS
|
| 144 |
+
metrics:
|
| 145 |
+
- type: f1
|
| 146 |
+
value: 0.97
|
| 147 |
+
name: Weighted F1
|
| 148 |
+
- type: roc_auc
|
| 149 |
+
value: 0.99
|
| 150 |
+
name: Weighted ROC-AUC
|
| 151 |
+
- type: accuracy
|
| 152 |
+
value: 0.97
|
| 153 |
+
name: Balanced Accuracy
|
| 154 |
+
- task:
|
| 155 |
+
type: image-classification
|
| 156 |
+
name: Treatment Response Prediction
|
| 157 |
+
dataset:
|
| 158 |
+
type: mbc
|
| 159 |
+
name: MBC
|
| 160 |
+
metrics:
|
| 161 |
+
- type: f1
|
| 162 |
+
value: 0.58
|
| 163 |
+
name: Weighted F1
|
| 164 |
+
- type: roc_auc
|
| 165 |
+
value: 0.68
|
| 166 |
+
name: Weighted ROC-AUC
|
| 167 |
+
- type: accuracy
|
| 168 |
+
value: 0.48
|
| 169 |
+
name: Balanced Accuracy
|
| 170 |
+
---
|
| 171 |
+
|
| 172 |
+
# MOOZY: A Patient-First Foundation Model for Computational Pathology
|
| 173 |
+
|
| 174 |
+
<p align="center">
|
| 175 |
+
<a href="https://github.com/AtlasAnalyticsLab/MOOZY"><img src="https://img.shields.io/badge/GitHub-Repository-181717?logo=github" alt="GitHub"></a>
|
| 176 |
+
<a href="https://pypi.org/project/moozy/"><img src="https://img.shields.io/pypi/v/moozy?logo=pypi&logoColor=white&label=PyPI" alt="PyPI"></a>
|
| 177 |
+
<a href="#citation"><img src="https://img.shields.io/badge/Paper-Coming%20Soon-B31B1B" alt="Paper"></a>
|
| 178 |
+
</p>
|
| 179 |
+
|
| 180 |
+
MOOZY is a slide and patient-level foundation model for computational pathology. The patient case, not the individual slide, is the core unit of representation. A vision-only slide encoder pretrained with masked self-distillation on 77,134 public slides is aligned with clinical semantics through multi-task supervision over 333 tasks (205 classification, 128 survival) from 56 public datasets spanning 23 anatomical sites. A case transformer explicitly models dependencies across all slides from the same patient, replacing the naive early/late fusion used by prior methods. 85.77M total parameters. Trained entirely on public data.
|
| 181 |
+
|
| 182 |
+

|
| 183 |
+
|
| 184 |
+
## Table of Contents
|
| 185 |
+
|
| 186 |
+
- [Installation](#installation)
|
| 187 |
+
- [Usage](#usage)
|
| 188 |
+
- [From pre-computed H5 feature files](#from-pre-computed-h5-feature-files)
|
| 189 |
+
- [From raw whole-slide images](#from-raw-whole-slide-images)
|
| 190 |
+
- [Python API](#python-api)
|
| 191 |
+
- [Arguments](#arguments)
|
| 192 |
+
- [Output format](#output-format)
|
| 193 |
+
- [Architecture](#architecture)
|
| 194 |
+
- [Tasks](#tasks)
|
| 195 |
+
- [Citation](#citation)
|
| 196 |
+
- [License](#license)
|
| 197 |
+
|
| 198 |
+
## Installation
|
| 199 |
+
|
| 200 |
+
```bash
|
| 201 |
+
pip install moozy
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
The checkpoint and task definitions are downloaded automatically from this repository on first use.
|
| 205 |
+
|
| 206 |
+
## Usage
|
| 207 |
+
|
| 208 |
+
### From pre-computed H5 feature files
|
| 209 |
+
|
| 210 |
+
The faster path. Pass `.h5` files containing patch features extracted with `lunit_vit_small_patch8_dino` at 224x224 patch size. Compatible with [AtlasPatch](https://github.com/AtlasAnalyticsLab/AtlasPatch) and [TRIDENT](https://github.com/mahmoodlab/TRIDENT) outputs.
|
| 211 |
+
|
| 212 |
+
```bash
|
| 213 |
+
moozy encode slide_1.h5 slide_2.h5 --output case_embedding.h5
|
| 214 |
+
```
|
| 215 |
+
|
| 216 |
+
### From raw whole-slide images
|
| 217 |
+
|
| 218 |
+
Pass slide files directly (`.svs`, `.tiff`, `.ndpi`, `.mrxs`, etc.). MOOZY calls [AtlasPatch](https://github.com/AtlasAnalyticsLab/AtlasPatch) under the hood to segment tissue, extract patches, and compute features. Requires `atlas-patch`, `sam2`, and the OpenSlide system library (see the [AtlasPatch installation guide](https://github.com/AtlasAnalyticsLab/AtlasPatch#installation)).
|
| 219 |
+
|
| 220 |
+
```bash
|
| 221 |
+
moozy encode slide_1.svs slide_2.svs --output case_embedding.h5 --target_mag 20
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
+
### Python API
|
| 225 |
+
|
| 226 |
+
```python
|
| 227 |
+
from moozy.encoding import run_encoding
|
| 228 |
+
|
| 229 |
+
# From H5 feature files
|
| 230 |
+
run_encoding(
|
| 231 |
+
slide_paths=["slide_1.h5", "slide_2.h5"],
|
| 232 |
+
output_path="case_embedding.h5",
|
| 233 |
+
)
|
| 234 |
+
|
| 235 |
+
# From raw slides
|
| 236 |
+
run_encoding(
|
| 237 |
+
slide_paths=["slide_1.svs", "slide_2.svs"],
|
| 238 |
+
output_path="case_embedding.h5",
|
| 239 |
+
target_mag=20,
|
| 240 |
+
)
|
| 241 |
+
```
|
| 242 |
+
|
| 243 |
+
### Arguments
|
| 244 |
+
|
| 245 |
+
| Argument | Default | Description |
|
| 246 |
+
|----------|---------|-------------|
|
| 247 |
+
| `SLIDES` | (required) | One or more H5 feature files or raw slide files forming a single case. Cannot mix the two types. |
|
| 248 |
+
| `--output`, `-o` | (required) | Output H5 file path. |
|
| 249 |
+
| `--mixed_precision` | off | Enable bfloat16 mixed precision. |
|
| 250 |
+
| `--target_mag` | 20 | Magnification for patch extraction from raw slides. Ignored for H5. |
|
| 251 |
+
| `--step_size` | 224 | Stride between patch centers in pixels. Set < 224 for overlap. Ignored for H5. |
|
| 252 |
+
| `--mpp_csv` | - | CSV with `wsi,mpp` columns for microns-per-pixel overrides. Ignored for H5. |
|
| 253 |
+
|
| 254 |
+
### Output format
|
| 255 |
+
|
| 256 |
+
The output H5 file contains a `features` dataset (768-D float32 case embedding) and a `coords` dataset with slide metadata.
|
| 257 |
+
|
| 258 |
+
## Architecture
|
| 259 |
+
|
| 260 |
+
| Component | Architecture | Params | Output dim |
|
| 261 |
+
|-----------|-------------|--------|------------|
|
| 262 |
+
| Patch encoder | ViT-S/8 (Lunit DINO) | 21.67M | 384 |
|
| 263 |
+
| Slide encoder | ViT, 6 layers, 768-D, 12 heads, 2D ALiBi | 42.8M | 768 |
|
| 264 |
+
| Case transformer | 3 layers, 12 heads | 21.3M | 768 |
|
| 265 |
+
|
| 266 |
+
## Tasks
|
| 267 |
+
|
| 268 |
+
This repository includes 333 task definitions in the `tasks/` directory. Each task has a `config.yaml` (task type, organ, label mapping) and a `task.csv` (annotations and splits). The tasks cover 205 classification and 128 survival endpoints across 32 TCGA cohorts, 14 CPTAC cohorts, the REG dataset, and other public sources.
|
| 269 |
+
|
| 270 |
+
## Citation
|
| 271 |
+
|
| 272 |
+
```bibtex
|
| 273 |
+
@article{moozy,
|
| 274 |
+
title = {MOOZY: A Patient-First Foundation Model for Computational Pathology},
|
| 275 |
+
author = {TODO},
|
| 276 |
+
year = {TODO},
|
| 277 |
+
}
|
| 278 |
+
```
|
| 279 |
+
|
| 280 |
+
## License
|
| 281 |
+
|
| 282 |
+
[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Research and non-commercial use only.
|
assets/data_scale_overview.png
ADDED
|
Git LFS Details
|