yousefkotp Claude Opus 4.6 (1M context) commited on
Commit
27b04b3
·
1 Parent(s): a795080

docs: add model card and data scale figure

Browse files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +282 -0
  3. assets/data_scale_overview.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,282 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ library_name: moozy
4
+ pipeline_tag: feature-extraction
5
+ base_model: 1aurent/vit_small_patch8_224.lunit_dino
6
+ tags:
7
+ - pathology
8
+ - computational-pathology
9
+ - digital-pathology
10
+ - foundation-model
11
+ - whole-slide-image
12
+ - vision-transformer
13
+ - self-supervised-learning
14
+ - slide-encoder
15
+ - case-encoder
16
+ - histopathology
17
+ - medical-imaging
18
+ - multiple-instance-learning
19
+ - slide-level-representation
20
+ - patient-level-representation
21
+ - multi-task-learning
22
+ - survival-analysis
23
+ - cancer
24
+ - oncology
25
+ - tissue-classification
26
+ - mutation-prediction
27
+ - TCGA
28
+ - CPTAC
29
+ - pytorch
30
+ - transformer
31
+ datasets:
32
+ - MahmoodLab/Patho-Bench
33
+ metrics:
34
+ - f1
35
+ - roc_auc
36
+ - accuracy
37
+ language:
38
+ - en
39
+ model-index:
40
+ - name: MOOZY
41
+ results:
42
+ - task:
43
+ type: image-classification
44
+ name: Residual Cancer Burden Classification
45
+ dataset:
46
+ type: bc_therapy
47
+ name: BC Therapy
48
+ metrics:
49
+ - type: f1
50
+ value: 0.56
51
+ name: Weighted F1
52
+ - type: roc_auc
53
+ value: 0.74
54
+ name: Weighted ROC-AUC
55
+ - type: accuracy
56
+ value: 0.51
57
+ name: Balanced Accuracy
58
+ - task:
59
+ type: image-classification
60
+ name: TP53 Mutation Prediction
61
+ dataset:
62
+ type: cptac_brca
63
+ name: CPTAC-BRCA
64
+ metrics:
65
+ - type: f1
66
+ value: 0.87
67
+ name: Weighted F1
68
+ - type: roc_auc
69
+ value: 0.86
70
+ name: Weighted ROC-AUC
71
+ - type: accuracy
72
+ value: 0.86
73
+ name: Balanced Accuracy
74
+ - task:
75
+ type: image-classification
76
+ name: BAP1 Mutation Prediction
77
+ dataset:
78
+ type: cptac_ccrcc
79
+ name: CPTAC-CCRCC
80
+ metrics:
81
+ - type: f1
82
+ value: 0.89
83
+ name: Weighted F1
84
+ - type: roc_auc
85
+ value: 0.79
86
+ name: Weighted ROC-AUC
87
+ - type: accuracy
88
+ value: 0.78
89
+ name: Balanced Accuracy
90
+ - task:
91
+ type: image-classification
92
+ name: ACVR2A Mutation Prediction
93
+ dataset:
94
+ type: cptac_coad
95
+ name: CPTAC-COAD
96
+ metrics:
97
+ - type: f1
98
+ value: 0.91
99
+ name: Weighted F1
100
+ - type: roc_auc
101
+ value: 0.91
102
+ name: Weighted ROC-AUC
103
+ - type: accuracy
104
+ value: 0.90
105
+ name: Balanced Accuracy
106
+ - task:
107
+ type: image-classification
108
+ name: Histologic Grade Classification
109
+ dataset:
110
+ type: cptac_lscc
111
+ name: CPTAC-LSCC
112
+ metrics:
113
+ - type: f1
114
+ value: 0.78
115
+ name: Weighted F1
116
+ - type: roc_auc
117
+ value: 0.75
118
+ name: Weighted ROC-AUC
119
+ - type: accuracy
120
+ value: 0.77
121
+ name: Balanced Accuracy
122
+ - task:
123
+ type: image-classification
124
+ name: KRAS Mutation Prediction
125
+ dataset:
126
+ type: cptac_luad
127
+ name: CPTAC-LUAD
128
+ metrics:
129
+ - type: f1
130
+ value: 0.85
131
+ name: Weighted F1
132
+ - type: roc_auc
133
+ value: 0.80
134
+ name: Weighted ROC-AUC
135
+ - type: accuracy
136
+ value: 0.79
137
+ name: Balanced Accuracy
138
+ - task:
139
+ type: image-classification
140
+ name: IDH Status Classification
141
+ dataset:
142
+ type: ebrains
143
+ name: EBRAINS
144
+ metrics:
145
+ - type: f1
146
+ value: 0.97
147
+ name: Weighted F1
148
+ - type: roc_auc
149
+ value: 0.99
150
+ name: Weighted ROC-AUC
151
+ - type: accuracy
152
+ value: 0.97
153
+ name: Balanced Accuracy
154
+ - task:
155
+ type: image-classification
156
+ name: Treatment Response Prediction
157
+ dataset:
158
+ type: mbc
159
+ name: MBC
160
+ metrics:
161
+ - type: f1
162
+ value: 0.58
163
+ name: Weighted F1
164
+ - type: roc_auc
165
+ value: 0.68
166
+ name: Weighted ROC-AUC
167
+ - type: accuracy
168
+ value: 0.48
169
+ name: Balanced Accuracy
170
+ ---
171
+
172
+ # MOOZY: A Patient-First Foundation Model for Computational Pathology
173
+
174
+ <p align="center">
175
+ <a href="https://github.com/AtlasAnalyticsLab/MOOZY"><img src="https://img.shields.io/badge/GitHub-Repository-181717?logo=github" alt="GitHub"></a>
176
+ <a href="https://pypi.org/project/moozy/"><img src="https://img.shields.io/pypi/v/moozy?logo=pypi&logoColor=white&label=PyPI" alt="PyPI"></a>
177
+ <a href="#citation"><img src="https://img.shields.io/badge/Paper-Coming%20Soon-B31B1B" alt="Paper"></a>
178
+ </p>
179
+
180
+ MOOZY is a slide and patient-level foundation model for computational pathology. The patient case, not the individual slide, is the core unit of representation. A vision-only slide encoder pretrained with masked self-distillation on 77,134 public slides is aligned with clinical semantics through multi-task supervision over 333 tasks (205 classification, 128 survival) from 56 public datasets spanning 23 anatomical sites. A case transformer explicitly models dependencies across all slides from the same patient, replacing the naive early/late fusion used by prior methods. 85.77M total parameters. Trained entirely on public data.
181
+
182
+ ![MOOZY data scale](assets/data_scale_overview.png)
183
+
184
+ ## Table of Contents
185
+
186
+ - [Installation](#installation)
187
+ - [Usage](#usage)
188
+ - [From pre-computed H5 feature files](#from-pre-computed-h5-feature-files)
189
+ - [From raw whole-slide images](#from-raw-whole-slide-images)
190
+ - [Python API](#python-api)
191
+ - [Arguments](#arguments)
192
+ - [Output format](#output-format)
193
+ - [Architecture](#architecture)
194
+ - [Tasks](#tasks)
195
+ - [Citation](#citation)
196
+ - [License](#license)
197
+
198
+ ## Installation
199
+
200
+ ```bash
201
+ pip install moozy
202
+ ```
203
+
204
+ The checkpoint and task definitions are downloaded automatically from this repository on first use.
205
+
206
+ ## Usage
207
+
208
+ ### From pre-computed H5 feature files
209
+
210
+ The faster path. Pass `.h5` files containing patch features extracted with `lunit_vit_small_patch8_dino` at 224x224 patch size. Compatible with [AtlasPatch](https://github.com/AtlasAnalyticsLab/AtlasPatch) and [TRIDENT](https://github.com/mahmoodlab/TRIDENT) outputs.
211
+
212
+ ```bash
213
+ moozy encode slide_1.h5 slide_2.h5 --output case_embedding.h5
214
+ ```
215
+
216
+ ### From raw whole-slide images
217
+
218
+ Pass slide files directly (`.svs`, `.tiff`, `.ndpi`, `.mrxs`, etc.). MOOZY calls [AtlasPatch](https://github.com/AtlasAnalyticsLab/AtlasPatch) under the hood to segment tissue, extract patches, and compute features. Requires `atlas-patch`, `sam2`, and the OpenSlide system library (see the [AtlasPatch installation guide](https://github.com/AtlasAnalyticsLab/AtlasPatch#installation)).
219
+
220
+ ```bash
221
+ moozy encode slide_1.svs slide_2.svs --output case_embedding.h5 --target_mag 20
222
+ ```
223
+
224
+ ### Python API
225
+
226
+ ```python
227
+ from moozy.encoding import run_encoding
228
+
229
+ # From H5 feature files
230
+ run_encoding(
231
+ slide_paths=["slide_1.h5", "slide_2.h5"],
232
+ output_path="case_embedding.h5",
233
+ )
234
+
235
+ # From raw slides
236
+ run_encoding(
237
+ slide_paths=["slide_1.svs", "slide_2.svs"],
238
+ output_path="case_embedding.h5",
239
+ target_mag=20,
240
+ )
241
+ ```
242
+
243
+ ### Arguments
244
+
245
+ | Argument | Default | Description |
246
+ |----------|---------|-------------|
247
+ | `SLIDES` | (required) | One or more H5 feature files or raw slide files forming a single case. Cannot mix the two types. |
248
+ | `--output`, `-o` | (required) | Output H5 file path. |
249
+ | `--mixed_precision` | off | Enable bfloat16 mixed precision. |
250
+ | `--target_mag` | 20 | Magnification for patch extraction from raw slides. Ignored for H5. |
251
+ | `--step_size` | 224 | Stride between patch centers in pixels. Set < 224 for overlap. Ignored for H5. |
252
+ | `--mpp_csv` | - | CSV with `wsi,mpp` columns for microns-per-pixel overrides. Ignored for H5. |
253
+
254
+ ### Output format
255
+
256
+ The output H5 file contains a `features` dataset (768-D float32 case embedding) and a `coords` dataset with slide metadata.
257
+
258
+ ## Architecture
259
+
260
+ | Component | Architecture | Params | Output dim |
261
+ |-----------|-------------|--------|------------|
262
+ | Patch encoder | ViT-S/8 (Lunit DINO) | 21.67M | 384 |
263
+ | Slide encoder | ViT, 6 layers, 768-D, 12 heads, 2D ALiBi | 42.8M | 768 |
264
+ | Case transformer | 3 layers, 12 heads | 21.3M | 768 |
265
+
266
+ ## Tasks
267
+
268
+ This repository includes 333 task definitions in the `tasks/` directory. Each task has a `config.yaml` (task type, organ, label mapping) and a `task.csv` (annotations and splits). The tasks cover 205 classification and 128 survival endpoints across 32 TCGA cohorts, 14 CPTAC cohorts, the REG dataset, and other public sources.
269
+
270
+ ## Citation
271
+
272
+ ```bibtex
273
+ @article{moozy,
274
+ title = {MOOZY: A Patient-First Foundation Model for Computational Pathology},
275
+ author = {TODO},
276
+ year = {TODO},
277
+ }
278
+ ```
279
+
280
+ ## License
281
+
282
+ [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Research and non-commercial use only.
assets/data_scale_overview.png ADDED

Git LFS Details

  • SHA256: b7688172dd32c46cf6c3ceb9fa7860137354eafc79ad0def5bf0b1d799ebe230
  • Pointer size: 131 Bytes
  • Size of remote file: 372 kB