|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: timm |
|
|
tags: |
|
|
- histopathology |
|
|
- pathology |
|
|
- dino |
|
|
- vision-transformer |
|
|
- prostate |
|
|
- feature-extraction |
|
|
pipeline_tag: image-feature-extraction |
|
|
--- |
|
|
|
|
|
# Prost40M |
|
|
|
|
|
**Prost40M** is a prostatectomy-specific foundation model pretrained with DINO on a large corpus of H&E prostatectomy slides. |
|
|
It is designed as a strong feature extractor for computational pathology tasks where subtle prostate-specific morphology matters. |
|
|
|
|
|
|
|
|
## Model At a Glance |
|
|
|
|
|
| Field | Value | |
|
|
| --- | --- | |
|
|
| Model name | Prost40M | |
|
|
| Backbone architecture | `vit_small` | |
|
|
| Input size | `224 x 224` | |
|
|
| Patch size | `14` | |
|
|
| Embedding dimension | `384` | |
|
|
| Released weights | Teacher backbone encoder | |
|
|
| Domain | H&E prostatectomy histopathology | |
|
|
|
|
|
## Quickstart |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import timm |
|
|
from PIL import Image |
|
|
from timm.data import resolve_data_config |
|
|
from timm.data.transforms_factory import create_transform |
|
|
|
|
|
model = timm.create_model("hf-hub:waticlems/Prost40M", pretrained=True) |
|
|
model.eval() |
|
|
|
|
|
transform = create_transform(**resolve_data_config(model.pretrained_cfg, model=model)) |
|
|
|
|
|
img = Image.open("tile.png").convert("RGB") |
|
|
x = transform(img).unsqueeze(0) |
|
|
with torch.inference_mode(): |
|
|
embedding = model(x) # shape: [1, 384] |
|
|
print(embedding.shape) |
|
|
``` |
|
|
|
|
|
## Motivation |
|
|
|
|
|
Large pathology foundation models are typically trained on broad, multi-organ |
|
|
data. Their generic features transfer well across many settings, but can be less |
|
|
sensitive to fine-grained morphology of a specific organ. Prost40M was developed |
|
|
to evaluate the value of organ-specific pretraining in prostate histopathology. |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- Approx. 40 million image tiles at `0.50` microns per pixel |
|
|
- 1888 H&E-stained prostatectomy slides |
|
|
- 449 slides from 403 patients in the TCGA-PRAD cohort |
|
|
- 1439 slides from 508 patients in the LEOPARD cohort |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
- Tile-level feature extraction for downstream prostate histopathology tasks |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Performance can degrade under domain shift (scanner, stain protocol, center) |
|
|
- Learned representations reflect dataset composition and preprocessing choices |
|
|
|
|
|
## License |
|
|
|
|
|
Apache-2.0 |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use **Prost40M**, cite: |
|
|
|
|
|
- _citation to be added soon_ |
|
|
|