File size: 2,227 Bytes

---
license: apache-2.0
library_name: timm
tags:
  - histopathology
  - pathology
  - dino
  - vision-transformer
  - prostate
  - feature-extraction
pipeline_tag: image-feature-extraction
---

# Prost40M

**Prost40M** is a prostatectomy-specific foundation model pretrained with DINO on a large corpus of H&E prostatectomy slides.  
It is designed as a strong feature extractor for computational pathology tasks where subtle prostate-specific morphology matters.


## Model At a Glance

| Field | Value |
| --- | --- |
| Model name | Prost40M |
| Backbone architecture | `vit_small` |
| Input size | `224 x 224` |
| Patch size | `14` |
| Embedding dimension | `384` |
| Released weights | Teacher backbone encoder |
| Domain | H&E prostatectomy histopathology |

## Quickstart

```python
import torch
import timm
from PIL import Image
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform

model = timm.create_model("hf-hub:waticlems/Prost40M", pretrained=True)
model.eval()

transform = create_transform(**resolve_data_config(model.pretrained_cfg, model=model))

img = Image.open("tile.png").convert("RGB")
x = transform(img).unsqueeze(0)
with torch.inference_mode():
    embedding = model(x)  # shape: [1, 384]
print(embedding.shape)
```

## Motivation

Large pathology foundation models are typically trained on broad, multi-organ
data. Their generic features transfer well across many settings, but can be less
sensitive to fine-grained morphology of a specific organ. Prost40M was developed
to evaluate the value of organ-specific pretraining in prostate histopathology.

## Training Data

- Approx. 40 million image tiles at `0.50` microns per pixel
- 1888 H&E-stained prostatectomy slides
  - 449 slides from 403 patients in the TCGA-PRAD cohort
  - 1439 slides from 508 patients in the LEOPARD cohort

## Intended Use

- Tile-level feature extraction for downstream prostate histopathology tasks

## Limitations

- Performance can degrade under domain shift (scanner, stain protocol, center)
- Learned representations reflect dataset composition and preprocessing choices

## License

Apache-2.0

## Citation

If you use **Prost40M**, cite:

- _citation to be added soon_