Whole Slide Concepts: A Supervised Foundation Model For Pathological Images
Paper
• 2507.05742 • Published
Model card for the Whole Slide Concepts model.
First of all, please install mmm by running
pip install medicalmultitaskmodeling, m3-sdk
To download the model run the following commands
from mmm.api.M3Model import M3Model, M3_MODELS, WSC_MTL_TINY
# The downloaded .zip contains the .pt weights for the encoder, squeezer and grouper.
# Load the selected Model
model = M3Model(M3_MODELS[WSC_MTL_TINY])
# Should give you dict_keys(['encoder', 'squeezer', 'grouper'])
print(model.keys())
Here is an example of inference for a Bag of 10 instances. Each instance is passed through the encoder to obtain a pyramid representation. This representation is then compressed by the 'squeezer' Module. Finally, to obtain a single WSI vector, the grouper is used to group the individual instances together. Notice the two arguments in the forward pass. First, all instances are passed. The second argument contains identifiers, which indicate which instances of the batch belong together
# now test the forward pass.
test = torch.rand((10,3,224,224))
with torch.no_grad():
model_in = test.to(model.device)
# We first obtain the pyramid features of the encoder
pyramid = model['encoder'](model_in)
# then pass it to obtain the latent representation of each instance
_, latent = model['squeezer'](pyramid)
latent = torch.nn.Flatten(start_dim=1)(latent)
# And group all instances.
# Notice the second argument in the forward pass. Here we define which instances of the batch belong together.
# In this case, all of the 10 instances belong to the same bag. Therefore we simply use ones.
wsi_vector, attention_per_head = model['grouper'](latent, torch.ones((10,), dtype=torch.long, device=model.device))
# We obtain the latent_wsi vector of shape (1, 768) and the attention weights per head of shape (10, 8)
wsi_vector.shape, attention_per_head.shape
Please Cite
@misc{nicke2026slideconceptssupervisedfoundation,
title={Whole Slide Concepts: A Supervised Foundation Model For Pathological Images},
author={Till Nicke and Daniela Schacherer and Jan Raphael Schäfer and Natalia Artysh and Antje Prasse and André Homeyer and Andrea Schenk and Henning Höfener and Johannes Lotz},
year={2026},
eprint={2507.05742},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2507.05742},
}
@article{schafer2024overcoming,
title={Overcoming data scarcity in biomedical imaging with a foundational multi-task model},
author={Sch{\"a}fer, Raphael and Nicke, Till and H{\"o}fener, Henning and Lange, Annkristin and Merhof, Dorit and Feuerhake, Friedrich and Schulz, Volkmar and Lotz, Johannes and Kiessling, Fabian},
journal={Nature Computational Science},
volume={4},
number={7},
pages={495--509},
year={2024},
publisher={Nature Publishing Group US New York}
}