Initial migration from moritzschaefer/spatialwhisperer (analysis e48e9670)
Browse files- README.md +106 -0
- spatialwhisperer.ckpt +3 -0
README.md
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
tags:
|
| 4 |
+
- histopathology
|
| 5 |
+
- spatial-transcriptomics
|
| 6 |
+
- multimodal
|
| 7 |
+
- vision-language
|
| 8 |
+
- clip
|
| 9 |
+
- cell-type-annotation
|
| 10 |
+
library_name: pytorch
|
| 11 |
+
pipeline_tag: zero-shot-image-classification
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# SpatialWhisperer
|
| 15 |
+
|
| 16 |
+
SpatialWhisperer is a trimodal embedding model that aligns hematoxylin & eosin (H&E) image patches, gene-expression profiles, and free-text descriptions into a shared 2048-dimensional space. It enables zero-shot cell-type annotation of H&E patches and natural-language querying over histopathology and spatial transcriptomics data.
|
| 17 |
+
|
| 18 |
+
This repository hosts the main checkpoint (seed=0) from the ICML 2026 paper *Trimodal Learning Enhances Zero-Shot Histopathology Annotation* (anonymized name `\ourmethod`).
|
| 19 |
+
|
| 20 |
+
## Model architecture
|
| 21 |
+
|
| 22 |
+
Three encoders project into a shared embedding space:
|
| 23 |
+
|
| 24 |
+
| Modality | Encoder | Freezing |
|
| 25 |
+
|----------------|------------------|----------|
|
| 26 |
+
| Image (H&E) | UNI2 | locked |
|
| 27 |
+
| Transcriptome | Geneformer (12L) | locked |
|
| 28 |
+
| Text | BioBERT v1.1 | unfrozen |
|
| 29 |
+
|
| 30 |
+
Following LiT convention, the freezing pattern is **LUL** (image locked, text unlocked, transcriptome locked). Only the text tower and the three projection heads are trained. Projection dimension is 2048.
|
| 31 |
+
|
| 32 |
+
## Training data
|
| 33 |
+
|
| 34 |
+
Three paired datasets cover the three modality pairs:
|
| 35 |
+
|
| 36 |
+
- **HEST-1K** — H&E ↔ spatial gene expression (Visium-style spots)
|
| 37 |
+
- **cellxgene_census** — gene expression ↔ free-text cell/sample metadata
|
| 38 |
+
- **ARCHS4/GEO** — gene expression ↔ free-text sample descriptions
|
| 39 |
+
|
| 40 |
+
Training was 4 epochs with AdamW at learning rate 1e-5 and cosine schedule (warmup 3%), batch size 512, on a single H100 GPU. This checkpoint reflects epoch 3, global step 14624.
|
| 41 |
+
|
| 42 |
+
## Evaluation
|
| 43 |
+
|
| 44 |
+
Reported AUROC on cell-type benchmarks (mean across cell types):
|
| 45 |
+
|
| 46 |
+
| Benchmark | SpatialWhisperer | Best published baseline | Δ rel. |
|
| 47 |
+
|-----------|------------------|-------------------------|---------|
|
| 48 |
+
| PathoCell | **0.630** | 0.554 | +13.7% |
|
| 49 |
+
| Lizard | (see paper) | — | +15.9% |
|
| 50 |
+
| PanNuke | (see paper) | — | +13.7% |
|
| 51 |
+
|
| 52 |
+
Modality-pair benchmarks (Tabula Sapiens, HEST-1K, Skin Conditions) confirm the trimodal model retains per-pair performance under low-n subsampling. See the paper for full numbers.
|
| 53 |
+
|
| 54 |
+
## How to use
|
| 55 |
+
|
| 56 |
+
The checkpoint is a stripped Lightning state-dict (~505 MB, 236 tensors covering the trained BioBERT text tower and the three 2048-d projection heads) plus its `hyper_parameters` block. **Foundation model weights are NOT included** — the locked UNI2 image encoder and locked Geneformer transcriptome encoder are re-instantiated at load time from their original providers (and remain under their respective licenses). The ckpt's `hyper_parameters.model_config.use_cache = True` flag triggers the `FrozenCachedModel` wrapping that excludes the locked towers from `state_dict` during load.
|
| 57 |
+
|
| 58 |
+
Loading requires the cellwhisperer code at <https://github.com/Good-Lab/spatialwhisperer> (model code) and the foundation models (UNI2, Geneformer, BioBERT v1.1), which are downloaded by the cellwhisperer setup scripts.
|
| 59 |
+
|
| 60 |
+
```python
|
| 61 |
+
from cellwhisperer.utils.model_io import load_cellwhisperer_model
|
| 62 |
+
|
| 63 |
+
model, tokenizer, transcriptome_proc, image_proc = load_cellwhisperer_model(
|
| 64 |
+
model_path="hf://Good-Lab/spatialwhisperer"
|
| 65 |
+
)
|
| 66 |
+
# model is a TranscriptomeTextDualEncoderLightning in eval mode
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
While the repo is private, export a token first:
|
| 70 |
+
|
| 71 |
+
```bash
|
| 72 |
+
export HUGGINGFACE_TOKEN=$(pass api_keys/huggingface_write) # or any read token with access
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
To compute image–text similarities for zero-shot cell-type annotation, encode patches and class-name strings, then take cosine similarity in the shared 2048-d space. See `examples/zero_shot_celltype.py` in the model code repository.
|
| 76 |
+
|
| 77 |
+
## Intended use & limitations
|
| 78 |
+
|
| 79 |
+
**Intended.** Research on multimodal histopathology, cell-type annotation, spatial transcriptomics analysis, and natural-language querying over H&E and gene-expression data.
|
| 80 |
+
|
| 81 |
+
**Not intended.** Clinical diagnosis or treatment decisions. The model was trained on academic datasets and is not validated for clinical use.
|
| 82 |
+
|
| 83 |
+
**Known limitations.**
|
| 84 |
+
- Trained on Visium-scale spots (~55 μm); finer-grained image–expression alignment is not guaranteed.
|
| 85 |
+
- BioBERT vocabulary constrains the text tower; rare technical terms may be out-of-distribution.
|
| 86 |
+
- The image tower (UNI2) is locked; performance on tissue types poorly represented in UNI2's pretraining will be lower.
|
| 87 |
+
|
| 88 |
+
## File contents
|
| 89 |
+
|
| 90 |
+
- `spatialwhisperer.ckpt` — Lightning checkpoint (state_dict + hyper_parameters; optimizer/scheduler state stripped).
|
| 91 |
+
- `README.md` — this card.
|
| 92 |
+
|
| 93 |
+
## Citation
|
| 94 |
+
|
| 95 |
+
```bibtex
|
| 96 |
+
@inproceedings{schaefer2026spatialwhisperer,
|
| 97 |
+
title = {Trimodal Learning Enhances Zero-Shot Histopathology Annotation},
|
| 98 |
+
author = {Schaefer, Moritz and others},
|
| 99 |
+
booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
|
| 100 |
+
year = {2026},
|
| 101 |
+
}
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
## License
|
| 105 |
+
|
| 106 |
+
CC BY-NC 4.0 (research use). Foundation model weights (UNI2, Geneformer, BioBERT) carry their own licenses; please consult upstream repositories.
|
spatialwhisperer.ckpt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ee4e5afb0d8b6b8776f66b8b7623660ecf8864cb0cbbbc70ed69326322b4ed48
|
| 3 |
+
size 529993226
|