Good-Lab commited on
Commit
3caa347
·
verified ·
1 Parent(s): 10ad5ec

Initial migration from moritzschaefer/spatialwhisperer (analysis e48e9670)

Browse files
Files changed (2) hide show
  1. README.md +106 -0
  2. spatialwhisperer.ckpt +3 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - histopathology
5
+ - spatial-transcriptomics
6
+ - multimodal
7
+ - vision-language
8
+ - clip
9
+ - cell-type-annotation
10
+ library_name: pytorch
11
+ pipeline_tag: zero-shot-image-classification
12
+ ---
13
+
14
+ # SpatialWhisperer
15
+
16
+ SpatialWhisperer is a trimodal embedding model that aligns hematoxylin & eosin (H&E) image patches, gene-expression profiles, and free-text descriptions into a shared 2048-dimensional space. It enables zero-shot cell-type annotation of H&E patches and natural-language querying over histopathology and spatial transcriptomics data.
17
+
18
+ This repository hosts the main checkpoint (seed=0) from the ICML 2026 paper *Trimodal Learning Enhances Zero-Shot Histopathology Annotation* (anonymized name `\ourmethod`).
19
+
20
+ ## Model architecture
21
+
22
+ Three encoders project into a shared embedding space:
23
+
24
+ | Modality | Encoder | Freezing |
25
+ |----------------|------------------|----------|
26
+ | Image (H&E) | UNI2 | locked |
27
+ | Transcriptome | Geneformer (12L) | locked |
28
+ | Text | BioBERT v1.1 | unfrozen |
29
+
30
+ Following LiT convention, the freezing pattern is **LUL** (image locked, text unlocked, transcriptome locked). Only the text tower and the three projection heads are trained. Projection dimension is 2048.
31
+
32
+ ## Training data
33
+
34
+ Three paired datasets cover the three modality pairs:
35
+
36
+ - **HEST-1K** — H&E ↔ spatial gene expression (Visium-style spots)
37
+ - **cellxgene_census** — gene expression ↔ free-text cell/sample metadata
38
+ - **ARCHS4/GEO** — gene expression ↔ free-text sample descriptions
39
+
40
+ Training was 4 epochs with AdamW at learning rate 1e-5 and cosine schedule (warmup 3%), batch size 512, on a single H100 GPU. This checkpoint reflects epoch 3, global step 14624.
41
+
42
+ ## Evaluation
43
+
44
+ Reported AUROC on cell-type benchmarks (mean across cell types):
45
+
46
+ | Benchmark | SpatialWhisperer | Best published baseline | Δ rel. |
47
+ |-----------|------------------|-------------------------|---------|
48
+ | PathoCell | **0.630** | 0.554 | +13.7% |
49
+ | Lizard | (see paper) | — | +15.9% |
50
+ | PanNuke | (see paper) | — | +13.7% |
51
+
52
+ Modality-pair benchmarks (Tabula Sapiens, HEST-1K, Skin Conditions) confirm the trimodal model retains per-pair performance under low-n subsampling. See the paper for full numbers.
53
+
54
+ ## How to use
55
+
56
+ The checkpoint is a stripped Lightning state-dict (~505 MB, 236 tensors covering the trained BioBERT text tower and the three 2048-d projection heads) plus its `hyper_parameters` block. **Foundation model weights are NOT included** — the locked UNI2 image encoder and locked Geneformer transcriptome encoder are re-instantiated at load time from their original providers (and remain under their respective licenses). The ckpt's `hyper_parameters.model_config.use_cache = True` flag triggers the `FrozenCachedModel` wrapping that excludes the locked towers from `state_dict` during load.
57
+
58
+ Loading requires the cellwhisperer code at <https://github.com/Good-Lab/spatialwhisperer> (model code) and the foundation models (UNI2, Geneformer, BioBERT v1.1), which are downloaded by the cellwhisperer setup scripts.
59
+
60
+ ```python
61
+ from cellwhisperer.utils.model_io import load_cellwhisperer_model
62
+
63
+ model, tokenizer, transcriptome_proc, image_proc = load_cellwhisperer_model(
64
+ model_path="hf://Good-Lab/spatialwhisperer"
65
+ )
66
+ # model is a TranscriptomeTextDualEncoderLightning in eval mode
67
+ ```
68
+
69
+ While the repo is private, export a token first:
70
+
71
+ ```bash
72
+ export HUGGINGFACE_TOKEN=$(pass api_keys/huggingface_write) # or any read token with access
73
+ ```
74
+
75
+ To compute image–text similarities for zero-shot cell-type annotation, encode patches and class-name strings, then take cosine similarity in the shared 2048-d space. See `examples/zero_shot_celltype.py` in the model code repository.
76
+
77
+ ## Intended use & limitations
78
+
79
+ **Intended.** Research on multimodal histopathology, cell-type annotation, spatial transcriptomics analysis, and natural-language querying over H&E and gene-expression data.
80
+
81
+ **Not intended.** Clinical diagnosis or treatment decisions. The model was trained on academic datasets and is not validated for clinical use.
82
+
83
+ **Known limitations.**
84
+ - Trained on Visium-scale spots (~55 μm); finer-grained image–expression alignment is not guaranteed.
85
+ - BioBERT vocabulary constrains the text tower; rare technical terms may be out-of-distribution.
86
+ - The image tower (UNI2) is locked; performance on tissue types poorly represented in UNI2's pretraining will be lower.
87
+
88
+ ## File contents
89
+
90
+ - `spatialwhisperer.ckpt` — Lightning checkpoint (state_dict + hyper_parameters; optimizer/scheduler state stripped).
91
+ - `README.md` — this card.
92
+
93
+ ## Citation
94
+
95
+ ```bibtex
96
+ @inproceedings{schaefer2026spatialwhisperer,
97
+ title = {Trimodal Learning Enhances Zero-Shot Histopathology Annotation},
98
+ author = {Schaefer, Moritz and others},
99
+ booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
100
+ year = {2026},
101
+ }
102
+ ```
103
+
104
+ ## License
105
+
106
+ CC BY-NC 4.0 (research use). Foundation model weights (UNI2, Geneformer, BioBERT) carry their own licenses; please consult upstream repositories.
spatialwhisperer.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee4e5afb0d8b6b8776f66b8b7623660ecf8864cb0cbbbc70ed69326322b4ed48
3
+ size 529993226