File size: 10,853 Bytes
5bf14d7
80e9bfc
 
5bf14d7
 
 
 
 
 
 
 
 
 
80e9bfc
c333b74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d36f1de
c333b74
 
 
 
 
 
 
 
 
d36f1de
 
 
 
b4cfeaa
c333b74
 
 
 
 
 
 
 
 
 
 
 
 
 
c368846
c333b74
c368846
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c333b74
 
c368846
 
 
 
 
 
 
 
c333b74
 
c368846
 
 
 
 
c333b74
 
 
 
 
 
0737504
c333b74
 
 
 
 
c368846
b4cfeaa
c368846
 
b4cfeaa
 
 
 
 
 
 
 
 
 
0737504
b4cfeaa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0737504
b4cfeaa
 
 
 
 
 
 
 
 
 
 
 
 
 
c368846
 
 
 
 
 
 
 
 
 
 
 
 
b4cfeaa
 
 
 
 
 
c368846
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c333b74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4cfeaa
 
 
c333b74
 
 
 
 
 
 
b4cfeaa
0737504
b4cfeaa
 
 
 
 
 
 
c333b74
 
 
 
7b5cfb3
c333b74
 
 
 
 
 
e2a56c2
c368846
5f524ef
c333b74
 
 
c368846
7b5cfb3
 
 
 
 
 
 
 
0737504
7b5cfb3
 
 
 
 
 
 
 
 
5bf14d7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
---
license: apache-2.0
pipeline_tag: image-segmentation
tags:
- cell-classification
- cell-detection
- digital-pathology
- whole-slide-image
- H&E
- computational-pathology
- pytorch
---
                  

# HNE2Cell β€” H&E Whole-Slide Image Cell Detection & Classification

HNE2Cell detects and classifies **16 cell types** from H&E-stained whole-slide images (WSI).
It takes 256Γ—256 px patches as input and outputs per-cell contours, centroids, and type labels.

## Cell Types (16 classes)

| ID | Cell Type | Color |
|----|-----------|-------|
| 0 | Background | β€” |
| 1 | **Malignant** | πŸ”΄ Red |
| 2 | CD4 T | πŸ”΅ Dodger Blue |
| 3 | CD8 T | πŸ”΅ Royal Blue |
| 4 | B | πŸ”΅ Blue |
| 5 | Plasma | πŸ”΅ Cornflower Blue |
| 6 | Macrophage | πŸ”΅ Powder Blue |
| 7 | Myeloid | πŸ”΅ Steel Blue |
| 8 | DC | πŸ”΅ Deep Sky Blue |
| 9 | Fibroblast | 🟒 Forest Green |
| 10 | Endothelial | 🟒 Medium Sea Green |
| 11 | Pericyte | 🟒 Lime Green |
| 12 | Epithelial | 🟠 Dark Orange |
| 13 | Immune_Other | βšͺ Light Blue |
| 14 | Stromal_Other | 🟀 Olive Drab |
| 15 | Dead | ⚫ Grey |

## Pipeline Overview

The full pipeline consists of three steps:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Normalize β”‚ ──→   β”‚  2. Patchify  β”‚ ──→  β”‚  3. Inference   β”‚
β”‚   (Reinhard) β”‚       β”‚ (256px, 64ov) β”‚     β”‚ (Cell Detection)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    SVS / TIF             PNG patches          Masks + Centroids
```

### Magnification

| | 40x (Recommended) | 20x (Supported) |
|---|---|---|
| **Accuracy** | Best β€” fine-grained cell boundaries | Good β€” may miss small immune cells |
| **Speed** | More patches per slide | Fewer patches, faster |
| **Use when** | Immune cell subtyping matters | Quick screening / large cohorts |

> **40x is strongly recommended.** The model was primarily trained on 40x data. 20x works but expect reduced precision for small cells (lymphocytes, DCs).

---

## System Requirements

### Software dependencies (tested versions)

Core packages (as reported in the manuscript):

- Python 3.10
- pytorch == 2.5.1
- timm == 1.0.8
- transformers == 4.44.0
- scanpy == 1.10.3
- squidpy == 1.5.0
- spatialdata == 0.2.5
- scikit-image == 0.24.0
- scikit-learn == 1.2.2
- scipy == 1.13.1
- shapely == 2.0.7

Additional utilities required by the pipeline scripts:

- torchvision (matching the PyTorch 2.5.1 release)
- tifffile, Pillow, opencv-python-headless, pandas, tqdm
- huggingface_hub
- openslide-python (optional, for `.svs` files)

### Operating systems tested

- Ubuntu 22.04 LTS
- Ubuntu 20.04 LTS

(Not tested on Windows/macOS.)

### Hardware requirements

> **Note:** WSI processing is memory-intensive. This pipeline is designed for
> server- or workstation-class hardware, not standard desktops.

**Minimum (small WSIs, ~1–2 GB):**
- GPU: NVIDIA GPU with β‰₯12 GB VRAM
- RAM: 32 GB (64 GB strongly recommended)
- Disk: 100 GB free

**Recommended (typical WSIs, 2–10 GB):**
- GPU: NVIDIA A100 / RTX 4090 / RTX 3090 (β‰₯24 GB VRAM)
- RAM: β‰₯128 GB
- Disk: 500 GB+ free (intermediate `Aligned-hne.tif` can be 20–50 GB per slide)

**Tested configurations:**
- NVIDIA A100 (40 GB VRAM), 256 GB RAM, Ubuntu 22.04
- NVIDIA RTX 3060 (12 GB VRAM), 64 GB RAM, Ubuntu 22.04

CPU-only inference is not supported in practice β€” full WSI inference would take
days even on a high-core-count CPU.

---

## Installation Guide

### Recommended: Conda environment from `cellvit_rv3.yml`

The repository includes a frozen conda environment file with all dependencies pinned
to the exact versions used in the manuscript.

```bash
# 1. Download environment file
wget https://huggingface.co/roobee79/HNE2Cell/resolve/main/cellvit_rv3.yml

# 2. Create environment
conda env create -f cellvit_rv3.yml

# 3. Activate
conda activate cellvit_rv3
```

**Typical install time:** ~10–15 minutes on a Linux server with a stable network connection
(dominated by the PyTorch + CUDA toolkit download).


### Download the model

```python
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="roobee79/HNE2Cell",
    filename="HNE2cell_pub_patch73_jit.pt"
)
```

---

## Demo: Reproducible Walkthrough

To verify your installation, run the pipeline on the example slide included in this
repository (`TCGA-56-8628-01Z-00-DX1`, LUSC, ~36 MB).

### Download the model, example slide, and reference image

```python
from huggingface_hub import hf_hub_download

REPO_ID = "roobee79/HNE2Cell"

model_path = hf_hub_download(
    repo_id=REPO_ID,
    filename="HNE2cell_pub_patch73_jit.pt"
)

slide_path = hf_hub_download(
    repo_id=REPO_ID,
    filename="TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs"
)

ref_path = hf_hub_download(
    repo_id=REPO_ID,
    filename="standard-ilc.tif"
)
```

### Run the pipeline

```bash
# Place the downloaded slide in a working directory
mkdir -p example/slides
cp <slide_path> example/slides/

# Step 1: Normalize
python normalize.py \
    --input_dir ./example/slides \
    --target ./standard-ilc.tif

# Step 2: Patchify at 40x
python patchify.py \
    --input_dir ./example/slides \
    --magnification 40 \
    --patch_size 256 \
    --overlap 64 \
    --workers 8

# Step 3: Inference
python inference.py \
    --input_dir ./example/slides/TCGA-56-8628-01Z-00-DX1.../patches \
    --output_dir ./example/results \
    --model_path ./HNE2cell_pub_patch73_jit.pt \
    --magnification 40 \
    --batch_size 32
```

### Expected output

```
example/results/
β”œβ”€β”€ Aligned-hne.tif        # Normalized full-resolution H&E
β”œβ”€β”€ Aligned-hne.jpg        # 4Γ— preview
β”œβ”€β”€ patch_*_mask.png       # Per-patch cell type masks
└── patch_*_centroid.csv   # Cell centroids with type labels
```

**Expected results on the example slide (`TCGA-56-8628-01Z-00-DX1`):**
Approximately **63,000 cells** are detected across the 16 classes.
Small variation (Β±a few percent) is expected between hardware configurations.

### Expected runtime

| Hardware | Full pipeline runtime |
|---|---|
| NVIDIA A100 (40 GB) + 256 GB RAM | ~20 min |
| NVIDIA RTX 3060 (12 GB) + 64 GB RAM | ~30 min |

A system without sufficient RAM (<32 GB) will fail at the normalization step
due to full-resolution image loading.

> The example slide is from **TCGA-LUSC** and is redistributed under the
> [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy).

---

## Instructions for Use (On Your Own Data)

```bash
# Step 1: Color normalization (Reinhard method)
python normalize.py \
    --input_dir /path/to/slides \
    --target /path/to/standard-ilc.tif

# Step 2: Extract patches (40x recommended)
python patchify.py \
    --input_dir /path/to/slides \
    --magnification 40 \
    --patch_size 256 \
    --overlap 64 \
    --workers 8

# Step 3: Cell detection & classification
python inference.py \
    --input_dir /path/to/patch_folders \
    --output_dir /path/to/results \
    --model_path ./HNE2cell_all_patch73_jit.pt \
    --magnification 40 \
    --batch_size 32
```

---

## Input / Output Details

### Input

| Step | Input | Format |
|------|-------|--------|
| Normalize | Raw WSI | `.svs`, `.tif`, `.tiff`, `.ndpi` |
| Patchify | Normalized image | `Aligned-hne.tif` (from Step 1) |
| Inference | Patches | `256Γ—256 px` PNG files |

### Output

| File | Description |
|------|-------------|
| `Aligned-hne.tif` | Full-resolution normalized H&E image |
| `Aligned-hne.jpg` | 4Γ— downsampled preview |
| `recon.tif` | Tissue-only reconstruction (intermediate) |
| `*_mask.png` | Per-patch cell segmentation mask (colored by type) |
| `*_centroid.csv` | Cell centroids with columns: `slide_id, x, y, celltype, celltype_name` |

### Centroid CSV format

```csv
slide_id,x,y,celltype,celltype_name
patch_0_0,112.3,87.5,1,Malignant
patch_0_0,45.1,201.2,2,CD4T
...
```

To convert patch-local coordinates to WSI-global coordinates:

```python
# Parse patch filename: {prefix}_{x_offset}_{y_offset}.png
x_global = x + x_offset
y_global = y + y_offset
```

---

## Normalization Reference Image

The pipeline uses Reinhard color normalization in LAB color space. You need a reference image
(`standard-ilc.tif`) that represents your target stain appearance. The reference image is
included in this repository, or you can supply your own.

---

## File Structure

```
HNE2Cell/
β”œβ”€β”€ README.md                                                    # This file
β”œβ”€β”€ HNE2cell_pub_patch73_jit.pt                                  # TorchScript model
β”œβ”€β”€ normalize.py                                                 # Step 1: Reinhard normalization
β”œβ”€β”€ patchify.py                                                  # Step 2: Patch extraction
β”œβ”€β”€ inference.py                                                 # Step 3: Model inference
β”œβ”€β”€ post_processing.py                                           # Cell post-processing module
β”œβ”€β”€ tools.py                                                     # Utility functions
β”œβ”€β”€ standard-ilc.tif                                             # Reference image for normalization
└── TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs   # Example slide (TCGA-LUSC)
```

---

  
## Citation

If you use HNE2Cell in your research, please cite:

```bibtex
@misc{hne2cell,
  title={Spatial transcriptomics–supervised deep learning enables single-cell mapping of tumor immune architecture from routine histology},
  year={2026},
  url={https://huggingface.co/roobee79/HNE2Cell}
}
```

The example slide is derived from data generated by the TCGA:
<https://portal.gdc.cancer.gov/>.


## License

This repository uses a dual licensing scheme commonly adopted in academic ML/biomedical projects (e.g., SAM, LLaMA):

- **Source code** (`.py` files): Released under the **MIT License**. See `LICENSE`.
- **Model weights** (`HNE2cell_pub_patch73_jit.pt`): Released under 
  **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)**. 
  See `MODEL_LICENSE`. Free for academic and non-commercial research use.
  
  For commercial licensing, please contact:  
  **Ewha University-Industry Collaboration Foundation**  
  Technology Commercialization Team  
  Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul, Republic of Korea  
  Web: <https://research.ewha.ac.kr> | <https://epm.ewha.ac.kr>
- **Example slide** (`TCGA-56-8628-*.svs`): Derived from TCGA-LUSC, governed by 
  the [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy).