File size: 10,853 Bytes
5bf14d7 80e9bfc 5bf14d7 80e9bfc c333b74 d36f1de c333b74 d36f1de b4cfeaa c333b74 c368846 c333b74 c368846 c333b74 c368846 c333b74 c368846 c333b74 0737504 c333b74 c368846 b4cfeaa c368846 b4cfeaa 0737504 b4cfeaa 0737504 b4cfeaa c368846 b4cfeaa c368846 c333b74 b4cfeaa c333b74 b4cfeaa 0737504 b4cfeaa c333b74 7b5cfb3 c333b74 e2a56c2 c368846 5f524ef c333b74 c368846 7b5cfb3 0737504 7b5cfb3 5bf14d7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | ---
license: apache-2.0
pipeline_tag: image-segmentation
tags:
- cell-classification
- cell-detection
- digital-pathology
- whole-slide-image
- H&E
- computational-pathology
- pytorch
---
# HNE2Cell β H&E Whole-Slide Image Cell Detection & Classification
HNE2Cell detects and classifies **16 cell types** from H&E-stained whole-slide images (WSI).
It takes 256Γ256 px patches as input and outputs per-cell contours, centroids, and type labels.
## Cell Types (16 classes)
| ID | Cell Type | Color |
|----|-----------|-------|
| 0 | Background | β |
| 1 | **Malignant** | π΄ Red |
| 2 | CD4 T | π΅ Dodger Blue |
| 3 | CD8 T | π΅ Royal Blue |
| 4 | B | π΅ Blue |
| 5 | Plasma | π΅ Cornflower Blue |
| 6 | Macrophage | π΅ Powder Blue |
| 7 | Myeloid | π΅ Steel Blue |
| 8 | DC | π΅ Deep Sky Blue |
| 9 | Fibroblast | π’ Forest Green |
| 10 | Endothelial | π’ Medium Sea Green |
| 11 | Pericyte | π’ Lime Green |
| 12 | Epithelial | π Dark Orange |
| 13 | Immune_Other | βͺ Light Blue |
| 14 | Stromal_Other | π€ Olive Drab |
| 15 | Dead | β« Grey |
## Pipeline Overview
The full pipeline consists of three steps:
```
βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ
β 1. Normalize β βββ β 2. Patchify β βββ β 3. Inference β
β (Reinhard) β β (256px, 64ov) β β (Cell Detection)β
βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ
SVS / TIF PNG patches Masks + Centroids
```
### Magnification
| | 40x (Recommended) | 20x (Supported) |
|---|---|---|
| **Accuracy** | Best β fine-grained cell boundaries | Good β may miss small immune cells |
| **Speed** | More patches per slide | Fewer patches, faster |
| **Use when** | Immune cell subtyping matters | Quick screening / large cohorts |
> **40x is strongly recommended.** The model was primarily trained on 40x data. 20x works but expect reduced precision for small cells (lymphocytes, DCs).
---
## System Requirements
### Software dependencies (tested versions)
Core packages (as reported in the manuscript):
- Python 3.10
- pytorch == 2.5.1
- timm == 1.0.8
- transformers == 4.44.0
- scanpy == 1.10.3
- squidpy == 1.5.0
- spatialdata == 0.2.5
- scikit-image == 0.24.0
- scikit-learn == 1.2.2
- scipy == 1.13.1
- shapely == 2.0.7
Additional utilities required by the pipeline scripts:
- torchvision (matching the PyTorch 2.5.1 release)
- tifffile, Pillow, opencv-python-headless, pandas, tqdm
- huggingface_hub
- openslide-python (optional, for `.svs` files)
### Operating systems tested
- Ubuntu 22.04 LTS
- Ubuntu 20.04 LTS
(Not tested on Windows/macOS.)
### Hardware requirements
> **Note:** WSI processing is memory-intensive. This pipeline is designed for
> server- or workstation-class hardware, not standard desktops.
**Minimum (small WSIs, ~1β2 GB):**
- GPU: NVIDIA GPU with β₯12 GB VRAM
- RAM: 32 GB (64 GB strongly recommended)
- Disk: 100 GB free
**Recommended (typical WSIs, 2β10 GB):**
- GPU: NVIDIA A100 / RTX 4090 / RTX 3090 (β₯24 GB VRAM)
- RAM: β₯128 GB
- Disk: 500 GB+ free (intermediate `Aligned-hne.tif` can be 20β50 GB per slide)
**Tested configurations:**
- NVIDIA A100 (40 GB VRAM), 256 GB RAM, Ubuntu 22.04
- NVIDIA RTX 3060 (12 GB VRAM), 64 GB RAM, Ubuntu 22.04
CPU-only inference is not supported in practice β full WSI inference would take
days even on a high-core-count CPU.
---
## Installation Guide
### Recommended: Conda environment from `cellvit_rv3.yml`
The repository includes a frozen conda environment file with all dependencies pinned
to the exact versions used in the manuscript.
```bash
# 1. Download environment file
wget https://huggingface.co/roobee79/HNE2Cell/resolve/main/cellvit_rv3.yml
# 2. Create environment
conda env create -f cellvit_rv3.yml
# 3. Activate
conda activate cellvit_rv3
```
**Typical install time:** ~10β15 minutes on a Linux server with a stable network connection
(dominated by the PyTorch + CUDA toolkit download).
### Download the model
```python
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="roobee79/HNE2Cell",
filename="HNE2cell_pub_patch73_jit.pt"
)
```
---
## Demo: Reproducible Walkthrough
To verify your installation, run the pipeline on the example slide included in this
repository (`TCGA-56-8628-01Z-00-DX1`, LUSC, ~36 MB).
### Download the model, example slide, and reference image
```python
from huggingface_hub import hf_hub_download
REPO_ID = "roobee79/HNE2Cell"
model_path = hf_hub_download(
repo_id=REPO_ID,
filename="HNE2cell_pub_patch73_jit.pt"
)
slide_path = hf_hub_download(
repo_id=REPO_ID,
filename="TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs"
)
ref_path = hf_hub_download(
repo_id=REPO_ID,
filename="standard-ilc.tif"
)
```
### Run the pipeline
```bash
# Place the downloaded slide in a working directory
mkdir -p example/slides
cp <slide_path> example/slides/
# Step 1: Normalize
python normalize.py \
--input_dir ./example/slides \
--target ./standard-ilc.tif
# Step 2: Patchify at 40x
python patchify.py \
--input_dir ./example/slides \
--magnification 40 \
--patch_size 256 \
--overlap 64 \
--workers 8
# Step 3: Inference
python inference.py \
--input_dir ./example/slides/TCGA-56-8628-01Z-00-DX1.../patches \
--output_dir ./example/results \
--model_path ./HNE2cell_pub_patch73_jit.pt \
--magnification 40 \
--batch_size 32
```
### Expected output
```
example/results/
βββ Aligned-hne.tif # Normalized full-resolution H&E
βββ Aligned-hne.jpg # 4Γ preview
βββ patch_*_mask.png # Per-patch cell type masks
βββ patch_*_centroid.csv # Cell centroids with type labels
```
**Expected results on the example slide (`TCGA-56-8628-01Z-00-DX1`):**
Approximately **63,000 cells** are detected across the 16 classes.
Small variation (Β±a few percent) is expected between hardware configurations.
### Expected runtime
| Hardware | Full pipeline runtime |
|---|---|
| NVIDIA A100 (40 GB) + 256 GB RAM | ~20 min |
| NVIDIA RTX 3060 (12 GB) + 64 GB RAM | ~30 min |
A system without sufficient RAM (<32 GB) will fail at the normalization step
due to full-resolution image loading.
> The example slide is from **TCGA-LUSC** and is redistributed under the
> [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy).
---
## Instructions for Use (On Your Own Data)
```bash
# Step 1: Color normalization (Reinhard method)
python normalize.py \
--input_dir /path/to/slides \
--target /path/to/standard-ilc.tif
# Step 2: Extract patches (40x recommended)
python patchify.py \
--input_dir /path/to/slides \
--magnification 40 \
--patch_size 256 \
--overlap 64 \
--workers 8
# Step 3: Cell detection & classification
python inference.py \
--input_dir /path/to/patch_folders \
--output_dir /path/to/results \
--model_path ./HNE2cell_all_patch73_jit.pt \
--magnification 40 \
--batch_size 32
```
---
## Input / Output Details
### Input
| Step | Input | Format |
|------|-------|--------|
| Normalize | Raw WSI | `.svs`, `.tif`, `.tiff`, `.ndpi` |
| Patchify | Normalized image | `Aligned-hne.tif` (from Step 1) |
| Inference | Patches | `256Γ256 px` PNG files |
### Output
| File | Description |
|------|-------------|
| `Aligned-hne.tif` | Full-resolution normalized H&E image |
| `Aligned-hne.jpg` | 4Γ downsampled preview |
| `recon.tif` | Tissue-only reconstruction (intermediate) |
| `*_mask.png` | Per-patch cell segmentation mask (colored by type) |
| `*_centroid.csv` | Cell centroids with columns: `slide_id, x, y, celltype, celltype_name` |
### Centroid CSV format
```csv
slide_id,x,y,celltype,celltype_name
patch_0_0,112.3,87.5,1,Malignant
patch_0_0,45.1,201.2,2,CD4T
...
```
To convert patch-local coordinates to WSI-global coordinates:
```python
# Parse patch filename: {prefix}_{x_offset}_{y_offset}.png
x_global = x + x_offset
y_global = y + y_offset
```
---
## Normalization Reference Image
The pipeline uses Reinhard color normalization in LAB color space. You need a reference image
(`standard-ilc.tif`) that represents your target stain appearance. The reference image is
included in this repository, or you can supply your own.
---
## File Structure
```
HNE2Cell/
βββ README.md # This file
βββ HNE2cell_pub_patch73_jit.pt # TorchScript model
βββ normalize.py # Step 1: Reinhard normalization
βββ patchify.py # Step 2: Patch extraction
βββ inference.py # Step 3: Model inference
βββ post_processing.py # Cell post-processing module
βββ tools.py # Utility functions
βββ standard-ilc.tif # Reference image for normalization
βββ TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs # Example slide (TCGA-LUSC)
```
---
## Citation
If you use HNE2Cell in your research, please cite:
```bibtex
@misc{hne2cell,
title={Spatial transcriptomicsβsupervised deep learning enables single-cell mapping of tumor immune architecture from routine histology},
year={2026},
url={https://huggingface.co/roobee79/HNE2Cell}
}
```
The example slide is derived from data generated by the TCGA:
<https://portal.gdc.cancer.gov/>.
## License
This repository uses a dual licensing scheme commonly adopted in academic ML/biomedical projects (e.g., SAM, LLaMA):
- **Source code** (`.py` files): Released under the **MIT License**. See `LICENSE`.
- **Model weights** (`HNE2cell_pub_patch73_jit.pt`): Released under
**Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)**.
See `MODEL_LICENSE`. Free for academic and non-commercial research use.
For commercial licensing, please contact:
**Ewha University-Industry Collaboration Foundation**
Technology Commercialization Team
Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul, Republic of Korea
Web: <https://research.ewha.ac.kr> | <https://epm.ewha.ac.kr>
- **Example slide** (`TCGA-56-8628-*.svs`): Derived from TCGA-LUSC, governed by
the [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy). |