- HNE2Cell β H&E Whole-Slide Image Cell Detection & Classification
HNE2Cell β H&E Whole-Slide Image Cell Detection & Classification
HNE2Cell detects and classifies 16 cell types from H&E-stained whole-slide images (WSI). It takes 256Γ256 px patches as input and outputs per-cell contours, centroids, and type labels.
Cell Types (16 classes)
| ID | Cell Type | Color |
|---|---|---|
| 0 | Background | β |
| 1 | Malignant | π΄ Red |
| 2 | CD4 T | π΅ Dodger Blue |
| 3 | CD8 T | π΅ Royal Blue |
| 4 | B | π΅ Blue |
| 5 | Plasma | π΅ Cornflower Blue |
| 6 | Macrophage | π΅ Powder Blue |
| 7 | Myeloid | π΅ Steel Blue |
| 8 | DC | π΅ Deep Sky Blue |
| 9 | Fibroblast | π’ Forest Green |
| 10 | Endothelial | π’ Medium Sea Green |
| 11 | Pericyte | π’ Lime Green |
| 12 | Epithelial | π Dark Orange |
| 13 | Immune_Other | βͺ Light Blue |
| 14 | Stromal_Other | π€ Olive Drab |
| 15 | Dead | β« Grey |
Pipeline Overview
The full pipeline consists of three steps:
βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ
β 1. Normalize β βββ β 2. Patchify β βββ β 3. Inference β
β (Reinhard) β β (256px, 64ov) β β (Cell Detection)β
βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ
SVS / TIF PNG patches Masks + Centroids
Magnification
| 40x (Recommended) | 20x (Supported) | |
|---|---|---|
| Accuracy | Best β fine-grained cell boundaries | Good β may miss small immune cells |
| Speed | More patches per slide | Fewer patches, faster |
| Use when | Immune cell subtyping matters | Quick screening / large cohorts |
40x is strongly recommended. The model was primarily trained on 40x data. 20x works but expect reduced precision for small cells (lymphocytes, DCs).
System Requirements
Software dependencies (tested versions)
Core packages (as reported in the manuscript):
- Python 3.10
- pytorch == 2.5.1
- timm == 1.0.8
- transformers == 4.44.0
- scanpy == 1.10.3
- squidpy == 1.5.0
- spatialdata == 0.2.5
- scikit-image == 0.24.0
- scikit-learn == 1.2.2
- scipy == 1.13.1
- shapely == 2.0.7
Additional utilities required by the pipeline scripts:
- torchvision (matching the PyTorch 2.5.1 release)
- tifffile, Pillow, opencv-python-headless, pandas, tqdm
- huggingface_hub
- openslide-python (optional, for
.svsfiles)
Operating systems tested
- Ubuntu 22.04 LTS
- Ubuntu 20.04 LTS
(Not tested on Windows/macOS.)
Hardware requirements
Note: WSI processing is memory-intensive. This pipeline is designed for server- or workstation-class hardware, not standard desktops.
Minimum (small WSIs, ~1β2 GB):
- GPU: NVIDIA GPU with β₯12 GB VRAM
- RAM: 32 GB (64 GB strongly recommended)
- Disk: 100 GB free
Recommended (typical WSIs, 2β10 GB):
- GPU: NVIDIA A100 / RTX 4090 / RTX 3090 (β₯24 GB VRAM)
- RAM: β₯128 GB
- Disk: 500 GB+ free (intermediate
Aligned-hne.tifcan be 20β50 GB per slide)
Tested configurations:
- NVIDIA A100 (40 GB VRAM), 256 GB RAM, Ubuntu 22.04
- NVIDIA RTX 3060 (12 GB VRAM), 64 GB RAM, Ubuntu 22.04
CPU-only inference is not supported in practice β full WSI inference would take days even on a high-core-count CPU.
Installation Guide
Recommended: Conda environment from cellvit_rv3.yml
The repository includes a frozen conda environment file with all dependencies pinned to the exact versions used in the manuscript.
# 1. Download environment file
wget https://huggingface.co/roobee79/HNE2Cell/resolve/main/cellvit_rv3.yml
# 2. Create environment
conda env create -f cellvit_rv3.yml
# 3. Activate
conda activate cellvit_rv3
Typical install time: ~10β15 minutes on a Linux server with a stable network connection (dominated by the PyTorch + CUDA toolkit download).
Download the model
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="roobee79/HNE2Cell",
filename="HNE2cell_pub_patch73_jit.pt"
)
Demo: Reproducible Walkthrough
To verify your installation, run the pipeline on the example slide included in this
repository (TCGA-56-8628-01Z-00-DX1, LUSC, ~36 MB).
Download the model, example slide, and reference image
from huggingface_hub import hf_hub_download
REPO_ID = "roobee79/HNE2Cell"
model_path = hf_hub_download(
repo_id=REPO_ID,
filename="HNE2cell_pub_patch73_jit.pt"
)
slide_path = hf_hub_download(
repo_id=REPO_ID,
filename="TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs"
)
ref_path = hf_hub_download(
repo_id=REPO_ID,
filename="standard-ilc.tif"
)
Run the pipeline
# Place the downloaded slide in a working directory
mkdir -p example/slides
cp <slide_path> example/slides/
# Step 1: Normalize
python normalize.py \
--input_dir ./example/slides \
--target ./standard-ilc.tif
# Step 2: Patchify at 40x
python patchify.py \
--input_dir ./example/slides \
--magnification 40 \
--patch_size 256 \
--overlap 64 \
--workers 8
# Step 3: Inference
python inference.py \
--input_dir ./example/slides/TCGA-56-8628-01Z-00-DX1.../patches \
--output_dir ./example/results \
--model_path ./HNE2cell_pub_patch73_jit.pt \
--magnification 40 \
--batch_size 32
Expected output
example/results/
βββ Aligned-hne.tif # Normalized full-resolution H&E
βββ Aligned-hne.jpg # 4Γ preview
βββ patch_*_mask.png # Per-patch cell type masks
βββ patch_*_centroid.csv # Cell centroids with type labels
Expected results on the example slide (TCGA-56-8628-01Z-00-DX1):
Approximately 63,000 cells are detected across the 16 classes.
Small variation (Β±a few percent) is expected between hardware configurations.
Expected runtime
| Hardware | Full pipeline runtime |
|---|---|
| NVIDIA A100 (40 GB) + 256 GB RAM | ~20 min |
| NVIDIA RTX 3060 (12 GB) + 64 GB RAM | ~30 min |
A system without sufficient RAM (<32 GB) will fail at the normalization step due to full-resolution image loading.
The example slide is from TCGA-LUSC and is redistributed under the NIH Genomic Data Sharing Policy.
Instructions for Use (On Your Own Data)
# Step 1: Color normalization (Reinhard method)
python normalize.py \
--input_dir /path/to/slides \
--target /path/to/standard-ilc.tif
# Step 2: Extract patches (40x recommended)
python patchify.py \
--input_dir /path/to/slides \
--magnification 40 \
--patch_size 256 \
--overlap 64 \
--workers 8
# Step 3: Cell detection & classification
python inference.py \
--input_dir /path/to/patch_folders \
--output_dir /path/to/results \
--model_path ./HNE2cell_all_patch73_jit.pt \
--magnification 40 \
--batch_size 32
Input / Output Details
Input
| Step | Input | Format |
|---|---|---|
| Normalize | Raw WSI | .svs, .tif, .tiff, .ndpi |
| Patchify | Normalized image | Aligned-hne.tif (from Step 1) |
| Inference | Patches | 256Γ256 px PNG files |
Output
| File | Description |
|---|---|
Aligned-hne.tif |
Full-resolution normalized H&E image |
Aligned-hne.jpg |
4Γ downsampled preview |
recon.tif |
Tissue-only reconstruction (intermediate) |
*_mask.png |
Per-patch cell segmentation mask (colored by type) |
*_centroid.csv |
Cell centroids with columns: slide_id, x, y, celltype, celltype_name |
Centroid CSV format
slide_id,x,y,celltype,celltype_name
patch_0_0,112.3,87.5,1,Malignant
patch_0_0,45.1,201.2,2,CD4T
...
To convert patch-local coordinates to WSI-global coordinates:
# Parse patch filename: {prefix}_{x_offset}_{y_offset}.png
x_global = x + x_offset
y_global = y + y_offset
Normalization Reference Image
The pipeline uses Reinhard color normalization in LAB color space. You need a reference image
(standard-ilc.tif) that represents your target stain appearance. The reference image is
included in this repository, or you can supply your own.
File Structure
HNE2Cell/
βββ README.md # This file
βββ HNE2cell_pub_patch73_jit.pt # TorchScript model
βββ normalize.py # Step 1: Reinhard normalization
βββ patchify.py # Step 2: Patch extraction
βββ inference.py # Step 3: Model inference
βββ post_processing.py # Cell post-processing module
βββ tools.py # Utility functions
βββ standard-ilc.tif # Reference image for normalization
βββ TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs # Example slide (TCGA-LUSC)
Citation
If you use HNE2Cell in your research, please cite:
@misc{hne2cell,
title={Spatial transcriptomicsβsupervised deep learning enables single-cell mapping of tumor immune architecture from routine histology},
year={2026},
url={https://huggingface.co/roobee79/HNE2Cell}
}
The example slide is derived from data generated by the TCGA: https://portal.gdc.cancer.gov/.
License
This repository uses a dual licensing scheme commonly adopted in academic ML/biomedical projects (e.g., SAM, LLaMA):
Source code (
.pyfiles): Released under the MIT License. SeeLICENSE.Model weights (
HNE2cell_pub_patch73_jit.pt): Released under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). SeeMODEL_LICENSE. Free for academic and non-commercial research use.For commercial licensing, please contact:
Ewha University-Industry Collaboration Foundation
Technology Commercialization Team
Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul, Republic of Korea
Web: https://research.ewha.ac.kr | https://epm.ewha.ac.krExample slide (
TCGA-56-8628-*.svs): Derived from TCGA-LUSC, governed by the NIH Genomic Data Sharing Policy.