HNE2Cell β€” H&E Whole-Slide Image Cell Detection & Classification

HNE2Cell detects and classifies 16 cell types from H&E-stained whole-slide images (WSI). It takes 256Γ—256 px patches as input and outputs per-cell contours, centroids, and type labels.

Cell Types (16 classes)

ID Cell Type Color
0 Background β€”
1 Malignant πŸ”΄ Red
2 CD4 T πŸ”΅ Dodger Blue
3 CD8 T πŸ”΅ Royal Blue
4 B πŸ”΅ Blue
5 Plasma πŸ”΅ Cornflower Blue
6 Macrophage πŸ”΅ Powder Blue
7 Myeloid πŸ”΅ Steel Blue
8 DC πŸ”΅ Deep Sky Blue
9 Fibroblast 🟒 Forest Green
10 Endothelial 🟒 Medium Sea Green
11 Pericyte 🟒 Lime Green
12 Epithelial 🟠 Dark Orange
13 Immune_Other βšͺ Light Blue
14 Stromal_Other 🟀 Olive Drab
15 Dead ⚫ Grey

Pipeline Overview

The full pipeline consists of three steps:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Normalize β”‚ ──→   β”‚  2. Patchify  β”‚ ──→  β”‚  3. Inference   β”‚
β”‚   (Reinhard) β”‚       β”‚ (256px, 64ov) β”‚     β”‚ (Cell Detection)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    SVS / TIF             PNG patches          Masks + Centroids

Magnification

40x (Recommended) 20x (Supported)
Accuracy Best β€” fine-grained cell boundaries Good β€” may miss small immune cells
Speed More patches per slide Fewer patches, faster
Use when Immune cell subtyping matters Quick screening / large cohorts

40x is strongly recommended. The model was primarily trained on 40x data. 20x works but expect reduced precision for small cells (lymphocytes, DCs).


System Requirements

Software dependencies (tested versions)

Core packages (as reported in the manuscript):

  • Python 3.10
  • pytorch == 2.5.1
  • timm == 1.0.8
  • transformers == 4.44.0
  • scanpy == 1.10.3
  • squidpy == 1.5.0
  • spatialdata == 0.2.5
  • scikit-image == 0.24.0
  • scikit-learn == 1.2.2
  • scipy == 1.13.1
  • shapely == 2.0.7

Additional utilities required by the pipeline scripts:

  • torchvision (matching the PyTorch 2.5.1 release)
  • tifffile, Pillow, opencv-python-headless, pandas, tqdm
  • huggingface_hub
  • openslide-python (optional, for .svs files)

Operating systems tested

  • Ubuntu 22.04 LTS
  • Ubuntu 20.04 LTS

(Not tested on Windows/macOS.)

Hardware requirements

Note: WSI processing is memory-intensive. This pipeline is designed for server- or workstation-class hardware, not standard desktops.

Minimum (small WSIs, ~1–2 GB):

  • GPU: NVIDIA GPU with β‰₯12 GB VRAM
  • RAM: 32 GB (64 GB strongly recommended)
  • Disk: 100 GB free

Recommended (typical WSIs, 2–10 GB):

  • GPU: NVIDIA A100 / RTX 4090 / RTX 3090 (β‰₯24 GB VRAM)
  • RAM: β‰₯128 GB
  • Disk: 500 GB+ free (intermediate Aligned-hne.tif can be 20–50 GB per slide)

Tested configurations:

  • NVIDIA A100 (40 GB VRAM), 256 GB RAM, Ubuntu 22.04
  • NVIDIA RTX 3060 (12 GB VRAM), 64 GB RAM, Ubuntu 22.04

CPU-only inference is not supported in practice β€” full WSI inference would take days even on a high-core-count CPU.


Installation Guide

Recommended: Conda environment from cellvit_rv3.yml

The repository includes a frozen conda environment file with all dependencies pinned to the exact versions used in the manuscript.

# 1. Download environment file
wget https://huggingface.co/roobee79/HNE2Cell/resolve/main/cellvit_rv3.yml

# 2. Create environment
conda env create -f cellvit_rv3.yml

# 3. Activate
conda activate cellvit_rv3

Typical install time: ~10–15 minutes on a Linux server with a stable network connection (dominated by the PyTorch + CUDA toolkit download).

Download the model

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="roobee79/HNE2Cell",
    filename="HNE2cell_pub_patch73_jit.pt"
)

Demo: Reproducible Walkthrough

To verify your installation, run the pipeline on the example slide included in this repository (TCGA-56-8628-01Z-00-DX1, LUSC, ~36 MB).

Download the model, example slide, and reference image

from huggingface_hub import hf_hub_download

REPO_ID = "roobee79/HNE2Cell"

model_path = hf_hub_download(
    repo_id=REPO_ID,
    filename="HNE2cell_pub_patch73_jit.pt"
)

slide_path = hf_hub_download(
    repo_id=REPO_ID,
    filename="TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs"
)

ref_path = hf_hub_download(
    repo_id=REPO_ID,
    filename="standard-ilc.tif"
)

Run the pipeline

# Place the downloaded slide in a working directory
mkdir -p example/slides
cp <slide_path> example/slides/

# Step 1: Normalize
python normalize.py \
    --input_dir ./example/slides \
    --target ./standard-ilc.tif

# Step 2: Patchify at 40x
python patchify.py \
    --input_dir ./example/slides \
    --magnification 40 \
    --patch_size 256 \
    --overlap 64 \
    --workers 8

# Step 3: Inference
python inference.py \
    --input_dir ./example/slides/TCGA-56-8628-01Z-00-DX1.../patches \
    --output_dir ./example/results \
    --model_path ./HNE2cell_pub_patch73_jit.pt \
    --magnification 40 \
    --batch_size 32

Expected output

example/results/
β”œβ”€β”€ Aligned-hne.tif        # Normalized full-resolution H&E
β”œβ”€β”€ Aligned-hne.jpg        # 4Γ— preview
β”œβ”€β”€ patch_*_mask.png       # Per-patch cell type masks
└── patch_*_centroid.csv   # Cell centroids with type labels

Expected results on the example slide (TCGA-56-8628-01Z-00-DX1): Approximately 63,000 cells are detected across the 16 classes. Small variation (Β±a few percent) is expected between hardware configurations.

Expected runtime

Hardware Full pipeline runtime
NVIDIA A100 (40 GB) + 256 GB RAM ~20 min
NVIDIA RTX 3060 (12 GB) + 64 GB RAM ~30 min

A system without sufficient RAM (<32 GB) will fail at the normalization step due to full-resolution image loading.

The example slide is from TCGA-LUSC and is redistributed under the NIH Genomic Data Sharing Policy.


Instructions for Use (On Your Own Data)

# Step 1: Color normalization (Reinhard method)
python normalize.py \
    --input_dir /path/to/slides \
    --target /path/to/standard-ilc.tif

# Step 2: Extract patches (40x recommended)
python patchify.py \
    --input_dir /path/to/slides \
    --magnification 40 \
    --patch_size 256 \
    --overlap 64 \
    --workers 8

# Step 3: Cell detection & classification
python inference.py \
    --input_dir /path/to/patch_folders \
    --output_dir /path/to/results \
    --model_path ./HNE2cell_all_patch73_jit.pt \
    --magnification 40 \
    --batch_size 32

Input / Output Details

Input

Step Input Format
Normalize Raw WSI .svs, .tif, .tiff, .ndpi
Patchify Normalized image Aligned-hne.tif (from Step 1)
Inference Patches 256Γ—256 px PNG files

Output

File Description
Aligned-hne.tif Full-resolution normalized H&E image
Aligned-hne.jpg 4Γ— downsampled preview
recon.tif Tissue-only reconstruction (intermediate)
*_mask.png Per-patch cell segmentation mask (colored by type)
*_centroid.csv Cell centroids with columns: slide_id, x, y, celltype, celltype_name

Centroid CSV format

slide_id,x,y,celltype,celltype_name
patch_0_0,112.3,87.5,1,Malignant
patch_0_0,45.1,201.2,2,CD4T
...

To convert patch-local coordinates to WSI-global coordinates:

# Parse patch filename: {prefix}_{x_offset}_{y_offset}.png
x_global = x + x_offset
y_global = y + y_offset

Normalization Reference Image

The pipeline uses Reinhard color normalization in LAB color space. You need a reference image (standard-ilc.tif) that represents your target stain appearance. The reference image is included in this repository, or you can supply your own.


File Structure

HNE2Cell/
β”œβ”€β”€ README.md                                                    # This file
β”œβ”€β”€ HNE2cell_pub_patch73_jit.pt                                  # TorchScript model
β”œβ”€β”€ normalize.py                                                 # Step 1: Reinhard normalization
β”œβ”€β”€ patchify.py                                                  # Step 2: Patch extraction
β”œβ”€β”€ inference.py                                                 # Step 3: Model inference
β”œβ”€β”€ post_processing.py                                           # Cell post-processing module
β”œβ”€β”€ tools.py                                                     # Utility functions
β”œβ”€β”€ standard-ilc.tif                                             # Reference image for normalization
└── TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs   # Example slide (TCGA-LUSC)

Citation

If you use HNE2Cell in your research, please cite:

@misc{hne2cell,
  title={Spatial transcriptomics–supervised deep learning enables single-cell mapping of tumor immune architecture from routine histology},
  year={2026},
  url={https://huggingface.co/roobee79/HNE2Cell}
}

The example slide is derived from data generated by the TCGA: https://portal.gdc.cancer.gov/.

License

This repository uses a dual licensing scheme commonly adopted in academic ML/biomedical projects (e.g., SAM, LLaMA):

  • Source code (.py files): Released under the MIT License. See LICENSE.

  • Model weights (HNE2cell_pub_patch73_jit.pt): Released under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). See MODEL_LICENSE. Free for academic and non-commercial research use.

    For commercial licensing, please contact:
    Ewha University-Industry Collaboration Foundation
    Technology Commercialization Team
    Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul, Republic of Korea
    Web: https://research.ewha.ac.kr | https://epm.ewha.ac.kr

  • Example slide (TCGA-56-8628-*.svs): Derived from TCGA-LUSC, governed by the NIH Genomic Data Sharing Policy.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support