HNE2Cell / README.md
roobee79's picture
Update README.md
5bf14d7 verified
---
license: apache-2.0
pipeline_tag: image-segmentation
tags:
- cell-classification
- cell-detection
- digital-pathology
- whole-slide-image
- H&E
- computational-pathology
- pytorch
---
# HNE2Cell β€” H&E Whole-Slide Image Cell Detection & Classification
HNE2Cell detects and classifies **16 cell types** from H&E-stained whole-slide images (WSI).
It takes 256Γ—256 px patches as input and outputs per-cell contours, centroids, and type labels.
## Cell Types (16 classes)
| ID | Cell Type | Color |
|----|-----------|-------|
| 0 | Background | β€” |
| 1 | **Malignant** | πŸ”΄ Red |
| 2 | CD4 T | πŸ”΅ Dodger Blue |
| 3 | CD8 T | πŸ”΅ Royal Blue |
| 4 | B | πŸ”΅ Blue |
| 5 | Plasma | πŸ”΅ Cornflower Blue |
| 6 | Macrophage | πŸ”΅ Powder Blue |
| 7 | Myeloid | πŸ”΅ Steel Blue |
| 8 | DC | πŸ”΅ Deep Sky Blue |
| 9 | Fibroblast | 🟒 Forest Green |
| 10 | Endothelial | 🟒 Medium Sea Green |
| 11 | Pericyte | 🟒 Lime Green |
| 12 | Epithelial | 🟠 Dark Orange |
| 13 | Immune_Other | βšͺ Light Blue |
| 14 | Stromal_Other | 🟀 Olive Drab |
| 15 | Dead | ⚫ Grey |
## Pipeline Overview
The full pipeline consists of three steps:
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Normalize β”‚ ──→ β”‚ 2. Patchify β”‚ ──→ β”‚ 3. Inference β”‚
β”‚ (Reinhard) β”‚ β”‚ (256px, 64ov) β”‚ β”‚ (Cell Detection)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
SVS / TIF PNG patches Masks + Centroids
```
### Magnification
| | 40x (Recommended) | 20x (Supported) |
|---|---|---|
| **Accuracy** | Best β€” fine-grained cell boundaries | Good β€” may miss small immune cells |
| **Speed** | More patches per slide | Fewer patches, faster |
| **Use when** | Immune cell subtyping matters | Quick screening / large cohorts |
> **40x is strongly recommended.** The model was primarily trained on 40x data. 20x works but expect reduced precision for small cells (lymphocytes, DCs).
---
## System Requirements
### Software dependencies (tested versions)
Core packages (as reported in the manuscript):
- Python 3.10
- pytorch == 2.5.1
- timm == 1.0.8
- transformers == 4.44.0
- scanpy == 1.10.3
- squidpy == 1.5.0
- spatialdata == 0.2.5
- scikit-image == 0.24.0
- scikit-learn == 1.2.2
- scipy == 1.13.1
- shapely == 2.0.7
Additional utilities required by the pipeline scripts:
- torchvision (matching the PyTorch 2.5.1 release)
- tifffile, Pillow, opencv-python-headless, pandas, tqdm
- huggingface_hub
- openslide-python (optional, for `.svs` files)
### Operating systems tested
- Ubuntu 22.04 LTS
- Ubuntu 20.04 LTS
(Not tested on Windows/macOS.)
### Hardware requirements
> **Note:** WSI processing is memory-intensive. This pipeline is designed for
> server- or workstation-class hardware, not standard desktops.
**Minimum (small WSIs, ~1–2 GB):**
- GPU: NVIDIA GPU with β‰₯12 GB VRAM
- RAM: 32 GB (64 GB strongly recommended)
- Disk: 100 GB free
**Recommended (typical WSIs, 2–10 GB):**
- GPU: NVIDIA A100 / RTX 4090 / RTX 3090 (β‰₯24 GB VRAM)
- RAM: β‰₯128 GB
- Disk: 500 GB+ free (intermediate `Aligned-hne.tif` can be 20–50 GB per slide)
**Tested configurations:**
- NVIDIA A100 (40 GB VRAM), 256 GB RAM, Ubuntu 22.04
- NVIDIA RTX 3060 (12 GB VRAM), 64 GB RAM, Ubuntu 22.04
CPU-only inference is not supported in practice β€” full WSI inference would take
days even on a high-core-count CPU.
---
## Installation Guide
### Recommended: Conda environment from `cellvit_rv3.yml`
The repository includes a frozen conda environment file with all dependencies pinned
to the exact versions used in the manuscript.
```bash
# 1. Download environment file
wget https://huggingface.co/roobee79/HNE2Cell/resolve/main/cellvit_rv3.yml
# 2. Create environment
conda env create -f cellvit_rv3.yml
# 3. Activate
conda activate cellvit_rv3
```
**Typical install time:** ~10–15 minutes on a Linux server with a stable network connection
(dominated by the PyTorch + CUDA toolkit download).
### Download the model
```python
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="roobee79/HNE2Cell",
filename="HNE2cell_pub_patch73_jit.pt"
)
```
---
## Demo: Reproducible Walkthrough
To verify your installation, run the pipeline on the example slide included in this
repository (`TCGA-56-8628-01Z-00-DX1`, LUSC, ~36 MB).
### Download the model, example slide, and reference image
```python
from huggingface_hub import hf_hub_download
REPO_ID = "roobee79/HNE2Cell"
model_path = hf_hub_download(
repo_id=REPO_ID,
filename="HNE2cell_pub_patch73_jit.pt"
)
slide_path = hf_hub_download(
repo_id=REPO_ID,
filename="TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs"
)
ref_path = hf_hub_download(
repo_id=REPO_ID,
filename="standard-ilc.tif"
)
```
### Run the pipeline
```bash
# Place the downloaded slide in a working directory
mkdir -p example/slides
cp <slide_path> example/slides/
# Step 1: Normalize
python normalize.py \
--input_dir ./example/slides \
--target ./standard-ilc.tif
# Step 2: Patchify at 40x
python patchify.py \
--input_dir ./example/slides \
--magnification 40 \
--patch_size 256 \
--overlap 64 \
--workers 8
# Step 3: Inference
python inference.py \
--input_dir ./example/slides/TCGA-56-8628-01Z-00-DX1.../patches \
--output_dir ./example/results \
--model_path ./HNE2cell_pub_patch73_jit.pt \
--magnification 40 \
--batch_size 32
```
### Expected output
```
example/results/
β”œβ”€β”€ Aligned-hne.tif # Normalized full-resolution H&E
β”œβ”€β”€ Aligned-hne.jpg # 4Γ— preview
β”œβ”€β”€ patch_*_mask.png # Per-patch cell type masks
└── patch_*_centroid.csv # Cell centroids with type labels
```
**Expected results on the example slide (`TCGA-56-8628-01Z-00-DX1`):**
Approximately **63,000 cells** are detected across the 16 classes.
Small variation (Β±a few percent) is expected between hardware configurations.
### Expected runtime
| Hardware | Full pipeline runtime |
|---|---|
| NVIDIA A100 (40 GB) + 256 GB RAM | ~20 min |
| NVIDIA RTX 3060 (12 GB) + 64 GB RAM | ~30 min |
A system without sufficient RAM (<32 GB) will fail at the normalization step
due to full-resolution image loading.
> The example slide is from **TCGA-LUSC** and is redistributed under the
> [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy).
---
## Instructions for Use (On Your Own Data)
```bash
# Step 1: Color normalization (Reinhard method)
python normalize.py \
--input_dir /path/to/slides \
--target /path/to/standard-ilc.tif
# Step 2: Extract patches (40x recommended)
python patchify.py \
--input_dir /path/to/slides \
--magnification 40 \
--patch_size 256 \
--overlap 64 \
--workers 8
# Step 3: Cell detection & classification
python inference.py \
--input_dir /path/to/patch_folders \
--output_dir /path/to/results \
--model_path ./HNE2cell_all_patch73_jit.pt \
--magnification 40 \
--batch_size 32
```
---
## Input / Output Details
### Input
| Step | Input | Format |
|------|-------|--------|
| Normalize | Raw WSI | `.svs`, `.tif`, `.tiff`, `.ndpi` |
| Patchify | Normalized image | `Aligned-hne.tif` (from Step 1) |
| Inference | Patches | `256Γ—256 px` PNG files |
### Output
| File | Description |
|------|-------------|
| `Aligned-hne.tif` | Full-resolution normalized H&E image |
| `Aligned-hne.jpg` | 4Γ— downsampled preview |
| `recon.tif` | Tissue-only reconstruction (intermediate) |
| `*_mask.png` | Per-patch cell segmentation mask (colored by type) |
| `*_centroid.csv` | Cell centroids with columns: `slide_id, x, y, celltype, celltype_name` |
### Centroid CSV format
```csv
slide_id,x,y,celltype,celltype_name
patch_0_0,112.3,87.5,1,Malignant
patch_0_0,45.1,201.2,2,CD4T
...
```
To convert patch-local coordinates to WSI-global coordinates:
```python
# Parse patch filename: {prefix}_{x_offset}_{y_offset}.png
x_global = x + x_offset
y_global = y + y_offset
```
---
## Normalization Reference Image
The pipeline uses Reinhard color normalization in LAB color space. You need a reference image
(`standard-ilc.tif`) that represents your target stain appearance. The reference image is
included in this repository, or you can supply your own.
---
## File Structure
```
HNE2Cell/
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ HNE2cell_pub_patch73_jit.pt # TorchScript model
β”œβ”€β”€ normalize.py # Step 1: Reinhard normalization
β”œβ”€β”€ patchify.py # Step 2: Patch extraction
β”œβ”€β”€ inference.py # Step 3: Model inference
β”œβ”€β”€ post_processing.py # Cell post-processing module
β”œβ”€β”€ tools.py # Utility functions
β”œβ”€β”€ standard-ilc.tif # Reference image for normalization
└── TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs # Example slide (TCGA-LUSC)
```
---
## Citation
If you use HNE2Cell in your research, please cite:
```bibtex
@misc{hne2cell,
title={Spatial transcriptomics–supervised deep learning enables single-cell mapping of tumor immune architecture from routine histology},
year={2026},
url={https://huggingface.co/roobee79/HNE2Cell}
}
```
The example slide is derived from data generated by the TCGA:
<https://portal.gdc.cancer.gov/>.
## License
This repository uses a dual licensing scheme commonly adopted in academic ML/biomedical projects (e.g., SAM, LLaMA):
- **Source code** (`.py` files): Released under the **MIT License**. See `LICENSE`.
- **Model weights** (`HNE2cell_pub_patch73_jit.pt`): Released under
**Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)**.
See `MODEL_LICENSE`. Free for academic and non-commercial research use.
For commercial licensing, please contact:
**Ewha University-Industry Collaboration Foundation**
Technology Commercialization Team
Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul, Republic of Korea
Web: <https://research.ewha.ac.kr> | <https://epm.ewha.ac.kr>
- **Example slide** (`TCGA-56-8628-*.svs`): Derived from TCGA-LUSC, governed by
the [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy).