HNE2Cell / README.md

Update README.md

5bf14d7 verified 3 days ago

10.9 kB

	---
	license: apache-2.0
	pipeline_tag: image-segmentation
	tags:
	- cell-classification
	- cell-detection
	- digital-pathology
	- whole-slide-image
	- H&E
	- computational-pathology
	- pytorch
	---


	# HNE2Cell — H&E Whole-Slide Image Cell Detection & Classification

	HNE2Cell detects and classifies 16 cell types from H&E-stained whole-slide images (WSI).
	It takes 256×256 px patches as input and outputs per-cell contours, centroids, and type labels.

	## Cell Types (16 classes)

	\| ID \| Cell Type \| Color \|
	\|----\|-----------\|-------\|
	\| 0 \| Background \| — \|
	\| 1 \| Malignant \| 🔴 Red \|
	\| 2 \| CD4 T \| 🔵 Dodger Blue \|
	\| 3 \| CD8 T \| 🔵 Royal Blue \|
	\| 4 \| B \| 🔵 Blue \|
	\| 5 \| Plasma \| 🔵 Cornflower Blue \|
	\| 6 \| Macrophage \| 🔵 Powder Blue \|
	\| 7 \| Myeloid \| 🔵 Steel Blue \|
	\| 8 \| DC \| 🔵 Deep Sky Blue \|
	\| 9 \| Fibroblast \| 🟢 Forest Green \|
	\| 10 \| Endothelial \| 🟢 Medium Sea Green \|
	\| 11 \| Pericyte \| 🟢 Lime Green \|
	\| 12 \| Epithelial \| 🟠 Dark Orange \|
	\| 13 \| Immune_Other \| ⚪ Light Blue \|
	\| 14 \| Stromal_Other \| 🟤 Olive Drab \|
	\| 15 \| Dead \| ⚫ Grey \|

	## Pipeline Overview

	The full pipeline consists of three steps:

	```
	┌───────────────┐ ┌────────────────┐ ┌──────────────────┐
	│ 1. Normalize │ ──→ │ 2. Patchify │ ──→ │ 3. Inference │
	│ (Reinhard) │ │ (256px, 64ov) │ │ (Cell Detection)│
	└───────────────┘ └────────────────┘ └──────────────────┘
	SVS / TIF PNG patches Masks + Centroids
	```

	### Magnification

	\| \| 40x (Recommended) \| 20x (Supported) \|
	\|---\|---\|---\|
	\| Accuracy \| Best — fine-grained cell boundaries \| Good — may miss small immune cells \|
	\| Speed \| More patches per slide \| Fewer patches, faster \|
	\| Use when \| Immune cell subtyping matters \| Quick screening / large cohorts \|

	> 40x is strongly recommended. The model was primarily trained on 40x data. 20x works but expect reduced precision for small cells (lymphocytes, DCs).

	---

	## System Requirements

	### Software dependencies (tested versions)

	Core packages (as reported in the manuscript):

	- Python 3.10
	- pytorch == 2.5.1
	- timm == 1.0.8
	- transformers == 4.44.0
	- scanpy == 1.10.3
	- squidpy == 1.5.0
	- spatialdata == 0.2.5
	- scikit-image == 0.24.0
	- scikit-learn == 1.2.2
	- scipy == 1.13.1
	- shapely == 2.0.7

	Additional utilities required by the pipeline scripts:

	- torchvision (matching the PyTorch 2.5.1 release)
	- tifffile, Pillow, opencv-python-headless, pandas, tqdm
	- huggingface_hub
	- openslide-python (optional, for `.svs` files)

	### Operating systems tested

	- Ubuntu 22.04 LTS
	- Ubuntu 20.04 LTS

	(Not tested on Windows/macOS.)

	### Hardware requirements

	> Note: WSI processing is memory-intensive. This pipeline is designed for
	> server- or workstation-class hardware, not standard desktops.

	Minimum (small WSIs, ~1–2 GB):
	- GPU: NVIDIA GPU with ≥12 GB VRAM
	- RAM: 32 GB (64 GB strongly recommended)
	- Disk: 100 GB free

	Recommended (typical WSIs, 2–10 GB):
	- GPU: NVIDIA A100 / RTX 4090 / RTX 3090 (≥24 GB VRAM)
	- RAM: ≥128 GB
	- Disk: 500 GB+ free (intermediate `Aligned-hne.tif` can be 20–50 GB per slide)

	Tested configurations:
	- NVIDIA A100 (40 GB VRAM), 256 GB RAM, Ubuntu 22.04
	- NVIDIA RTX 3060 (12 GB VRAM), 64 GB RAM, Ubuntu 22.04

	CPU-only inference is not supported in practice — full WSI inference would take
	days even on a high-core-count CPU.

	---

	## Installation Guide

	### Recommended: Conda environment from `cellvit_rv3.yml`

	The repository includes a frozen conda environment file with all dependencies pinned
	to the exact versions used in the manuscript.

	```bash
	# 1. Download environment file
	wget https://huggingface.co/roobee79/HNE2Cell/resolve/main/cellvit_rv3.yml

	# 2. Create environment
	conda env create -f cellvit_rv3.yml

	# 3. Activate
	conda activate cellvit_rv3
	```

	Typical install time: ~10–15 minutes on a Linux server with a stable network connection
	(dominated by the PyTorch + CUDA toolkit download).


	### Download the model

	```python
	from huggingface_hub import hf_hub_download

	model_path = hf_hub_download(
	repo_id="roobee79/HNE2Cell",
	filename="HNE2cell_pub_patch73_jit.pt"
	)
	```

	---

	## Demo: Reproducible Walkthrough

	To verify your installation, run the pipeline on the example slide included in this
	repository (`TCGA-56-8628-01Z-00-DX1`, LUSC, ~36 MB).

	### Download the model, example slide, and reference image

	```python
	from huggingface_hub import hf_hub_download

	REPO_ID = "roobee79/HNE2Cell"

	model_path = hf_hub_download(
	repo_id=REPO_ID,
	filename="HNE2cell_pub_patch73_jit.pt"
	)

	slide_path = hf_hub_download(
	repo_id=REPO_ID,
	filename="TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs"
	)

	ref_path = hf_hub_download(
	repo_id=REPO_ID,
	filename="standard-ilc.tif"
	)
	```

	### Run the pipeline

	```bash
	# Place the downloaded slide in a working directory
	mkdir -p example/slides
	cp <slide_path> example/slides/

	# Step 1: Normalize
	python normalize.py \
	--input_dir ./example/slides \
	--target ./standard-ilc.tif

	# Step 2: Patchify at 40x
	python patchify.py \
	--input_dir ./example/slides \
	--magnification 40 \
	--patch_size 256 \
	--overlap 64 \
	--workers 8

	# Step 3: Inference
	python inference.py \
	--input_dir ./example/slides/TCGA-56-8628-01Z-00-DX1.../patches \
	--output_dir ./example/results \
	--model_path ./HNE2cell_pub_patch73_jit.pt \
	--magnification 40 \
	--batch_size 32
	```

	### Expected output

	```
	example/results/
	├── Aligned-hne.tif # Normalized full-resolution H&E
	├── Aligned-hne.jpg # 4× preview
	├── patch_*_mask.png # Per-patch cell type masks
	└── patch_*_centroid.csv # Cell centroids with type labels
	```

	Expected results on the example slide (`TCGA-56-8628-01Z-00-DX1`):
	Approximately 63,000 cells are detected across the 16 classes.
	Small variation (±a few percent) is expected between hardware configurations.

	### Expected runtime

	\| Hardware \| Full pipeline runtime \|
	\|---\|---\|
	\| NVIDIA A100 (40 GB) + 256 GB RAM \| ~20 min \|
	\| NVIDIA RTX 3060 (12 GB) + 64 GB RAM \| ~30 min \|

	A system without sufficient RAM (<32 GB) will fail at the normalization step
	due to full-resolution image loading.

	> The example slide is from TCGA-LUSC and is redistributed under the
	> [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy).

	---

	## Instructions for Use (On Your Own Data)

	```bash
	# Step 1: Color normalization (Reinhard method)
	python normalize.py \
	--input_dir /path/to/slides \
	--target /path/to/standard-ilc.tif

	# Step 2: Extract patches (40x recommended)
	python patchify.py \
	--input_dir /path/to/slides \
	--magnification 40 \
	--patch_size 256 \
	--overlap 64 \
	--workers 8

	# Step 3: Cell detection & classification
	python inference.py \
	--input_dir /path/to/patch_folders \
	--output_dir /path/to/results \
	--model_path ./HNE2cell_all_patch73_jit.pt \
	--magnification 40 \
	--batch_size 32
	```

	---

	## Input / Output Details

	### Input

	\| Step \| Input \| Format \|
	\|------\|-------\|--------\|
	\| Normalize \| Raw WSI \| `.svs`, `.tif`, `.tiff`, `.ndpi` \|
	\| Patchify \| Normalized image \| `Aligned-hne.tif` (from Step 1) \|
	\| Inference \| Patches \| `256×256 px` PNG files \|

	### Output

	\| File \| Description \|
	\|------\|-------------\|
	\| `Aligned-hne.tif` \| Full-resolution normalized H&E image \|
	\| `Aligned-hne.jpg` \| 4× downsampled preview \|
	\| `recon.tif` \| Tissue-only reconstruction (intermediate) \|
	\| `*_mask.png` \| Per-patch cell segmentation mask (colored by type) \|
	\| `*_centroid.csv` \| Cell centroids with columns: `slide_id, x, y, celltype, celltype_name` \|

	### Centroid CSV format

	```csv
	slide_id,x,y,celltype,celltype_name
	patch_0_0,112.3,87.5,1,Malignant
	patch_0_0,45.1,201.2,2,CD4T
	...
	```

	To convert patch-local coordinates to WSI-global coordinates:

	```python
	# Parse patch filename: {prefix}_{x_offset}_{y_offset}.png
	x_global = x + x_offset
	y_global = y + y_offset
	```

	---

	## Normalization Reference Image

	The pipeline uses Reinhard color normalization in LAB color space. You need a reference image
	(`standard-ilc.tif`) that represents your target stain appearance. The reference image is
	included in this repository, or you can supply your own.

	---

	## File Structure

	```
	HNE2Cell/
	├── README.md # This file
	├── HNE2cell_pub_patch73_jit.pt # TorchScript model
	├── normalize.py # Step 1: Reinhard normalization
	├── patchify.py # Step 2: Patch extraction
	├── inference.py # Step 3: Model inference
	├── post_processing.py # Cell post-processing module
	├── tools.py # Utility functions
	├── standard-ilc.tif # Reference image for normalization
	└── TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs # Example slide (TCGA-LUSC)
	```

	---


	## Citation

	If you use HNE2Cell in your research, please cite:

	```bibtex
	@misc{hne2cell,
	title={Spatial transcriptomics–supervised deep learning enables single-cell mapping of tumor immune architecture from routine histology},
	year={2026},
	url={https://huggingface.co/roobee79/HNE2Cell}
	}
	```

	The example slide is derived from data generated by the TCGA:
	<https://portal.gdc.cancer.gov/>.


	## License

	This repository uses a dual licensing scheme commonly adopted in academic ML/biomedical projects (e.g., SAM, LLaMA):

	- Source code (`.py` files): Released under the MIT License. See `LICENSE`.
	- Model weights (`HNE2cell_pub_patch73_jit.pt`): Released under
	Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
	See `MODEL_LICENSE`. Free for academic and non-commercial research use.

	For commercial licensing, please contact:
	Ewha University-Industry Collaboration Foundation
	Technology Commercialization Team
	Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul, Republic of Korea
	Web: <https://research.ewha.ac.kr> \| <https://epm.ewha.ac.kr>
	- Example slide (`TCGA-56-8628-*.svs`): Derived from TCGA-LUSC, governed by
	the [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy).