| --- |
| license: apache-2.0 |
| pipeline_tag: image-segmentation |
| tags: |
| - cell-classification |
| - cell-detection |
| - digital-pathology |
| - whole-slide-image |
| - H&E |
| - computational-pathology |
| - pytorch |
| --- |
| |
| |
| # HNE2Cell β H&E Whole-Slide Image Cell Detection & Classification |
|
|
| HNE2Cell detects and classifies **16 cell types** from H&E-stained whole-slide images (WSI). |
| It takes 256Γ256 px patches as input and outputs per-cell contours, centroids, and type labels. |
|
|
| ## Cell Types (16 classes) |
|
|
| | ID | Cell Type | Color | |
| |----|-----------|-------| |
| | 0 | Background | β | |
| | 1 | **Malignant** | π΄ Red | |
| | 2 | CD4 T | π΅ Dodger Blue | |
| | 3 | CD8 T | π΅ Royal Blue | |
| | 4 | B | π΅ Blue | |
| | 5 | Plasma | π΅ Cornflower Blue | |
| | 6 | Macrophage | π΅ Powder Blue | |
| | 7 | Myeloid | π΅ Steel Blue | |
| | 8 | DC | π΅ Deep Sky Blue | |
| | 9 | Fibroblast | π’ Forest Green | |
| | 10 | Endothelial | π’ Medium Sea Green | |
| | 11 | Pericyte | π’ Lime Green | |
| | 12 | Epithelial | π Dark Orange | |
| | 13 | Immune_Other | βͺ Light Blue | |
| | 14 | Stromal_Other | π€ Olive Drab | |
| | 15 | Dead | β« Grey | |
|
|
| ## Pipeline Overview |
|
|
| The full pipeline consists of three steps: |
|
|
| ``` |
| βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ |
| β 1. Normalize β βββ β 2. Patchify β βββ β 3. Inference β |
| β (Reinhard) β β (256px, 64ov) β β (Cell Detection)β |
| βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ |
| SVS / TIF PNG patches Masks + Centroids |
| ``` |
|
|
| ### Magnification |
|
|
| | | 40x (Recommended) | 20x (Supported) | |
| |---|---|---| |
| | **Accuracy** | Best β fine-grained cell boundaries | Good β may miss small immune cells | |
| | **Speed** | More patches per slide | Fewer patches, faster | |
| | **Use when** | Immune cell subtyping matters | Quick screening / large cohorts | |
|
|
| > **40x is strongly recommended.** The model was primarily trained on 40x data. 20x works but expect reduced precision for small cells (lymphocytes, DCs). |
|
|
| --- |
|
|
| ## System Requirements |
|
|
| ### Software dependencies (tested versions) |
|
|
| Core packages (as reported in the manuscript): |
|
|
| - Python 3.10 |
| - pytorch == 2.5.1 |
| - timm == 1.0.8 |
| - transformers == 4.44.0 |
| - scanpy == 1.10.3 |
| - squidpy == 1.5.0 |
| - spatialdata == 0.2.5 |
| - scikit-image == 0.24.0 |
| - scikit-learn == 1.2.2 |
| - scipy == 1.13.1 |
| - shapely == 2.0.7 |
|
|
| Additional utilities required by the pipeline scripts: |
|
|
| - torchvision (matching the PyTorch 2.5.1 release) |
| - tifffile, Pillow, opencv-python-headless, pandas, tqdm |
| - huggingface_hub |
| - openslide-python (optional, for `.svs` files) |
| |
| ### Operating systems tested |
| |
| - Ubuntu 22.04 LTS |
| - Ubuntu 20.04 LTS |
| |
| (Not tested on Windows/macOS.) |
| |
| ### Hardware requirements |
| |
| > **Note:** WSI processing is memory-intensive. This pipeline is designed for |
| > server- or workstation-class hardware, not standard desktops. |
| |
| **Minimum (small WSIs, ~1β2 GB):** |
| - GPU: NVIDIA GPU with β₯12 GB VRAM |
| - RAM: 32 GB (64 GB strongly recommended) |
| - Disk: 100 GB free |
| |
| **Recommended (typical WSIs, 2β10 GB):** |
| - GPU: NVIDIA A100 / RTX 4090 / RTX 3090 (β₯24 GB VRAM) |
| - RAM: β₯128 GB |
| - Disk: 500 GB+ free (intermediate `Aligned-hne.tif` can be 20β50 GB per slide) |
| |
| **Tested configurations:** |
| - NVIDIA A100 (40 GB VRAM), 256 GB RAM, Ubuntu 22.04 |
| - NVIDIA RTX 3060 (12 GB VRAM), 64 GB RAM, Ubuntu 22.04 |
| |
| CPU-only inference is not supported in practice β full WSI inference would take |
| days even on a high-core-count CPU. |
| |
| --- |
| |
| ## Installation Guide |
| |
| ### Recommended: Conda environment from `cellvit_rv3.yml` |
|
|
| The repository includes a frozen conda environment file with all dependencies pinned |
| to the exact versions used in the manuscript. |
|
|
| ```bash |
| # 1. Download environment file |
| wget https://huggingface.co/roobee79/HNE2Cell/resolve/main/cellvit_rv3.yml |
| |
| # 2. Create environment |
| conda env create -f cellvit_rv3.yml |
| |
| # 3. Activate |
| conda activate cellvit_rv3 |
| ``` |
|
|
| **Typical install time:** ~10β15 minutes on a Linux server with a stable network connection |
| (dominated by the PyTorch + CUDA toolkit download). |
|
|
|
|
| ### Download the model |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| model_path = hf_hub_download( |
| repo_id="roobee79/HNE2Cell", |
| filename="HNE2cell_pub_patch73_jit.pt" |
| ) |
| ``` |
|
|
| --- |
|
|
| ## Demo: Reproducible Walkthrough |
|
|
| To verify your installation, run the pipeline on the example slide included in this |
| repository (`TCGA-56-8628-01Z-00-DX1`, LUSC, ~36 MB). |
|
|
| ### Download the model, example slide, and reference image |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| REPO_ID = "roobee79/HNE2Cell" |
| |
| model_path = hf_hub_download( |
| repo_id=REPO_ID, |
| filename="HNE2cell_pub_patch73_jit.pt" |
| ) |
| |
| slide_path = hf_hub_download( |
| repo_id=REPO_ID, |
| filename="TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs" |
| ) |
| |
| ref_path = hf_hub_download( |
| repo_id=REPO_ID, |
| filename="standard-ilc.tif" |
| ) |
| ``` |
|
|
| ### Run the pipeline |
|
|
| ```bash |
| # Place the downloaded slide in a working directory |
| mkdir -p example/slides |
| cp <slide_path> example/slides/ |
| |
| # Step 1: Normalize |
| python normalize.py \ |
| --input_dir ./example/slides \ |
| --target ./standard-ilc.tif |
| |
| # Step 2: Patchify at 40x |
| python patchify.py \ |
| --input_dir ./example/slides \ |
| --magnification 40 \ |
| --patch_size 256 \ |
| --overlap 64 \ |
| --workers 8 |
| |
| # Step 3: Inference |
| python inference.py \ |
| --input_dir ./example/slides/TCGA-56-8628-01Z-00-DX1.../patches \ |
| --output_dir ./example/results \ |
| --model_path ./HNE2cell_pub_patch73_jit.pt \ |
| --magnification 40 \ |
| --batch_size 32 |
| ``` |
|
|
| ### Expected output |
|
|
| ``` |
| example/results/ |
| βββ Aligned-hne.tif # Normalized full-resolution H&E |
| βββ Aligned-hne.jpg # 4Γ preview |
| βββ patch_*_mask.png # Per-patch cell type masks |
| βββ patch_*_centroid.csv # Cell centroids with type labels |
| ``` |
|
|
| **Expected results on the example slide (`TCGA-56-8628-01Z-00-DX1`):** |
| Approximately **63,000 cells** are detected across the 16 classes. |
| Small variation (Β±a few percent) is expected between hardware configurations. |
|
|
| ### Expected runtime |
|
|
| | Hardware | Full pipeline runtime | |
| |---|---| |
| | NVIDIA A100 (40 GB) + 256 GB RAM | ~20 min | |
| | NVIDIA RTX 3060 (12 GB) + 64 GB RAM | ~30 min | |
|
|
| A system without sufficient RAM (<32 GB) will fail at the normalization step |
| due to full-resolution image loading. |
|
|
| > The example slide is from **TCGA-LUSC** and is redistributed under the |
| > [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy). |
|
|
| --- |
|
|
| ## Instructions for Use (On Your Own Data) |
|
|
| ```bash |
| # Step 1: Color normalization (Reinhard method) |
| python normalize.py \ |
| --input_dir /path/to/slides \ |
| --target /path/to/standard-ilc.tif |
| |
| # Step 2: Extract patches (40x recommended) |
| python patchify.py \ |
| --input_dir /path/to/slides \ |
| --magnification 40 \ |
| --patch_size 256 \ |
| --overlap 64 \ |
| --workers 8 |
| |
| # Step 3: Cell detection & classification |
| python inference.py \ |
| --input_dir /path/to/patch_folders \ |
| --output_dir /path/to/results \ |
| --model_path ./HNE2cell_all_patch73_jit.pt \ |
| --magnification 40 \ |
| --batch_size 32 |
| ``` |
|
|
| --- |
|
|
| ## Input / Output Details |
|
|
| ### Input |
|
|
| | Step | Input | Format | |
| |------|-------|--------| |
| | Normalize | Raw WSI | `.svs`, `.tif`, `.tiff`, `.ndpi` | |
| | Patchify | Normalized image | `Aligned-hne.tif` (from Step 1) | |
| | Inference | Patches | `256Γ256 px` PNG files | |
|
|
| ### Output |
|
|
| | File | Description | |
| |------|-------------| |
| | `Aligned-hne.tif` | Full-resolution normalized H&E image | |
| | `Aligned-hne.jpg` | 4Γ downsampled preview | |
| | `recon.tif` | Tissue-only reconstruction (intermediate) | |
| | `*_mask.png` | Per-patch cell segmentation mask (colored by type) | |
| | `*_centroid.csv` | Cell centroids with columns: `slide_id, x, y, celltype, celltype_name` | |
|
|
| ### Centroid CSV format |
|
|
| ```csv |
| slide_id,x,y,celltype,celltype_name |
| patch_0_0,112.3,87.5,1,Malignant |
| patch_0_0,45.1,201.2,2,CD4T |
| ... |
| ``` |
|
|
| To convert patch-local coordinates to WSI-global coordinates: |
|
|
| ```python |
| # Parse patch filename: {prefix}_{x_offset}_{y_offset}.png |
| x_global = x + x_offset |
| y_global = y + y_offset |
| ``` |
|
|
| --- |
|
|
| ## Normalization Reference Image |
|
|
| The pipeline uses Reinhard color normalization in LAB color space. You need a reference image |
| (`standard-ilc.tif`) that represents your target stain appearance. The reference image is |
| included in this repository, or you can supply your own. |
|
|
| --- |
|
|
| ## File Structure |
|
|
| ``` |
| HNE2Cell/ |
| βββ README.md # This file |
| βββ HNE2cell_pub_patch73_jit.pt # TorchScript model |
| βββ normalize.py # Step 1: Reinhard normalization |
| βββ patchify.py # Step 2: Patch extraction |
| βββ inference.py # Step 3: Model inference |
| βββ post_processing.py # Cell post-processing module |
| βββ tools.py # Utility functions |
| βββ standard-ilc.tif # Reference image for normalization |
| βββ TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs # Example slide (TCGA-LUSC) |
| ``` |
|
|
| --- |
|
|
| |
| ## Citation |
|
|
| If you use HNE2Cell in your research, please cite: |
|
|
| ```bibtex |
| @misc{hne2cell, |
| title={Spatial transcriptomicsβsupervised deep learning enables single-cell mapping of tumor immune architecture from routine histology}, |
| year={2026}, |
| url={https://huggingface.co/roobee79/HNE2Cell} |
| } |
| ``` |
|
|
| The example slide is derived from data generated by the TCGA: |
| <https://portal.gdc.cancer.gov/>. |
|
|
|
|
| ## License |
|
|
| This repository uses a dual licensing scheme commonly adopted in academic ML/biomedical projects (e.g., SAM, LLaMA): |
|
|
| - **Source code** (`.py` files): Released under the **MIT License**. See `LICENSE`. |
| - **Model weights** (`HNE2cell_pub_patch73_jit.pt`): Released under |
| **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)**. |
| See `MODEL_LICENSE`. Free for academic and non-commercial research use. |
| |
| For commercial licensing, please contact: |
| **Ewha University-Industry Collaboration Foundation** |
| Technology Commercialization Team |
| Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul, Republic of Korea |
| Web: <https://research.ewha.ac.kr> | <https://epm.ewha.ac.kr> |
| - **Example slide** (`TCGA-56-8628-*.svs`): Derived from TCGA-LUSC, governed by |
| the [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy). |