--- license: apache-2.0 pipeline_tag: image-segmentation tags: - cell-classification - cell-detection - digital-pathology - whole-slide-image - H&E - computational-pathology - pytorch --- # HNE2Cell β€” H&E Whole-Slide Image Cell Detection & Classification HNE2Cell detects and classifies **16 cell types** from H&E-stained whole-slide images (WSI). It takes 256Γ—256 px patches as input and outputs per-cell contours, centroids, and type labels. ## Cell Types (16 classes) | ID | Cell Type | Color | |----|-----------|-------| | 0 | Background | β€” | | 1 | **Malignant** | πŸ”΄ Red | | 2 | CD4 T | πŸ”΅ Dodger Blue | | 3 | CD8 T | πŸ”΅ Royal Blue | | 4 | B | πŸ”΅ Blue | | 5 | Plasma | πŸ”΅ Cornflower Blue | | 6 | Macrophage | πŸ”΅ Powder Blue | | 7 | Myeloid | πŸ”΅ Steel Blue | | 8 | DC | πŸ”΅ Deep Sky Blue | | 9 | Fibroblast | 🟒 Forest Green | | 10 | Endothelial | 🟒 Medium Sea Green | | 11 | Pericyte | 🟒 Lime Green | | 12 | Epithelial | 🟠 Dark Orange | | 13 | Immune_Other | βšͺ Light Blue | | 14 | Stromal_Other | 🟀 Olive Drab | | 15 | Dead | ⚫ Grey | ## Pipeline Overview The full pipeline consists of three steps: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 1. Normalize β”‚ ──→ β”‚ 2. Patchify β”‚ ──→ β”‚ 3. Inference β”‚ β”‚ (Reinhard) β”‚ β”‚ (256px, 64ov) β”‚ β”‚ (Cell Detection)β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ SVS / TIF PNG patches Masks + Centroids ``` ### Magnification | | 40x (Recommended) | 20x (Supported) | |---|---|---| | **Accuracy** | Best β€” fine-grained cell boundaries | Good β€” may miss small immune cells | | **Speed** | More patches per slide | Fewer patches, faster | | **Use when** | Immune cell subtyping matters | Quick screening / large cohorts | > **40x is strongly recommended.** The model was primarily trained on 40x data. 20x works but expect reduced precision for small cells (lymphocytes, DCs). --- ## System Requirements ### Software dependencies (tested versions) Core packages (as reported in the manuscript): - Python 3.10 - pytorch == 2.5.1 - timm == 1.0.8 - transformers == 4.44.0 - scanpy == 1.10.3 - squidpy == 1.5.0 - spatialdata == 0.2.5 - scikit-image == 0.24.0 - scikit-learn == 1.2.2 - scipy == 1.13.1 - shapely == 2.0.7 Additional utilities required by the pipeline scripts: - torchvision (matching the PyTorch 2.5.1 release) - tifffile, Pillow, opencv-python-headless, pandas, tqdm - huggingface_hub - openslide-python (optional, for `.svs` files) ### Operating systems tested - Ubuntu 22.04 LTS - Ubuntu 20.04 LTS (Not tested on Windows/macOS.) ### Hardware requirements > **Note:** WSI processing is memory-intensive. This pipeline is designed for > server- or workstation-class hardware, not standard desktops. **Minimum (small WSIs, ~1–2 GB):** - GPU: NVIDIA GPU with β‰₯12 GB VRAM - RAM: 32 GB (64 GB strongly recommended) - Disk: 100 GB free **Recommended (typical WSIs, 2–10 GB):** - GPU: NVIDIA A100 / RTX 4090 / RTX 3090 (β‰₯24 GB VRAM) - RAM: β‰₯128 GB - Disk: 500 GB+ free (intermediate `Aligned-hne.tif` can be 20–50 GB per slide) **Tested configurations:** - NVIDIA A100 (40 GB VRAM), 256 GB RAM, Ubuntu 22.04 - NVIDIA RTX 3060 (12 GB VRAM), 64 GB RAM, Ubuntu 22.04 CPU-only inference is not supported in practice β€” full WSI inference would take days even on a high-core-count CPU. --- ## Installation Guide ### Recommended: Conda environment from `cellvit_rv3.yml` The repository includes a frozen conda environment file with all dependencies pinned to the exact versions used in the manuscript. ```bash # 1. Download environment file wget https://huggingface.co/roobee79/HNE2Cell/resolve/main/cellvit_rv3.yml # 2. Create environment conda env create -f cellvit_rv3.yml # 3. Activate conda activate cellvit_rv3 ``` **Typical install time:** ~10–15 minutes on a Linux server with a stable network connection (dominated by the PyTorch + CUDA toolkit download). ### Download the model ```python from huggingface_hub import hf_hub_download model_path = hf_hub_download( repo_id="roobee79/HNE2Cell", filename="HNE2cell_pub_patch73_jit.pt" ) ``` --- ## Demo: Reproducible Walkthrough To verify your installation, run the pipeline on the example slide included in this repository (`TCGA-56-8628-01Z-00-DX1`, LUSC, ~36 MB). ### Download the model, example slide, and reference image ```python from huggingface_hub import hf_hub_download REPO_ID = "roobee79/HNE2Cell" model_path = hf_hub_download( repo_id=REPO_ID, filename="HNE2cell_pub_patch73_jit.pt" ) slide_path = hf_hub_download( repo_id=REPO_ID, filename="TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs" ) ref_path = hf_hub_download( repo_id=REPO_ID, filename="standard-ilc.tif" ) ``` ### Run the pipeline ```bash # Place the downloaded slide in a working directory mkdir -p example/slides cp example/slides/ # Step 1: Normalize python normalize.py \ --input_dir ./example/slides \ --target ./standard-ilc.tif # Step 2: Patchify at 40x python patchify.py \ --input_dir ./example/slides \ --magnification 40 \ --patch_size 256 \ --overlap 64 \ --workers 8 # Step 3: Inference python inference.py \ --input_dir ./example/slides/TCGA-56-8628-01Z-00-DX1.../patches \ --output_dir ./example/results \ --model_path ./HNE2cell_pub_patch73_jit.pt \ --magnification 40 \ --batch_size 32 ``` ### Expected output ``` example/results/ β”œβ”€β”€ Aligned-hne.tif # Normalized full-resolution H&E β”œβ”€β”€ Aligned-hne.jpg # 4Γ— preview β”œβ”€β”€ patch_*_mask.png # Per-patch cell type masks └── patch_*_centroid.csv # Cell centroids with type labels ``` **Expected results on the example slide (`TCGA-56-8628-01Z-00-DX1`):** Approximately **63,000 cells** are detected across the 16 classes. Small variation (Β±a few percent) is expected between hardware configurations. ### Expected runtime | Hardware | Full pipeline runtime | |---|---| | NVIDIA A100 (40 GB) + 256 GB RAM | ~20 min | | NVIDIA RTX 3060 (12 GB) + 64 GB RAM | ~30 min | A system without sufficient RAM (<32 GB) will fail at the normalization step due to full-resolution image loading. > The example slide is from **TCGA-LUSC** and is redistributed under the > [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy). --- ## Instructions for Use (On Your Own Data) ```bash # Step 1: Color normalization (Reinhard method) python normalize.py \ --input_dir /path/to/slides \ --target /path/to/standard-ilc.tif # Step 2: Extract patches (40x recommended) python patchify.py \ --input_dir /path/to/slides \ --magnification 40 \ --patch_size 256 \ --overlap 64 \ --workers 8 # Step 3: Cell detection & classification python inference.py \ --input_dir /path/to/patch_folders \ --output_dir /path/to/results \ --model_path ./HNE2cell_all_patch73_jit.pt \ --magnification 40 \ --batch_size 32 ``` --- ## Input / Output Details ### Input | Step | Input | Format | |------|-------|--------| | Normalize | Raw WSI | `.svs`, `.tif`, `.tiff`, `.ndpi` | | Patchify | Normalized image | `Aligned-hne.tif` (from Step 1) | | Inference | Patches | `256Γ—256 px` PNG files | ### Output | File | Description | |------|-------------| | `Aligned-hne.tif` | Full-resolution normalized H&E image | | `Aligned-hne.jpg` | 4Γ— downsampled preview | | `recon.tif` | Tissue-only reconstruction (intermediate) | | `*_mask.png` | Per-patch cell segmentation mask (colored by type) | | `*_centroid.csv` | Cell centroids with columns: `slide_id, x, y, celltype, celltype_name` | ### Centroid CSV format ```csv slide_id,x,y,celltype,celltype_name patch_0_0,112.3,87.5,1,Malignant patch_0_0,45.1,201.2,2,CD4T ... ``` To convert patch-local coordinates to WSI-global coordinates: ```python # Parse patch filename: {prefix}_{x_offset}_{y_offset}.png x_global = x + x_offset y_global = y + y_offset ``` --- ## Normalization Reference Image The pipeline uses Reinhard color normalization in LAB color space. You need a reference image (`standard-ilc.tif`) that represents your target stain appearance. The reference image is included in this repository, or you can supply your own. --- ## File Structure ``` HNE2Cell/ β”œβ”€β”€ README.md # This file β”œβ”€β”€ HNE2cell_pub_patch73_jit.pt # TorchScript model β”œβ”€β”€ normalize.py # Step 1: Reinhard normalization β”œβ”€β”€ patchify.py # Step 2: Patch extraction β”œβ”€β”€ inference.py # Step 3: Model inference β”œβ”€β”€ post_processing.py # Cell post-processing module β”œβ”€β”€ tools.py # Utility functions β”œβ”€β”€ standard-ilc.tif # Reference image for normalization └── TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895.svs # Example slide (TCGA-LUSC) ``` --- ## Citation If you use HNE2Cell in your research, please cite: ```bibtex @misc{hne2cell, title={Spatial transcriptomics–supervised deep learning enables single-cell mapping of tumor immune architecture from routine histology}, year={2026}, url={https://huggingface.co/roobee79/HNE2Cell} } ``` The example slide is derived from data generated by the TCGA: . ## License This repository uses a dual licensing scheme commonly adopted in academic ML/biomedical projects (e.g., SAM, LLaMA): - **Source code** (`.py` files): Released under the **MIT License**. See `LICENSE`. - **Model weights** (`HNE2cell_pub_patch73_jit.pt`): Released under **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)**. See `MODEL_LICENSE`. Free for academic and non-commercial research use. For commercial licensing, please contact: **Ewha University-Industry Collaboration Foundation** Technology Commercialization Team Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul, Republic of Korea Web: | - **Example slide** (`TCGA-56-8628-*.svs`): Derived from TCGA-LUSC, governed by the [NIH Genomic Data Sharing Policy](https://sharing.nih.gov/genomic-data-sharing-policy).