Add files using upload-large-folder tool

c3a3093 verified about 22 hours ago

5.76 kB

license: cc-by-nc-nd-4.0
language:
  - en
pipeline_tag: image-text-to-text
library_name: transformers
tags:
  - multimodal
  - pathology
  - arxiv:2512.21058
extra_gated_prompt: >-
  The UniPath-7B model and its associated materials are released under the
  CC-BY-NC-ND 4.0 license. Access is restricted to non-commercial, academic
  research purposes only, with proper citation required. Any commercial usage,
  redistribution, or derivative work (including training models based on this
  model or generating datasets from its outputs) is strictly prohibited without
  prior written approval.

  Users must register with an official institutional email address (generic
  domains such as @gmail, @qq, @hotmail, etc. will not be accepted). By
  requesting access, you confirm that your information is accurate and current,
  and that you agree to comply with all terms listed herein. If other members of
  your organization wish to use the model, they must register independently and
  agree to the same terms.
extra_gated_fields:
  Full name (first and last): text
  Institutional affiliation (no abbreviations): text
  Role/Position:
    type: select
    options:
      - Faculty/Principal Investigator
      - PhD Student
      - Postdoctoral Researcher
      - Research Staff
      - Other
  Official institutional email (**must match your Hugging Face primary email; generic domains will be denied**): text
  Intended research use (be specific): text
  I agree to use this model only for non-commercial academic purposes: checkbox
  I agree not to redistribute this model or share it outside of my individual usage: checkbox
  I confirm that all submitted information is accurate and up to date: checkbox

[CVPR 2026] Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control

🏠 Project Page | 📖 Paper | 🤗 UniPath-1M | 🤗 UniPath-68K | 🧠 Model Weight

Abstract: In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image–text corpora; the lack of precise, fine-grained semantic control, which forces reliance on non-semantic cues; and terminological heterogeneity, where diverse phrasings for the same diagnostic concept impede reliable text conditioning. We introduce UniPath, a semantics-driven pathology image generation framework that leverages mature diagnostic understanding to enable controllable generation. UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM to distill paraphrase-robust Diagnostic Semantic Tokens and to expand prompts into diagnosis-aware attribute bundles; and a Prototype stream that affords component-level morphological control via a prototype bank. On the data front, we curate a 2.65M image–text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity. For a comprehensive assessment, we establish a four-tier evaluation hierarchy tailored to pathology. Extensive experiments demonstrate UniPath's SOTA performance, including a Patho‑FID of 80.9 (51% better than the second-best) and fine-grained semantic control achieving 98.7% of the real-image.

Highlights

Semantic-first pathology generation: shifts from pixel-level imitation to diagnosis-aware semantic control.
Multi-Stream Control: combines raw text, distilled diagnostic semantic tokens, and prototype-level morphology guidance.
Data curation at scale: uses a 2.65M corpus plus a high-quality 68K subset for robust training.
Pathology-specific evaluation: introduces a four-tier benchmark protocol for generation quality and controllability.

Installation

Recommended environment:

Python 3.11
GPU with at least 24 GB VRAM

Install dependencies:

pip install -r requirements.txt

Quickstart

Run from repository root:

python src/inference.py \
  --model_path /path/to/checkpoints \
  --rag_root_dir /path/to/RAG_8K \
  --output_dir ./generated_images \
  --num_seeds 5

RAG Data Requirements

src/inference.py expects --rag_root_dir to contain:

<RAG_ROOT>/
  llm_filtered_vocab_gemini_pro.txt
  keyword_inverted_index.json
  selected_8k.h5
  images/

Acknowledgements

This repository substantially reuses and adapts components from:

Patho-R1: https://github.com/Wenchuan-Zhang/Patho-R1
BLIP3o: https://github.com/JiuhaiChen/BLIP3o/tree/main
PixCell-256: https://huggingface.co/StonyBrook-CVLab/PixCell-256

We thank the original authors for open-sourcing their code and model weights.

Citation

If you find UniPath useful, please cite:

@article{han2025beyond,
  title={Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control},
  author={Han, Minghao and Liu, YiChen and Liu, Yizhou and Chen, Zizhi and Tang, Jingqun and Wu, Xuecheng and Yang, Dingkang and Zhang, Lihua},
  journal={arXiv preprint arXiv:2512.21058},
  year={2025}
}