license: cc-by-nc-nd-4.0
language:
- en
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- multimodal
- pathology
- arxiv:2512.21058
extra_gated_prompt: >-
The UniPath-7B model and its associated materials are released under the
CC-BY-NC-ND 4.0 license. Access is restricted to non-commercial, academic
research purposes only, with proper citation required. Any commercial usage,
redistribution, or derivative work (including training models based on this
model or generating datasets from its outputs) is strictly prohibited without
prior written approval.
Users must register with an official institutional email address (generic
domains such as @gmail, @qq, @hotmail, etc. will not be accepted). By
requesting access, you confirm that your information is accurate and current,
and that you agree to comply with all terms listed herein. If other members of
your organization wish to use the model, they must register independently and
agree to the same terms.
extra_gated_fields:
Full name (first and last): text
Institutional affiliation (no abbreviations): text
Role/Position:
type: select
options:
- Faculty/Principal Investigator
- PhD Student
- Postdoctoral Researcher
- Research Staff
- Other
Official institutional email (**must match your Hugging Face primary email; generic domains will be denied**): text
Intended research use (be specific): text
I agree to use this model only for non-commercial academic purposes: checkbox
I agree not to redistribute this model or share it outside of my individual usage: checkbox
I confirm that all submitted information is accurate and up to date: checkbox
[CVPR 2026] Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control
🏠 Project Page | 📖 Paper | 🤗 UniPath-1M | 🤗 UniPath-68K | 🧠 Model Weight
Abstract: In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image–text corpora; the lack of precise, fine-grained semantic control, which forces reliance on non-semantic cues; and terminological heterogeneity, where diverse phrasings for the same diagnostic concept impede reliable text conditioning. We introduce UniPath, a semantics-driven pathology image generation framework that leverages mature diagnostic understanding to enable controllable generation. UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM to distill paraphrase-robust Diagnostic Semantic Tokens and to expand prompts into diagnosis-aware attribute bundles; and a Prototype stream that affords component-level morphological control via a prototype bank. On the data front, we curate a 2.65M image–text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity. For a comprehensive assessment, we establish a four-tier evaluation hierarchy tailored to pathology. Extensive experiments demonstrate UniPath's SOTA performance, including a Patho‑FID of 80.9 (51% better than the second-best) and fine-grained semantic control achieving 98.7% of the real-image.
Highlights
- Semantic-first pathology generation: shifts from pixel-level imitation to diagnosis-aware semantic control.
- Multi-Stream Control: combines raw text, distilled diagnostic semantic tokens, and prototype-level morphology guidance.
- Data curation at scale: uses a 2.65M corpus plus a high-quality 68K subset for robust training.
- Pathology-specific evaluation: introduces a four-tier benchmark protocol for generation quality and controllability.
Installation
Recommended environment:
- Python 3.11
- GPU with at least 24 GB VRAM
Install dependencies:
pip install -r requirements.txt
Quickstart
Run from repository root:
python src/inference.py \
--model_path /path/to/checkpoints \
--rag_root_dir /path/to/RAG_8K \
--output_dir ./generated_images \
--num_seeds 5
RAG Data Requirements
src/inference.py expects --rag_root_dir to contain:
<RAG_ROOT>/
llm_filtered_vocab_gemini_pro.txt
keyword_inverted_index.json
selected_8k.h5
images/
Acknowledgements
This repository substantially reuses and adapts components from:
- Patho-R1: https://github.com/Wenchuan-Zhang/Patho-R1
- BLIP3o: https://github.com/JiuhaiChen/BLIP3o/tree/main
- PixCell-256: https://huggingface.co/StonyBrook-CVLab/PixCell-256
We thank the original authors for open-sourcing their code and model weights.
Citation
If you find UniPath useful, please cite:
@article{han2025beyond,
title={Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control},
author={Han, Minghao and Liu, YiChen and Liu, Yizhou and Chen, Zizhi and Tang, Jingqun and Wu, Xuecheng and Yang, Dingkang and Zhang, Lihua},
journal={arXiv preprint arXiv:2512.21058},
year={2025}
}