| | --- |
| | license: cc-by-nc-nd-4.0 |
| | language: |
| | - en |
| | pipeline_tag: image-text-to-text |
| | library_name: transformers |
| | tags: |
| | - multimodal |
| | - pathology |
| | - arxiv:2512.21058 |
| |
|
| | extra_gated_prompt: >- |
| | The UniPath-7B model and its associated materials are released under the |
| | CC-BY-NC-ND 4.0 license. Access is restricted to non-commercial, academic |
| | research purposes only, with proper citation required. Any commercial usage, |
| | redistribution, or derivative work (including training models based on this |
| | model or generating datasets from its outputs) is strictly prohibited without |
| | prior written approval. |
| | |
| | Users must register with an official institutional email address (generic |
| | domains such as @gmail, @qq, @hotmail, etc. will not be accepted). By |
| | requesting access, you confirm that your information is accurate and current, |
| | and that you agree to comply with all terms listed herein. If other members |
| | of your organization wish to use the model, they must register independently |
| | and agree to the same terms. |
| |
|
| | extra_gated_fields: |
| | "Full name (first and last)": text |
| | "Institutional affiliation (no abbreviations)": text |
| | "Role/Position": |
| | type: select |
| | options: |
| | - Faculty/Principal Investigator |
| | - PhD Student |
| | - Postdoctoral Researcher |
| | - Research Staff |
| | - Other |
| | "Official institutional email (**must match your Hugging Face primary email; generic domains will be denied**)": text |
| | "Intended research use (be specific)": text |
| | "I agree to use this model only for non-commercial academic purposes": checkbox |
| | "I agree not to redistribute this model or share it outside of my individual usage": checkbox |
| | "I confirm that all submitted information is accurate and up to date": checkbox |
| | --- |
| | |
| |
|
| | <div align="center"> |
| | <br> |
| | <h1>[CVPR 2026] Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control</h1> |
| |
|
| | <p align="center"> |
| | <a href="https://github.com/Hanminghao/UniPath">🏠 Project Page</a> | |
| | <a href="https://arxiv.org/abs/2512.21058">📖 Paper</a> | |
| | <a href="https://huggingface.co/datasets/minghaofdu/UniPath-1M">🤗 UniPath-1M</a> | |
| | <a href="https://huggingface.co/datasets/minghaofdu/UniPath-68K">🤗 UniPath-68K</a> | |
| | <a href="https://huggingface.co/minghaofdu/UniPath-7B">🧠 Model Weight</a> |
| | </p> |
| |
|
| | </div> |
| |
|
| | <img src="docs/logo.png" width="200px" align="right" /> |
| |
|
| | **Abstract:** In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image–text corpora; the lack of precise, fine-grained semantic control, which forces reliance on non-semantic cues; and terminological heterogeneity, where diverse phrasings for the same diagnostic concept impede reliable text conditioning. We introduce UniPath, a semantics-driven pathology image generation framework that leverages mature diagnostic understanding to enable controllable generation. UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM to distill paraphrase-robust Diagnostic Semantic Tokens and to expand prompts into diagnosis-aware attribute bundles; and a Prototype stream that affords component-level morphological control via a prototype bank. On the data front, we curate a 2.65M image–text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity. For a comprehensive assessment, we establish a four-tier evaluation hierarchy tailored to pathology. Extensive experiments demonstrate UniPath's SOTA performance, including a Patho‑FID of 80.9 (51\% better than the second-best) and fine-grained semantic control achieving 98.7\% of the real-image. |
| |
|
| | --- |
| |
|
| | <img src="overall.png" scaledwidth="85%" align="center" /> |
| |
|
| | ## Highlights |
| | - **Semantic-first pathology generation:** shifts from pixel-level imitation to diagnosis-aware semantic control. |
| | - **Multi-Stream Control:** combines raw text, distilled diagnostic semantic tokens, and prototype-level morphology guidance. |
| | - **Data curation at scale:** uses a 2.65M corpus plus a high-quality 68K subset for robust training. |
| | - **Pathology-specific evaluation:** introduces a four-tier benchmark protocol for generation quality and controllability. |
| |
|
| | ## Installation |
| | Recommended environment: |
| | - Python 3.11 |
| | - GPU with at least 24 GB VRAM |
| |
|
| | Install dependencies: |
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | ## Quickstart |
| | Run from repository root: |
| | ```bash |
| | python src/inference.py \ |
| | --model_path /path/to/checkpoints \ |
| | --rag_root_dir /path/to/RAG_8K \ |
| | --output_dir ./generated_images \ |
| | --num_seeds 5 |
| | ``` |
| |
|
| | ## RAG Data Requirements |
| | `src/inference.py` expects `--rag_root_dir` to contain: |
| | ```bash |
| | <RAG_ROOT>/ |
| | llm_filtered_vocab_gemini_pro.txt |
| | keyword_inverted_index.json |
| | selected_8k.h5 |
| | images/ |
| | ``` |
| |
|
| | ## Acknowledgements |
| | This repository substantially reuses and adapts components from: |
| | - **Patho-R1:** https://github.com/Wenchuan-Zhang/Patho-R1 |
| | - **BLIP3o:** https://github.com/JiuhaiChen/BLIP3o/tree/main |
| | - **PixCell-256:** https://huggingface.co/StonyBrook-CVLab/PixCell-256 |
| |
|
| | We thank the original authors for open-sourcing their code and model weights. |
| |
|
| |
|
| | ## Citation |
| | If you find UniPath useful, please cite: |
| | ```bibtex |
| | @article{han2025beyond, |
| | title={Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control}, |
| | author={Han, Minghao and Liu, YiChen and Liu, Yizhou and Chen, Zizhi and Tang, Jingqun and Wu, Xuecheng and Yang, Dingkang and Zhang, Lihua}, |
| | journal={arXiv preprint arXiv:2512.21058}, |
| | year={2025} |
| | } |
| | ``` |
| |
|