Add files using upload-large-folder tool

c3a3093 verified 1 day ago

5.76 kB

	---
	license: cc-by-nc-nd-4.0
	language:
	- en
	pipeline_tag: image-text-to-text
	library_name: transformers
	tags:
	- multimodal
	- pathology
	- arxiv:2512.21058

	extra_gated_prompt: >-
	The UniPath-7B model and its associated materials are released under the
	CC-BY-NC-ND 4.0 license. Access is restricted to non-commercial, academic
	research purposes only, with proper citation required. Any commercial usage,
	redistribution, or derivative work (including training models based on this
	model or generating datasets from its outputs) is strictly prohibited without
	prior written approval.

	Users must register with an official institutional email address (generic
	domains such as @gmail, @qq, @hotmail, etc. will not be accepted). By
	requesting access, you confirm that your information is accurate and current,
	and that you agree to comply with all terms listed herein. If other members
	of your organization wish to use the model, they must register independently
	and agree to the same terms.

	extra_gated_fields:
	"Full name (first and last)": text
	"Institutional affiliation (no abbreviations)": text
	"Role/Position":
	type: select
	options:
	- Faculty/Principal Investigator
	- PhD Student
	- Postdoctoral Researcher
	- Research Staff
	- Other
	"Official institutional email (must match your Hugging Face primary email; generic domains will be denied)": text
	"Intended research use (be specific)": text
	"I agree to use this model only for non-commercial academic purposes": checkbox
	"I agree not to redistribute this model or share it outside of my individual usage": checkbox
	"I confirm that all submitted information is accurate and up to date": checkbox
	---


	<div align="center">
	<br>
	<h1>[CVPR 2026] Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control</h1>

	<p align="center">
	<a href="https://github.com/Hanminghao/UniPath">🏠 Project Page</a> \|
	<a href="https://arxiv.org/abs/2512.21058">📖 Paper</a> \|
	<a href="https://huggingface.co/datasets/minghaofdu/UniPath-1M">🤗 UniPath-1M</a> \|
	<a href="https://huggingface.co/datasets/minghaofdu/UniPath-68K">🤗 UniPath-68K</a> \|
	<a href="https://huggingface.co/minghaofdu/UniPath-7B">🧠 Model Weight</a>
	</p>

	</div>

	<img src="docs/logo.png" width="200px" align="right" />

	Abstract: In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image–text corpora; the lack of precise, fine-grained semantic control, which forces reliance on non-semantic cues; and terminological heterogeneity, where diverse phrasings for the same diagnostic concept impede reliable text conditioning. We introduce UniPath, a semantics-driven pathology image generation framework that leverages mature diagnostic understanding to enable controllable generation. UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM to distill paraphrase-robust Diagnostic Semantic Tokens and to expand prompts into diagnosis-aware attribute bundles; and a Prototype stream that affords component-level morphological control via a prototype bank. On the data front, we curate a 2.65M image–text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity. For a comprehensive assessment, we establish a four-tier evaluation hierarchy tailored to pathology. Extensive experiments demonstrate UniPath's SOTA performance, including a Patho‑FID of 80.9 (51\% better than the second-best) and fine-grained semantic control achieving 98.7\% of the real-image.

	---

	<img src="overall.png" scaledwidth="85%" align="center" />

	## Highlights
	- Semantic-first pathology generation: shifts from pixel-level imitation to diagnosis-aware semantic control.
	- Multi-Stream Control: combines raw text, distilled diagnostic semantic tokens, and prototype-level morphology guidance.
	- Data curation at scale: uses a 2.65M corpus plus a high-quality 68K subset for robust training.
	- Pathology-specific evaluation: introduces a four-tier benchmark protocol for generation quality and controllability.

	## Installation
	Recommended environment:
	- Python 3.11
	- GPU with at least 24 GB VRAM

	Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	## Quickstart
	Run from repository root:
	```bash
	python src/inference.py \
	--model_path /path/to/checkpoints \
	--rag_root_dir /path/to/RAG_8K \
	--output_dir ./generated_images \
	--num_seeds 5
	```

	## RAG Data Requirements
	`src/inference.py` expects `--rag_root_dir` to contain:
	```bash
	<RAG_ROOT>/
	llm_filtered_vocab_gemini_pro.txt
	keyword_inverted_index.json
	selected_8k.h5
	images/
	```

	## Acknowledgements
	This repository substantially reuses and adapts components from:
	- Patho-R1: https://github.com/Wenchuan-Zhang/Patho-R1
	- BLIP3o: https://github.com/JiuhaiChen/BLIP3o/tree/main
	- PixCell-256: https://huggingface.co/StonyBrook-CVLab/PixCell-256

	We thank the original authors for open-sourcing their code and model weights.


	## Citation
	If you find UniPath useful, please cite:
	```bibtex
	@article{han2025beyond,
	title={Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control},
	author={Han, Minghao and Liu, YiChen and Liu, Yizhou and Chen, Zizhi and Tang, Jingqun and Wu, Xuecheng and Yang, Dingkang and Zhang, Lihua},
	journal={arXiv preprint arXiv:2512.21058},
	year={2025}
	}
	```