jshhhh
/

PathFLIP

vision-language

contrastive-learning

Model card Files Files and versions

PathFLIP / README.md

jshhhh's picture

Update README.md

15a2001 verified about 1 month ago

|

History Blame Contribute Delete

1.35 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	pipeline_tag: image-to-text
	tags:
	- medical
	- pathology
	- vision-language
	- contrastive-learning
	- fine-grained
	- multimodal
	library_name: transformers
	---
	# PathFLIP

	Model weights for the paper PathFLIP: Fine-Grained Language-Image Pretraining for Versatile Pathology Image Understanding.

	## Overview

	PathFLIP is a pathology vision-language model that aligns fine-grained morphological sub-captions with their corresponding regions in Whole Slide Images. Unlike prior pathology VLMs that pair an entire slide with a single report-level anchor, PathFLIP introduces region-statement correspondence through a region Q-Former and a region-level contrastive objective with caption-swapped negatives, learning region-level alignment without any manual spatial annotation. This fine-grained supervision enables strong slide-level classification and retrieval performance, and gives rise to an emergent visual grounding capability.

	## Model Details

	- Base model: Qwen3-0.6B
	- Training data: [FGC-4K Dataset](https://huggingface.co/datasets/jshhhh/PathFLIP/)
	- Task: classification, image-text retrieval, visual grounding, vqa
	- Languages: English

	## License

	This model is released under CC BY-NC 4.0 — free for academic and research use, not for commercial use or clinical deployment.