jshhhh
/

PathFLIP

vision-language

contrastive-learning

Model card Files Files and versions

jshhhh commited on May 21

Commit

15a2001

·

verified ·

1 Parent(s): bbc2b79

Update README.md

Files changed (1) hide show

README.md +29 -0

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
 ---
 license: cc-by-nc-4.0
 ---

 ---
 license: cc-by-nc-4.0
+language:
+- en
+pipeline_tag: image-to-text
+tags:
+- medical
+- pathology
+- vision-language
+- contrastive-learning
+- fine-grained
+- multimodal
+library_name: transformers
 ---
+# PathFLIP
+Model weights for the paper *PathFLIP: Fine-Grained Language-Image Pretraining for Versatile Pathology Image Understanding*.
+## Overview
+PathFLIP is a pathology vision-language model that aligns fine-grained morphological sub-captions with their corresponding regions in Whole Slide Images. Unlike prior pathology VLMs that pair an entire slide with a single report-level anchor, PathFLIP introduces region-statement correspondence through a region Q-Former and a region-level contrastive objective with caption-swapped negatives, learning region-level alignment without any manual spatial annotation. This fine-grained supervision enables strong slide-level classification and retrieval performance, and gives rise to an emergent visual grounding capability.
+## Model Details
+- **Base model**: *Qwen3-0.6B*
+- **Training data**: [FGC-4K Dataset](https://huggingface.co/datasets/jshhhh/PathFLIP/)
+- **Task**: classification, image-text retrieval, visual grounding, vqa
+- **Languages**: English
+## License
+This model is released under CC BY-NC 4.0 — free for academic and research use, **not for commercial use or clinical deployment**.