Patho-CLIP-L / README.md
WenchuanZhang's picture
Update README.md
8c9d184 verified
metadata
license: cc-by-nc-nd-4.0
language:
  - en
tags:
  - Pathology
  - arxiv:2505.11404
extra_gated_prompt: >-
  The Patho-CLIP-L model and its associated materials are released under the
  CC-BY-NC-ND 4.0 license.  Access is restricted to non-commercial, academic
  research purposes only, with proper citation required.  Any commercial usage,
  redistribution, or derivative work (including training models based on this
  model or generating datasets from its outputs)  is strictly prohibited without
  prior written approval. 

  Users must register with an official institutional email address (generic
  domains such as @gmail, @qq, @hotmail, etc. will not be accepted).  By
  requesting access, you confirm that your information is accurate and current,
  and that you agree to comply with all terms listed herein.  If other members
  of your organization wish to use the model, they must register independently
  and agree to the same terms.
extra_gated_fields:
  Full name (first and last): text
  Institutional affiliation (no abbreviations): text
  Role/Position:
    type: select
    options:
      - Faculty/Principal Investigator
      - PhD Student
      - Postdoctoral Researcher
      - Research Staff
      - Other
  Official institutional email (**must match your Hugging Face primary email; generic domains will be denied**): text
  Intended research use (be specific): text
  I agree to use this model only for non-commercial academic purposes: checkbox
  I agree not to redistribute this model or share it outside of my individual usage: checkbox
  I confirm that all submitted information is accurate and up to date: checkbox

[Arxiv] | [Github Repo] | [Cite]

Introduction📝

To bridge the gap between fine-grained tissue morphology and clinical semantic understanding in pathology, we present Patho-CLIP-L, a vision-language model tailored for high-resolution cross-modal representation learning in pathological diagnosis.

Patho-CLIP-L is built on the OpenAI-CLIP-L architecture and trained through a two-stage progressive paradigm:

Stage I: Contrastive pretraining on PathGen-1.6M, focusing on cell morphology and tissue organization to embed high-resolution visual priors

Stage II: Joint training on a 3.5M composite corpus comprising PathGen-1.6M, Quilt-1M, PathCap, and a textbook-derived dataset, to integrate domain-specific semantics with morphological features

This strategy enables Patho-CLIP-L to achieve strong performance in semantic alignment, cross-modal retrieval, and tissue-level discrimination, offering a robust foundation for downstream pathology tasks.

Acknowledgements🎖

We gratefully acknowledge the OpenCLIP project for providing an efficient and extensible implementation of CLIP models. Its flexible training pipeline, model support, and strong community contributions significantly facilitated the development and training of our Patho-CLIP-L model.

We thank the authors and maintainers for their excellent work.

Citation❤️

If you find our work helpful, a citation would be greatly appreciated:

@article{zhang2025patho,
  title={Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner},
  author={Zhang, Wenchuan and Zhang, Penghao and Guo, Jingru and Cheng, Tao and Chen, Jie and Zhang, Shuwan and Zhang, Zhang and Yi, Yuhao and Bu, Hong},
  journal={arXiv preprint arXiv:2505.11404},
  year={2025}
}