StanfordAIMI
/

CheXficient

Zero-Shot Image Classification

chexficient_clip

Model card Files Files and versions

cwangrun commited on Feb 27

Commit

83af716

·

verified ·

1 Parent(s): 0d50af0

Update README.md

Files changed (1) hide show

README.md +26 -16

README.md CHANGED Viewed

@@ -8,23 +8,39 @@ base_model:
 pipeline_tag: zero-shot-image-classification
 tags:
 - medical
 ---
 # CheXficient
-CheXficient is a vision-language foundation model for chest X-ray (CXR) interpretation, developed to enhance both data- and computation-efficiency.
-It enables joint image-text representation learning and supports prompt-based zero-shot classification.
-This repository provides a Hugging Face-compatible implementation for seamless integration into research workflows.
 ------------------------------------------------------------------------
 ## Model Overview
--   Architecture: Vision-Language dual encoder
--   Input: Chest X-ray image + text prompts
--   Output: Image-text similarity logits and embeddings
--   Framework: PyTorch + Hugging Face Transformers
--   Intended Use: Research in medical AI and multimodal learning
 ------------------------------------------------------------------------
@@ -88,21 +104,15 @@ print(probs)
 ```
-## Intended Use
--   Zero-shot CXR findings classification
--   Prompt-based disease detection
 ------------------------------------------------------------------------
 ## Citation
 ``` bibtex
 @article{chexficient2024,
-  title={CheXficient: Efficient Vision-Language Learning for Chest X-ray Understanding},
   author={...},
   journal={...},
-  year={2024}
 }
 ```

 pipeline_tag: zero-shot-image-classification
 tags:
 - medical
+datasets:
+- simwit/mimic-cxr
+- danjacobellis/chexpert
+- rajpurkarlab/ReXGradient-160K
+- BahaaEldin0/NIH-Chest-Xray-14
+- SampadKar/vindr-cxr
+metrics:
+- accuracy
+- bleu
 ---
 # CheXficient
+[Paper](https://arxiv.org/abs/2602.22843) | [GitHub](https://github.com/cwangrun/CheXficient)
+CheXficient is a vision-language foundation model for chest X-ray (CXR) interpretation, designed to improve both **data efficiency** and **computational efficiency** during pretraining.
+Instead of scaling indiscriminately to ever-larger datasets, CheXficient adopts a principled data curation strategy to selectively prioritize informative training samples.
+This approach demonstrates that active, structured data selection can serve as a cost-effective alternative to brute-force dataset enlargement.
+The model follows a dual-encoder architecture and supports prompt-based zero-shot classification via joint image-text representation learning.
 ------------------------------------------------------------------------
 ## Model Overview
+- **Architecture:** Vision-language dual encoder
+- **Image Backbone:** DINOv2 (base)
+- **Text Backbone:** BioClinicalBERT
+- **Input:** Chest X-ray image + text prompts
+- **Output:** Image-text similarity logits and embeddings
+- **Framework:** PyTorch + Hugging Face Transformers
+- **Intended Use:** Research in medical AI and multimodal learning
 ------------------------------------------------------------------------
 ```
 ------------------------------------------------------------------------
 ## Citation
 ``` bibtex
 @article{chexficient2024,
+  title={A data- and compute-efficient chest X-ray foundation model beyond aggressive scaling},
   author={...},
   journal={...},
+  year={2026}
 }
 ```