--- license: mit language: - en base_model: - facebook/dinov2-small - emilyalsentzer/Bio_ClinicalBERT pipeline_tag: zero-shot-image-classification tags: - medical datasets: - simwit/mimic-cxr - danjacobellis/chexpert - rajpurkarlab/ReXGradient-160K - BahaaEldin0/NIH-Chest-Xray-14 - SampadKar/vindr-cxr metrics: - accuracy - bleu --- # CheXficient [Paper](https://arxiv.org/abs/2602.22843) | [GitHub](https://github.com/cwangrun/CheXficient) CheXficient is a vision-language foundation model for chest X-ray (CXR) interpretation, designed to improve both **data efficiency** and **computational efficiency** during pretraining. Instead of scaling indiscriminately to ever-larger datasets, CheXficient adopts a principled data curation strategy to selectively prioritize informative training samples. This approach demonstrates that active, structured data selection can serve as a cost-effective alternative to brute-force dataset enlargement. The model follows a dual-encoder architecture and supports prompt-based zero-shot classification via joint image-text representation learning. ------------------------------------------------------------------------ ## Model Overview - **Architecture:** Vision-language dual encoder - **Image Backbone:** DINOv2 (base) - **Text Backbone:** BioClinicalBERT - **Input:** Chest X-ray image + text prompts - **Output:** Image-text similarity logits and embeddings - **Framework:** PyTorch + Hugging Face Transformers - **Intended Use:** Research in medical AI and multimodal learning ------------------------------------------------------------------------ ## Installation ``` bash pip install torch torchvision transformers pillow ``` ------------------------------------------------------------------------ ## Load the Model ``` python import torch from PIL import Image from transformers import AutoModel, AutoTokenizer, AutoImageProcessor repo_id = "StanfordAIMI/CheXficient" device = "cuda" if torch.cuda.is_available() else "cpu" model = AutoModel.from_pretrained( repo_id, trust_remote_code=True ).to(device) tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) image_processor = AutoImageProcessor.from_pretrained(repo_id, trust_remote_code=True) model.eval() ``` ------------------------------------------------------------------------ ## Zero-Shot Classification Example ``` python image = Image.open("./CXR/images/5AF3BB6C1BCC83C.png").convert("RGB") text = ["Pneumonia", "no Pneumonia"] image_inputs = image_processor(images=image, return_tensors="pt").to(device) text_inputs = tokenizer(text, padding=True, return_tensors="pt").to(device) with torch.no_grad(): outputs = model( pixel_values=image_inputs["pixel_values"], text_tokens=text_inputs, ) print(outputs) ``` Optional probability conversion: ``` python import torch.nn.functional as F logits = outputs["logits_per_image"] probs = F.softmax(logits, dim=-1) print(probs) ``` ------------------------------------------------------------------------ ## Citation ``` bibtex @article{chexficient2024, title={A data- and compute-efficient chest X-ray foundation model beyond aggressive scaling}, author={...}, journal={...}, year={2026} } ```