StanfordAIMI
/

CheXficient

Zero-Shot Image Classification

chexficient_clip

Model card Files Files and versions

CheXficient / README.md

cwangrun's picture

Update README.md

83af716 verified 1 day ago

|

history blame contribute delete

3.25 kB

	---
	license: mit
	language:
	- en
	base_model:
	- facebook/dinov2-small
	- emilyalsentzer/Bio_ClinicalBERT
	pipeline_tag: zero-shot-image-classification
	tags:
	- medical
	datasets:
	- simwit/mimic-cxr
	- danjacobellis/chexpert
	- rajpurkarlab/ReXGradient-160K
	- BahaaEldin0/NIH-Chest-Xray-14
	- SampadKar/vindr-cxr
	metrics:
	- accuracy
	- bleu
	---
	# CheXficient

	[Paper](https://arxiv.org/abs/2602.22843) \| [GitHub](https://github.com/cwangrun/CheXficient)

	CheXficient is a vision-language foundation model for chest X-ray (CXR) interpretation, designed to improve both data efficiency and computational efficiency during pretraining.

	Instead of scaling indiscriminately to ever-larger datasets, CheXficient adopts a principled data curation strategy to selectively prioritize informative training samples.
	This approach demonstrates that active, structured data selection can serve as a cost-effective alternative to brute-force dataset enlargement.

	The model follows a dual-encoder architecture and supports prompt-based zero-shot classification via joint image-text representation learning.


	------------------------------------------------------------------------

	## Model Overview

	- Architecture: Vision-language dual encoder
	- Image Backbone: DINOv2 (base)
	- Text Backbone: BioClinicalBERT
	- Input: Chest X-ray image + text prompts
	- Output: Image-text similarity logits and embeddings
	- Framework: PyTorch + Hugging Face Transformers
	- Intended Use: Research in medical AI and multimodal learning

	------------------------------------------------------------------------

	## Installation

	``` bash
	pip install torch torchvision transformers pillow
	```

	------------------------------------------------------------------------

	## Load the Model

	``` python
	import torch
	from PIL import Image
	from transformers import AutoModel, AutoTokenizer, AutoImageProcessor

	repo_id = "StanfordAIMI/CheXficient"
	device = "cuda" if torch.cuda.is_available() else "cpu"

	model = AutoModel.from_pretrained(
	repo_id,
	trust_remote_code=True
	).to(device)

	tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
	image_processor = AutoImageProcessor.from_pretrained(repo_id, trust_remote_code=True)

	model.eval()
	```

	------------------------------------------------------------------------

	## Zero-Shot Classification Example

	``` python
	image = Image.open("./CXR/images/5AF3BB6C1BCC83C.png").convert("RGB")
	text = ["Pneumonia", "no Pneumonia"]

	image_inputs = image_processor(images=image, return_tensors="pt").to(device)
	text_inputs = tokenizer(text, padding=True, return_tensors="pt").to(device)

	with torch.no_grad():
	outputs = model(
	pixel_values=image_inputs["pixel_values"],
	text_tokens=text_inputs,
	)

	print(outputs)
	```

	Optional probability conversion:

	``` python
	import torch.nn.functional as F

	logits = outputs["logits_per_image"]
	probs = F.softmax(logits, dim=-1)
	print(probs)
	```


	------------------------------------------------------------------------

	## Citation

	``` bibtex
	@article{chexficient2024,
	title={A data- and compute-efficient chest X-ray foundation model beyond aggressive scaling},
	author={...},
	journal={...},
	year={2026}
	}
	```