peterhan91
/

CLEAR

+---
+license: apache-2.0
+language:
+  - en
+tags:
+  - medical-imaging
+  - chest-x-ray
+  - radiology
+  - clip
+  - dinov2
+  - vision-language-model
+  - contrastive-learning
+datasets:
+  - mimic-cxr
+  - chexpert-plus
+  - rexgradient
+pipeline_tag: zero-shot-image-classification
+---
+# CLEAR — Vision-Language Backbone
+This repository hosts the pretrained **vision-language backbone** for CLEAR (*Concept-Level Embeddings for Auditable Radiology*). The checkpoint contains the contrastive image–text encoder that maps chest X-rays and radiological text into a shared embedding space. It is used to compute concept similarity scores over 368,294 radiological observations, which form the input to CLEAR's downstream auditable inference pipeline.
+For the full CLEAR pipeline (concept bank, LLM embeddings, zero-shot inference, benchmarking, and concept bottleneck experiments), see the **[GitHub repository](https://github.com/peterhan91/CLEAR)**.
+## Checkpoint Details
+| Attribute | Value |
+|-----------|-------|
+| **Image Encoder** | DINOv2 ViT-B/14 with registers |
+| **Text Encoder** | 12-layer Transformer (GPT-2 style, 512-dim, 8 heads) |
+| **Embedding Dimension** | 768 |
+| **Input Resolution** | 448 x 448 pixels |
+| **File** | `best_model.pt` |
+## Training Data
+The model was trained on **873,342 CXR–report pairs** (239,091 patients) using symmetric contrastive learning (InfoNCE):
+- MIMIC-CXR (377,110 pairs)
+- CheXpert-Plus (223,228 pairs)
+- ReXGradient (273,004 pairs)
+Training details: 3 x NVIDIA A6000, effective batch size 768, AdamW (lr=1e-4, weight decay 0.2), cosine annealing with 500 warmup steps, 40 epochs.
+## How This Checkpoint Fits into CLEAR
+```
+CXR Image
+  │
+  ▼
+┌─────────────────────────┐
+│  Image Encoder (DINOv2)  │  ◄── this checkpoint
+└────────────┬────────────┘
+             │ image features
+             ▼
+   cosine similarity with
+   368,294 concept text embeddings  ◄── this checkpoint (text encoder)
+             │
+             ▼
+      concept score vector s ∈ R^368,294
+             │
+             ▼
+   projection via LLM embeddings   ◄── concept bank (GitHub repo assets/)
+             │
+             ▼
+      CLEAR image embedding
+             │
+             ▼
+   zero-shot / linear probe / CBM  ◄── downstream tasks (GitHub repo)
+```
+## Citation
+```bibtex
+@article{han2025clear,
+  title={CLEAR: An Auditable Foundation Model for Radiology Grounded in Clinical Concepts},
+  author={Han, Tianyu and Wu, Riga and Tian, Yu and Khader, Firas and Adams, Lisa C. and Bressem, Keno K. and Davatzikos, Christos and Kather, Jakob Nikolas and Shen, Li and Mankoff, David A. and Barbosa Jr, Eduardo Mortani and Truhn, Daniel},
+  year={2025}
+}
+```
+## License
+Apache-2.0

best_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:da1617a7d47970667a7a158eabfa4c4f3798d5d03b7f98caec150d8ee6ab6a03
+size 603060057