peterhan91 commited on
Commit
d48cf41
Β·
verified Β·
1 Parent(s): b352ee7

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +85 -3
  2. best_model.pt +3 -0
README.md CHANGED
@@ -1,3 +1,85 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - medical-imaging
7
+ - chest-x-ray
8
+ - radiology
9
+ - clip
10
+ - dinov2
11
+ - vision-language-model
12
+ - contrastive-learning
13
+ datasets:
14
+ - mimic-cxr
15
+ - chexpert-plus
16
+ - rexgradient
17
+ pipeline_tag: zero-shot-image-classification
18
+ ---
19
+
20
+ # CLEAR β€” Vision-Language Backbone
21
+
22
+ This repository hosts the pretrained **vision-language backbone** for CLEAR (*Concept-Level Embeddings for Auditable Radiology*). The checkpoint contains the contrastive image–text encoder that maps chest X-rays and radiological text into a shared embedding space. It is used to compute concept similarity scores over 368,294 radiological observations, which form the input to CLEAR's downstream auditable inference pipeline.
23
+
24
+ For the full CLEAR pipeline (concept bank, LLM embeddings, zero-shot inference, benchmarking, and concept bottleneck experiments), see the **[GitHub repository](https://github.com/peterhan91/CLEAR)**.
25
+
26
+ ## Checkpoint Details
27
+
28
+ | Attribute | Value |
29
+ |-----------|-------|
30
+ | **Image Encoder** | DINOv2 ViT-B/14 with registers |
31
+ | **Text Encoder** | 12-layer Transformer (GPT-2 style, 512-dim, 8 heads) |
32
+ | **Embedding Dimension** | 768 |
33
+ | **Input Resolution** | 448 x 448 pixels |
34
+ | **File** | `best_model.pt` |
35
+
36
+ ## Training Data
37
+
38
+ The model was trained on **873,342 CXR–report pairs** (239,091 patients) using symmetric contrastive learning (InfoNCE):
39
+
40
+ - MIMIC-CXR (377,110 pairs)
41
+ - CheXpert-Plus (223,228 pairs)
42
+ - ReXGradient (273,004 pairs)
43
+
44
+ Training details: 3 x NVIDIA A6000, effective batch size 768, AdamW (lr=1e-4, weight decay 0.2), cosine annealing with 500 warmup steps, 40 epochs.
45
+
46
+ ## How This Checkpoint Fits into CLEAR
47
+
48
+ ```
49
+ CXR Image
50
+ β”‚
51
+ β–Ό
52
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
53
+ β”‚ Image Encoder (DINOv2) β”‚ ◄── this checkpoint
54
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
55
+ β”‚ image features
56
+ β–Ό
57
+ cosine similarity with
58
+ 368,294 concept text embeddings ◄── this checkpoint (text encoder)
59
+ β”‚
60
+ β–Ό
61
+ concept score vector s ∈ R^368,294
62
+ β”‚
63
+ β–Ό
64
+ projection via LLM embeddings ◄── concept bank (GitHub repo assets/)
65
+ β”‚
66
+ β–Ό
67
+ CLEAR image embedding
68
+ β”‚
69
+ β–Ό
70
+ zero-shot / linear probe / CBM ◄── downstream tasks (GitHub repo)
71
+ ```
72
+
73
+ ## Citation
74
+
75
+ ```bibtex
76
+ @article{han2025clear,
77
+ title={CLEAR: An Auditable Foundation Model for Radiology Grounded in Clinical Concepts},
78
+ author={Han, Tianyu and Wu, Riga and Tian, Yu and Khader, Firas and Adams, Lisa C. and Bressem, Keno K. and Davatzikos, Christos and Kather, Jakob Nikolas and Shen, Li and Mankoff, David A. and Barbosa Jr, Eduardo Mortani and Truhn, Daniel},
79
+ year={2025}
80
+ }
81
+ ```
82
+
83
+ ## License
84
+
85
+ Apache-2.0
best_model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da1617a7d47970667a7a158eabfa4c4f3798d5d03b7f98caec150d8ee6ab6a03
3
+ size 603060057