peterhan91 commited on
Commit
0cd3b3f
·
verified ·
1 Parent(s): 3279277

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -21,6 +21,64 @@ pipeline_tag: zero-shot-image-classification
21
 
22
  This repository hosts the pretrained vision-language backbone for CLEAR (*Concept-Level Embeddings for Auditable Radiology*). The checkpoint contains a contrastive image–text encoder that maps chest X-rays and radiological text into a shared 768-dimensional embedding space. Given a chest X-ray, the image encoder produces a feature vector whose cosine similarity with each of 368,294 text-encoded radiological observations yields a concept score vector. This concept score vector is then projected through LLM-derived semantic embeddings to produce the final CLEAR image embedding used for zero-shot classification, supervised linear probing, and concept bottleneck models. The concept bank, LLM embeddings, and downstream inference code are available in the [GitHub repository](https://github.com/peterhan91/CLEAR).
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ## Checkpoint Details
25
 
26
  | Attribute | Value |
 
21
 
22
  This repository hosts the pretrained vision-language backbone for CLEAR (*Concept-Level Embeddings for Auditable Radiology*). The checkpoint contains a contrastive image–text encoder that maps chest X-rays and radiological text into a shared 768-dimensional embedding space. Given a chest X-ray, the image encoder produces a feature vector whose cosine similarity with each of 368,294 text-encoded radiological observations yields a concept score vector. This concept score vector is then projected through LLM-derived semantic embeddings to produce the final CLEAR image embedding used for zero-shot classification, supervised linear probing, and concept bottleneck models. The concept bank, LLM embeddings, and downstream inference code are available in the [GitHub repository](https://github.com/peterhan91/CLEAR).
23
 
24
+ ## Files in This Repository
25
+
26
+ | File | Description |
27
+ |------|-------------|
28
+ | `best_model.pt` | CLEAR vision-language backbone checkpoint (DINOv2 ViT-B/14 + text encoder) |
29
+ | `mimic_concepts.csv` | Full concept vocabulary (368,294 radiological observations extracted from reports) |
30
+ | `concept_embeddings_368294.pt` | Precomputed SFR-Embedding-Mistral embeddings for all 368,294 concepts (4096-dim) |
31
+
32
+ ## Quick Start
33
+
34
+ ```python
35
+ import torch
36
+ from PIL import Image
37
+ from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize, InterpolationMode
38
+ from huggingface_hub import hf_hub_download
39
+
40
+ # Download the checkpoint
41
+ ckpt_path = hf_hub_download(repo_id="peterhan91/CLEAR", filename="best_model.pt")
42
+
43
+ # Clone the CLEAR repo for model code: git clone https://github.com/peterhan91/CLEAR
44
+ from clear.model import CLIP
45
+ from clear import tokenize
46
+ from examples.train import load_clip
47
+
48
+ # Load the CLEAR checkpoint (DINOv2 ViT-B/14 + text encoder)
49
+ model = load_clip(
50
+ model_path=ckpt_path,
51
+ use_dinov2=True,
52
+ dinov2_model_name="dinov2_vitb14",
53
+ )
54
+ model.eval()
55
+
56
+ # Preprocess a chest X-ray
57
+ preprocess = Compose([
58
+ Resize(448, interpolation=InterpolationMode.BICUBIC),
59
+ CenterCrop(448),
60
+ ToTensor(),
61
+ Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
62
+ ])
63
+ device = next(model.parameters()).device
64
+ image = preprocess(Image.open("path/to/cxr.jpg").convert("RGB")).unsqueeze(0).to(device)
65
+
66
+ # Encode image and text
67
+ with torch.no_grad():
68
+ image_features = model.encode_image(image)
69
+ text_tokens = tokenize(["pleural effusion", "no pleural effusion"]).to(device)
70
+ text_features = model.encode_text(text_tokens)
71
+
72
+ # Cosine similarity → softmax probability
73
+ image_features = image_features / image_features.norm(dim=-1, keepdim=True)
74
+ text_features = text_features / text_features.norm(dim=-1, keepdim=True)
75
+ logits = (image_features @ text_features.T).softmax(dim=-1)
76
+
77
+ print(logits) # [[prob_positive, prob_negative]]
78
+ ```
79
+
80
+ The full CLEAR pipeline projects these concept similarity scores through LLM-derived semantic embeddings (SFR-Embedding-Mistral) for auditable zero-shot classification. See the [GitHub repository](https://github.com/peterhan91/CLEAR) for benchmarking, concept bottleneck models, and model auditing scripts.
81
+
82
  ## Checkpoint Details
83
 
84
  | Attribute | Value |