Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -21,6 +21,64 @@ pipeline_tag: zero-shot-image-classification
|
|
| 21 |
|
| 22 |
This repository hosts the pretrained vision-language backbone for CLEAR (*Concept-Level Embeddings for Auditable Radiology*). The checkpoint contains a contrastive image–text encoder that maps chest X-rays and radiological text into a shared 768-dimensional embedding space. Given a chest X-ray, the image encoder produces a feature vector whose cosine similarity with each of 368,294 text-encoded radiological observations yields a concept score vector. This concept score vector is then projected through LLM-derived semantic embeddings to produce the final CLEAR image embedding used for zero-shot classification, supervised linear probing, and concept bottleneck models. The concept bank, LLM embeddings, and downstream inference code are available in the [GitHub repository](https://github.com/peterhan91/CLEAR).
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
## Checkpoint Details
|
| 25 |
|
| 26 |
| Attribute | Value |
|
|
|
|
| 21 |
|
| 22 |
This repository hosts the pretrained vision-language backbone for CLEAR (*Concept-Level Embeddings for Auditable Radiology*). The checkpoint contains a contrastive image–text encoder that maps chest X-rays and radiological text into a shared 768-dimensional embedding space. Given a chest X-ray, the image encoder produces a feature vector whose cosine similarity with each of 368,294 text-encoded radiological observations yields a concept score vector. This concept score vector is then projected through LLM-derived semantic embeddings to produce the final CLEAR image embedding used for zero-shot classification, supervised linear probing, and concept bottleneck models. The concept bank, LLM embeddings, and downstream inference code are available in the [GitHub repository](https://github.com/peterhan91/CLEAR).
|
| 23 |
|
| 24 |
+
## Files in This Repository
|
| 25 |
+
|
| 26 |
+
| File | Description |
|
| 27 |
+
|------|-------------|
|
| 28 |
+
| `best_model.pt` | CLEAR vision-language backbone checkpoint (DINOv2 ViT-B/14 + text encoder) |
|
| 29 |
+
| `mimic_concepts.csv` | Full concept vocabulary (368,294 radiological observations extracted from reports) |
|
| 30 |
+
| `concept_embeddings_368294.pt` | Precomputed SFR-Embedding-Mistral embeddings for all 368,294 concepts (4096-dim) |
|
| 31 |
+
|
| 32 |
+
## Quick Start
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
import torch
|
| 36 |
+
from PIL import Image
|
| 37 |
+
from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize, InterpolationMode
|
| 38 |
+
from huggingface_hub import hf_hub_download
|
| 39 |
+
|
| 40 |
+
# Download the checkpoint
|
| 41 |
+
ckpt_path = hf_hub_download(repo_id="peterhan91/CLEAR", filename="best_model.pt")
|
| 42 |
+
|
| 43 |
+
# Clone the CLEAR repo for model code: git clone https://github.com/peterhan91/CLEAR
|
| 44 |
+
from clear.model import CLIP
|
| 45 |
+
from clear import tokenize
|
| 46 |
+
from examples.train import load_clip
|
| 47 |
+
|
| 48 |
+
# Load the CLEAR checkpoint (DINOv2 ViT-B/14 + text encoder)
|
| 49 |
+
model = load_clip(
|
| 50 |
+
model_path=ckpt_path,
|
| 51 |
+
use_dinov2=True,
|
| 52 |
+
dinov2_model_name="dinov2_vitb14",
|
| 53 |
+
)
|
| 54 |
+
model.eval()
|
| 55 |
+
|
| 56 |
+
# Preprocess a chest X-ray
|
| 57 |
+
preprocess = Compose([
|
| 58 |
+
Resize(448, interpolation=InterpolationMode.BICUBIC),
|
| 59 |
+
CenterCrop(448),
|
| 60 |
+
ToTensor(),
|
| 61 |
+
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
|
| 62 |
+
])
|
| 63 |
+
device = next(model.parameters()).device
|
| 64 |
+
image = preprocess(Image.open("path/to/cxr.jpg").convert("RGB")).unsqueeze(0).to(device)
|
| 65 |
+
|
| 66 |
+
# Encode image and text
|
| 67 |
+
with torch.no_grad():
|
| 68 |
+
image_features = model.encode_image(image)
|
| 69 |
+
text_tokens = tokenize(["pleural effusion", "no pleural effusion"]).to(device)
|
| 70 |
+
text_features = model.encode_text(text_tokens)
|
| 71 |
+
|
| 72 |
+
# Cosine similarity → softmax probability
|
| 73 |
+
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
|
| 74 |
+
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
|
| 75 |
+
logits = (image_features @ text_features.T).softmax(dim=-1)
|
| 76 |
+
|
| 77 |
+
print(logits) # [[prob_positive, prob_negative]]
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
The full CLEAR pipeline projects these concept similarity scores through LLM-derived semantic embeddings (SFR-Embedding-Mistral) for auditable zero-shot classification. See the [GitHub repository](https://github.com/peterhan91/CLEAR) for benchmarking, concept bottleneck models, and model auditing scripts.
|
| 81 |
+
|
| 82 |
## Checkpoint Details
|
| 83 |
|
| 84 |
| Attribute | Value |
|