lukeingawesome
/

TILA

Image Feature Extraction

medical-imaging

temporal-analysis

interval-change

vision-language

Model card Files Files and versions

lukeingawesome commited on 13 days ago

Commit

82069f1

·

verified ·

1 Parent(s): 21c8bfc

Upload folder using huggingface_hub

Files changed (2) hide show

README.md +0 -34
model.py +1 -5

README.md CHANGED Viewed

@@ -120,40 +120,6 @@ python inference.py \
     --previous_image /path/to/previous.png
 ```
-## Model Architecture
-```
-IMAGE ENCODER:
-  Input: current CXR [B, 3, 448, 448] + previous CXR [B, 3, 448, 448]
-    |
-    +-- ResNet-50 backbone (shared weights, processes both images)
-    |     -> patch features [B, 2048, 14, 14]
-    |
-    +-- 1x1 Conv projection (2048 -> 256)
-    |
-    +-- Vision Transformer Pooler (3 blocks, 8 heads)
-    |     -> temporal difference features [B, 256, 14, 14]
-    |
-    +-- Concatenate [static, temporal] -> [B, 512, 14, 14]
-    |
-    +-- MLP Projector (512 -> 128)
-          -> image embedding [B, 128]            <-- get_embeddings()
-TEXT ENCODER:
-  Input: tokenized text
-    |
-    +-- CXR-BERT (12 layers, 768-dim)
-    |     -> CLS token [B, 768]
-    |
-    +-- LayerNorm + Linear (768 -> 128)
-          -> text embedding [B, 128]             <-- encode_text()
-CLASSIFIER:
-  image embedding [B, 128]
-    |
-    +-- Linear (128 -> 64) -> ReLU -> Linear (64 -> 1)
-          -> change probability [B]              <-- get_interval_change_prediction()
-```
 ## Preprocessing Raw Images

     --previous_image /path/to/previous.png
 ```
 ## Preprocessing Raw Images

model.py CHANGED Viewed

@@ -481,11 +481,7 @@ class TILAModel(_BASE_CLASS):
         tokens = self._tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)
         tokens = {k: v.to(device) for k, v in tokens.items()}
         self.eval()
-        # Run text encoder in float32 for numerical stability
-        with torch.autocast(device_type=device.type if isinstance(device, torch.device) else "cuda", enabled=False):
-            self.text_encoder.float()
-            emb = self.text_encoder(tokens)
-            self.text_encoder.to(next(self.image_encoder.parameters()).dtype)
         return F.normalize(emb.float(), p=2, dim=1)
     @torch.no_grad()

         tokens = self._tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)
         tokens = {k: v.to(device) for k, v in tokens.items()}
         self.eval()
+        emb = self.text_encoder(tokens)
         return F.normalize(emb.float(), p=2, dim=1)
     @torch.no_grad()