Upload readme.md with huggingface_hub
Browse files
readme.md
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: "ViT Base Patch16 384 – GI Endoscopy Classifier"
|
| 3 |
+
tags:
|
| 4 |
+
- vision-transformer
|
| 5 |
+
- vit
|
| 6 |
+
- image-classification
|
| 7 |
+
- medical-imaging
|
| 8 |
+
- gastrointestinal
|
| 9 |
+
- pytorch
|
| 10 |
+
- timm
|
| 11 |
+
library_name: timm
|
| 12 |
+
license: other
|
| 13 |
+
language: en
|
| 14 |
+
pipeline_tag: image-classification
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
## ViT Base (384px) for GI Endoscopy
|
| 18 |
+
23-class gastrointestinal endoscopy classifier built on ViT Base Patch16 384. Trained with timm, strong augmentation (MixUp + Albumentations), and evaluated with TTA. Includes traced TorchScript weights `vit_best_traced.pt` for drop-in PyTorch inference.
|
| 19 |
+
|
| 20 |
+
### Highlights
|
| 21 |
+
- Val: **92.18%** (epoch 21 best); Test (TTA): **93.25% Acc / 92.19% Prec / 93.25% Rec / 92.59% F1**
|
| 22 |
+
- Robust pipeline: MixUp, CoarseDropout, and TTA for generalization
|
| 23 |
+
- Ready to serve: TorchScript traced weights; standard 384x384 ImageNet normalization
|
| 24 |
+
- Scope: 23 GI classes; input RGB 384x384
|
| 25 |
+
|
| 26 |
+
### Performance
|
| 27 |
+
| Split | Accuracy | Precision | Recall | F1 | Notes |
|
| 28 |
+
| --- | --- | --- | --- | --- | --- |
|
| 29 |
+
| Validation (best) | 92.18% | – | – | – | Best at epoch 21 |
|
| 30 |
+
| Test (TTA) | 93.25% | 92.19% | 93.25% | 92.59% | 800 batches, TTA enabled |
|
| 31 |
+
|
| 32 |
+
### Model Card
|
| 33 |
+
- Backbone: `vit_base_patch16_384` (≈86.1M params)
|
| 34 |
+
- Classes: 23
|
| 35 |
+
- Input: 384x384 RGB, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
|
| 36 |
+
- Training recipe:
|
| 37 |
+
- Epochs: 25; batch size: 2
|
| 38 |
+
- LR: 1e-5 base with warmup schedule
|
| 39 |
+
- Augment: MixUp; Albumentations with CoarseDropout
|
| 40 |
+
- Eval: TTA on validation/test; best checkpoint saved each epoch (best at epoch 21)
|
| 41 |
+
- Data scale (from logs): 7,463 train, 1,599 val, 1,600 test images across 23 classes
|
| 42 |
+
|
| 43 |
+
### Inference (PyTorch TorchScript)
|
| 44 |
+
```python
|
| 45 |
+
import torch
|
| 46 |
+
from PIL import Image
|
| 47 |
+
from torchvision import transforms
|
| 48 |
+
|
| 49 |
+
model = torch.jit.load("vit_best_traced.pt")
|
| 50 |
+
model.eval()
|
| 51 |
+
|
| 52 |
+
preprocess = transforms.Compose([
|
| 53 |
+
transforms.Resize((384, 384)),
|
| 54 |
+
transforms.ToTensor(),
|
| 55 |
+
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
|
| 56 |
+
])
|
| 57 |
+
|
| 58 |
+
img = Image.open("path/to/image.jpg").convert("RGB")
|
| 59 |
+
tensor = preprocess(img).unsqueeze(0)
|
| 60 |
+
with torch.no_grad():
|
| 61 |
+
probs = model(tensor).softmax(dim=1)
|
| 62 |
+
pred = probs.argmax(dim=1).item()
|
| 63 |
+
|
| 64 |
+
print("Predicted class index:", pred)
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
### Training Log (key checkpoints)
|
| 68 |
+
- Epoch 8: Val Acc 90.56%
|
| 69 |
+
- Epoch 12: Val Acc 91.62%
|
| 70 |
+
- Epoch 16: Val Acc 91.74%
|
| 71 |
+
- Epoch 18: Val Acc 92.06%
|
| 72 |
+
- Epoch 20: Val Acc 92.12%
|
| 73 |
+
- Epoch 21: **Val Acc 92.18%** (best)
|
| 74 |
+
- Test (TTA): Acc 93.25%, Precision 92.19%, Recall 93.25%, F1 92.59%
|
| 75 |
+
|
| 76 |
+
### Files
|
| 77 |
+
- `vit_best_traced.pt` — TorchScript traced weights for ViT Base Patch16 384 (best checkpoint)
|
| 78 |
+
|
| 79 |
+
### Notes & Responsible Use
|
| 80 |
+
- Trainer: custom `AdvancedMemoryEfficientTrainer` with MixUp and Albumentations TTA.
|
| 81 |
+
- Warnings during training: Albumentations recommends `Affine` over `ShiftScaleRotate`; CoarseDropout args adjusted. Metrics unaffected.
|
| 82 |
+
- Medical context: research-only; not a regulated medical device. Validate clinically and keep a human in the loop.
|
| 83 |
+
|
| 84 |
+
### Reproduce / Fine-tune Tips
|
| 85 |
+
- Keep preprocessing (384 resize + ImageNet normalization) consistent to match reported metrics.
|
| 86 |
+
- Start from `vit_base_patch16_384` in timm; use MixUp and similar augmentations.
|
| 87 |
+
- Enable TTA for best accuracy; disable for faster inference.
|
| 88 |
+
|
| 89 |
+
### Changelog
|
| 90 |
+
- 2025-12-29: Initial public release with traced weights and metrics.
|