ayanahmedkhan commited on
Commit
1e6b95a
·
verified ·
1 Parent(s): 587d6e5

Upload readme.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. readme.md +90 -0
readme.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: "ViT Base Patch16 384 – GI Endoscopy Classifier"
3
+ tags:
4
+ - vision-transformer
5
+ - vit
6
+ - image-classification
7
+ - medical-imaging
8
+ - gastrointestinal
9
+ - pytorch
10
+ - timm
11
+ library_name: timm
12
+ license: other
13
+ language: en
14
+ pipeline_tag: image-classification
15
+ ---
16
+
17
+ ## ViT Base (384px) for GI Endoscopy
18
+ 23-class gastrointestinal endoscopy classifier built on ViT Base Patch16 384. Trained with timm, strong augmentation (MixUp + Albumentations), and evaluated with TTA. Includes traced TorchScript weights `vit_best_traced.pt` for drop-in PyTorch inference.
19
+
20
+ ### Highlights
21
+ - Val: **92.18%** (epoch 21 best); Test (TTA): **93.25% Acc / 92.19% Prec / 93.25% Rec / 92.59% F1**
22
+ - Robust pipeline: MixUp, CoarseDropout, and TTA for generalization
23
+ - Ready to serve: TorchScript traced weights; standard 384x384 ImageNet normalization
24
+ - Scope: 23 GI classes; input RGB 384x384
25
+
26
+ ### Performance
27
+ | Split | Accuracy | Precision | Recall | F1 | Notes |
28
+ | --- | --- | --- | --- | --- | --- |
29
+ | Validation (best) | 92.18% | – | – | – | Best at epoch 21 |
30
+ | Test (TTA) | 93.25% | 92.19% | 93.25% | 92.59% | 800 batches, TTA enabled |
31
+
32
+ ### Model Card
33
+ - Backbone: `vit_base_patch16_384` (≈86.1M params)
34
+ - Classes: 23
35
+ - Input: 384x384 RGB, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
36
+ - Training recipe:
37
+ - Epochs: 25; batch size: 2
38
+ - LR: 1e-5 base with warmup schedule
39
+ - Augment: MixUp; Albumentations with CoarseDropout
40
+ - Eval: TTA on validation/test; best checkpoint saved each epoch (best at epoch 21)
41
+ - Data scale (from logs): 7,463 train, 1,599 val, 1,600 test images across 23 classes
42
+
43
+ ### Inference (PyTorch TorchScript)
44
+ ```python
45
+ import torch
46
+ from PIL import Image
47
+ from torchvision import transforms
48
+
49
+ model = torch.jit.load("vit_best_traced.pt")
50
+ model.eval()
51
+
52
+ preprocess = transforms.Compose([
53
+ transforms.Resize((384, 384)),
54
+ transforms.ToTensor(),
55
+ transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
56
+ ])
57
+
58
+ img = Image.open("path/to/image.jpg").convert("RGB")
59
+ tensor = preprocess(img).unsqueeze(0)
60
+ with torch.no_grad():
61
+ probs = model(tensor).softmax(dim=1)
62
+ pred = probs.argmax(dim=1).item()
63
+
64
+ print("Predicted class index:", pred)
65
+ ```
66
+
67
+ ### Training Log (key checkpoints)
68
+ - Epoch 8: Val Acc 90.56%
69
+ - Epoch 12: Val Acc 91.62%
70
+ - Epoch 16: Val Acc 91.74%
71
+ - Epoch 18: Val Acc 92.06%
72
+ - Epoch 20: Val Acc 92.12%
73
+ - Epoch 21: **Val Acc 92.18%** (best)
74
+ - Test (TTA): Acc 93.25%, Precision 92.19%, Recall 93.25%, F1 92.59%
75
+
76
+ ### Files
77
+ - `vit_best_traced.pt` — TorchScript traced weights for ViT Base Patch16 384 (best checkpoint)
78
+
79
+ ### Notes & Responsible Use
80
+ - Trainer: custom `AdvancedMemoryEfficientTrainer` with MixUp and Albumentations TTA.
81
+ - Warnings during training: Albumentations recommends `Affine` over `ShiftScaleRotate`; CoarseDropout args adjusted. Metrics unaffected.
82
+ - Medical context: research-only; not a regulated medical device. Validate clinically and keep a human in the loop.
83
+
84
+ ### Reproduce / Fine-tune Tips
85
+ - Keep preprocessing (384 resize + ImageNet normalization) consistent to match reported metrics.
86
+ - Start from `vit_base_patch16_384` in timm; use MixUp and similar augmentations.
87
+ - Enable TTA for best accuracy; disable for faster inference.
88
+
89
+ ### Changelog
90
+ - 2025-12-29: Initial public release with traced weights and metrics.