tytyty-ty commited on
Commit
6592167
Β·
verified Β·
1 Parent(s): b366e26

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +241 -3
README.md CHANGED
@@ -1,3 +1,241 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - image-classification
6
+ - medical-imaging
7
+ - chest-xray
8
+ - pneumonia
9
+ - yolo
10
+ - ultralytics
11
+ datasets:
12
+ - keremberke/chest-xray-classification
13
+ metrics:
14
+ - accuracy
15
+ - f1
16
+ - precision
17
+ - recall
18
+ ---
19
+
20
+ # Automated Classification of Pneumonia in Medical Radiography
21
+
22
+ **Model by:** Siri Suwannatee | BDATA 497: Computer Vision Techniques
23
+
24
+ ## Model Description
25
+
26
+ This model is a chest X-ray (CXR) image classifier that distinguishes between three classes: **Normal**, **Bacterial Pneumonia**, and **Viral Pneumonia**. It was developed as an AI-powered screening tool to prioritize high-risk cases for specialist review, helping reduce the time-to-decision in clinical workflows.
27
+
28
+ The model is intended to act as a **triage assistant** β€” flagging high-risk (pneumonia) cases for comprehensive expert review, while routing low-risk (normal) cases to standard review queues. It is **not** intended for standalone clinical diagnosis.
29
+
30
+ **Training approach:** Fine-tuned from YOLOv26n (Ultralytics classification head) on a balanced, 3-class chest X-ray dataset.
31
+
32
+ **Intended use cases:**
33
+ - Hospital or clinic screening pipelines to prioritize radiologist workload
34
+ - Academic/research exploration of CNN-based CXR classification
35
+ - Educational demonstrations of automated medical image triage
36
+
37
+ ---
38
+
39
+ ## Training Data
40
+
41
+ ### Dataset Source
42
+
43
+ **Chest X-Ray Images (Pneumonia)** β€” published on Mendeley Data:
44
+ > Kermany, D., Zhang, K., & Goldbaum, M. (2018). *Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification.* Mendeley Data, V2. https://doi.org/10.17632/rscbjbr9sj.2
45
+
46
+ ### Number of Images and Classes
47
+
48
+ The original dataset contains **5,856 chest X-ray images** in two classes: Normal and Pneumonia.
49
+
50
+ ### Annotation Process (Value Added)
51
+
52
+ This project used **Annotation Option B: Pre-annotated single source with modifications.** The original binary Pneumonia label was refined into two sub-categories β€” **Bacterial** and **Viral** β€” using publisher-provided file metadata. Image filenames in the source dataset encode the pneumonia type (e.g., `person112_bacteria_539.jpeg`, `person1613_virus_2799.jpeg`), which allowed programmatic re-labeling without manual annotation.
53
+
54
+ This refinement adds clinically meaningful granularity: bacterial and viral pneumonias have different treatment pathways, so distinguishing them is a valuable model capability.
55
+
56
+ ### Class Distribution
57
+
58
+ | Class | Count | % of Total |
59
+ |-------|-------|------------|
60
+ | Normal | 1,583 | 27.0% |
61
+ | Bacterial Pneumonia | 2,780 | 47.5% |
62
+ | Viral Pneumonia | 1,493 | 25.5% |
63
+ | **Total** | **5,856** | **100%** |
64
+
65
+ The dataset is **imbalanced** β€” Bacterial Pneumonia is overrepresented. To address this, the training experiments were run on both the imbalanced dataset and a **balanced dataset** (downsampled to 1,493 images per class, total 4,479 images).
66
+
67
+ ### Train / Validation / Test Split
68
+
69
+ | Split | Ratio |
70
+ |-------|-------|
71
+ | Train | 70% |
72
+ | Validation | 20% |
73
+ | Test | 10% |
74
+
75
+ ### Data Augmentation
76
+
77
+ No additional augmentation beyond resizing was applied. Images were resized to **640Γ—640 pixels** as required by the YOLO architecture.
78
+
79
+ ### Known Biases and Limitations in Training Data
80
+
81
+ - **Pediatric bias:** The source dataset was collected primarily from pediatric patients at Guangzhou Women and Children's Medical Center. Performance on adult populations may differ.
82
+ - **Geographic/demographic bias:** Single-institution data from China limits generalizability to other populations, imaging equipment, or acquisition protocols.
83
+ - **Metadata-based annotation:** The Bacterial/Viral split was derived from filename metadata rather than independent clinical re-annotation. Any labeling errors in the source dataset propagate into this model.
84
+ - **Class imbalance:** The raw dataset has ~1.86Γ— more bacterial than viral pneumonia samples, which can bias model predictions toward the more common class if not corrected.
85
+ - **No patient-level split:** Images from the same patient may appear across train/validation/test sets, potentially inflating reported metrics.
86
+
87
+ ---
88
+
89
+ ## Training Procedure
90
+
91
+ ### Training Framework
92
+
93
+ - **Framework:** PyTorch + Ultralytics
94
+ - **Hardware:** NVIDIA L4 GPU, 24 GB VRAM
95
+ - **Training time:** ~70 epochs per model (with early stopping)
96
+
97
+ ### Preprocessing
98
+
99
+ - Resize all images to **640Γ—640 pixels**
100
+ - No additional normalization or augmentation beyond framework defaults
101
+
102
+ ### Hyperparameters
103
+
104
+ | Parameter | Value |
105
+ |-----------|-------|
106
+ | Learning Rate | 0.0001 (1e-4) |
107
+ | Optimizer | Adam |
108
+ | Loss Function | CrossEntropy |
109
+ | Batch Size | 16 |
110
+ | Epochs | 70 |
111
+ | Early Stopping Patience | 12 (monitored on validation loss) |
112
+
113
+ ### Model Architectures Compared
114
+
115
+ Four lightweight architectures were trained and compared:
116
+
117
+ 1. **MobileNet-V3**
118
+ 2. **EfficientNet-V2**
119
+ 3. **YOLOv11n** (classification)
120
+ 4. **YOLOv26n** (classification) ← *Selected final model*
121
+
122
+ ---
123
+
124
+ ## Evaluation Results
125
+
126
+ ### Overall Performance Summary (Balanced Dataset)
127
+
128
+ | Model | Accuracy | Recall | Macro F1 | Precision |
129
+ |-------|----------|--------|----------|-----------|
130
+ | MobileNet-V3 | 0.74 | 0.76 | 0.73 | 0.79 |
131
+ | EfficientNet-V2 | 0.70 | 0.73 | 0.68 | 0.78 |
132
+ | YOLOv11n | 0.83 | 0.84 | 0.82 | 0.84 |
133
+ | **YOLOv26n** | **0.89** | **0.88** | **0.88** | **0.88** |
134
+
135
+ *Min target = 0.80 on all metrics. Both YOLO models meet this threshold; MobileNet-V3 and EfficientNet-V2 fall short.*
136
+
137
+ ### Detailed Per-Class Performance: YOLOv26n (Final Model)
138
+
139
+ | Class | Precision | Recall | F1 | Test Set Count |
140
+ |-------|-----------|--------|----|----------------|
141
+ | Bacterial | 0.91 | 0.94 | 0.93 | ~242 |
142
+ | Normal | 0.97 | 0.85 | 0.91 | ~234 |
143
+ | Viral | 0.74 | 0.84 | 0.79 | ~148 |
144
+ | **Macro Avg** | **0.88** | **0.88** | **0.88** | ~624 |
145
+
146
+ ### Confusion Matrix: YOLOv26n
147
+
148
+ | | Pred: Bacterial | Pred: Normal | Pred: Viral |
149
+ |--|----------------|--------------|-------------|
150
+ | **True: Bacterial** | **228** | 4 | 10 |
151
+ | **True: Normal** | 1 | **200** | 33 |
152
+ | **True: Viral** | 21 | 2 | **125** |
153
+
154
+ ### Inference Latency
155
+
156
+ All models ran well below the 100 ms target latency on the test hardware:
157
+
158
+ | Model | Inference Latency |
159
+ |-------|------------------|
160
+ | MobileNet-V3 | 13.92 ms |
161
+ | EfficientNet-V2 | 14.83 ms |
162
+ | YOLOv11n | 15.88 ms |
163
+ | YOLOv26n | 14.00 ms |
164
+
165
+ ### Performance Analysis
166
+
167
+ **YOLOv26n was selected as the final model** based on highest accuracy (0.89), Macro F1 (0.88), and recall (0.88) on the balanced test set β€” all exceeding the 0.80 minimum target.
168
+
169
+ **What the model does well:**
170
+ - Bacterial Pneumonia is classified with high confidence (F1 = 0.93, Recall = 0.94). This is likely because bacterial pneumonia produces more visually distinct consolidation patterns in CXRs.
171
+ - Normal lungs are detected with very high precision (0.97), meaning when the model says "Normal," it is almost always correct. This is critical for a triage tool β€” false negatives (missed pneumonia) are more dangerous than false positives.
172
+
173
+ **Where the model struggles:**
174
+ - **Viral Pneumonia is the weakest class** (Precision = 0.74, F1 = 0.79, below the 0.80 target). The confusion matrix reveals that 21 Viral cases were misclassified as Bacterial. This is clinically plausible: early viral pneumonia produces subtle, diffuse interstitial patterns that are harder to distinguish from bacterial consolidation β€” even for human radiologists.
175
+ - **Normal β†’ Viral confusion:** 33 Normal cases were predicted as Viral. This represents a false positive rate that could cause unnecessary specialist reviews, but is safer than missed pneumonia.
176
+ - **Imbalanced data makes both YOLO models worse on average** β€” the balanced dataset consistently improved performance across all four models, confirming that class imbalance was a meaningful problem.
177
+
178
+ ---
179
+
180
+ ## Limitations and Biases
181
+
182
+ ### Known Failure Cases
183
+
184
+ - **Viral Pneumonia misclassified as Bacterial:** The model confuses 21 of 148 viral test cases (14%) as bacterial. In practice, both are pneumonia, so this is a severity-2 error (wrong subtype, correct disease category), not a severity-1 error (missed disease entirely).
185
+ - **Normal X-rays with subtle findings:** 33 Normal images were predicted as Viral Pneumonia. Images near the decision boundary β€” for example, mild atelectasis or pleural effusion in otherwise healthy patients β€” may trigger false positives.
186
+
187
+ ### Poor Performing Classes
188
+
189
+ **Viral Pneumonia** has below-target precision (0.74), meaning the model over-predicts this class. The likely cause is the visual similarity between early viral pneumonia (bilateral ground-glass opacity) and normal lung parenchyma with mild variation, as well as overlap with bacterial consolidation in more advanced cases.
190
+
191
+ ### Data Biases
192
+
193
+ - **Pediatric population:** Sourced exclusively from a children's hospital. Lung anatomy, disease presentation, and imaging protocols differ between pediatric and adult patients. Do not use this model on adult CXRs without further validation.
194
+ - **Single institution / single scanner:** Scanner brand, kVp settings, and image processing pipeline all affect CXR appearance. Out-of-distribution images may degrade performance significantly.
195
+ - **Metadata-derived labels:** The Bacterial/Viral annotation comes from filename metadata, not re-reviewed clinical records. Mislabeled source images directly impact model quality and evaluation metrics.
196
+
197
+ ### Environmental / Contextual Limitations
198
+
199
+ - Model assumes standard PA (posterior-anterior) chest X-ray orientation. Portable/AP views or rotated images may produce unreliable predictions.
200
+ - Performance on low-resolution or heavily compressed images has not been evaluated.
201
+ - Presence of medical devices (pacemakers, central lines, NG tubes) may confuse the classifier.
202
+
203
+ ### Inappropriate Use Cases
204
+
205
+ **This model should NOT be used for:**
206
+ - Standalone clinical diagnosis or as a replacement for radiologist review
207
+ - Adult patient populations (not validated)
208
+ - Emergency or acute care settings where false negatives carry life-threatening consequences
209
+ - Differentiating COVID-19 from other viral pneumonias (not trained on COVID data)
210
+ - Any deployment without physician oversight and institutional validation
211
+
212
+ ### Ethical Considerations
213
+
214
+ Medical AI tools carry significant ethical risk. This model is a research/educational prototype trained on a limited, non-diverse dataset. Deploying it in clinical settings without rigorous prospective validation, diverse population testing, and regulatory approval (e.g., FDA 510(k) clearance) would be inappropriate and potentially harmful. The model should never be used as the sole basis for a treatment decision.
215
+
216
+ ### Sample Size Limitations
217
+
218
+ - The Viral Pneumonia test set contains only ~148 images, making precision/recall estimates for this class statistically noisier than for Bacterial (242) or Normal (234).
219
+ - Further evaluation on an external, adult, multi-institution dataset is needed before any clinical consideration.
220
+
221
+ ---
222
+
223
+ ## How to Use
224
+
225
+ The model was deployed as a Streamlit web application. Users can upload a JPG/PNG chest X-ray image and receive a predicted class (Normal, Bacterial, or Viral) along with a probability distribution across all three classes.
226
+
227
+ ```python
228
+ from ultralytics import YOLO
229
+
230
+ model = YOLO("path/to/yolo26n_cxr.pt")
231
+ results = model.predict("chest_xray.jpg")
232
+ print(results[0].probs) # Class probabilities
233
+ ```
234
+
235
+ ---
236
+
237
+ ## Citation
238
+
239
+ If you use this model, please cite the original dataset:
240
+
241
+ > Kermany, D., Zhang, K., & Goldbaum, M. (2018). Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification. Mendeley Data, V2. https://doi.org/10.17632/rscbjbr9sj.2