Spaces:

emiraran
/

chest-xray-classification

Sleeping

App Files Files Community

emiraran commited on Dec 3, 2025

Commit

feed4c5

verified ·

1 Parent(s): 530614b

Upload 7 files

Browse files

Files changed (7) hide show

README.md +411 -14
app.py +347 -0
best_model_final.h5 +3 -0
gradcam_utils.py +187 -0
label_encoder.pkl +3 -0
optimal_thresholds.pkl +3 -0
requirements.txt +10 -0

README.md CHANGED Viewed

@@ -1,14 +1,411 @@
----
-title: Chest Xray Classification
-emoji: 😻
-colorFrom: gray
-colorTo: indigo
-sdk: gradio
-sdk_version: 6.0.2
-app_file: app.py
-pinned: false
-license: mit
-short_description: Multi-label classification of 15 thoracic diseases from ches
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Multi-Label Chest X-Ray Disease Classification
+**Deep learning system for automated detection of 15 thoracic diseases from chest X-ray images using EfficientNetB0 with advanced training techniques.**
+[![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/)
+[![TensorFlow](https://img.shields.io/badge/TensorFlow-2.10-orange.svg)](https://www.tensorflow.org/)
+[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
+---
+## 📊 Performance
+| Metric | Value | Benchmark (Wang et al. 2017) |
+|--------|-------|------------------------------|
+| **Mean AUC** | **0.784** | 0.740 |
+| **Improvement** | **+5.9%** | Baseline |
+| **Top Disease (Edema)** | **0.884 AUC** | - |
+| **Recall (Medical Priority)** | **80.3%** | - |
+**Real Talk:** This isn't radiologist-level (CheXNet: 0.841 AUC), but it beats the original ChestX-ray14 paper. For a 3rd-year undergrad project, this is solid work. The dataset has 10-20% label noise (NLP-extracted, not radiologist-verified), which caps performance.
+---
+## 🎯 Dataset
+**ChestX-ray14 (NIH Clinical Center)**
+- 112,120 frontal-view chest X-ray images
+- 30,805 unique patients
+- 15 disease classes (multi-label)
+- **Download:** [NIH Box](https://nihcc.app.box.com/v/ChestXray-NIHCC)
+**Diseases:** Atelectasis, Cardiomegaly, Consolidation, Edema, Effusion, Emphysema, Fibrosis, Hernia, Infiltration, Mass, Nodule, Pleural Thickening, Pneumonia, Pneumothorax, No Finding
+**⚠️ Dataset Issues (Be Aware):**
+- Labels extracted via NLP from radiology reports → 10-20% noise
+- Extreme class imbalance (Hernia: 110 samples vs No Finding: 60K)
+- Multi-label complexity (avg 1.5 diseases per image)
+---
+## 🏗️ Architecture
+```
+Input (224x224x3)
+    ↓
+EfficientNetB0 (ImageNet pretrained)
+    ├── All 237 layers trainable (full fine-tuning)
+    └── Mixed Precision (FP16) for speed
+    ↓
+Global Average Pooling
+    ↓
+Dense(512, ReLU) → Dropout(0.3)
+    ↓
+Dense(256, ReLU) → Dropout(0.2)
+    ↓
+Dense(15, Sigmoid) [Multi-label output]
+```
+**Why This Works:**
+- **EfficientNetB0:** SOTA efficiency (5.3M params, 0.39B FLOPs)
+- **Full fine-tuning:** Medical imaging ≠ ImageNet → adapt all layers
+- **Mixed precision:** 30-40% speedup, no accuracy loss
+📖 **[See detailed architecture diagrams and training pipeline →](ARCHITECTURE.md)**
+---
+## 🔧 Training Strategy
+### **1. Focal Loss (Lin et al. 2020)**
+```python
+focal_loss = BinaryFocalCrossentropy(alpha=0.25, gamma=2.0)
+```
+**Why:** Handles extreme class imbalance better than BCE. Focuses on hard-to-classify samples (rare diseases).
+### **2. Balanced Oversampling**
+- Rare diseases (Hernia: 110 → 2000 samples) oversampled
+- Prevents model from ignoring minority classes
+- **Trade-off:** Increased training time (+4%), but +12% AUC on rare diseases
+### **3. Class Weights**
+- Soft weighting (50% reduction factor) to avoid overfitting rare classes
+- Complements Focal Loss for balanced learning
+### **4. Medical-Appropriate Augmentation**
+```python
+- Horizontal flip (anatomically valid)
+- Brightness ±10% (X-ray exposure variation)
+- Contrast ±10% (detector sensitivity)
+- Random zoom 0.9-1.0 (positioning variation)
+```
+**No rotation:** Chest X-rays have fixed orientation (heart on left).
+### **5. Test-Time Augmentation (TTA)**
+- 6 predictions per image (1 original + 5 augmented)
+- Average predictions → +0.6% AUC boost
+- **Cost:** 6x inference time (use for critical cases only)
+### **6. Threshold Optimization**
+- Default 0.5 → Optimized 0.2-0.45 per disease
+- Target: 80% recall (medical priority)
+- **Result:** False positives increase, but missing diseases is worse
+---
+## 📈 Results Breakdown
+### **Top Performing Diseases:**
+| Disease | AUC | Recall | Precision | Why Good? |
+|---------|-----|--------|-----------|-----------|
+| Edema | 0.884 | 80% | 43% | Clear radiological features |
+| Cardiomegaly | 0.865 | 80% | 39% | Large, distinct heart silhouette |
+| Effusion | 0.852 | 82% | 46% | High prevalence (2.5K samples) |
+### **Worst Performing Diseases:**
+| Disease | AUC | Recall | Precision | Why Bad? |
+|---------|-----|--------|-----------|----------|
+| Hernia | 0.612 | 75% | 18% | Only 110 samples (extreme rarity) |
+| Pneumonia | 0.698 | 79% | 22% | Overlaps with Infiltration (label noise) |
+| Nodule | 0.704 | 78% | 28% | Small, subtle features |
+### **Honest Assessment:**
+- **AUC 0.78** is good for noisy labels, but not clinic-ready
+- **80% recall** is appropriate for screening (catch diseases early)
+- **40% precision** means high false positives (radiologist review needed)
+- This is a **screening tool**, not a diagnostic system
+---
+## ⚠️ Limitations (Critical)
+### **1. False Positive Rate (The Elephant in the Room)**
+- **Precision: 40-45%** → 55-60% false positives
+- **Why:** Low thresholds (0.2-0.4) to maximize recall
+- **Clinical impact:** Radiologist must review all positives (intended use)
+### **2. Dataset Label Noise**
+- ChestX-ray14 uses NLP extraction (not radiologist-verified)
+- Estimated 10-20% mislabeling rate
+- Some "diseases" are actually descriptions (e.g., "No Finding")
+### **3. Class Imbalance Persists**
+- Even with oversampling, rare diseases underperform
+- Hernia (110 samples) vs No Finding (60K) → 500x difference
+- Model biased toward common diseases
+### **4. No External Validation**
+- Trained and tested on same hospital (NIH Clinical Center)
+- Performance will drop on external datasets (domain shift)
+- Real-world deployment requires multi-site validation
+### **5. Not Radiologist-Level**
+- CheXNet (2017): 0.841 AUC with DenseNet-121
+- This model: 0.784 AUC with EfficientNetB0
+- **Gap:** 5.7% AUC → Needs more data, better labels, or ensemble
+---
+## 🚀 Live Demo
+**Try it online:** [🤗 Hugging Face Space](https://huggingface.co/spaces/emiraran/chest-xray-classification)
+Upload a chest X-ray and get instant predictions! No setup required.
+---
+## 💻 Local Usage
+### **Installation**
+```bash
+pip install -r requirements.txt
+```
+### **Quick Inference (No Grad-CAM)**
+```bash
+python demo.py images/00000001_000.png
+```
+### **Full Inference (With Grad-CAM)**
+```bash
+python demo_with_gradcam.py images/00000001_000.png
+# Output: Disease predictions + gradcam_*.png heatmaps
+```
+### **Programmatic Usage**
+```python
+from demo import ChestXRayPredictor
+# Initialize predictor
+predictor = ChestXRayPredictor(
+    model_path='best_model_final.h5',
+    thresholds_path='optimal_thresholds.pkl',
+    label_encoder_path='label_encoder.pkl'
+)
+# Get predictions
+results = predictor.predict('sample_xray.png', use_tta=False)
+    for disease, idx in label_encoder.items():
+        prob = probs[idx]
+        threshold = thresholds[disease]
+        if prob >= threshold:
+            results.append({
+                'disease': disease,
+                'probability': f"{prob:.1%}",
+                'confidence': 'HIGH' if prob > threshold + 0.1 else 'MEDIUM'
+            })
+    return sorted(results, key=lambda x: float(x['probability'].strip('%')), reverse=True)
+# Example
+predictions = predict_xray('sample_xray.png')
+for p in predictions:
+    print(f"{p['disease']:<20} {p['probability']:>6}  [{p['confidence']}]")
+```
+---
+## 📁 Project Structure
+```
+chest-xray-classification/
+├── chest_xray_analysis.ipynb      # Main notebook (training + evaluation)
+├── README.md                       # This file
+├── ARCHITECTURE.md                 # Detailed architecture diagrams & pipeline
+├── .gitignore                      # Ignore large files
+├── requirements.txt                # Python dependencies
+├── demo.py                         # Local inference script
+├── demo_with_gradcam.py           # Local demo with Grad-CAM visualization
+├── gradcam_utils.py               # Grad-CAM implementation
+├── app.py                          # Gradio web interface for HF Spaces
+├── best_model_final.h5            # Model weights (NOT in repo - download separately)
+├── optimal_thresholds.pkl         # Disease-specific thresholds (NOT in repo)
+├── label_encoder.pkl              # Disease name mapping (NOT in repo)
+└── images/                        # Dataset (NOT in repo - download from NIH)
+```
+**Note:** Model files excluded due to size. Train the model using the notebook to generate weights.
+---
+## 🔬 Technical Details
+### **Training Configuration**
+```yaml
+Epochs: 50 (early stopping at epoch 46)
+Batch Size: 64
+Learning Rate: 1e-5 (reduced to 3.1e-7 via ReduceLROnPlateau)
+Optimizer: Adam
+Loss: Binary Focal Crossentropy (α=0.25, γ=2.0)
+Mixed Precision: FP16
+Training Time: ~3 hours (NVIDIA RTX GPU)
+```
+### **Data Split**
+- **Patient-level split** (not image-level) to prevent data leakage
+- Train: 89,826 images (24,644 patients)
+- Test: 22,294 images (6,161 patients)
+- **Why patient-level?** Same patient may have multiple X-rays → prevent memorization
+### **Callbacks**
+- **ModelCheckpoint:** Save best val_auc model
+- **ReduceLROnPlateau:** Halve LR if val_loss plateaus (patience=5)
+- **EarlyStopping:** Stop if val_auc plateaus (patience=10)
+---
+## 🎨 Grad-CAM Visualization
+**NEW!** See where the model looks when making predictions:
+```bash
+# Generate Grad-CAM heatmaps for top 3 predictions
+python demo_with_gradcam.py images/00000001_000.png
+# Output: gradcam_edema.png, gradcam_cardiomegaly.png, gradcam_effusion.png
+```
+**What is Grad-CAM?**
+- Gradient-weighted Class Activation Mapping
+- Shows important regions for each disease prediction
+- Red = model focuses here, Blue = model ignores
+- **Use case:** Validate model isn't using spurious correlations (e.g., text artifacts)
+**Reference:** Selvaraju et al. (2017) - [Grad-CAM: Visual Explanations from Deep Networks](https://arxiv.org/abs/1610.02391)
+---
+## 📚 References
+1. **Wang et al. (2017)** - ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks
+   [Paper](https://arxiv.org/abs/1705.02315) | [Dataset](https://nihcc.app.box.com/v/ChestXray-NIHCC)
+2. **Rajpurkar et al. (2017)** - CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays
+   [Paper](https://arxiv.org/abs/1711.05225)
+3. **Tan & Le (2019)** - EfficientNet: Rethinking Model Scaling for CNNs
+   [Paper](https://arxiv.org/abs/1905.11946)
+4. **Selvaraju et al. (2017)** - Grad-CAM: Visual Explanations from Deep Networks
+   [Paper](https://arxiv.org/abs/1610.02391)
+4. **Lin et al. (2020)** - Focal Loss for Dense Object Detection
+   [Paper](https://arxiv.org/abs/1708.02002)
+---
+## 🎓 For Recruiters / Academic Review
+### **What's Good:**
+✅ Beats published benchmark (+5.9% AUC)
+✅ SOTA techniques (Focal Loss, TTA, Mixed Precision, Full Fine-Tuning)
+✅ Medical-aware design (recall priority, patient-level split)
+✅ Comprehensive evaluation (ROC, PR curves, confusion matrices)
+✅ Honest limitation discussion (no BS marketing)
+### **What's Missing (Acknowledgment):**
+❌ External validation (single hospital data)
+❌ Radiologist comparison (no ground truth verification)
+❌ Grad-CAM visualization (explainability)
+❌ Ensemble methods (single model only)
+❌ Production deployment (no API, no containerization)
+### **Suitable For:**
+- 🎓 Undergraduate/Graduate ML coursework
+- 📝 Academic paper (with external validation)
+- 💼 Portfolio project for ML engineer roles
+- 🏥 Research prototype (NOT clinical deployment)
+### **NOT Suitable For:**
+- ❌ Clinical decision-making (FDA/CE approval required)
+- ❌ Standalone diagnosis (must be radiologist-assisted)
+- ❌ Real-time emergency screening (inference time ~200ms per image)
+---
+## 🤝 Contributing
+This is an academic project. If you find issues or have improvements:
+1. Fork the repo
+2. Create feature branch (`git checkout -b feature/improvement`)
+3. Commit changes (`git commit -m 'Add improvement'`)
+4. Push to branch (`git push origin feature/improvement`)
+5. Open Pull Request
+---
+## 📄 License
+MIT License - See [LICENSE](LICENSE) file for details.
+**Dataset License:** NIH ChestX-ray14 dataset is public domain (U.S. Government work). Please cite the original paper if you use this work.
+---
+## 🙏 Acknowledgments
+- NIH Clinical Center for ChestX-ray14 dataset
+- Original paper authors (Wang et al., 2017)
+- TensorFlow team for EfficientNet implementation
+- Medical imaging community for open research
+---
+## 📧 Contact
+**Author:** Emir Muhammet Aran
+**Institution:** Computer Engineering Student
+**GitHub:** [github.com/emirmuhammmetaran](https://github.com/emirmuhammmetaran)
+---
+## ⚡ Quick Start
+```bash
+# 1. Clone repo
+git clone https://github.com/emirmuhammmetaran/chest-xray-classification.git
+cd chest-xray-classification
+# 2. Install dependencies
+pip install -r requirements.txt
+# 3. Download dataset from NIH
+# https://nihcc.app.box.com/v/ChestXray-NIHCC
+# 4. Run notebook
+jupyter notebook chest_xray_analysis.ipynb
+# 5. Train model (or use pre-trained weights)
+# Training takes ~3 hours on GPU
+```
+---
+**Last Updated:** December 2025
+**Status:** ✅ Training complete | 📊 AUC 0.784 | 🎓 Academic project
+---
+## 🔥 Honest Takeaway
+**This model works, but it's not magic.**
+- It beats the 2017 baseline → Good engineering
+- It has 60% false positives → Needs radiologist review
+- It costs $0.50/1000 images (GPU inference) → Economical screening
+- It's NOT FDA-approved → Research only
+**Use case:** Pre-screen X-rays → flag suspicious cases → radiologist reviews positives.
+**Don't use for:** Standalone diagnosis, emergency triage, legal liability scenarios.
+**Bottom line:** Solid ML engineering with realistic expectations. That's how you build trust in AI.

app.py ADDED Viewed

	@@ -0,0 +1,347 @@

+"""
+Chest X-Ray Disease Classification - Hugging Face Demo
+=======================================================
+Multi-label classification of 15 thoracic diseases from chest X-rays.
+Author: Emir Muhammet Aran
+Model: EfficientNetB0 (AUC 0.784)
+Dataset: NIH ChestX-ray14
+"""
+import gradio as gr
+import tensorflow as tf
+import numpy as np
+import pickle
+from PIL import Image
+import warnings
+warnings.filterwarnings('ignore')
+from gradcam_utils import generate_gradcam_for_top_predictions, get_last_conv_layer_name
+# ============================================================================
+# MODEL LOADING
+# ============================================================================
+def build_model(num_classes=15):
+    """Rebuild EfficientNetB0 architecture"""
+    from tensorflow.keras import layers
+    from tensorflow.keras.applications import EfficientNetB0
+    IMG_SIZE = 224
+    inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
+    base_model = EfficientNetB0(
+        include_top=False,
+        weights=None,
+        input_tensor=inputs,
+        pooling='avg'
+    )
+    x = base_model.output
+    x = layers.Dense(512, activation='relu')(x)
+    x = layers.Dropout(0.3)(x)
+    x = layers.Dense(256, activation='relu')(x)
+    x = layers.Dropout(0.2)(x)
+    outputs = layers.Dense(num_classes, activation='sigmoid', dtype='float32')(x)
+    model = tf.keras.Model(inputs=inputs, outputs=outputs)
+    return model
+# Load model components
+print("Loading model...")
+model = build_model(num_classes=15)
+model.load_weights('best_model_final.h5')
+with open('optimal_thresholds.pkl', 'rb') as f:
+    optimal_thresholds = pickle.load(f)
+with open('label_encoder.pkl', 'rb') as f:
+    label_encoder = pickle.load(f)
+print("✅ Model loaded successfully!")
+# ============================================================================
+# PREDICTION FUNCTION
+# ============================================================================
+def predict_xray(image, use_tta=False):
+    """
+    Predict diseases from chest X-ray image.
+    Args:
+        image: PIL Image or numpy array
+        use_tta: Use Test-Time Augmentation (slower but more accurate)
+    Returns:
+        HTML formatted results
+    """
+    try:
+        # Preprocess image
+        if isinstance(image, np.ndarray):
+            image = Image.fromarray(image)
+        # Resize and normalize
+        image = image.convert('RGB')
+        image = image.resize((224, 224))
+        img_array = np.array(image) / 255.0
+        img_array = np.expand_dims(img_array, axis=0).astype(np.float32)
+        # Predict
+        if use_tta:
+            # Test-Time Augmentation (5 predictions)
+            predictions = []
+            predictions.append(model.predict(img_array, verbose=0)[0])
+            for _ in range(4):
+                # Random horizontal flip
+                aug_img = tf.image.random_flip_left_right(img_array)
+                aug_img = tf.image.random_brightness(aug_img, max_delta=0.1)
+                aug_img = tf.clip_by_value(aug_img, 0.0, 1.0)
+                predictions.append(model.predict(aug_img.numpy(), verbose=0)[0])
+            probs = np.mean(predictions, axis=0)
+        else:
+            probs = model.predict(img_array, verbose=0)[0]
+        # Apply thresholds and format results
+        results = []
+        for disease, idx in label_encoder.items():
+            prob = float(probs[idx])
+            threshold = optimal_thresholds[disease]
+            if prob >= threshold:
+                confidence_score = min((prob - threshold) / (1 - threshold), 1.0)
+                confidence = 'HIGH' if confidence_score > 0.5 else 'MEDIUM'
+                results.append({
+                    'disease': disease,
+                    'probability': prob,
+                    'confidence': confidence
+                })
+        # Sort by probability
+        results = sorted(results, key=lambda x: x['probability'], reverse=True)
+        # Generate Grad-CAM for top 3 predictions if enabled
+        gradcam_images = None
+        if use_tta and results:  # Use TTA checkbox to toggle Grad-CAM
+            try:
+                last_conv_layer = get_last_conv_layer_name(model)
+                gradcam_images = generate_gradcam_for_top_predictions(
+                    image, model, results, label_encoder, top_k=min(3, len(results)),
+                    last_conv_layer_name=last_conv_layer
+                )
+            except Exception as e:
+                print(f"Grad-CAM generation failed: {e}")
+                gradcam_images = None
+        # Format output
+        if not results:
+            html_output = """
+            <div style="padding: 20px; background: #d4edda; border: 2px solid #28a745; border-radius: 10px;">
+                <h2 style="color: #155724; margin-top: 0;">✅ NO ABNORMALITIES DETECTED</h2>
+                <p style="color: #155724;">All disease probabilities are below the optimized thresholds.</p>
+                <p style="color: #666; font-size: 0.9em; margin-bottom: 0;">
+                    <strong>Note:</strong> This model prioritizes recall (80%), so low-probability findings are filtered out.
+                </p>
+            </div>
+            """
+        else:
+            html_output = f"""
+            <div style="padding: 20px; background: #fff3cd; border: 2px solid #ffc107; border-radius: 10px;">
+                <h2 style="color: #856404; margin-top: 0;">⚠️ {len(results)} POTENTIAL FINDING(S) DETECTED</h2>
+                <div style="margin: 15px 0;">
+            """
+            for i, r in enumerate(results, 1):
+                prob_pct = f"{r['probability'] * 100:.1f}%"
+                conf_color = '#28a745' if r['confidence'] == 'HIGH' else '#ffc107'
+                html_output += f"""
+                <div style="padding: 12px; margin: 8px 0; background: white; border-left: 4px solid {conf_color}; border-radius: 5px;">
+                    <div style="display: flex; justify-content: space-between; align-items: center;">
+                        <span style="font-weight: bold; font-size: 1.1em;">{i}. {r['disease']}</span>
+                        <span style="background: {conf_color}; color: white; padding: 4px 12px; border-radius: 12px; font-size: 0.85em;">
+                            {r['confidence']}
+                        </span>
+                    </div>
+                    <div style="margin-top: 8px;">
+                        <span style="color: #666;">Probability: </span>
+                        <span style="font-weight: bold; color: #333;">{prob_pct}</span>
+                    </div>
+                </div>
+                """
+            html_output += """
+                </div>
+            </div>
+            """
+        # Add disclaimer
+        html_output += """
+        <div style="margin-top: 20px; padding: 15px; background: #f8d7da; border: 2px solid #f5c6cb; border-radius: 10px;">
+            <h3 style="color: #721c24; margin-top: 0; font-size: 1em;">⚠️ IMPORTANT DISCLAIMER</h3>
+            <p style="color: #721c24; margin: 8px 0; font-size: 0.9em;">
+                <strong>This is a research prototype. NOT for clinical diagnosis.</strong>
+            </p>
+            <ul style="color: #721c24; margin: 8px 0; font-size: 0.85em; padding-left: 20px;">
+                <li>Model achieves 0.784 AUC (80% recall, 40% precision)</li>
+                <li>High false positive rate by design (prioritizes catching diseases)</li>
+                <li>Dataset has 10-20% label noise (NLP-extracted labels)</li>
+                <li>Always consult a qualified radiologist for medical diagnosis</li>
+            </ul>
+        </div>
+        """
+        # Return both HTML and Grad-CAM images
+        if gradcam_images:
+            return html_output, gradcam_images[0][1], gradcam_images[1][1] if len(gradcam_images) > 1 else None, gradcam_images[2][1] if len(gradcam_images) > 2 else None
+        else:
+            return html_output, None, None, None
+    except Exception as e:
+        error_html = f"""
+        <div style="padding: 20px; background: #f8d7da; border: 2px solid #f5c6cb; border-radius: 10px;">
+            <h2 style="color: #721c24; margin-top: 0;">❌ ERROR</h2>
+            <p style="color: #721c24;">Failed to process image: {str(e)}</p>
+            <p style="color: #666; font-size: 0.9em;">
+                Please ensure the image is a valid chest X-ray (PNG/JPEG format).
+            </p>
+        </div>
+        """
+        return error_html, None, None, None
+# ============================================================================
+# GRADIO INTERFACE
+# ============================================================================
+# Custom CSS
+custom_css = """
+#component-0 {
+    max-width: 900px;
+    margin: auto;
+}
+.output-html {
+    font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+}
+"""
+# Example images (optional - add if you have sample X-rays)
+examples = [
+    # ["examples/normal.png"],
+    # ["examples/pneumonia.png"],
+]
+# Create Gradio interface
+with gr.Blocks(css=custom_css, title="Chest X-Ray Disease Classifier") as demo:
+    gr.Markdown(
+        """
+        # 🏥 Chest X-Ray Disease Classification
+        **Multi-label detection of 15 thoracic diseases using EfficientNetB0**
+        Upload a frontal chest X-ray image to detect potential abnormalities.
+        **Performance:** Mean AUC 0.784 | 80% Recall | Trained on 112K X-rays (NIH ChestX-ray14)
+        ---
+        """
+    )
+    with gr.Row():
+        with gr.Column(scale=1):
+            image_input = gr.Image(
+                label="Upload Chest X-Ray",
+                type="pil",
+                height=400
+            )
+            tta_checkbox = gr.Checkbox(
+                label="Enable Grad-CAM Visualization",
+                value=False,
+                info="Show where the model looks (enables TTA for better accuracy)"
+            )
+            predict_btn = gr.Button(
+                "🔍 Analyze X-Ray",
+                variant="primary",
+                size="lg"
+            )
+        with gr.Column(scale=1):
+            output_html = gr.HTML(
+                label="Results",
+                elem_classes="output-html"
+            )
+    # Grad-CAM visualizations
+    with gr.Row(visible=True):
+        gradcam_1 = gr.Image(label="🔥 Grad-CAM #1 (Top Prediction)", type="pil")
+        gradcam_2 = gr.Image(label="🔥 Grad-CAM #2", type="pil")
+        gradcam_3 = gr.Image(label="🔥 Grad-CAM #3", type="pil")
+    # Examples section (if you have sample images)
+    if examples:
+        gr.Examples(
+            examples=examples,
+            inputs=image_input,
+            outputs=output_html,
+            fn=predict_xray,
+            cache_examples=False
+        )
+    gr.Markdown(
+        """
+        ---
+        ## 📊 About This Model
+        **Architecture:** EfficientNetB0 with full fine-tuning (237 layers)
+        **Training:** Focal Loss + Balanced Sampling + Mixed Precision (FP16)
+        **Dataset:** NIH ChestX-ray14 (112,120 images from 30,805 patients)
+        **Detected Diseases (15 classes):**
+        - Atelectasis, Cardiomegaly, Consolidation, Edema, Effusion
+        - Emphysema, Fibrosis, Hernia, Infiltration, Mass
+        - Nodule, Pleural Thickening, Pneumonia, Pneumothorax, No Finding
+        **Performance by Disease:**
+        - Best: Edema (0.884 AUC), Cardiomegaly (0.865 AUC), Effusion (0.852 AUC)
+        - Worst: Hernia (0.612 AUC - only 110 training samples)
+        **Limitations:**
+        - High false positive rate (60%) by design to maximize recall
+        - Dataset has label noise (NLP-extracted from reports)
+        - Single-site training (NIH) - may not generalize to other hospitals
+        - NOT FDA-approved or clinically validated
+        ---
+        ## 🔗 Links
+        - **Dataset:** [NIH ChestX-ray14 on Kaggle](https://www.kaggle.com/datasets/nih-chest-xrays/data)
+        - **Code:** [GitHub Repository](https://github.com/emirmuhammmetaran/chest-xray-classification)
+        - **Paper:** [Wang et al. 2017](https://arxiv.org/abs/1705.02315)
+        ---
+        **Built by:** Emir Muhammet Aran | **Institution:** Computer Engineering Student
+        **Last Updated:** December 2025
+        """
+    )
+    # Connect button to prediction function
+    predict_btn.click(
+        fn=predict_xray,
+        inputs=[image_input, tta_checkbox],
+        outputs=[output_html, gradcam_1, gradcam_2, gradcam_3]
+    )
+# Launch app
+if __name__ == "__main__":
+    demo.launch()

best_model_final.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c786618c34a3bb1afa575f26d2f7b814d2ec5fd7a70d354308cd593f8c5ab913
+size 19900048

gradcam_utils.py ADDED Viewed

	@@ -0,0 +1,187 @@

+"""
+Grad-CAM Implementation for Chest X-Ray Classification
+========================================================
+Visualizes which regions of the X-ray the model focuses on when making predictions.
+Reference: Selvaraju et al. (2017) - Grad-CAM: Visual Explanations from Deep Networks
+"""
+import tensorflow as tf
+import numpy as np
+import cv2
+from PIL import Image
+def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
+    """
+    Generate Grad-CAM heatmap for a given image and prediction.
+    Args:
+        img_array: Preprocessed image (batch_size, height, width, channels)
+        model: Trained Keras model
+        last_conv_layer_name: Name of last convolutional layer
+        pred_index: Target class index (if None, uses predicted class)
+    Returns:
+        heatmap: Normalized heatmap (0-1 range)
+    """
+    # Create a model that maps the input image to the activations of the last conv layer
+    # as well as the output predictions
+    grad_model = tf.keras.models.Model(
+        [model.inputs],
+        [model.get_layer(last_conv_layer_name).output, model.output]
+    )
+    # Compute the gradient of the top predicted class for our input image
+    # with respect to the activations of the last conv layer
+    with tf.GradientTape() as tape:
+        last_conv_layer_output, preds = grad_model(img_array)
+        if pred_index is None:
+            pred_index = tf.argmax(preds[0])
+        class_channel = preds[:, pred_index]
+    # Gradient of the output neuron with regard to the output feature map of the last conv layer
+    grads = tape.gradient(class_channel, last_conv_layer_output)
+    # Vector where each entry is the mean intensity of the gradient over a specific feature map channel
+    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
+    # Multiply each channel in the feature map array by "how important this channel is"
+    last_conv_layer_output = last_conv_layer_output[0]
+    heatmap = last_conv_layer_output @ pooled_grads[..., tf.newaxis]
+    heatmap = tf.squeeze(heatmap)
+    # Normalize the heatmap between 0 & 1 for visualization
+    heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
+    return heatmap.numpy()
+def overlay_heatmap_on_image(img, heatmap, alpha=0.4, colormap=cv2.COLORMAP_JET):
+    """
+    Overlay Grad-CAM heatmap on original image.
+    Args:
+        img: Original PIL Image or numpy array
+        heatmap: Grad-CAM heatmap (0-1 range)
+        alpha: Transparency of heatmap overlay (0-1)
+        colormap: OpenCV colormap (default: JET - red=hot, blue=cold)
+    Returns:
+        superimposed_img: PIL Image with heatmap overlay
+    """
+    # Convert PIL to numpy if needed
+    if isinstance(img, Image.Image):
+        img = np.array(img)
+    # Resize heatmap to match image size
+    heatmap_resized = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
+    # Convert heatmap to RGB
+    heatmap_colored = np.uint8(255 * heatmap_resized)
+    heatmap_colored = cv2.applyColorMap(heatmap_colored, colormap)
+    heatmap_colored = cv2.cvtColor(heatmap_colored, cv2.COLOR_BGR2RGB)
+    # Superimpose the heatmap on original image
+    superimposed_img = heatmap_colored * alpha + img * (1 - alpha)
+    superimposed_img = np.uint8(superimposed_img)
+    return Image.fromarray(superimposed_img)
+def generate_gradcam_for_disease(image, model, disease_name, label_encoder,
+                                  last_conv_layer_name='top_conv', img_size=224):
+    """
+    Generate Grad-CAM visualization for a specific disease prediction.
+    Args:
+        image: PIL Image
+        model: Trained model
+        disease_name: Name of disease to visualize
+        label_encoder: Disease name -> index mapping
+        last_conv_layer_name: Name of last conv layer in EfficientNetB0
+        img_size: Input image size
+    Returns:
+        overlaid_image: PIL Image with Grad-CAM overlay
+        heatmap: Raw heatmap array
+    """
+    # Preprocess image
+    img_resized = image.convert('RGB').resize((img_size, img_size))
+    img_array = np.array(img_resized) / 255.0
+    img_array = np.expand_dims(img_array, axis=0).astype(np.float32)
+    # Get disease index
+    disease_idx = label_encoder[disease_name]
+    # Generate heatmap
+    heatmap = make_gradcam_heatmap(img_array, model, last_conv_layer_name, disease_idx)
+    # Overlay on original image
+    overlaid_image = overlay_heatmap_on_image(img_resized, heatmap, alpha=0.4)
+    return overlaid_image, heatmap
+def generate_gradcam_for_top_predictions(image, model, predictions, label_encoder,
+                                          top_k=3, last_conv_layer_name='top_conv'):
+    """
+    Generate Grad-CAM for top K predicted diseases.
+    Args:
+        image: PIL Image
+        model: Trained model
+        predictions: List of prediction dicts from main app
+        label_encoder: Disease name -> index mapping
+        top_k: Number of top predictions to visualize
+        last_conv_layer_name: Name of last conv layer
+    Returns:
+        gradcam_images: List of (disease_name, overlaid_image, probability) tuples
+    """
+    gradcam_images = []
+    # Sort predictions by probability
+    sorted_preds = sorted(predictions, key=lambda x: x['probability'], reverse=True)[:top_k]
+    for pred in sorted_preds:
+        disease_name = pred['disease']
+        probability = pred['probability']
+        # Generate Grad-CAM
+        overlaid_img, _ = generate_gradcam_for_disease(
+            image, model, disease_name, label_encoder, last_conv_layer_name
+        )
+        gradcam_images.append((disease_name, overlaid_img, probability))
+    return gradcam_images
+def get_last_conv_layer_name(model):
+    """
+    Automatically find the last convolutional layer in the model.
+    For EfficientNetB0, it's typically 'top_conv' or the last Conv2D layer.
+    Args:
+        model: Keras model
+    Returns:
+        layer_name: Name of last conv layer
+    """
+    # Try common names first
+    common_names = ['top_conv', 'block7a_project_conv', 'conv_head']
+    for name in common_names:
+        try:
+            model.get_layer(name)
+            return name
+        except:
+            pass
+    # Search backwards for Conv2D layer
+    for layer in reversed(model.layers):
+        if isinstance(layer, tf.keras.layers.Conv2D):
+            return layer.name
+    raise ValueError("No convolutional layer found in model!")

label_encoder.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1a741eb91d4f54ad79acc4df4cb6f2ab3b91de9bae12e1639b84037a56c5d008
+size 234

optimal_thresholds.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b72ad6220549ae86dcb8800ffe62fc9362cbeeb75d32db2e484bae77aab25338
+size 514

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+tensorflow>=2.10.0,<2.11.0
+numpy>=1.23.0,<1.24.0
+pandas>=2.0.0
+matplotlib>=3.7.0
+seaborn>=0.12.0
+scikit-learn>=1.3.0
+jupyter>=1.0.0
+Pillow>=9.5.0
+opencv-python>=4.7.0
+gradio>=4.0.0