Spaces:

Ci-Dave
/

DR_Classification

Runtime error

File size: 4,289 Bytes

import streamlit as st
import pandas as pd
import os
from PIL import Image
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

st.set_page_config(layout="wide")
st.title("🩺 Diabetic Retinopathy Project")

# Tabs
tab1, tab2, tab3 = st.tabs(["📂 Dataset Info", "📊 Training Visualization", "🤖 Algorithm Used"])

# =============================
# Tab 1: Dataset Information
# =============================
with tab1:
    st.markdown("""
    ### 🧾 Dataset Overview

    **Dataset Description:**

    The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
    - **No_DR**
    - **Mild**
    - **Moderate**
    - **Severe**
    - **Proliferative_DR**

    Poor-quality images were removed, and black backgrounds were deleted. **12,521 images left**
    [📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
     
    ### 🧪 Data Preparation & Splitting

    - All images resized to **224x224**
    - **70% Training**, **30% Testing** (stratified by class)
    """)

# =============================
# Tab 2: Training Visualization
# =============================
with tab2:
    st.markdown("### 📊 Training Data Class Distribution")
    
    # CSV path and image folder path (adjust as needed)
    CSV_PATH = "./dataset/DR_grading.csv"
    IMG_FOLDER = "./dataset/images"  # Folder where all images are stored

    # Load CSV
    df = pd.read_csv(CSV_PATH)

    # Map the 'diagnosis' column to 'label' if it's numeric (e.g., 0 to 4)
    label_map = {
        0: "No_DR",
        1: "Mild",
        2: "Moderate",
        3: "Severe",
        4: "Proliferative_DR"
    }
    df['label'] = df['diagnosis'].map(label_map)

    # --- Metric 1: Full Dataset Table ---
    st.subheader("3️⃣ Full Dataset Table")
    st.dataframe(df, use_container_width=True)

    # --- Metric 2: Class Distribution ---
    st.subheader("1️⃣ Class Distribution")
    class_counts = df['label'].value_counts().reset_index()
    class_counts.columns = ['Class', 'Count']

    fig1, ax1 = plt.subplots()
    sns.barplot(data=class_counts, x='Class', y='Count', palette='rocket', ax=ax1)
    ax1.set_title("Class Distribution")
    st.pyplot(fig1)

    # --- Metric 3: Sample Images Per Class ---
    st.subheader("2️⃣ Sample Images Per Class")

    cols = st.columns(len(class_counts))
    for i, label in enumerate(class_counts['Class']):
        sample_row = df[df['label'] == label].iloc[0]  # Get first image of this class
        img_path = os.path.join(IMG_FOLDER, sample_row['id_code'])  # Assuming image filenames are id_code.png
        if os.path.exists(img_path):
            image = Image.open(img_path)
            cols[i].image(image, caption=label, use_container_width=True)
        else:
            cols[i].write(f"Image not found: {sample_row['id_code']}")
# =============================
# Tab 3: Algorithm Used
# =============================
with tab3:
    st.markdown("""
    ### 🤖 Model and Algorithm

    We used **Transfer Learning** with **DenseNet121** for DR classification.

    #### 🏗️ Model Details:
    - Model: **DenseNet121** (pretrained on **ImageNet**)
    - Input Image Size: **224x224**
    - Batch Size: **32**
    - Optimizer: **AdamW** (learning rate = **1e-3**)
    - Loss Function: **Categorical Crossentropy**
    - Evaluation Metrics: **Accuracy**, **Precision**, **Recall**

    #### 📊 Evaluation Results:
    - **Top-1 Accuracy:** 85.0%
    - **Top-2 Accuracy:** 84.9%
    - **Top-3 Accuracy:** 84.6%

    #### 🖥️ Training Environment:
    - **Operating System:** Windows  
    - **Hardware:** CPU only (no GPU)  
    - **Epochs:** 15  
    - **Training Time:** ~41 minutes per epoch  

    Since the training was done on a CPU, it was slower compared to using a GPU.  
    Because of this, we only trained for 15 epochs to save time.

    DenseNet121 was selected because it passes features directly to deeper layers,  
    which helps improve learning and reduces overfitting — especially useful in medical images like eye scans. 
    https://www.researchgate.net/publication/373171778_Deep_learning-enhanced_diabetic_retinopathy_image_classification
    """)