File size: 4,289 Bytes
466431f
 
 
 
1b4d479
 
198f3b5
466431f
 
1b4d479
466431f
1b4d479
 
466431f
1b4d479
 
 
 
 
 
466431f
1b4d479
466431f
1b4d479
 
 
 
 
 
466431f
198f3b5
1b4d479
198f3b5
1b4d479
466431f
1b4d479
c1de23d
1b4d479
466431f
1b4d479
 
 
 
 
 
 
ddfabeb
 
466431f
1b4d479
 
466431f
1b4d479
 
 
 
 
 
 
 
 
466431f
198f3b5
 
 
 
 
1b4d479
 
 
466431f
1b4d479
 
 
 
466431f
198f3b5
1b4d479
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198f3b5
1b4d479
 
198f3b5
1b4d479
198f3b5
 
1b4d479
 
 
198f3b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
import streamlit as st
import pandas as pd
import os
from PIL import Image
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

st.set_page_config(layout="wide")
st.title("🩺 Diabetic Retinopathy Project")

# Tabs
tab1, tab2, tab3 = st.tabs(["📂 Dataset Info", "📊 Training Visualization", "🤖 Algorithm Used"])

# =============================
# Tab 1: Dataset Information
# =============================
with tab1:
    st.markdown("""
    ### 🧾 Dataset Overview

    **Dataset Description:**

    The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
    - **No_DR**
    - **Mild**
    - **Moderate**
    - **Severe**
    - **Proliferative_DR**

    Poor-quality images were removed, and black backgrounds were deleted. **12,521 images left**
    [📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
     
    ### 🧪 Data Preparation & Splitting

    - All images resized to **224x224**
    - **70% Training**, **30% Testing** (stratified by class)
    """)

# =============================
# Tab 2: Training Visualization
# =============================
with tab2:
    st.markdown("### 📊 Training Data Class Distribution")
    
    # CSV path and image folder path (adjust as needed)
    CSV_PATH = "./dataset/DR_grading.csv"
    IMG_FOLDER = "./dataset/images"  # Folder where all images are stored

    # Load CSV
    df = pd.read_csv(CSV_PATH)

    # Map the 'diagnosis' column to 'label' if it's numeric (e.g., 0 to 4)
    label_map = {
        0: "No_DR",
        1: "Mild",
        2: "Moderate",
        3: "Severe",
        4: "Proliferative_DR"
    }
    df['label'] = df['diagnosis'].map(label_map)

    # --- Metric 1: Full Dataset Table ---
    st.subheader("3️⃣ Full Dataset Table")
    st.dataframe(df, use_container_width=True)

    # --- Metric 2: Class Distribution ---
    st.subheader("1️⃣ Class Distribution")
    class_counts = df['label'].value_counts().reset_index()
    class_counts.columns = ['Class', 'Count']

    fig1, ax1 = plt.subplots()
    sns.barplot(data=class_counts, x='Class', y='Count', palette='rocket', ax=ax1)
    ax1.set_title("Class Distribution")
    st.pyplot(fig1)

    # --- Metric 3: Sample Images Per Class ---
    st.subheader("2️⃣ Sample Images Per Class")

    cols = st.columns(len(class_counts))
    for i, label in enumerate(class_counts['Class']):
        sample_row = df[df['label'] == label].iloc[0]  # Get first image of this class
        img_path = os.path.join(IMG_FOLDER, sample_row['id_code'])  # Assuming image filenames are id_code.png
        if os.path.exists(img_path):
            image = Image.open(img_path)
            cols[i].image(image, caption=label, use_container_width=True)
        else:
            cols[i].write(f"Image not found: {sample_row['id_code']}")
# =============================
# Tab 3: Algorithm Used
# =============================
with tab3:
    st.markdown("""
    ### 🤖 Model and Algorithm

    We used **Transfer Learning** with **DenseNet121** for DR classification.

    #### 🏗️ Model Details:
    - Model: **DenseNet121** (pretrained on **ImageNet**)
    - Input Image Size: **224x224**
    - Batch Size: **32**
    - Optimizer: **AdamW** (learning rate = **1e-3**)
    - Loss Function: **Categorical Crossentropy**
    - Evaluation Metrics: **Accuracy**, **Precision**, **Recall**

    #### 📊 Evaluation Results:
    - **Top-1 Accuracy:** 85.0%
    - **Top-2 Accuracy:** 84.9%
    - **Top-3 Accuracy:** 84.6%

    #### 🖥️ Training Environment:
    - **Operating System:** Windows  
    - **Hardware:** CPU only (no GPU)  
    - **Epochs:** 15  
    - **Training Time:** ~41 minutes per epoch  

    Since the training was done on a CPU, it was slower compared to using a GPU.  
    Because of this, we only trained for 15 epochs to save time.

    DenseNet121 was selected because it passes features directly to deeper layers,  
    which helps improve learning and reduces overfitting — especially useful in medical images like eye scans. 
    https://www.researchgate.net/publication/373171778_Deep_learning-enhanced_diabetic_retinopathy_image_classification
    """)