Spaces:
Runtime error
Runtime error
added the dataset table
Browse files- pages/Dataset.py +32 -40
- training/training.ipynb +1 -1
pages/Dataset.py
CHANGED
|
@@ -4,6 +4,7 @@ import os
|
|
| 4 |
from PIL import Image
|
| 5 |
import matplotlib.pyplot as plt
|
| 6 |
import seaborn as sns
|
|
|
|
| 7 |
|
| 8 |
st.set_page_config(layout="wide")
|
| 9 |
st.title("🩺 Diabetic Retinopathy Project")
|
|
@@ -21,16 +22,15 @@ with tab1:
|
|
| 21 |
**Dataset Description:**
|
| 22 |
|
| 23 |
The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
|
| 24 |
-
|
| 25 |
- **No_DR**
|
| 26 |
- **Mild**
|
| 27 |
- **Moderate**
|
| 28 |
- **Severe**
|
| 29 |
- **Proliferative_DR**
|
| 30 |
|
| 31 |
-
Poor-quality images were removed, and black backgrounds were deleted.
|
| 32 |
[📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
|
| 33 |
-
|
| 34 |
### 🧪 Data Preparation & Splitting
|
| 35 |
|
| 36 |
- All images resized to **224x224**
|
|
@@ -60,7 +60,11 @@ with tab2:
|
|
| 60 |
}
|
| 61 |
df['label'] = df['diagnosis'].map(label_map)
|
| 62 |
|
| 63 |
-
# --- Metric 1:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
st.subheader("1️⃣ Class Distribution")
|
| 65 |
class_counts = df['label'].value_counts().reset_index()
|
| 66 |
class_counts.columns = ['Class', 'Count']
|
|
@@ -70,7 +74,7 @@ with tab2:
|
|
| 70 |
ax1.set_title("Class Distribution")
|
| 71 |
st.pyplot(fig1)
|
| 72 |
|
| 73 |
-
# --- Metric
|
| 74 |
st.subheader("2️⃣ Sample Images Per Class")
|
| 75 |
|
| 76 |
cols = st.columns(len(class_counts))
|
|
@@ -82,36 +86,6 @@ with tab2:
|
|
| 82 |
cols[i].image(image, caption=label, use_container_width=True)
|
| 83 |
else:
|
| 84 |
cols[i].write(f"Image not found: {sample_row['id_code']}")
|
| 85 |
-
|
| 86 |
-
# --- Metric 3: Image Size Distribution ---
|
| 87 |
-
st.subheader("3️⃣ Image Size Distribution")
|
| 88 |
-
|
| 89 |
-
image_sizes = []
|
| 90 |
-
|
| 91 |
-
# Check a few images per class for speed
|
| 92 |
-
for label in class_counts['Class']:
|
| 93 |
-
sample_paths = df[df['label'] == label]['id_code'][:5] # 5 images per class
|
| 94 |
-
for img_code in sample_paths:
|
| 95 |
-
img_path = os.path.join(IMG_FOLDER, str(img_code)) # Assuming image filenames are id_code.png
|
| 96 |
-
if os.path.exists(img_path):
|
| 97 |
-
try:
|
| 98 |
-
with Image.open(img_path) as img:
|
| 99 |
-
image_sizes.append(img.size)
|
| 100 |
-
except Exception as e:
|
| 101 |
-
st.warning(f"Error loading image {img_code}: {e}")
|
| 102 |
-
pass
|
| 103 |
-
|
| 104 |
-
if image_sizes:
|
| 105 |
-
widths, heights = zip(*image_sizes)
|
| 106 |
-
fig2, ax2 = plt.subplots()
|
| 107 |
-
sns.histplot(widths, kde=True, label="Width", color="blue")
|
| 108 |
-
sns.histplot(heights, kde=True, label="Height", color="green")
|
| 109 |
-
ax2.legend()
|
| 110 |
-
ax2.set_title("Image Size Distribution")
|
| 111 |
-
st.pyplot(fig2)
|
| 112 |
-
else:
|
| 113 |
-
st.info("No image size data available. Check your paths.")
|
| 114 |
-
|
| 115 |
# =============================
|
| 116 |
# Tab 3: Algorithm Used
|
| 117 |
# =============================
|
|
@@ -119,14 +93,32 @@ with tab3:
|
|
| 119 |
st.markdown("""
|
| 120 |
### 🤖 Model and Algorithm
|
| 121 |
|
| 122 |
-
We used **Transfer Learning** with **
|
| 123 |
|
| 124 |
#### 🏗️ Model Details:
|
|
|
|
| 125 |
- Input Image Size: **224x224**
|
| 126 |
-
-
|
| 127 |
-
- Optimizer: **
|
| 128 |
- Loss Function: **Categorical Crossentropy**
|
| 129 |
- Evaluation Metrics: **Accuracy**, **Precision**, **Recall**
|
| 130 |
|
| 131 |
-
|
| 132 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
from PIL import Image
|
| 5 |
import matplotlib.pyplot as plt
|
| 6 |
import seaborn as sns
|
| 7 |
+
import numpy as np
|
| 8 |
|
| 9 |
st.set_page_config(layout="wide")
|
| 10 |
st.title("🩺 Diabetic Retinopathy Project")
|
|
|
|
| 22 |
**Dataset Description:**
|
| 23 |
|
| 24 |
The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
|
|
|
|
| 25 |
- **No_DR**
|
| 26 |
- **Mild**
|
| 27 |
- **Moderate**
|
| 28 |
- **Severe**
|
| 29 |
- **Proliferative_DR**
|
| 30 |
|
| 31 |
+
Poor-quality images were removed, and black backgrounds were deleted. **12,521 images left**
|
| 32 |
[📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
|
| 33 |
+
|
| 34 |
### 🧪 Data Preparation & Splitting
|
| 35 |
|
| 36 |
- All images resized to **224x224**
|
|
|
|
| 60 |
}
|
| 61 |
df['label'] = df['diagnosis'].map(label_map)
|
| 62 |
|
| 63 |
+
# --- Metric 1: Full Dataset Table ---
|
| 64 |
+
st.subheader("3️⃣ Full Dataset Table")
|
| 65 |
+
st.dataframe(df, use_container_width=True)
|
| 66 |
+
|
| 67 |
+
# --- Metric 2: Class Distribution ---
|
| 68 |
st.subheader("1️⃣ Class Distribution")
|
| 69 |
class_counts = df['label'].value_counts().reset_index()
|
| 70 |
class_counts.columns = ['Class', 'Count']
|
|
|
|
| 74 |
ax1.set_title("Class Distribution")
|
| 75 |
st.pyplot(fig1)
|
| 76 |
|
| 77 |
+
# --- Metric 3: Sample Images Per Class ---
|
| 78 |
st.subheader("2️⃣ Sample Images Per Class")
|
| 79 |
|
| 80 |
cols = st.columns(len(class_counts))
|
|
|
|
| 86 |
cols[i].image(image, caption=label, use_container_width=True)
|
| 87 |
else:
|
| 88 |
cols[i].write(f"Image not found: {sample_row['id_code']}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
# =============================
|
| 90 |
# Tab 3: Algorithm Used
|
| 91 |
# =============================
|
|
|
|
| 93 |
st.markdown("""
|
| 94 |
### 🤖 Model and Algorithm
|
| 95 |
|
| 96 |
+
We used **Transfer Learning** with **DenseNet121** for DR classification.
|
| 97 |
|
| 98 |
#### 🏗️ Model Details:
|
| 99 |
+
- Model: **DenseNet121** (pretrained on **ImageNet**)
|
| 100 |
- Input Image Size: **224x224**
|
| 101 |
+
- Batch Size: **32**
|
| 102 |
+
- Optimizer: **AdamW** (learning rate = **1e-3**)
|
| 103 |
- Loss Function: **Categorical Crossentropy**
|
| 104 |
- Evaluation Metrics: **Accuracy**, **Precision**, **Recall**
|
| 105 |
|
| 106 |
+
#### 📊 Evaluation Results:
|
| 107 |
+
- **Top-1 Accuracy:** 85.0%
|
| 108 |
+
- **Top-2 Accuracy:** 84.9%
|
| 109 |
+
- **Top-3 Accuracy:** 84.6%
|
| 110 |
+
|
| 111 |
+
#### 🖥️ Training Environment:
|
| 112 |
+
- **Operating System:** Windows
|
| 113 |
+
- **Hardware:** CPU only (no GPU)
|
| 114 |
+
- **Epochs:** 15
|
| 115 |
+
- **Training Time:** ~41 minutes per epoch
|
| 116 |
+
|
| 117 |
+
Since the training was done on a CPU, it was slower compared to using a GPU.
|
| 118 |
+
Because of this, we only trained for 15 epochs to save time.
|
| 119 |
+
|
| 120 |
+
DenseNet121 was selected because it passes features directly to deeper layers,
|
| 121 |
+
which helps improve learning and reduces overfitting — especially useful in medical images like eye scans.
|
| 122 |
+
https://www.researchgate.net/publication/373171778_Deep_learning-enhanced_diabetic_retinopathy_image_classification
|
| 123 |
+
""")
|
| 124 |
+
|
training/training.ipynb
CHANGED
|
@@ -271,7 +271,7 @@
|
|
| 271 |
"id": "1e34f571",
|
| 272 |
"metadata": {},
|
| 273 |
"source": [
|
| 274 |
-
"#### For the ESRGAN if applicable"
|
| 275 |
]
|
| 276 |
},
|
| 277 |
{
|
|
|
|
| 271 |
"id": "1e34f571",
|
| 272 |
"metadata": {},
|
| 273 |
"source": [
|
| 274 |
+
"#### For the ESRGAN if applicable (Future)"
|
| 275 |
]
|
| 276 |
},
|
| 277 |
{
|