Aniket1409 commited on
Commit
885e185
·
verified ·
1 Parent(s): 50dcb3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -288
README.md CHANGED
@@ -1,288 +1,39 @@
1
- ---
2
- model_name: Plant Disease Scanner
3
- model_type: Image Classification
4
- license: cc-by-sa-4.0
5
- description: >-
6
- CNN model for classifying plant diseases from leaf images, detecting 38
7
- classes.
8
- intended_use:
9
- - Identify plant diseases
10
- - Provide treatment guides
11
- training_data:
12
- dataset_name: Plant Village Dataset
13
- dataset_link: https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset
14
- structure:
15
- - 87k RGB images of 38 types of leaves
16
- - 'Train: 70,295 images (80%)'
17
- - 'Valid: 17,572 images (20%)'
18
- evaluation_metrics:
19
- - Accuracy
20
- - Confusion Matrix
21
- additional_info:
22
- prerequisites:
23
- - Python 3.9
24
- - Anaconda3
25
- - NVIDIA GPU
26
- - TensorFlow 2.10
27
- installation: Follow instructions in the repository.
28
- model_architecture: 5 x [Conv2D + MaxPooling] with Dense layers.
29
- error_handling: Handles invalid images and missing data.
30
- language:
31
- - en
32
- metrics:
33
- - accuracy
34
- pipeline_tag: image-classification
35
- tags:
36
- - plant-disease
37
- - cnn
38
- - image-classification
39
- ---
40
-
41
-
42
- # Plant Disease Scanner
43
-
44
- # Overview
45
- This project uses a Convolutional Neural Network (CNN) to classify plant diseases based on leaf images from the Plant Village dataset.
46
- The model is trained to detect 38 different classes of plant diseases and healthy leaves.
47
-
48
- # Dataset
49
- - **Plant Village Dataset**:
50
- - [Kaggle Link](https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset)
51
- - [Original GitHub](https://github.com/spMohanty/PlantVillage-Dataset)
52
-
53
- - **Dataset Structure**:
54
- - 87k RGB images of 38 types of crop leaves
55
- - **Train**: 70,295 images (80%)
56
- - **Valid**: 17,572 images (20%)
57
- - **Test**: 33 images for prediction
58
- - Subfolders named as `[plant.name_disease.name]` or `[plant.name_healthy]`
59
-
60
- # Prerequisites
61
- - Python 3.9
62
- - Anaconda3 2024.10-1 (64-bit)
63
- - NVIDIA GPU with latest drivers
64
- - Microsoft Visual C++ 2015-2022 (x64)
65
- - TensorFlow 2.10 (last version with GPU support)
66
-
67
- # Installation (Anaconda Prompt)
68
-
69
- nvidia-smi
70
- ### Shows GPU driver, current GPU usage & CUDA version
71
-
72
- conda create -n tensorflow_environment python==3.9
73
- ### Create new environment named 'tensorflow_environment' with Python 3.9
74
-
75
- conda activate tensorflow_environment
76
- ### Activate the created environment
77
-
78
- conda deactivate
79
- ### Deactivate the current environment (corrected from 'conda activate')
80
-
81
- conda env list
82
- ### List all available Conda environments
83
-
84
- conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
85
- ### Install CUDA Toolkit 11.2 and cuDNN 8.1.0 for GPU support
86
-
87
- python -m pip install --upgrade pip
88
- ### Upgrade pip to the latest version
89
-
90
- cd <folder_location>
91
- ### Change directory to the specified folder location
92
-
93
- pip install -r requirements.txt
94
- ### Install all Python libraries listed in requirements.txt
95
-
96
- ## Note:
97
- using pip install > CPU version gets installed
98
- ##
99
- using requirements.txt > TF detects CUDA installation [for GPU] > installs GPU version
100
-
101
- # Reference
102
- - *youtube tutorial*: https://www.youtube.com/playlist?list=PLvz5lCwTgdXDNcXEVwwHsb9DwjNXZGsoy
103
-
104
-
105
- # Model
106
-
107
- ## 1. Importing Libraries
108
-
109
- ### Python Libraries
110
- import matplotlib.pyplot as plt
111
- import seaborn as sns
112
- import tensorflow as tf
113
-
114
- ### Keras Components
115
- from tensorflow.keras.models import Sequential
116
- from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout
117
-
118
- ### Evaluation Metrics
119
- from sklearn.metrics import classification_report, confusion_matrix
120
-
121
-
122
- ## 2. Data Preprocessing [Image Data Loading]
123
- - Input dimensions: 256×256 RGB images
124
- - Batch size: 32 samples
125
-
126
-
127
- ## 3. Import CNN Model & Layers
128
-
129
- ### Sequential Model
130
- - A linear stack of layers which can be added one by one.
131
-
132
- ### Conv2D
133
- - A 2D convolutional layer to detect leaf features (edges, spots, textures).
134
-
135
- ### MaxPool2D
136
- - A pooling layer that reduces (halves) spatial dimensions from (126, 126, 32) to (63, 63, 32), further reducing computation.
137
-
138
- ### Flatten
139
- - Converts multi-dimensional data from MaxPool2D (6×6×256) into a 1D vector (9216 values) so it can be fed into a Dense (fully connected) layer.
140
-
141
- ### Dense (Fully Connected)
142
- - A regular fully-connected neural network layer.
143
-
144
-
145
- ## 4. CNN Architecture
146
-
147
- #### Sequential Model
148
- - Linear stack of layers added sequentially
149
- - Simple feed-forward architecture
150
-
151
- #### Conv2D (2D Convolutional Layer)
152
- - **Purpose**: Detects visual features (edges, spots, textures)
153
- - **Operation**: Applies learned filters across spatial dimensions
154
- - **Example**: Input (126, 126, 32) → Output (126, 126, 64)
155
-
156
- #### MaxPool2D (Max Pooling Layer)
157
- - **Purpose**: Reduces spatial dimensions while preserving important features
158
- - **Operation**: Takes maximum value from each window
159
- - **Example**: Input (126, 126, 64) → Output (63, 63, 64) [with 2×2 pooling]
160
-
161
- #### Flatten
162
- - **Purpose**: Converts multi-dimensional data to 1D vector
163
- - **Operation**: Reshapes (6, 6, 256) → (9216)
164
- - **Why**: Prepares data for dense layers
165
-
166
- #### Dense (Fully Connected Layer)
167
- - **Purpose**: Final classification layer
168
- - **Operation**: Each neuron connects to all inputs
169
- - **Typical Use**: Last layer with softmax activation for classification
170
-
171
- ## CNN Architecture Used: 5 x [Pairs of Conv2D + MaxPooling] to balance detail preservation and computational cost
172
-
173
-
174
- ## 5. Build CNN Layers
175
-
176
- - **feature map**: Resulting output
177
- - **filters**: Number of patterns (features) to learn
178
- - **kernel_size**: Size of the sliding window (window to scan the image)
179
- - **padding=same** [preserve image size]: Size of input image matches the size of feature matrix for each Conv2D layer
180
- - **padding=valid** [reduce flatten parameters to avoid overfitting]: Shrinks for each Conv2D layer
181
- - **strides**: Stepwise movement speed of the sliding window (in pixels)
182
- - **pool_size**: Window size (e.g., (2,2) halves dimensions)
183
- - **dropout()**: Regularization step which randomly drops a percentage of neurons during each training step
184
- - **relu**: If a leaf has a symptom - keeps the input. If no symptoms - ignores the input
185
- - **relu units**: Number of neurons looking for different patterns (e.g., spot, color changes). More units give more detailed detection (but slower)
186
- - **softmax**: Converts the detected features into probabilities for each class. Highest probability is the predicted class
187
- - **softmax units**: Number of possible diseases (classes)
188
-
189
- ## Gradual Filter Increase with Each Conv2D [32 → 256 filters: simple → complex features]
190
-
191
-
192
- ## 6. Compiling Model
193
-
194
- - **Adam(Adaptive Moment Estimation)**: optimization algorithm, adjusts learning rate adaptively to minimize prediction errors
195
- - **learning_rate**: adjusts model weights (patterns)
196
- - **categorical_crossentropy**: measures difference between model’s predictions and true labels
197
- - **accuracy**: % of correctly classified leaves
198
-
199
-
200
- ## 7. Model Summary
201
-
202
- - Each Conv2D reduces image size only slightly (e.g., 128×128 → 126×126)
203
- - Dense(1024) outputs 1024 high-level features for the final classification layer
204
- - Dense(38) outputs probabilities for 38 disease classes
205
-
206
- ## 8. Model Training
207
-
208
- - **epoch**: number of times the model will be trained, adjust till loss/accuracy becomes still
209
-
210
- ### Challenges and Fixes
211
-
212
- #### Overshooting
213
- - **Description**: Model updates weights too aggressively (due to high learning rate), missing the optimal solution.
214
- - **Signs of Overshooting**: Loss/accuracy fluctuates wildly during training.
215
- - **Fix**: Use a smaller learning rate (changed from default 0.001 to 0.0001).
216
-
217
- #### Overfitting
218
- - **Description**: Model memorizes specific leaf images but fails on new images.
219
- - **Signs of Overfitting**: High training accuracy, low validation accuracy.
220
- - **Fix**:
221
- - Add Dropout after dense layers.
222
- - Reduce the number of neurons (model size).
223
-
224
- ### Changes Made To The Model
225
-
226
- - Increased dense layer neurons from 1024 to 1500
227
- - Decreased learning rate size from adam default 0.001 to 0.0001
228
- - Added Dropouts after conv2d layers (25%) and dense layer (40%)
229
- - Added another conv2d layer with 512 filters to capture tiny disease signs (e.g., tiny lesions, texture changes)
230
- - Removed padding from second conv2d layer to boost training speed
231
-
232
- ### Before Changes
233
-
234
- - **Total params**: 10,649,414
235
- - **Trainable params**: 10,649,414
236
- - **Non-trainable params**: 0
237
-
238
- ### After Changes
239
-
240
- - **Total params**: 7,842,762
241
- - **Trainable params**: 7,842,762
242
- - **Non-trainable params**: 0
243
-
244
-
245
- ### Model Testing
246
-
247
- - Access class names of the dataset
248
- - Load the validation set for testing the model, then use it to predict classes
249
- - **Output**: 38 probabilities for 17572 images present in validation folder
250
- - Vertically calculate the maximum probability for each image
251
- - Iterate over test set
252
-
253
- ### Confusion Matrix
254
- - The confusion matrix is generated to evaluate the model's performance
255
-
256
- # CSV Data Exporter
257
- - Python script to export plant disease data to a CSV file
258
- - It generates `plant_disease_data.csv` with these columns:
259
- - Class Name
260
- - Disease
261
- - Symptoms
262
- - Treatment
263
-
264
-
265
- # Website Using Streamlit
266
-
267
- ## Plant Disease Scanner
268
- - A Streamlit app that identifies plant diseases from leaf photos using a trained TensorFlow model.
269
- - Camera & Upload: Snap a photo or upload leaf images (JPG/PNG)
270
- - AI Detection: Predicts diseases with confidence scores
271
- - Treatment Guide: Shows symptoms and solutions for detected diseases
272
-
273
- ## Required Files
274
- model.keras - Trained TensorFlow model
275
- combined_disease_data.csv - Disease database Class Name, Disease, Symptoms, Treatment columns
276
-
277
- ## How It Works
278
- Data Loading: Reads CSV into dictionary
279
- Prediction: Resizes images to 128x128, uses model.keras to predict disease class
280
-
281
- ## Results:
282
- - Disease name with confidence %
283
- - Expandable treatment guide
284
-
285
- ## Error Handling
286
- - Catches invalid/corrupt images
287
- - Handles missing disease data gracefully
288
- - Works best with clear leaf photos against neutral backgrounds.
 
1
+ ---
2
+ model_name: Plant Disease Scanner
3
+ model_type: Image Classification
4
+ license: cc-by-sa-4.0
5
+ description: >-
6
+ CNN model for classifying plant diseases from leaf images, detecting 38
7
+ classes.
8
+ intended_use:
9
+ - Identify plant diseases
10
+ - Provide treatment guides
11
+ training_data:
12
+ dataset_name: Plant Village Dataset
13
+ dataset_link: https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset
14
+ structure:
15
+ - 87k RGB images of 38 types of leaves
16
+ - 'Train: 70,295 images (80%)'
17
+ - 'Valid: 17,572 images (20%)'
18
+ evaluation_metrics:
19
+ - Accuracy
20
+ - Confusion Matrix
21
+ additional_info:
22
+ prerequisites:
23
+ - Python 3.9
24
+ - Anaconda3
25
+ - NVIDIA GPU
26
+ - TensorFlow 2.10
27
+ installation: Follow instructions in the repository.
28
+ model_architecture: 5 x [Conv2D + MaxPooling] with Dense layers.
29
+ error_handling: Handles invalid images and missing data.
30
+ language:
31
+ - en
32
+ metrics:
33
+ - accuracy
34
+ pipeline_tag: image-classification
35
+ tags:
36
+ - plant-disease
37
+ - cnn
38
+ - image-classification
39
+ ---