charvi020 commited on
Commit
1b06c4d
·
verified ·
1 Parent(s): e8a079c

Model Card Readme

Browse files
Files changed (1) hide show
  1. README.md +256 -3
README.md CHANGED
@@ -1,3 +1,256 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ ---
6
+ # Model Cards: Driver Drowsiness Detection System
7
+
8
+ This repository contains models developed for the Driver Drowsiness Detection System project. The goal is to enhance vehicular safety by identifying signs of driver fatigue and drowsiness in real-time using deep learning. The system employs two main approaches:
9
+ 1. **Facial Features Drowsiness Detection (Dataset 1):** Analyzes overall facial images for signs of drowsiness (e.g., yawning, general expression).
10
+ 2. **Eye Closure Drowsiness Detection (Dataset 2):** Specifically focuses on detecting whether the driver's eyes are open or closed.
11
+
12
+ The report suggests combining these approaches for a more robust system, potentially using MobileNetV2 for facial features and the tuned CNN for eye closure.
13
+
14
+ ---
15
+
16
+ ## Model Card: Facial Drowsiness Detection - Base CNN
17
+
18
+ * **Model File:** `trained_model_weights_BASE_DATASET1.pth`
19
+
20
+ ### Model Details
21
+ * **Description:** A custom Convolutional Neural Network (CNN) trained from scratch to classify facial images as 'Drowsy' or 'Natural' (alert). This is the initial baseline model for Dataset 1.
22
+ * **Architecture:** `Model_OurArchitecture` (4 Conv2D layers: 1->32, 32->64, 64->128, 128->128; MaxPool2D after first 3 Conv layers; 1 FC layer: 128*6*6 -> 256; Output FC layer: 256 -> 1; ReLU activations; Single Dropout(0.5) layer before final output).
23
+ * **Input:** 48x48 Grayscale images.
24
+ * **Output:** Single logit predicting drowsiness (Binary Classification).
25
+ * **Framework:** PyTorch.
26
+
27
+ ### Intended Use
28
+ * Intended for detecting drowsiness based on static facial images. Serves as a baseline for comparison.
29
+ * **Not recommended for deployment due to significant overfitting.**
30
+
31
+ ### Training Data
32
+ * **Dataset:** Drowsy Detection Dataset ([Kaggle Link](https://www.kaggle.com/datasets/yasharjebraeily/drowsy-detection-dataset))
33
+ * **Classes:** DROWSY, NATURAL.
34
+ * **Size:** 5,859 training images.
35
+ * **Preprocessing:** Resize (48x48), Grayscale, ToTensor, Normalize (calculated mean/std from dataset), RandomHorizontalFlip.
36
+
37
+ ### Evaluation Data
38
+ * **Dataset:** Test split of the Drowsy Detection Dataset.
39
+ * **Size:** 1,483 testing images.
40
+ * **Preprocessing:** Resize (48x48), Grayscale, ToTensor, Normalize (same as training).
41
+
42
+ ### Quantitative Analyses
43
+ * **Training Performance:** Accuracy: 99.51%, Loss: 0.0148
44
+ * **Evaluation Performance:** Accuracy: 86.24%, Loss: 0.9170
45
+ * **Metrics:** Accuracy, Binary Cross-Entropy with Logits Loss.
46
+
47
+ ### Limitations and Ethical Considerations
48
+ * **Overfitting:** Shows significant overfitting (large gap between training and testing accuracy). Generalizes poorly to unseen data.
49
+ * **Bias:** Performance may vary across different demographics, lighting conditions, camera angles, and accessories (e.g., glasses) not equally represented in the dataset.
50
+ * **Misuse Potential:** Could be used for surveillance, though not designed for it. False negatives (missing drowsiness) could lead to accidents; false positives (incorrect alerts) could be annoying or lead to user distrust.
51
+
52
+ ---
53
+
54
+ ## Model Card: Facial Drowsiness Detection - Base CNN + Dropout
55
+
56
+ * **Model File:** `trained_model_weights_BASE_DROPOUT_DATASET1.pth`
57
+
58
+ ### Model Details
59
+ * **Description:** The same custom CNN architecture as the base model (`Model_OurArchitecture`) but explicitly trained *with* the described dropout layer active to mitigate overfitting observed in the baseline.
60
+ * **Architecture:** `Model_OurArchitecture` (As described above, including the Dropout(0.5) layer).
61
+ * **Input:** 48x48 Grayscale images.
62
+ * **Output:** Single logit predicting drowsiness.
63
+ * **Framework:** PyTorch.
64
+
65
+ ### Intended Use
66
+ * Intended for detecting drowsiness based on static facial images. Shows improvement over the baseline by using dropout for regularization.
67
+ * Better generalization than the baseline, but transfer learning models performed better.
68
+
69
+ ### Training Data
70
+ * Same as the Base CNN model (Dataset 1).
71
+
72
+ ### Evaluation Data
73
+ * Same as the Base CNN model (Dataset 1).
74
+
75
+ ### Quantitative Analyses
76
+ * **Training Performance:** Accuracy: 96.36%, Loss: 0.0960
77
+ * **Evaluation Performance:** Accuracy: 90.42%, Loss: 0.1969
78
+ * **Metrics:** Accuracy, BCEWithLogitsLoss.
79
+
80
+ ### Limitations and Ethical Considerations
81
+ * **Overfitting Reduced:** Overfitting is reduced compared to the baseline, but a gap still exists.
82
+ * **Bias:** Same potential biases as the base model regarding demographics, lighting, etc.
83
+ * **Misuse Potential:** Same as the base model.
84
+
85
+ ---
86
+
87
+ ## Model Card: Facial Drowsiness Detection - Base CNN + Dropout + Early Stopping
88
+
89
+ * **Model File:** `trained_model_weights_BASE_DROPOUT_EARLYSTOPPING_DATASET1.pth`
90
+
91
+ ### Model Details
92
+ * **Description:** The same custom CNN architecture (`Model_OurArchitecture` with dropout) trained using Dropout and Early Stopping (patience=5) to further prevent overfitting. Training stopped at epoch 9 out of 25 planned.
93
+ * **Architecture:** `Model_OurArchitecture` (As described above, including the Dropout(0.5) layer).
94
+ * **Input:** 48x48 Grayscale images.
95
+ * **Output:** Single logit predicting drowsiness.
96
+ * **Framework:** PyTorch.
97
+
98
+ ### Intended Use
99
+ * Intended for detecting drowsiness based on static facial images. Represents the best-performing version of the custom CNN architecture due to regularization techniques.
100
+ * Performance is closer between training and testing compared to previous versions.
101
+
102
+ ### Training Data
103
+ * Same as the Base CNN model (Dataset 1).
104
+
105
+ ### Evaluation Data
106
+ * Same as the Base CNN model (Dataset 1).
107
+
108
+ ### Quantitative Analyses
109
+ * **Best Training Performance (at Epoch 9):** Accuracy: 97.87%, Loss: 0.0617
110
+ * **Evaluation Performance:** Accuracy: 91.64%, Loss: 0.1899
111
+ * **Metrics:** Accuracy, BCEWithLogitsLoss.
112
+
113
+ ### Limitations and Ethical Considerations
114
+ * **Generalization:** While improved, may not perform as well as the best transfer learning models on diverse unseen data.
115
+ * **Bias:** Same potential biases as the base model.
116
+ * **Misuse Potential:** Same as the base model.
117
+
118
+ ---
119
+
120
+ ## Model Card: Facial Drowsiness Detection - Fine-tuned VGG16
121
+
122
+ * **Model File:** `trained_model_weights_VGG16_DATASET1.pth`
123
+
124
+ ### Model Details
125
+ * **Description:** A VGG16 model, pre-trained on ImageNet, fine-tuned for binary classification of facial images ('Drowsy' vs 'Natural') on Dataset 1.
126
+ * **Architecture:** Standard VGG16 architecture with the final fully connected layer replaced by a single output unit for binary classification.
127
+ * **Input:** 224x224 RGB images (Normalized using ImageNet stats).
128
+ * **Output:** Single logit predicting drowsiness.
129
+ * **Framework:** PyTorch.
130
+
131
+ ### Intended Use
132
+ * Detecting drowsiness from facial images. Leverages transfer learning for potentially better feature extraction and generalization compared to the custom CNN. Good performance on the test set.
133
+
134
+ ### Training Data
135
+ * **Dataset:** Drowsy Detection Dataset ([Kaggle Link](https://www.kaggle.com/datasets/yasharjebraeily/drowsy-detection-dataset))
136
+ * **Classes:** DROWSY, NATURAL.
137
+ * **Size:** 5,859 training images.
138
+ * **Preprocessing:** Resize (224x224), RandomHorizontalFlip, ToTensor, Normalize (ImageNet mean/std).
139
+
140
+ ### Evaluation Data
141
+ * **Dataset:** Test split of the Drowsy Detection Dataset.
142
+ * **Size:** 1,483 testing images.
143
+ * **Preprocessing:** Resize (224x224), ToTensor, Normalize (ImageNet mean/std).
144
+
145
+ ### Quantitative Analyses
146
+ * **Training Performance:** Accuracy: 96.69%, Loss: 0.1067
147
+ * **Evaluation Performance:** Accuracy: 97.51%, Loss: 0.1033
148
+ * **Metrics:** Accuracy, BCEWithLogitsLoss.
149
+
150
+ ### Limitations and Ethical Considerations
151
+ * **Model Size:** VGG16 is relatively large, potentially impacting inference speed and deployment on resource-constrained devices.
152
+ * **Bias:** Potential biases inherited from ImageNet pre-training and the fine-tuning dataset (demographics, lighting, etc.).
153
+ * **Misuse Potential:** Same as the base model.
154
+
155
+ ---
156
+
157
+ ## Model Card: Facial Drowsiness Detection - Fine-tuned ResNet18
158
+
159
+ * **Model File:** `trained_model_weights_RESNET18_DATASET1.pth`
160
+
161
+ ### Model Details
162
+ * **Description:** A ResNet18 model, pre-trained on ImageNet, fine-tuned for binary classification of facial images ('Drowsy' vs 'Natural') on Dataset 1.
163
+ * **Architecture:** Standard ResNet18 architecture with the final fully connected layer replaced by a single output unit.
164
+ * **Input:** 224x224 RGB images (Normalized using ImageNet stats).
165
+ * **Output:** Single logit predicting drowsiness.
166
+ * **Framework:** PyTorch.
167
+
168
+ ### Intended Use
169
+ * Detecting drowsiness from facial images using transfer learning. Offers a balance between performance and model size compared to VGG16.
170
+
171
+ ### Training Data
172
+ * Same as the Fine-tuned VGG16 model (Dataset 1, 224x224 RGB, ImageNet Norm).
173
+
174
+ ### Evaluation Data
175
+ * Same as the Fine-tuned VGG16 model (Dataset 1 Test Set).
176
+
177
+ ### Quantitative Analyses
178
+ * **Training Performance:** Accuracy: 99.42%, Loss: 0.0197
179
+ * **Evaluation Performance:** Accuracy: 95.28%, Loss: 0.1118
180
+ * **Metrics:** Accuracy, BCEWithLogitsLoss.
181
+
182
+ ### Limitations and Ethical Considerations
183
+ * **Overfitting:** Shows a slightly larger gap between training and test performance compared to VGG16/MobileNetV2 on this task, indicating some overfitting.
184
+ * **Bias:** Potential biases from ImageNet and the fine-tuning dataset.
185
+ * **Misuse Potential:** Same as the base model.
186
+
187
+ ---
188
+
189
+ ## Model Card: Facial Drowsiness Detection - Fine-tuned MobileNetV2 (**Recommended for Facial Features**)
190
+
191
+ * **Model File:** `trained_model_weights_MOBILENETV2_DATASET1.pth`
192
+
193
+ ### Model Details
194
+ * **Description:** A MobileNetV2 model, pre-trained on ImageNet, fine-tuned for binary classification of facial images ('Drowsy' vs 'Natural') on Dataset 1. Achieved the highest test accuracy among models tested on Dataset 1.
195
+ * **Architecture:** Standard MobileNetV2 architecture with the final classifier replaced for a single output unit. Designed for efficiency.
196
+ * **Input:** 224x224 RGB images (Normalized using ImageNet stats).
197
+ * **Output:** Single logit predicting drowsiness.
198
+ * **Framework:** PyTorch.
199
+
200
+ ### Intended Use
201
+ * **Recommended model for facial drowsiness detection.** Offers high accuracy and efficiency, suitable for real-time applications.
202
+
203
+ ### Training Data
204
+ * Same as the Fine-tuned VGG16 model (Dataset 1, 224x224 RGB, ImageNet Norm).
205
+
206
+ ### Evaluation Data
207
+ * Same as the Fine-tuned VGG16 model (Dataset 1 Test Set).
208
+
209
+ ### Quantitative Analyses
210
+ * **Training Performance:** Accuracy: 99.61%, Loss: 0.0175
211
+ * **Evaluation Performance:** Accuracy: 98.99%, Loss: 0.0317
212
+ * **Metrics:** Accuracy, BCEWithLogitsLoss.
213
+
214
+ ### Limitations and Ethical Considerations
215
+ * **Efficiency vs. Complexity:** While efficient, it might be less robust to extreme variations than larger models in some scenarios.
216
+ * **Bias:** Potential biases from ImageNet and the fine-tuning dataset.
217
+ * **Misuse Potential:** Same as the base model. Performance under challenging real-world conditions (e.g., poor lighting, partial occlusion) should be carefully validated.
218
+
219
+ ---
220
+
221
+ ## Model Card: Eye Closure Detection - Tuned CNN (**Recommended for Eye Closure**)
222
+
223
+ * **Model File:** `trained_model_weights_FINAL_DATASET2.pth`
224
+
225
+ ### Model Details
226
+ * **Description:** A custom CNN (`Model_NewArchitecture`) trained to detect whether eyes are 'Open' or 'Closed'. This model is the result of hyperparameter tuning (Adam optimizer, Dropout rate 0.5) on the baseline architecture for Dataset 2.
227
+ * **Architecture:** `Model_NewArchitecture` (4 Conv2D layers: 3->64, 64->128, 128->256, 256->256; MaxPool2D after first 3 Conv layers; 1 FC layer: 256*28*28 -> 512; Output FC layer: 512 -> 1; ReLU activations; Dropout(0.5) before final output).
228
+ * **Input:** 224x224 Grayscale images (potentially replicated to 3 channels based on report's transform description, normalized using dataset stats).
229
+ * **Output:** Single logit predicting eye closure (Binary Classification).
230
+ * **Framework:** PyTorch.
231
+
232
+ ### Intended Use
233
+ * **Recommended model for eye closure detection.** Specifically designed to classify eye state, intended to be used alongside the facial feature model for a more robust drowsiness detection system.
234
+
235
+ ### Training Data
236
+ * **Dataset:** Openned Closed Eyes Dataset ([Kaggle Link](https://www.kaggle.com/datasets/hazemfahmy/openned-closed-eyes/data)) - UnityEyes synthetic data.
237
+ * **Classes:** Opened, Closed.
238
+ * **Size:** 5,807 training images.
239
+ * **Preprocessing:** Resize (224x224), Grayscale (num_output_channels=3), Augmentations (RandomHorizontalFlip, RandomRotation(10), ColorJitter), ToTensor, Normalize (calculated mean/std from dataset).
240
+
241
+ ### Evaluation Data
242
+ * **Dataset:** Test split of the Openned Closed Eyes Dataset.
243
+ * **Size:** 4,232 testing images.
244
+ * **Preprocessing:** Resize (224x224), Grayscale (num_output_channels=3), ToTensor, Normalize (same as training).
245
+
246
+ ### Quantitative Analyses (Hyperparameter Tuned Model: Adam, Dropout 0.5)
247
+ * **Final Training Performance:** Accuracy: 95.52%, Loss: 0.1303 (from table pg 23)
248
+ * **Evaluation Performance:** Accuracy: 96.79%, Loss: 0.0935 (from table pg 23)
249
+ * **Metrics:** Accuracy, BCEWithLogitsLoss.
250
+
251
+ ### Limitations and Ethical Considerations
252
+ * **Synthetic Data:** Trained primarily on synthetic eye images (UnityEyes). Performance on diverse real-world eyes (different ethnicities, lighting, glasses, occlusions, extreme angles) needs validation. Domain gap might exist.
253
+ * **Bias:** Potential biases related to the distribution of eye types/states in the synthetic dataset.
254
+ * **Misuse Potential:** Could be part of a surveillance system monitoring eye state. False negatives/positives have safety implications as described for other models.
255
+
256
+ ---