Yoel125 commited on
Commit
b07a6ec
·
verified ·
1 Parent(s): dd683de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -11
README.md CHANGED
@@ -148,27 +148,21 @@ Since the groups are almost equal, our model can learn from both types of data e
148
  # Part 8: Classification Model Evaluation & Results
149
  In this final analytical stage, I evaluated the three trained classifiers using Confusion Matrices to understand their prediction patterns and error types.
150
  ### Model Performance Analysis:
151
- Part 8: Classification Model Evaluation & Results
152
- In this final analytical stage, I evaluated three different classifiers: Logistic Regression, Decision Tree, and Random Forest. To move beyond simple accuracy, I generated Confusion Matrices for all models to understand the business impact of their prediction errors—specifically focusing on the trade-off between False Positives (false delay alarms) and False Negatives (missed delays).
153
- #### Decision Tree Classifier
154
  Performance: The Decision Tree captured the highest number of actual delays (5,989 True Positives).
 
155
  Drawback: It suffered from an unacceptably high rate of False Positives (2,974). This means it predicted a delay for nearly 3,000 flights that actually arrived on time, making it too "trigger-happy" and unreliable for a stress-free passenger experience.
156
  ### Random Forest Classifier
157
  Performance: As an ensemble model, it improved upon the single tree, successfully identifying 5,902 True Positives while maintaining 9,596 True Negatives.
158
- Drawback: While it is a very balanced model, it still generated 1,470 False Positives. In a real-world application, this still represents a significant number of unnecessary false alarms sent to passengers.
159
 
160
- 3. Logistic Regression (The Business Winner)
161
 
 
162
  Performance: This model demonstrated exceptional reliability in identifying on-time flights, achieving the highest number of True Negatives (10,380) and predicting 5,636 True Positives.
163
 
164
  Business Value: Most importantly, it produced the lowest number of False Positives (only 686). While it missed some actual delays (3,079 False Negatives), it heavily minimizes false alarms.
165
-
166
- Final Conclusion & Business Logic:
167
- From a practical and entrepreneurial perspective, the cost of a False Positive (alerting a passenger that their flight is delayed when it is actually on time) is much higher than a False Negative (a regular, unpredicted delay). False alarms cause unnecessary stress, disrupt travel plans, and damage trust in the application.
168
-
169
- Therefore, despite Random Forest having a slightly better overall balance, Logistic Regression is the chosen model for this project. It ensures that when the system issues a delay warning, it is highly likely to be accurate, thereby protecting the user experience.
170
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/TZzzpOliTb8WGjzw8W9PM.png)
171
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/X2R8v3yyBl3Op-6KrC9Ny.png)
172
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)
173
-
 
174
 
 
148
  # Part 8: Classification Model Evaluation & Results
149
  In this final analytical stage, I evaluated the three trained classifiers using Confusion Matrices to understand their prediction patterns and error types.
150
  ### Model Performance Analysis:
 
 
 
151
  Performance: The Decision Tree captured the highest number of actual delays (5,989 True Positives).
152
+
153
  Drawback: It suffered from an unacceptably high rate of False Positives (2,974). This means it predicted a delay for nearly 3,000 flights that actually arrived on time, making it too "trigger-happy" and unreliable for a stress-free passenger experience.
154
  ### Random Forest Classifier
155
  Performance: As an ensemble model, it improved upon the single tree, successfully identifying 5,902 True Positives while maintaining 9,596 True Negatives.
 
156
 
157
+ Drawback: While it is a very balanced model, it still generated 1,470 False Positives. In a real-world application, this still represents a significant number of unnecessary false alarms sent to passengers.
158
 
159
+ ### Logistic Regression (The Business Winner)
160
  Performance: This model demonstrated exceptional reliability in identifying on-time flights, achieving the highest number of True Negatives (10,380) and predicting 5,636 True Positives.
161
 
162
  Business Value: Most importantly, it produced the lowest number of False Positives (only 686). While it missed some actual delays (3,079 False Negatives), it heavily minimizes false alarms.
 
 
 
 
 
163
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/TZzzpOliTb8WGjzw8W9PM.png)
164
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/X2R8v3yyBl3Op-6KrC9Ny.png)
165
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)
166
+ The cost of a False Positive (alerting a passenger that their flight is delayed when it is actually on time) is much higher than a False Negative (a regular, unpredicted delay). False alarms cause unnecessary stress, disrupt travel plans, and damage trust in the application.
167
+ Therefore, despite Random Forest having a slightly better overall balance, Logistic Regression is the chosen model for this project. It ensures that when the system issues a delay warning, it is highly likely to be accurate, thereby protecting the user experience.
168