Yoel125 commited on
Commit
dd683de
·
verified ·
1 Parent(s): ae07063

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -4
README.md CHANGED
@@ -148,13 +148,27 @@ Since the groups are almost equal, our model can learn from both types of data e
148
  # Part 8: Classification Model Evaluation & Results
149
  In this final analytical stage, I evaluated the three trained classifiers using Confusion Matrices to understand their prediction patterns and error types.
150
  ### Model Performance Analysis:
151
- Logistic Regression: This model served as a strong baseline, achieving the highest number of True Negatives (10,380). It is highly reliable at identifying on-time flights but struggle with a significant number of False Negatives (3,079), meaning it often misses actual delays.
 
 
 
 
 
 
 
152
 
153
- Decision Tree: While it captured the highest number of True Positives (5,989), it suffered from the highest rate of False Positives (2,974). This indicates that the single tree is prone to "over-detecting" delays, leading to many false alarms.
154
 
155
- Random Forest: This model provided the most balanced performance. It maintained a high count of True Negatives (9,596) while successfully identifying 5,902 delayed flights with significantly fewer false alarms than the Decision Tree.
 
 
 
 
 
 
 
156
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/TZzzpOliTb8WGjzw8W9PM.png)
157
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/X2R8v3yyBl3Op-6KrC9Ny.png)
158
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)
159
- The Random Forest model is the overall winner for this task. It offers a superior trade-off between precision and recall, making it the most robust tool for predicting
160
 
 
148
  # Part 8: Classification Model Evaluation & Results
149
  In this final analytical stage, I evaluated the three trained classifiers using Confusion Matrices to understand their prediction patterns and error types.
150
  ### Model Performance Analysis:
151
+ Part 8: Classification Model Evaluation & Results
152
+ In this final analytical stage, I evaluated three different classifiers: Logistic Regression, Decision Tree, and Random Forest. To move beyond simple accuracy, I generated Confusion Matrices for all models to understand the business impact of their prediction errors—specifically focusing on the trade-off between False Positives (false delay alarms) and False Negatives (missed delays).
153
+ #### Decision Tree Classifier
154
+ Performance: The Decision Tree captured the highest number of actual delays (5,989 True Positives).
155
+ Drawback: It suffered from an unacceptably high rate of False Positives (2,974). This means it predicted a delay for nearly 3,000 flights that actually arrived on time, making it too "trigger-happy" and unreliable for a stress-free passenger experience.
156
+ ### Random Forest Classifier
157
+ Performance: As an ensemble model, it improved upon the single tree, successfully identifying 5,902 True Positives while maintaining 9,596 True Negatives.
158
+ Drawback: While it is a very balanced model, it still generated 1,470 False Positives. In a real-world application, this still represents a significant number of unnecessary false alarms sent to passengers.
159
 
160
+ 3. Logistic Regression (The Business Winner)
161
 
162
+ Performance: This model demonstrated exceptional reliability in identifying on-time flights, achieving the highest number of True Negatives (10,380) and predicting 5,636 True Positives.
163
+
164
+ Business Value: Most importantly, it produced the lowest number of False Positives (only 686). While it missed some actual delays (3,079 False Negatives), it heavily minimizes false alarms.
165
+
166
+ Final Conclusion & Business Logic:
167
+ From a practical and entrepreneurial perspective, the cost of a False Positive (alerting a passenger that their flight is delayed when it is actually on time) is much higher than a False Negative (a regular, unpredicted delay). False alarms cause unnecessary stress, disrupt travel plans, and damage trust in the application.
168
+
169
+ Therefore, despite Random Forest having a slightly better overall balance, Logistic Regression is the chosen model for this project. It ensures that when the system issues a delay warning, it is highly likely to be accurate, thereby protecting the user experience.
170
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/TZzzpOliTb8WGjzw8W9PM.png)
171
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/X2R8v3yyBl3Op-6KrC9Ny.png)
172
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)
173
+
174