Yoel125
/

Assignment_2_data_science

Model card Files Files and versions

xet

Community

Yoel125 commited on Apr 30

Commit

dd683de

verified ·

1 Parent(s): ae07063

Update README.md

Browse files

Files changed (1) hide show

README.md +18 -4

README.md CHANGED Viewed

@@ -148,13 +148,27 @@ Since the groups are almost equal, our model can learn from both types of data e
 # Part 8: Classification Model Evaluation & Results
 In this final analytical stage, I evaluated the three trained classifiers using Confusion Matrices to understand their prediction patterns and error types.
 ### Model Performance Analysis:
-Logistic Regression: This model served as a strong baseline, achieving the highest number of True Negatives (10,380). It is highly reliable at identifying on-time flights but struggle with a significant number of False Negatives (3,079), meaning it often misses actual delays.
-Decision Tree: While it captured the highest number of True Positives (5,989), it suffered from the highest rate of False Positives (2,974). This indicates that the single tree is prone to "over-detecting" delays, leading to many false alarms.
-Random Forest: This model provided the most balanced performance. It maintained a high count of True Negatives (9,596) while successfully identifying 5,902 delayed flights with significantly fewer false alarms than the Decision Tree.
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/TZzzpOliTb8WGjzw8W9PM.png)
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/X2R8v3yyBl3Op-6KrC9Ny.png)
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)
-The Random Forest model is the overall winner for this task. It offers a superior trade-off between precision and recall, making it the most robust tool for predicting

 # Part 8: Classification Model Evaluation & Results
 In this final analytical stage, I evaluated the three trained classifiers using Confusion Matrices to understand their prediction patterns and error types.
 ### Model Performance Analysis:
+Part 8: Classification Model Evaluation & Results
+In this final analytical stage, I evaluated three different classifiers: Logistic Regression, Decision Tree, and Random Forest. To move beyond simple accuracy, I generated Confusion Matrices for all models to understand the business impact of their prediction errors—specifically focusing on the trade-off between False Positives (false delay alarms) and False Negatives (missed delays).
+#### Decision Tree Classifier
+Performance: The Decision Tree captured the highest number of actual delays (5,989 True Positives).
+Drawback: It suffered from an unacceptably high rate of False Positives (2,974). This means it predicted a delay for nearly 3,000 flights that actually arrived on time, making it too "trigger-happy" and unreliable for a stress-free passenger experience.
+### Random Forest Classifier
+Performance: As an ensemble model, it improved upon the single tree, successfully identifying 5,902 True Positives while maintaining 9,596 True Negatives.
+Drawback: While it is a very balanced model, it still generated 1,470 False Positives. In a real-world application, this still represents a significant number of unnecessary false alarms sent to passengers.
+3. Logistic Regression (The Business Winner)
+Performance: This model demonstrated exceptional reliability in identifying on-time flights, achieving the highest number of True Negatives (10,380) and predicting 5,636 True Positives.
+Business Value: Most importantly, it produced the lowest number of False Positives (only 686). While it missed some actual delays (3,079 False Negatives), it heavily minimizes false alarms.
+Final Conclusion & Business Logic:
+From a practical and entrepreneurial perspective, the cost of a False Positive (alerting a passenger that their flight is delayed when it is actually on time) is much higher than a False Negative (a regular, unpredicted delay). False alarms cause unnecessary stress, disrupt travel plans, and damage trust in the application.
+Therefore, despite Random Forest having a slightly better overall balance, Logistic Regression is the chosen model for this project. It ensures that when the system issues a delay warning, it is highly likely to be accurate, thereby protecting the user experience.
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/TZzzpOliTb8WGjzw8W9PM.png)
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/X2R8v3yyBl3Op-6KrC9Ny.png)
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)