Yoel125
/

Assignment_2_data_science

Model card Files Files and versions

xet

Community

Yoel125 commited on Apr 30

Commit

b07a6ec

verified ·

1 Parent(s): dd683de

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -11

README.md CHANGED Viewed

@@ -148,27 +148,21 @@ Since the groups are almost equal, our model can learn from both types of data e
 # Part 8: Classification Model Evaluation & Results
 In this final analytical stage, I evaluated the three trained classifiers using Confusion Matrices to understand their prediction patterns and error types.
 ### Model Performance Analysis:
-Part 8: Classification Model Evaluation & Results
-In this final analytical stage, I evaluated three different classifiers: Logistic Regression, Decision Tree, and Random Forest. To move beyond simple accuracy, I generated Confusion Matrices for all models to understand the business impact of their prediction errors—specifically focusing on the trade-off between False Positives (false delay alarms) and False Negatives (missed delays).
-#### Decision Tree Classifier
 Performance: The Decision Tree captured the highest number of actual delays (5,989 True Positives).
 Drawback: It suffered from an unacceptably high rate of False Positives (2,974). This means it predicted a delay for nearly 3,000 flights that actually arrived on time, making it too "trigger-happy" and unreliable for a stress-free passenger experience.
 ### Random Forest Classifier
 Performance: As an ensemble model, it improved upon the single tree, successfully identifying 5,902 True Positives while maintaining 9,596 True Negatives.
-Drawback: While it is a very balanced model, it still generated 1,470 False Positives. In a real-world application, this still represents a significant number of unnecessary false alarms sent to passengers.
-3. Logistic Regression (The Business Winner)
 Performance: This model demonstrated exceptional reliability in identifying on-time flights, achieving the highest number of True Negatives (10,380) and predicting 5,636 True Positives.
 Business Value: Most importantly, it produced the lowest number of False Positives (only 686). While it missed some actual delays (3,079 False Negatives), it heavily minimizes false alarms.
-Final Conclusion & Business Logic:
-From a practical and entrepreneurial perspective, the cost of a False Positive (alerting a passenger that their flight is delayed when it is actually on time) is much higher than a False Negative (a regular, unpredicted delay). False alarms cause unnecessary stress, disrupt travel plans, and damage trust in the application.
-Therefore, despite Random Forest having a slightly better overall balance, Logistic Regression is the chosen model for this project. It ensures that when the system issues a delay warning, it is highly likely to be accurate, thereby protecting the user experience.
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/TZzzpOliTb8WGjzw8W9PM.png)
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/X2R8v3yyBl3Op-6KrC9Ny.png)
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)

 # Part 8: Classification Model Evaluation & Results
 In this final analytical stage, I evaluated the three trained classifiers using Confusion Matrices to understand their prediction patterns and error types.
 ### Model Performance Analysis:
 Performance: The Decision Tree captured the highest number of actual delays (5,989 True Positives).
 Drawback: It suffered from an unacceptably high rate of False Positives (2,974). This means it predicted a delay for nearly 3,000 flights that actually arrived on time, making it too "trigger-happy" and unreliable for a stress-free passenger experience.
 ### Random Forest Classifier
 Performance: As an ensemble model, it improved upon the single tree, successfully identifying 5,902 True Positives while maintaining 9,596 True Negatives.
+Drawback: While it is a very balanced model, it still generated 1,470 False Positives. In a real-world application, this still represents a significant number of unnecessary false alarms sent to passengers.
+### Logistic Regression (The Business Winner)
 Performance: This model demonstrated exceptional reliability in identifying on-time flights, achieving the highest number of True Negatives (10,380) and predicting 5,636 True Positives.
 Business Value: Most importantly, it produced the lowest number of False Positives (only 686). While it missed some actual delays (3,079 False Negatives), it heavily minimizes false alarms.
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/TZzzpOliTb8WGjzw8W9PM.png)
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/X2R8v3yyBl3Op-6KrC9Ny.png)
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)
+The cost of a False Positive (alerting a passenger that their flight is delayed when it is actually on time) is much higher than a False Negative (a regular, unpredicted delay). False alarms cause unnecessary stress, disrupt travel plans, and damage trust in the application.
+Therefore, despite Random Forest having a slightly better overall balance, Logistic Regression is the chosen model for this project. It ensures that when the system issues a delay warning, it is highly likely to be accurate, thereby protecting the user experience.