Yoel125
/

Assignment_2_data_science

Model card Files Files and versions

Yoel125 commited on Apr 30

Commit

28e48e2

·

verified ·

1 Parent(s): d30cf12

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -165,4 +165,5 @@ Business Value: Most importantly, it produced the lowest number of False Positiv
 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)
 The cost of a False Positive (alerting a passenger that their flight is delayed when it is actually on time) is much higher than a False Negative (a regular, unpredicted delay). False alarms cause unnecessary stress, disrupt travel plans, and damage trust in the application.
 Therefore, despite Random Forest having a slightly better overall balance, Logistic Regression is the chosen model for this project. It ensures that when the system issues a delay warning, it is highly likely to be accurate, thereby protecting the user experience.

 ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oYWz2NUXdC0qmirSp50tU.png)
 The cost of a False Positive (alerting a passenger that their flight is delayed when it is actually on time) is much higher than a False Negative (a regular, unpredicted delay). False alarms cause unnecessary stress, disrupt travel plans, and damage trust in the application.
 Therefore, despite Random Forest having a slightly better overall balance, Logistic Regression is the chosen model for this project. It ensures that when the system issues a delay warning, it is highly likely to be accurate, thereby protecting the user experience.
+# final conclusion
+The analysis successfully uncovered the underlying structure of flight delays by integrating K-Means clustering to define unique passenger service profiles. We concluded that the predictive narrative is more effectively framed as a binary classification challenge than a direct regression of delay minutes. The modeling process revealed that predictive integrity is defined by high precision, where minimizing false alarms is prioritized over raw recall to maintain user trust. By selecting Logistic Regression, we optimized the workflow for a minimal False Positive rate of only 686 cases, significantly outperforming more complex ensemble methods.