Yoel125 commited on
Commit
36adb68
·
verified ·
1 Parent(s): 936faf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -49,3 +49,24 @@ I calculated the percentage of outliers for each feature to understand how many
49
  #### Data Exploration: Answering Key Research Questions through Visualization
50
  Following the detection of outliers in flight distance, how extreme is their distribution and what impact might they have on the model's scaling?
51
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oIXebKJ0KqMjckAlGjYn-.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  #### Data Exploration: Answering Key Research Questions through Visualization
50
  Following the detection of outliers in flight distance, how extreme is their distribution and what impact might they have on the model's scaling?
51
  ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/oIXebKJ0KqMjckAlGjYn-.png)
52
+ This boxplot displays the distribution of flight distances and identifies extreme outliers that could distort the model's data scaling.
53
+ It serves as visual evidence for the capping strategy needed to ensure data quality and better performance in future modeling.
54
+
55
+ Following the detection of outliers in flight delays, how are departure and arrival delays distributed, and what do these extreme values indicate about the data set?
56
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/wWj_FKGb-0Et3ebK54Csh.png)
57
+ Both plots show a highly right-skewed distribution with extreme outliers reaching 1,600 minutes, meaning most flights are on time while a few have massive delays.
58
+ There is a strong correlation between departure and arrival delays, which requires handling outliers (like capping or log-transformation) to improve regression accuracy.
59
+ While these extreme values can skew numerical predictions in regression, they are easier to handle in classification tasks where the goal is binary status prediction.
60
+
61
+ How are the satisfaction ratings distributed, and what does the presence of '0' values in a 1-5 scale indicate about data quality?
62
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/wgnKWFRPS--OgH--lLnEK.png)
63
+ The plot shows that while most ratings are concentrated between 4 and 5, it visually confirms the presence of '0' values across various service categories.
64
+
65
+ What is the correlation between departure and arrival delays, and how do extreme outliers reflect unusual flight patterns?
66
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/qJy6FKjaVxlonbAhShNcf.png)
67
+ This scatter plot shows a strong positive correlation between departure and arrival delays, while highlighting how extreme outliers deviate from the main cluster.
68
+
69
+ What is the correlation between departure and arrival delays, and how do cleaning the extreme outliers reflect the flight patterns?
70
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/69c79aa8f856b118f80df631/udUMbiqphRopYggAj7llT.png)
71
+ After cleaning the data, the scatter plot now displays a much clearer and more reliable linear relationship between the two types of delays.
72
+ Removing the extreme anomalies allows us to visualize the core data patterns that will be used for our predictive modeling.