Update README.md
Browse files
README.md
CHANGED
|
@@ -49,3 +49,24 @@ I calculated the percentage of outliers for each feature to understand how many
|
|
| 49 |
#### Data Exploration: Answering Key Research Questions through Visualization
|
| 50 |
Following the detection of outliers in flight distance, how extreme is their distribution and what impact might they have on the model's scaling?
|
| 51 |

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
#### Data Exploration: Answering Key Research Questions through Visualization
|
| 50 |
Following the detection of outliers in flight distance, how extreme is their distribution and what impact might they have on the model's scaling?
|
| 51 |

|
| 52 |
+
This boxplot displays the distribution of flight distances and identifies extreme outliers that could distort the model's data scaling.
|
| 53 |
+
It serves as visual evidence for the capping strategy needed to ensure data quality and better performance in future modeling.
|
| 54 |
+
|
| 55 |
+
Following the detection of outliers in flight delays, how are departure and arrival delays distributed, and what do these extreme values indicate about the data set?
|
| 56 |
+

|
| 57 |
+
Both plots show a highly right-skewed distribution with extreme outliers reaching 1,600 minutes, meaning most flights are on time while a few have massive delays.
|
| 58 |
+
There is a strong correlation between departure and arrival delays, which requires handling outliers (like capping or log-transformation) to improve regression accuracy.
|
| 59 |
+
While these extreme values can skew numerical predictions in regression, they are easier to handle in classification tasks where the goal is binary status prediction.
|
| 60 |
+
|
| 61 |
+
How are the satisfaction ratings distributed, and what does the presence of '0' values in a 1-5 scale indicate about data quality?
|
| 62 |
+

|
| 63 |
+
The plot shows that while most ratings are concentrated between 4 and 5, it visually confirms the presence of '0' values across various service categories.
|
| 64 |
+
|
| 65 |
+
What is the correlation between departure and arrival delays, and how do extreme outliers reflect unusual flight patterns?
|
| 66 |
+

|
| 67 |
+
This scatter plot shows a strong positive correlation between departure and arrival delays, while highlighting how extreme outliers deviate from the main cluster.
|
| 68 |
+
|
| 69 |
+
What is the correlation between departure and arrival delays, and how do cleaning the extreme outliers reflect the flight patterns?
|
| 70 |
+

|
| 71 |
+
After cleaning the data, the scatter plot now displays a much clearer and more reliable linear relationship between the two types of delays.
|
| 72 |
+
Removing the extreme anomalies allows us to visualize the core data patterns that will be used for our predictive modeling.
|