Update README.md
Browse files
README.md
CHANGED
|
@@ -47,7 +47,7 @@ I checked the service rating scales (like Inflight wifi service) to make sure al
|
|
| 47 |
I used the IQR (Interquartile Range) method to find outliers in columns like Age, Flight Distance, and delay times.
|
| 48 |
I calculated the percentage of outliers for each feature to understand how many extreme values exist in the data.
|
| 49 |
#### Data Exploration: Answering Key Research Questions through Visualization
|
| 50 |
-
####
|
| 51 |

|
| 52 |
This boxplot displays the distribution of flight distances and identifies extreme outliers that could distort the model's data scaling.
|
| 53 |
It serves as visual evidence for the capping strategy needed to ensure data quality and better performance in future modeling.
|
|
@@ -62,11 +62,11 @@ While these extreme values can skew numerical predictions in regression, they ar
|
|
| 62 |

|
| 63 |
The plot shows that while most ratings are concentrated between 4 and 5, it visually confirms the presence of '0' values across various service categories.
|
| 64 |
|
| 65 |
-
#####
|
| 66 |

|
| 67 |
This scatter plot shows a strong positive correlation between departure and arrival delays, while highlighting how extreme outliers deviate from the main cluster.
|
| 68 |
|
| 69 |
-
#####
|
| 70 |

|
| 71 |
After cleaning the data, the scatter plot now displays a much clearer and more reliable linear relationship between the two types of delays.
|
| 72 |
Removing the extreme anomalies allows us to visualize the core data patterns that will be used for our predictive modeling.
|
|
|
|
| 47 |
I used the IQR (Interquartile Range) method to find outliers in columns like Age, Flight Distance, and delay times.
|
| 48 |
I calculated the percentage of outliers for each feature to understand how many extreme values exist in the data.
|
| 49 |
#### Data Exploration: Answering Key Research Questions through Visualization
|
| 50 |
+
#### Following the detection of outliers in flight distance, how extreme is their distribution and what impact might they have on the model's scaling?
|
| 51 |

|
| 52 |
This boxplot displays the distribution of flight distances and identifies extreme outliers that could distort the model's data scaling.
|
| 53 |
It serves as visual evidence for the capping strategy needed to ensure data quality and better performance in future modeling.
|
|
|
|
| 62 |

|
| 63 |
The plot shows that while most ratings are concentrated between 4 and 5, it visually confirms the presence of '0' values across various service categories.
|
| 64 |
|
| 65 |
+
##### What is the correlation between departure and arrival delays, and how do extreme outliers reflect unusual flight patterns?
|
| 66 |

|
| 67 |
This scatter plot shows a strong positive correlation between departure and arrival delays, while highlighting how extreme outliers deviate from the main cluster.
|
| 68 |
|
| 69 |
+
##### What is the correlation between departure and arrival delays, and how do cleaning the extreme outliers reflect the flight patterns?
|
| 70 |

|
| 71 |
After cleaning the data, the scatter plot now displays a much clearer and more reliable linear relationship between the two types of delays.
|
| 72 |
Removing the extreme anomalies allows us to visualize the core data patterns that will be used for our predictive modeling.
|