Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,122 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
|
| 3 |
+
{}
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Model Summary
|
| 7 |
+
This project analyzes Uber and Lyft ride data to understand price patterns, build predictive models, and convert the problem from regression to classification.
|
| 8 |
+
I begin with data cleaning and exploratory analysis, move to regression modeling with engineered features, and finally reframe price prediction into a multiclass classification task.
|
| 9 |
+
|
| 10 |
+
## Model Details
|
| 11 |
+
This model predicts the *price category* of an Uber ride (Cheap/medium/expensive) based on engineered features such as distance, surge multiplier, weather conditions and cluster based location features.
|
| 12 |
+
|
| 13 |
+
## Model Information
|
| 14 |
+
The dataset contains over 500,000 rows and 57 features.
|
| 15 |
+
It includes detailed ride information such as price, distance, timestamps, pickup and dropoff locations, weather conditions, surge multiplier features, and engineered variables.
|
| 16 |
+
|
| 17 |
+
### Model Description
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
- **Developed by:** Idan Khen
|
| 21 |
+
- **Input type:** Numeric tabular features
|
| 22 |
+
- **Output type:** Class lable
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
## Exploratory Data Analysis (EDA)
|
| 26 |
+
|
| 27 |
+
Before modeling, exploratory analysis was performed to understand the structure and behavior of the data.
|
| 28 |
+
This included checking distributions, identifying extreme values, and validating key relationships.
|
| 29 |
+
To ensure the dataset was usable, serval cleaning steps were performed:
|
| 30 |
+
|
| 31 |
+
- Missing values were removed(Price had 55,095 missing)
|
| 32 |
+
- Removed duplicate columns
|
| 33 |
+
- Extract hour,weekday and month from the timestamp
|
| 34 |
+
- Dropped irrelevant or unsed columns
|
| 35 |
+
|
| 36 |
+
### Outliers handeling
|
| 37 |
+
Outliers were detected in several numerical features, especially in distance and surge-related columns.
|
| 38 |
+
These extreme values represent real but rare ride scenarios (such as very long trips or periods of heavy surge pricing), so removing them would distort the true behavior of the data.
|
| 39 |
+
Therefore, the outliers were kept to preserve the integrity and variability of the dataset.
|
| 40 |
+
|
| 41 |
+
### Visual Exploration
|
| 42 |
+
After cleaning the dataset, several visualizations were created to better understand feature behavior and relationships.
|
| 43 |
+
|
| 44 |
+
*Correlation heatmap*
|
| 45 |
+
|
| 46 |
+

|
| 47 |
+
|
| 48 |
+
*Distribution plots*
|
| 49 |
+
|
| 50 |
+

|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
*Sctter Plot*
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+

|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
# Q&A
|
| 61 |
+
### 1. Are certain hours of the day associated with highter ride prices?
|
| 62 |
+
|
| 63 |
+
The grapsh shows that average ride prices remain constat throughout the day.
|
| 64 |
+
This indicates the the hour of the day does not affect ride pricing.
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+

|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
### 2. How do weather conditions affect ride prices?
|
| 71 |
+
|
| 72 |
+
Both the temperature scatterplot and the cold-warm compariosn showed that the prices are almot the same across cold, mild, warm weather.
|
| 73 |
+
Temperatue doesn't affect ride prices.
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+

|
| 77 |
+
|
| 78 |
+
|
| 79 |
+

|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
## 3. Which pickup location tend to have higher ride prices?
|
| 83 |
+
|
| 84 |
+
Pickup from Boston Uni, Fenway and the Finanical District are the most expensive on average.
|
| 85 |
+
Haymarket square and North End are the cheapset. We can see clear differences by location.
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+

|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
|
| 121 |
+
|
| 122 |
+
|