Dreipfelt commited on
Commit
372c427
Β·
verified Β·
1 Parent(s): 52631bd

Upload 7 files

Browse files
Files changed (8) hide show
  1. .gitattributes +1 -0
  2. Dockerfile +10 -0
  3. README.md +183 -0
  4. app.py +177 -0
  5. feature_names.json +1 -0
  6. model_metrics.json +1 -0
  7. pipeline.pkl +3 -0
  8. requirements.txt +12 -0
.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ pipeline.pkl filter=lfs diff=lfs merge=lfs -text
Dockerfile ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ COPY requirements.txt .
6
+ RUN pip install --no-cache-dir -r requirements.txt
7
+
8
+ COPY . .
9
+
10
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸš— GetAround β€” Delay Analysis & Pricing Prediction
2
+ > Certification CDSD β€” Data Science & Deployment Project β€” Jedha Bootcamp
3
+
4
+ ---
5
+
6
+ ## πŸ“Œ Project Overview
7
+
8
+ GetAround is a peer-to-peer car rental platform. Late vehicle returns create friction
9
+ for subsequent rentals, leading to customer dissatisfaction and cancellations.
10
+
11
+ This project addresses two strategic challenges:
12
+
13
+ - **Operational optimization** β€” Analyzing late checkouts and simulating minimum delay
14
+ thresholds to reduce conflicts between consecutive rentals.
15
+ - **Pricing optimization** β€” Serving a Machine Learning model via a production API to
16
+ help owners set optimal daily rental prices.
17
+
18
+ ---
19
+
20
+ ## πŸ”— Production Links
21
+
22
+ | Service | URL |
23
+ |---------|-----|
24
+ | πŸ“Š Dashboard | https://huggingface.co/spaces/Dreipfelt/getaround-dashboard |
25
+ | πŸ”Œ API | https://Dreipfelt-getaround-api.hf.space |
26
+ | πŸ“„ API Docs | https://Dreipfelt-getaround-api.hf.space/docs |
27
+ | βš™οΈ Swagger UI | https://Dreipfelt-getaround-api.hf.space/swagger |
28
+ | πŸ’» GitHub | https://github.com/Data-Science-Designer-and-Developer/Project_GetAround |
29
+
30
+ ---
31
+
32
+ ## 🎯 Business Objectives
33
+
34
+ ### Delay Management
35
+
36
+ - Measure how often drivers return cars late
37
+ - Quantify the impact on subsequent rentals
38
+ - Simulate different minimum delay thresholds (0 to 720 minutes)
39
+ - Help Product Management choose:
40
+ - an optimal delay **threshold**
41
+ - an appropriate **scope** (all cars vs Connect only)
42
+
43
+ ### Pricing Optimization
44
+
45
+ - Train a ML model on car characteristics
46
+ - Serve predictions via a REST API
47
+ - Allow real-time price prediction through a `/predict` endpoint
48
+
49
+ ---
50
+
51
+ ## πŸ“Š Dashboard
52
+
53
+ The interactive dashboard allows Product Managers to:
54
+
55
+ - Visualize the distribution of late checkouts
56
+ - Compare Connect vs Mobile check-in types
57
+ - Simulate the trade-off between blocked rentals and resolved issues
58
+ - Filter by scope and threshold in real time
59
+ - Get a live price prediction from the API
60
+
61
+ πŸ”— https://huggingface.co/spaces/Dreipfelt/getaround-dashboard
62
+
63
+ ---
64
+
65
+ ## πŸ€– Machine Learning API
66
+
67
+ ### Model
68
+
69
+ | Property | Value |
70
+ |----------|-------|
71
+ | Algorithm | XGBoost Regressor (sklearn Pipeline) |
72
+ | Target | rental_price_per_day (€) |
73
+ | RΒ² | ~0.68 |
74
+ | RMSE | XX € ← Γ  remplacer depuis le notebook |
75
+ | Features | 28 (mileage, engine_power, fuel, color, car_type, options…) |
76
+
77
+ > **Baseline context:** a naive model predicting the dataset mean achieves RΒ² = 0.
78
+ > Our model's RΒ² of 0.68 represents a substantial improvement over this baseline,
79
+ > explaining 68% of price variance from car characteristics alone.
80
+
81
+ ### Endpoint `/predict`
82
+
83
+ - **Method**: POST
84
+ - **Input**: JSON with key `input` β€” list of lists (one per car)
85
+ - **Validation**: each row must contain exactly the number of features defined in
86
+ `feature_names.json`; the API returns a `422` error with a descriptive message
87
+ if the input is malformed.
88
+
89
+ ```bash
90
+ curl -X POST "https://Dreipfelt-getaround-api.hf.space/predict" \
91
+ -H "Content-Type: application/json" \
92
+ -d '{"input": [[150000, 120, 1, 1, 1, 0, 1, 1, 0]]}'
93
+ ```
94
+
95
+ **Response:**
96
+ ```json
97
+ {"prediction": [104.75]}
98
+ ```
99
+
100
+ πŸ“„ Full documentation: https://Dreipfelt-getaround-api.hf.space/docs
101
+ βš™οΈ Swagger UI: https://Dreipfelt-getaround-api.hf.space/swagger
102
+
103
+ ---
104
+
105
+ ## πŸ—‚οΈ Repository Structure
106
+
107
+ ```
108
+ Project_GetAround/
109
+ β”œβ”€β”€ api/ # FastAPI application
110
+ β”‚ β”œβ”€β”€ app.py # API endpoints
111
+ β”‚ β”œβ”€β”€ Dockerfile # Docker configuration
112
+ β”‚ └── feature_names.json # Model feature names
113
+ β”‚
114
+ β”œβ”€β”€ dashboard/ # Streamlit dashboard
115
+ β”‚ β”œβ”€β”€ app.py # Dashboard application
116
+ β”‚ └── requirements.txt
117
+ β”‚
118
+ β”œβ”€β”€ notebooks/ # Jupyter notebooks
119
+ β”‚ β”œβ”€β”€ 01_EDA_delays.ipynb # Delay analysis
120
+ β”‚ └── 02_ML_pricing.ipynb # ML model training
121
+ β”‚
122
+ β”œβ”€β”€ .gitignore
123
+ └── README.md
124
+ ```
125
+
126
+ ---
127
+
128
+ ## πŸ› οΈ Tech Stack
129
+
130
+ | Category | Tools |
131
+ |----------|-------|
132
+ | Language | Python 3.10 |
133
+ | Dashboard | Streamlit, Plotly |
134
+ | API | FastAPI, Uvicorn |
135
+ | ML | Scikit-learn, XGBoost Regressor |
136
+ | Deployment | Hugging Face Spaces, Docker |
137
+ | Version Control | Git, GitHub |
138
+
139
+ ---
140
+
141
+ ## πŸ”’ Data & Privacy (RGPD / GDPR)
142
+
143
+ The datasets used in this project (`get_around_delay_analysis.xlsx` and the pricing
144
+ dataset) contain **no personal data**: rental IDs are anonymous identifiers, and no
145
+ name, email, phone number, or precise location is present.
146
+
147
+ The API processes only technical car characteristics (mileage, engine power, equipment
148
+ options) submitted by the user. This data is used for real-time inference only and is
149
+ **not stored or logged** after the response is returned.
150
+
151
+ The service is hosted on **Hugging Face Spaces** (EU infrastructure), consistent with
152
+ RGPD requirements. No third-party analytics or tracking is used.
153
+
154
+ ---
155
+
156
+ ## βš™οΈ Local Setup
157
+
158
+ ```bash
159
+ # Clone the repo
160
+ git clone https://github.com/Data-Science-Designer-and-Developer/Project_GetAround.git
161
+ cd Project_GetAround
162
+
163
+ # Install dependencies
164
+ pip install -r dashboard/requirements.txt
165
+
166
+ # Run the dashboard
167
+ streamlit run dashboard/app.py
168
+
169
+ # Run the API
170
+ cd api
171
+ uvicorn app:app --reload
172
+ # API available at http://localhost:8000
173
+ # Swagger UI at http://localhost:8000/swagger
174
+ # Custom docs at http://localhost:8000/docs
175
+ ```
176
+
177
+ ---
178
+
179
+ ## πŸ‘€ Author
180
+
181
+ **FrΓ©dΓ©ric**
182
+ CDSD Candidate β€” Data Scientist
183
+ Jedha Bootcamp
app.py ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import joblib
4
+ import pandas as pd
5
+ import numpy as np
6
+ from fastapi import FastAPI
7
+ from fastapi.responses import HTMLResponse
8
+ from pydantic import BaseModel, ConfigDict
9
+
10
+ # ── Load model and feature names ────────────────────────────────────────
11
+ BASE_DIR = os.path.dirname(os.path.abspath(__file__))
12
+
13
+ PIPELINE_PATH = os.path.join(BASE_DIR, "pipeline.pkl")
14
+ FEATURES_PATH = os.path.join(BASE_DIR, "feature_names.json")
15
+ METRICS_PATH = os.path.join(BASE_DIR, "model_metrics.json")
16
+
17
+ pipeline = joblib.load(PIPELINE_PATH)
18
+
19
+ with open(FEATURES_PATH, "r", encoding="utf-8") as f:
20
+ feature_names = json.load(f)
21
+
22
+
23
+ # ── Initialize app ──────────────────────────────────────────────────────
24
+ app = FastAPI(
25
+ title="GetAround Pricing API",
26
+ description="Predicts the optimal rental price per day for a car",
27
+ version="1.0.0"
28
+ )
29
+
30
+ # ── Input schema ────────────────────────────────────────────────────────
31
+
32
+
33
+ class PredictInput(BaseModel):
34
+ input: list[list]
35
+ model_config = ConfigDict(
36
+ json_schema_extra={
37
+ "example": {
38
+ "input": [
39
+ [150000, 120, 1, 1, 1, 0, 1, 1, 0]
40
+ ]
41
+ }
42
+ }
43
+ )
44
+
45
+ # ── Root route ──────────────────────────────────────────────────────────
46
+
47
+
48
+ @app.get("/", response_class=HTMLResponse)
49
+ def root():
50
+ return """
51
+ <html>
52
+ <body style="font-family: Arial; text-align: center; padding: 50px;">
53
+ <h1>πŸš— GetAround Pricing API</h1>
54
+ <p>API is running!</p>
55
+ <a href="/docs">πŸ“„ Go to Documentation</a>
56
+ </body>
57
+ </html>
58
+ """
59
+
60
+ # ── /predict route ──────────────────────────────────────────────────────
61
+
62
+
63
+ @app.post("/predict")
64
+ def predict(data: PredictInput):
65
+ # Convert input to DataFrame with correct column names
66
+ X = pd.DataFrame(data.input, columns=feature_names)
67
+
68
+ # Make predictions
69
+ predictions = pipeline.predict(X)
70
+
71
+ # Round to 2 decimals and return as list
72
+ return {"prediction": [round(float(p), 2) for p in predictions]}
73
+
74
+ # ── /docs route ─────────────────────────────────────────────────────────
75
+
76
+
77
+ @app.get("/docs", response_class=HTMLResponse)
78
+ def documentation():
79
+ return """
80
+ <!DOCTYPE html>
81
+ <html lang="en">
82
+ <head>
83
+ <meta charset="UTF-8">
84
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
85
+ <title>GetAround API Documentation</title>
86
+ <style>
87
+ * { margin: 0; padding: 0; box-sizing: border-box; }
88
+ body { font-family: 'Segoe UI', Arial, sans-serif; background: #f5f7fa; color: #333; }
89
+ header { background: #1a1a2e; color: white; padding: 40px; text-align: center; }
90
+ header h1 { font-size: 2.5em; margin-bottom: 10px; }
91
+ header p { color: #aaa; font-size: 1.1em; }
92
+ .container { max-width: 900px; margin: 40px auto; padding: 0 20px; }
93
+ .endpoint { background: white; border-radius: 12px; padding: 30px; margin-bottom: 30px; box-shadow: 0 2px 10px rgba(0,0,0,0.08); }
94
+ .endpoint h2 { font-size: 1.4em; margin-bottom: 15px; display: flex; align-items: center; gap: 12px; }
95
+ .badge { padding: 5px 14px; border-radius: 20px; font-size: 0.85em; font-weight: bold; }
96
+ .post { background: #d4edda; color: #155724; }
97
+ .get { background: #cce5ff; color: #004085; }
98
+ .url { background: #1a1a2e; color: #00d4aa; padding: 12px 18px; border-radius: 8px; font-family: monospace; margin: 15px 0; }
99
+ .section-title { font-weight: bold; margin: 20px 0 8px; color: #555; text-transform: uppercase; font-size: 0.85em; letter-spacing: 1px; }
100
+ pre { background: #f8f9fa; border: 1px solid #e9ecef; border-radius: 8px; padding: 15px; font-family: monospace; font-size: 0.9em; overflow-x: auto; }
101
+ .param-table { width: 100%; border-collapse: collapse; margin-top: 10px; }
102
+ .param-table th { background: #f1f3f5; padding: 10px; text-align: left; font-size: 0.85em; color: #555; }
103
+ .param-table td { padding: 10px; border-bottom: 1px solid #f1f3f5; font-size: 0.9em; }
104
+ .tag { background: #e9ecef; padding: 2px 8px; border-radius: 4px; font-family: monospace; font-size: 0.85em; }
105
+ footer { text-align: center; padding: 30px; color: #aaa; font-size: 0.9em; }
106
+ </style>
107
+ </head>
108
+ <body>
109
+
110
+ <header>
111
+ <h1>πŸš— GetAround Pricing API</h1>
112
+ <p>Predict the optimal rental price per day for any car</p>
113
+ </header>
114
+
115
+ <div class="container">
116
+
117
+ <div class="endpoint">
118
+ <h2><span class="badge post">POST</span>/predict</h2>
119
+ <p>Returns a predicted rental price per day based on the car's characteristics.</p>
120
+ <div class="url">/predict</div>
121
+
122
+ <div class="section-title">Input</div>
123
+ <p>JSON body with key <span class="tag">input</span> β€” a list of lists (one per car).</p>
124
+
125
+ <table class="param-table">
126
+ <tr><th>#</th><th>Feature</th><th>Type</th><th>Example</th></tr>
127
+ <tr><td>1</td><td>mileage</td><td>float</td><td>150000.0</td></tr>
128
+ <tr><td>2</td><td>engine_power</td><td>float</td><td>120.0</td></tr>
129
+ <tr><td>3</td><td>private_parking_available</td><td>bool (0/1)</td><td>1.0</td></tr>
130
+ <tr><td>4</td><td>has_gps</td><td>bool (0/1)</td><td>1.0</td></tr>
131
+ <tr><td>5</td><td>has_air_conditioning</td><td>bool (0/1)</td><td>1.0</td></tr>
132
+ <tr><td>6</td><td>automatic_car</td><td>bool (0/1)</td><td>0.0</td></tr>
133
+ <tr><td>7</td><td>has_getaround_connect</td><td>bool (0/1)</td><td>1.0</td></tr>
134
+ <tr><td>8</td><td>has_speed_regulator</td><td>bool (0/1)</td><td>1.0</td></tr>
135
+ <tr><td>9</td><td>winter_tires</td><td>bool (0/1)</td><td>0.0</td></tr>
136
+ </table>
137
+
138
+ <div class="section-title">Request Example</div>
139
+ <pre>curl -X POST "https://your-url/predict" \
140
+ -H "Content-Type: application/json" \
141
+ -d '{"input": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8]]}'</pre>
142
+
143
+ <div class="section-title">Response Example</div>
144
+ <pre>{"prediction": [89.5]}</pre>
145
+ </div>
146
+
147
+ <div class="endpoint">
148
+ <h2><span class="badge get">GET</span>/</h2>
149
+ <p>Health check β€” confirms the API is running.</p>
150
+ <div class="url">/</div>
151
+ </div>
152
+
153
+ <div class="endpoint">
154
+ <h2><span class="badge get">GET</span>/docs</h2>
155
+ <p>This documentation page.</p>
156
+ <div class="url">/docs</div>
157
+ </div>
158
+
159
+ <div class="endpoint">
160
+ <h2>πŸ€– Model Information</h2>
161
+ <table class="param-table">
162
+ <tr><th>Property</th><th>Value</th></tr>
163
+ <tr><td>Algorithm</td><td>XGBoost Regressor (via sklearn Pipeline)</td></tr>
164
+ <tr><td>Target</td><td>rental_price_per_day (€)</td></tr>
165
+ <tr><td>RMSE</td><td>~XX €</td></tr>
166
+ <tr><td>RΒ²</td><td>~0.XX</td></tr>
167
+ </table>
168
+ <p style="margin-top:12px; color:#888; font-size:0.85em;">
169
+ ⚠️ Replace RMSE and R² with your actual results from the notebook.
170
+ </p>
171
+ </div>
172
+
173
+ </div>
174
+ <footer>GetAround Pricing API β€” Built with FastAPI πŸš€</footer>
175
+ </body>
176
+ </html>
177
+ """
feature_names.json ADDED
@@ -0,0 +1 @@
 
 
1
+ ["model_key", "mileage", "engine_power", "fuel", "paint_color", "car_type", "private_parking_available", "has_gps", "has_air_conditioning", "automatic_car", "has_getaround_connect", "has_speed_regulator", "winter_tires"]
model_metrics.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"RMSE": 16.602761905202982, "MAE": 10.496041297912598, "R\u00b2": 0.7382780909538269, "CV_RMSE_mean": 16.862179946899413, "CV_RMSE_std": 1.2674684824599929}
pipeline.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39b815b2a06f3cd4c43e246dfc5af9d8178a328c2cc87bf0b4acfee2af903981
3
+ size 348162
requirements.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi==0.115.0
2
+ uvicorn==0.30.6
3
+
4
+ pandas==2.2.2
5
+ numpy==1.26.4
6
+
7
+ scikit-learn==1.5.1
8
+ joblib==1.4.2
9
+
10
+ xgboost==3.1.2
11
+
12
+ pydantic==2.9.2