HyraXuna commited on
Commit
eef5e54
ยท
verified ยท
1 Parent(s): d423f81

Uplaod of files and app

Browse files
Files changed (7) hide show
  1. .gitattributes +2 -0
  2. .streamlit/config.toml +12 -0
  3. Aventurine_3.png +3 -0
  4. ChibiElf1.png +3 -0
  5. Dockerfile +37 -0
  6. app.py +637 -0
  7. requirements.txt +8 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Aventurine_3.png filter=lfs diff=lfs merge=lfs -text
37
+ ChibiElf1.png filter=lfs diff=lfs merge=lfs -text
.streamlit/config.toml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [browser]
2
+ serverAddress = '0.0.0.0'
3
+
4
+ [global]
5
+ dataFrameSerialization = "legacy"
6
+
7
+ [theme]
8
+ base="light"
9
+ primaryColor="#D59A6F"
10
+ backgroundColor="#BDDFD6"
11
+ secondaryBackgroundColor="#FFDEAD"
12
+
Aventurine_3.png ADDED

Git LFS Details

  • SHA256: e8b209c5d00a3da07c6b4fda75ef7a4fbfc9e07a67a4c091b53c603c39cbbaee
  • Pointer size: 131 Bytes
  • Size of remote file: 180 kB
ChibiElf1.png ADDED

Git LFS Details

  • SHA256: f1e518fa8058731da5799aa6c8a5a852450ba2c54eea86545f4fe9d867358220
  • Pointer size: 132 Bytes
  • Size of remote file: 2.78 MB
Dockerfile ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Start with a lightweight Linux Anaconda image
2
+ FROM continuumio/miniconda3
3
+
4
+ # Update all packages and install nano unzip and curl
5
+ RUN apt-get update
6
+ RUN apt-get install nano unzip curl -y
7
+
8
+ # THIS IS SPECIFIC TO HUGGINFACE
9
+ # We create a new user named "user" with ID of 1000
10
+ RUN useradd -m -u 1000 user
11
+ # We switch from "root" (default user when creating an image) to "user"
12
+ USER user
13
+ # We set two environmnet variables
14
+ # so that we can give ownership to all files in there afterwards
15
+ # we also add /home/user/.local/bin in the $PATH environment variable
16
+ # PATH environment variable sets paths to look for installed binaries
17
+ # We update it so that Linux knows where to look for binaries if we were to install them with "user".
18
+ ENV HOME=/home/user \
19
+ PATH=/home/user/.local/bin:$PATH
20
+
21
+ # We set working directory to $HOME/app (<=> /home/user/app)
22
+ WORKDIR $HOME/app
23
+
24
+ # Copy all local files to /home/user/app with "user" as owner of these files
25
+ # Always use --chown=user when using HUGGINGFACE to avoid permission errors
26
+ COPY --chown=user . $HOME/app
27
+
28
+ # Install basic dependencies
29
+ RUN pip install -r requirements.txt
30
+
31
+
32
+ # THIS IS SPECIFIC TO HUGGINGFACE AS WELL
33
+ # expose port 7860 which is the port used by HuggingFace for Web Application
34
+ EXPOSE 7860
35
+
36
+ # Run streamlit server
37
+ CMD streamlit run --server.port 7860 app.py
app.py ADDED
@@ -0,0 +1,637 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import numpy as np
4
+ import plotly.express as px
5
+ import plotly.graph_objects as go
6
+
7
+
8
+ #################################################################### PAGE CONFIGURATION ####################################################################
9
+ st.set_page_config(page_title="Getaround Project Dashboard", page_icon="๐Ÿšฆ", layout="wide")
10
+
11
+
12
+ #################################################################### SIDEBAR MENU ####################################################################
13
+
14
+ st.sidebar.title("Navigation")
15
+ page = st.sidebar.radio("Go to", ["๐Ÿ  Home/Introduction", "๐Ÿ“Š Delays Analysis", "๐ŸŽ‰ The End & Thank You"])
16
+
17
+ e = st.sidebar.empty()
18
+ e.write("")
19
+ st.sidebar.write("Made with ๐Ÿ’–๐Ÿ’—โค๏ธโ€๐Ÿ”ฅ by Youenn PATAT")
20
+ e = st.sidebar.empty()
21
+ e.write("")
22
+ st.sidebar.image("Aventurine_3.png", use_container_width=True)
23
+ st.sidebar.markdown("ยซ ๐Ÿฅ‚ Cheers, dear reader! ๐Ÿทยป")
24
+
25
+
26
+ #################################################################### Loading data ####################################################################
27
+ #################################################################### & ####################################################################
28
+ #################################################################### Cleaning data ####################################################################
29
+ @st.cache_data
30
+ def load_data():
31
+ data = pd.read_excel("https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/get_around_delay_analysis.xlsx")
32
+ return data
33
+
34
+ @st.cache_data
35
+ def load_data_price():
36
+ data_price = pd.read_csv("https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/get_around_pricing_project.csv", index_col=0)
37
+ return data_price
38
+
39
+ data_load_state = st.text('Loading data...')
40
+ data = load_data()
41
+ data_price = load_data_price()
42
+ data_load_state.text("")
43
+
44
+ mean_rental_per_day = data_price["rental_price_per_day"].mean()
45
+
46
+ # Count the number of entries with delay_at_checkout_in_minutes > mean + 3*std and < mean - 3*std
47
+ mean_delay_checkout = data["delay_at_checkout_in_minutes"].mean()
48
+ std_delay_checkout = data["delay_at_checkout_in_minutes"].std()
49
+ outliers = data[(data['delay_at_checkout_in_minutes'] > (mean_delay_checkout + 3* std_delay_checkout)) |
50
+ (data['delay_at_checkout_in_minutes'] < (mean_delay_checkout - 3* std_delay_checkout))]
51
+ # Get the count of such entries
52
+ num_outliers = len(outliers)
53
+ # Filter out and remove the outliers
54
+ data = data[(data['delay_at_checkout_in_minutes'] <= (mean_delay_checkout + 3* std_delay_checkout)) & (data['delay_at_checkout_in_minutes'] >= (mean_delay_checkout - 3* std_delay_checkout)) | (data['delay_at_checkout_in_minutes'].isna())]
55
+ # We keep the Nan values to keep information of the cancel state of the rental, if not all the cancel state would be removed
56
+ # Define a function to categorize delays
57
+ def categorize_delay(delay):
58
+ if pd.isna(delay):
59
+ return "Unknown"
60
+ elif delay <= 0:
61
+ return "Early or in time"
62
+ elif delay < 60:
63
+ return "< 1 hour"
64
+ elif delay < 120:
65
+ return "1 to 2 hours"
66
+ elif delay < 180:
67
+ return "2 to 3 hours"
68
+ elif delay < 360:
69
+ return "3 to 6 hours"
70
+ elif delay < 720:
71
+ return "6 to 12 hours"
72
+ elif delay < 1440:
73
+ return "12 to 24 hours"
74
+ else:
75
+ return "1 day or more"
76
+ # Apply function to create the new column
77
+ data["checkout_delay_category"] = data["delay_at_checkout_in_minutes"].apply(categorize_delay)
78
+
79
+ #################################################################### HOME PAGE ####################################################################
80
+
81
+ if page == "๐Ÿ  Home/Introduction":
82
+ st.title("Welcome to the Getaround Project Dashboard โŒš๐Ÿš—โŒš")
83
+ st.image("https://lever-client-logos.s3.amazonaws.com/2bd4cdf9-37f2-497f-9096-c2793296a75f-1568844229943.png", use_container_width=True)
84
+ st.image("https://img.freepik.com/photos-gratuite/vue-du-modele-voiture-3d_23-2151138976.jpg?t=st=1742139826~exp=1742143426~hmac=a3191c31d2068646ebad17b88c52d572c57397c4d7bff718e2efa77cfaa87d07&w=1380", use_container_width=True)
85
+ st.markdown("""
86
+ ## Introduction
87
+ This project aims to analyze the impact of a new feature of threshold to deal with problematic cases when there are delays at the check-out for a rental.
88
+
89
+ ๐ŸŸ  **What you'll find in this app**:
90
+ * ๐Ÿ“Š Data insights on rental delays & affected revenue.
91
+ * ๐Ÿ“‰ Strategies to mitigate issues.
92
+ * ๐ŸŽฏ Conclusion & recommendations.
93
+
94
+ **Use the sidebar** to navigate between pages. ๐Ÿš€
95
+
96
+ In this first page, you will find out the presentation of data and first views of it. In the **Delays Analysis** page, you will find the analysis of the problem and answers.
97
+ And in the last page, some thanking and link for my other works.
98
+ """)
99
+
100
+ st.subheader("๐Ÿ“Œ - Basic analysis and view of data", divider="orange")
101
+
102
+ # diplay raw data for delays
103
+ st.write("Raw Data")
104
+ if st.checkbox('Show raw data'):
105
+ st.subheader('Raw data')
106
+ st.write(data)
107
+
108
+
109
+ # Calculate the value counts of each delay category
110
+ delay_counts = data['checkout_delay_category'].value_counts()
111
+ # Calculate the percentage of each category
112
+ delay_percentages = (delay_counts / delay_counts.sum()) * 100
113
+
114
+ st.markdown("""
115
+ Firstly, we want to check the proportion of check-in type (`mobile` or `connect`) and the proportion of the rentals' states (`ended` or `canceled`).
116
+ """)
117
+
118
+ col1, col2 = st.columns([1, 2])
119
+ with col1:
120
+ #visualisation of the percentage of the mobile vs connect check rental
121
+ checkin_counts = data["checkin_type"].value_counts().reset_index()
122
+ checkin_counts.columns = ["checkin_type", "count"]
123
+ fig1 = px.pie(checkin_counts,
124
+ names="checkin_type",
125
+ values="count",
126
+ title="Check-in Type Distribution",
127
+ color_discrete_sequence=["#3CB371", "#FFA500"])
128
+ fig1.update_traces(textfont_color="black")
129
+ st.plotly_chart(fig1, use_container_width=True, key="1")
130
+
131
+
132
+ # Add text in the second column
133
+ with col2:
134
+ #visualisation of the percentage of the mobile vs connect check rental
135
+ cancel_counts = data["state"].value_counts().reset_index()
136
+ cancel_counts.columns = ["state", "count"]
137
+ fig2 = px.pie(cancel_counts,
138
+ names="state",
139
+ values="count",
140
+ title="Proportion of rentals' states",
141
+ color_discrete_sequence=["#3CB371", "#FFA500"])
142
+ fig2.update_traces(textfont_color="black")
143
+ st.plotly_chart(fig2, use_container_width=True, key="2")
144
+
145
+ st.markdown("""
146
+ So, we see that the majority of check-in are made by mobile, only 20% are made by the connected car.
147
+ Moreover, in our case, with that dataset, we see that rentals are cancels for 15% of rentals.
148
+ """)
149
+
150
+ st.markdown("""
151
+ Now let's check the distribution of checkout delays in function of category of time.
152
+ """)
153
+ # Count occurrences of each category
154
+ delay_counts = data["checkout_delay_category"].value_counts().reset_index()
155
+ delay_counts.columns = ["Category", "Count"]
156
+ delay_counts["Percentage"] = (delay_counts["Count"] / delay_counts["Count"].sum()) * 100
157
+ # Define custom colors
158
+ custom_colors = {
159
+ "Early or in time": "#FFA500", # Orange
160
+ }
161
+ # Assign green as the default color
162
+ for category in delay_counts["Category"]:
163
+ if category not in custom_colors:
164
+ custom_colors[category] = "#3CB371" # Green
165
+ # Create a bar chart
166
+ fig3 = px.bar(
167
+ delay_counts,
168
+ x="Category",
169
+ y="Count",
170
+ title="Distribution of Checkout Delays",
171
+ labels={"Category": "Checkout Delay Category", "Count": "Number of Rentals"},
172
+ color="Category",
173
+ text=delay_counts["Percentage"].apply(lambda x: f"{x:.1f}%"),
174
+ color_discrete_map=custom_colors,
175
+ )
176
+ fig3.update_traces(textfont_color="black")
177
+ fig3.update_xaxes(showgrid=False, tickfont=dict(color='black'))
178
+ fig3.update_yaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'))
179
+ fig3.update_layout(xaxis_title="", yaxis_title="", title_font=dict(weight="bold"), showlegend=False, xaxis=dict(zeroline=True,zerolinecolor="black",zerolinewidth=2), plot_bgcolor="#BDDFD6")
180
+ st.plotly_chart(fig3, use_container_width=True, theme=None)
181
+ st.markdown("""
182
+ There is only 32.6% of rental checkout that are early or in time, without delay.
183
+ For 23.4% we don't have informations. And the majoruty of delays are less than 2 hours.
184
+ """)
185
+
186
+ # Count occurrences of each category grouped by checkin_type
187
+ delay_counts = data.groupby(["checkout_delay_category", "checkin_type"]).size().reset_index(name="Count")
188
+ delay_counts["Percentage"] = (delay_counts["Count"] / delay_counts["Count"].sum()) * 100
189
+ # Create a grouped bar chart
190
+ fig4 = px.bar(
191
+ delay_counts,
192
+ x="checkout_delay_category",
193
+ y="Count",
194
+ color="checkin_type",
195
+ title="Distribution of Checkout Delays by Check-in Type",
196
+ labels={"checkout_delay_category": "Checkout Delay Category", "Count": "Number of Rentals", "checkin_type": "Check-in Type"},
197
+ barmode="group", # Groups bars side by side
198
+ #text="Count",
199
+ text=delay_counts["Percentage"].apply(lambda x: f"{x:.1f}%"),
200
+ color_discrete_sequence=["#FFA500", "#3CB371"]
201
+ )
202
+ # Improve layout by setting custom order for x-axis
203
+ fig4.update_traces(textfont_color="black")
204
+ fig4.update_xaxes(showgrid=False, tickfont=dict(color='black'))
205
+ fig4.update_yaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'))
206
+ fig4.update_layout(xaxis_title="", yaxis_title="", title_font=dict(weight="bold"), xaxis=dict(zeroline=True,zerolinecolor="black",zerolinewidth=2), plot_bgcolor="#BDDFD6")
207
+ fig4.update_layout(xaxis={'categoryorder':'array', 'categoryarray': [
208
+ "Early or in time", "< 1 hour", "1 to 2 hours", "2 to 3 hours",
209
+ "3 to 6 hours", "6 to 12 hours", "12 to 24 hours", "1 day or more", "Unknown"
210
+ ]})
211
+ st.plotly_chart(fig4, use_container_width=True, theme=None)
212
+ st.markdown("""
213
+ There is much more delay problem with mobile checkin type than connect.
214
+ """)
215
+
216
+ st.markdown("""
217
+ Great ! Now for the following analysis, go to the next page "**๐Ÿ“Š Delays Analysis**" !
218
+ """)
219
+
220
+ #################################################################### DELAYS ANALYSIS ####################################################################
221
+
222
+ elif page == "๐Ÿ“Š Delays Analysis":
223
+ st.title("Analysis & Insights ๐Ÿ“Š")
224
+ st.markdown("""
225
+ Here, we analyze the delay problematic and how to solve it with threshold and a certain scope.
226
+
227
+ **Key Findings**:
228
+ - ๐Ÿš— A minimum delay of **X minutes** reduces scheduling conflicts.
229
+ - ๐Ÿ’ฐ Potential revenue impact: **Y% of total revenue**.
230
+ - โœ… Solving **Z% of problematic cases** with the policy.
231
+
232
+ *Visuals and explanations go here.*
233
+
234
+ In the following, we will focus on the next steps and questions:
235
+ * How often are drivers late for the next check-in? How does it impact the next driver?
236
+ * Which share of our ownerโ€™s revenue would potentially be affected by the feature?
237
+ * How many rentals would be affected by the feature depending on the threshold and scope we choose?
238
+ * How many problematic cases will it solve depending on the chosen threshold and scope?
239
+ """)
240
+
241
+ st.subheader("๐Ÿ“Œ - How often are drivers late for the next check-in? How does it impact the next driver?", divider="orange")
242
+
243
+ st.markdown("""
244
+ So, for the first question, here's the visualization of the check-out that are `late`, `early or in time` and the `unknown` data.
245
+ """)
246
+
247
+ # Count occurrences of category & group category as simple "late", "in time" or "unknown"
248
+ delay_drivers = data["checkout_delay_category"].apply(lambda x: "Early or in time" if x == "Early or in time"
249
+ else "Unkonwn" if x == "Unknown"
250
+ else "Late").value_counts().reset_index()
251
+ delay_drivers.columns = ["Category", "Count"]
252
+ delay_drivers["Percentage"] = (delay_drivers["Count"] / delay_drivers["Count"].sum()) * 100
253
+ # Create a bar chart
254
+ fig5 = px.bar(
255
+ delay_drivers,
256
+ x="Category",
257
+ y="Count",
258
+ labels={"Category": "Checkout Delay Category", "Count": "Number of Rentals"},
259
+ title="Distribution of Checkout Delays",
260
+ text=delay_drivers["Percentage"].apply(lambda x: f"{x:.1f}%"),
261
+ color_discrete_sequence=["#FFA500"],
262
+ )
263
+ fig5.update_traces(textfont_color="black")
264
+ fig5.update_xaxes(showgrid=False, tickfont=dict(color='black'))
265
+ fig5.update_yaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'))
266
+ fig5.update_layout(xaxis_title="", yaxis_title="", title_font=dict(weight="bold"), showlegend=False, xaxis=dict(zeroline=True,zerolinecolor="black",zerolinewidth=2), plot_bgcolor="#BDDFD6")
267
+ st.plotly_chart(fig5, use_container_width=True, theme=None)
268
+
269
+ # Count occurrences of each category
270
+ delay_counts = data["checkout_delay_category"].value_counts().reset_index()
271
+ delay_counts.columns = ["Category", "Count"]
272
+ delay_counts["Percentage"] = (delay_counts["Count"] / delay_counts["Count"].sum()) * 100
273
+ # Define custom colors
274
+ custom_colors = {
275
+ "Early or in time": "#FFA500", # Orange
276
+ }
277
+ # Assign green as the default color
278
+ for category in delay_counts["Category"]:
279
+ if category not in custom_colors:
280
+ custom_colors[category] = "#3CB371" # Green
281
+ # Create a bar chart
282
+ fig6 = px.bar(
283
+ delay_counts,
284
+ x="Category",
285
+ y="Count",
286
+ title="Distribution of Checkout Delays",
287
+ labels={"Category": "Checkout Delay Category", "Count": "Number of Rentals"},
288
+ color="Category",
289
+ text=delay_counts["Percentage"].apply(lambda x: f"{x:.1f}%"),
290
+ color_discrete_map=custom_colors,
291
+ )
292
+ fig6.update_traces(textfont_color="black")
293
+ fig6.update_xaxes(showgrid=False, tickfont=dict(color='black'))
294
+ fig6.update_yaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'))
295
+ fig6.update_layout(xaxis_title="", yaxis_title="", title_font=dict(weight="bold"), showlegend=False, xaxis=dict(zeroline=True,zerolinecolor="black",zerolinewidth=2), plot_bgcolor="#BDDFD6")
296
+ st.plotly_chart(fig6, use_container_width=True, theme=None)
297
+
298
+ st.markdown("""
299
+ Only 32.6% of the check-out are early or in time, whereas almost half of the check-out (44%) are late.
300
+ """)
301
+
302
+ st.markdown("""
303
+ Now, for the 2nd question, let's see how delays impact the next driver.
304
+ """)
305
+
306
+ mean_delay_impact = data["time_delta_with_previous_rental_in_minutes"].mean()
307
+ min_delay_impact = data["time_delta_with_previous_rental_in_minutes"].min()
308
+ max_delay_impact = data["time_delta_with_previous_rental_in_minutes"].max()
309
+
310
+ st.markdown("#### Delay impacting informations on the next driver ๐Ÿš˜:")
311
+
312
+ st.write(f"โ–ช๏ธ*Average delay impacting next driver:* {mean_delay_impact:.2f} minutes")
313
+ st.write(f"โ–ช๏ธ*Minimum delay impacting next driver:* {min_delay_impact:.2f} minutes")
314
+ st.write(f"โ–ช๏ธ*Maximum delay impacting next driver:* {max_delay_impact:.2f} minutes")
315
+
316
+ delay_impact = data
317
+
318
+ delay_impact["delta-late_checkout"] = delay_impact["time_delta_with_previous_rental_in_minutes"] - delay_impact["delay_at_checkout_in_minutes"]
319
+ #if negative delta - late checkout, it means that the new rental cannot do its check-in
320
+ negative_delay_impact = delay_impact[delay_impact["delta-late_checkout"] < 0]
321
+ late_checkout = delay_drivers[delay_drivers["Category"] == "Late"]["Count"][0]
322
+ nb_problematic_checkin_late = len(negative_delay_impact)
323
+ # percentage calculation
324
+ problematic_delays_rate = nb_problematic_checkin_late*100/late_checkout
325
+ st.write(f"โ–ช๏ธAmong all the delays ({late_checkout}), {problematic_delays_rate:.3f}% \n of delays caused problems to the next rental because the checkout\n was made later than the new rental checkin.")
326
+
327
+ # Calculate the average duration of problematic delays
328
+ average_problematic_delay = negative_delay_impact['delay_at_checkout_in_minutes'].mean()
329
+ # Calculate the average duration of non-problematic delays
330
+ average_non_problematic_delay = data[data['delay_at_checkout_in_minutes'] > 0]['delay_at_checkout_in_minutes'].mean()
331
+ # Compare the averages
332
+ st.write(f"โ–ช๏ธAverage Duration of Problematic Delays: {average_problematic_delay:.0f} minutes")
333
+ st.write(f"โ–ช๏ธAverage Duration of Non-Problematic Delays: {average_non_problematic_delay:.0f} minutes")
334
+
335
+ delay_impact["problematic_delay"] = delay_impact["delta-late_checkout"] < 0
336
+ delay_impact["problematic_delay"].value_counts()
337
+
338
+ fig7 = px.histogram(delay_impact, x="problematic_delay", color_discrete_sequence=["#FFA500"], title="Proportion of problematic delays"
339
+ )
340
+ fig7.update_xaxes(
341
+ categoryorder='array',
342
+ categoryarray=["Problematic", "Non-Problematic"],
343
+ showgrid=False, tickfont=dict(color='black')
344
+ )
345
+ fig7.update_yaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'))
346
+ fig7.add_annotation(x=3, y=10000,text=f"Avg Delay: {average_problematic_delay:.2f} min",showarrow=False)
347
+ fig7.add_annotation(x=2, y=10000,text=f"Avg Delay: {average_non_problematic_delay:.2f} min",showarrow=False)
348
+ fig7.update_layout(
349
+ xaxis=dict(
350
+ tickmode='array',
351
+ tickvals=[True, False],
352
+ ticktext=["Problematic Delay", "Non Problematic Delay"],
353
+ zeroline=True,zerolinecolor="black",zerolinewidth=2
354
+ ),
355
+ xaxis_title="",
356
+ yaxis_title="",
357
+ title_font=dict(weight="bold"),
358
+ showlegend=False,
359
+ plot_bgcolor="#BDDFD6"
360
+ )
361
+ fig7.update_traces(textfont_color="black")
362
+ st.plotly_chart(fig7, use_container_width=True, theme=None)
363
+
364
+ st.markdown("""
365
+ For the majority of cases, it poses no problem to have delay, but for 2.857% of the case it is problematic for the following rental.
366
+ """)
367
+
368
+ st.subheader("๐Ÿ“Œ - Which share of our ownerโ€™s revenue would potentially be affected by the feature?", divider="orange")
369
+
370
+ # Define the treshold of minimum time between 2 locations (minutes)
371
+ thresholds = [30, 60, 90, 120, 180, 360, 720, 1440] # Example : 1 hour
372
+
373
+ data["mean_price_per_rental"] = mean_rental_per_day
374
+
375
+ treshold_data = data
376
+ percentage_revenue_impacted = []
377
+ percentage_revenue_impacted_displaying = {}
378
+
379
+ for threshold in thresholds:
380
+ treshold_data[f"affected_rentals_{threshold}"] = data["time_delta_with_previous_rental_in_minutes"] <= threshold
381
+ affected_rentals = data[data["time_delta_with_previous_rental_in_minutes"] <= threshold]
382
+ affected_revenue = affected_rentals["mean_price_per_rental"].sum()
383
+ total_revenue = data["mean_price_per_rental"].sum()
384
+ revenue_impact = (affected_revenue / total_revenue) * 100
385
+ percentage_revenue_impacted.append(revenue_impact)
386
+ percentage_revenue_impacted_displaying[threshold] = round(revenue_impact, 3)
387
+
388
+ col1, col2 = st.columns([1, 2])
389
+ with col1:
390
+ # Select a threshold
391
+ selected_threshold = st.selectbox("Select a threshold โณ (in minutes):", thresholds, key="selectbox_1")
392
+ # Display impacted revenue percentage
393
+ st.metric(label="๐Ÿ’ฐ Impacted Revenue", value=f"{percentage_revenue_impacted_displaying[selected_threshold]}%")
394
+
395
+ with col2:
396
+ affected_counts = [treshold_data[f"affected_rentals_{threshold}"].value_counts().get(True, 0) for threshold in thresholds]
397
+ affected_rentals_plot = pd.DataFrame({"Threshold (min)": thresholds, "Affected rentals": affected_counts})
398
+
399
+ fig8 = px.line(affected_rentals_plot, x="Threshold (min)", y="Affected rentals", text="Affected rentals",
400
+ title="Number of rentals affected by the treshold",
401
+ color_discrete_sequence=["#3CB371"],)
402
+ fig8.update_traces(textposition='top center', textfont_color="black")
403
+ fig8.update_xaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'), showline=True, linewidth=2, linecolor='black')
404
+ fig8.update_yaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'))
405
+ fig8.update_layout(xaxis_title="", yaxis_title="", title_font=dict(weight="bold"), showlegend=False, xaxis=dict(zeroline=True,zerolinecolor="black",zerolinewidth=2), plot_bgcolor="#BDDFD6")
406
+ st.plotly_chart(fig8, use_container_width=True, theme=None)
407
+
408
+
409
+ st.subheader("๐Ÿ“Œ - How many rentals would be affected by the feature depending on the threshold and scope we choose?", divider="orange")
410
+
411
+ all_affected_list = []
412
+ all_affected_display = {}
413
+ connect_affected_list = []
414
+ connect_affected_display = {}
415
+ all_affected_percentage = {}
416
+ connect_affected_percentage = {}
417
+
418
+ for threshold in thresholds:
419
+ all_rentals = len(data)
420
+ all_affected = data[data["time_delta_with_previous_rental_in_minutes"] <= threshold].shape[0]
421
+ all_affected_list.append(all_affected)
422
+ connect_affected = data[(data["time_delta_with_previous_rental_in_minutes"] <= threshold) &
423
+ (data["checkin_type"] == "connect")].shape[0]
424
+ connect_affected_list.append(connect_affected)
425
+ all_affected_display[threshold] = all_affected
426
+ connect_affected_display[threshold] = connect_affected
427
+ all_affected_percentage[threshold] = (all_affected / all_rentals) * 100
428
+ connect_affected_percentage[threshold] = (connect_affected / all_rentals) * 100
429
+
430
+ # Select a threshold
431
+ selected_threshold = st.selectbox("Select a threshold โณ (in minutes):", thresholds, key="selectbox_2")
432
+ # Add a title before metrics
433
+ st.markdown(f"#### ๐Ÿš— Rentals Affected by the {selected_threshold}-Minutes Threshold")
434
+
435
+ col1, col2 = st.columns(2)
436
+ # Display metrics side by side
437
+ with col1:
438
+ st.metric(label="๐Ÿ“ฒ All check-ins affected in number โ‡ฉ", value=f"{all_affected_display[selected_threshold]}")
439
+ st.metric(label="๐Ÿ“ฒ All check-ins affected in % โ‡ฉ", value=f"{all_affected_percentage[selected_threshold]:.3f}")
440
+
441
+ with col2:
442
+ st.metric(label="๐Ÿ›œ Connect check-ins affected in number โ‡ฉ", value=f"{connect_affected_display[selected_threshold]}")
443
+ st.metric(label="๐Ÿ›œ Connect check-ins affected in % โ‡ฉ", value=f"{connect_affected_percentage[selected_threshold]:.3f}")
444
+
445
+ data_affected = pd.DataFrame({ "thresholds" : thresholds,
446
+ "all_affected" : all_affected_list,
447
+ "connect_affected" : connect_affected_list})
448
+
449
+ fig9 = px.scatter(data_affected, x='thresholds', y='all_affected',
450
+ color_discrete_sequence=["#FFA500"],
451
+ labels={'all_affected': 'All Affected'},
452
+ title="Rentals affected by Thresholds in function of the type of check-in")
453
+ # Add a line for 'all_affected'
454
+ fig9.add_trace(go.Scatter(x=data_affected['thresholds'], y=data_affected['all_affected'],
455
+ mode='lines+markers+text', line=dict(color='#FFA500'), name='All Affected', text=data_affected['all_affected']))
456
+
457
+ fig9.add_trace(go.Scatter(x=data_affected['thresholds'], y=data_affected['connect_affected'],
458
+ mode='lines+markers+text', marker_color='#3CB371', name='Connect Affected',
459
+ text=data_affected['connect_affected'],)) # Texte ร  afficher sur les marqueurs
460
+ fig9.update_traces(textposition='top center', textfont_color="black")
461
+ fig9.update_xaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'), showline=True, linewidth=2, linecolor='black')
462
+ fig9.update_yaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'))
463
+ fig9.update_layout(xaxis_title="", yaxis_title="", title_font=dict(weight="bold"), showlegend=True, xaxis=dict(zeroline=True,zerolinecolor="black",zerolinewidth=2), plot_bgcolor="#BDDFD6")
464
+ st.plotly_chart(fig9, use_container_width=True, theme=None)
465
+
466
+ st.markdown("""
467
+ There are less rentals affected with the scope only on connected check-in than all
468
+ (mobile + connect) check-in. Moreover, as it could be expected, more rentals are
469
+ impacted with an increasing of the threshold choice.""")
470
+
471
+ st.subheader("๐Ÿ“Œ - How many problematic cases will it solve depending on the chosen threshold and scope?", divider="orange")
472
+
473
+ solved_cases_all_list = []
474
+ solved_cases_connect_list = []
475
+
476
+ for threshold, i in zip(thresholds, range(len(thresholds))):
477
+
478
+ problematic_cases = negative_delay_impact[(negative_delay_impact["delay_at_checkout_in_minutes"] <= threshold)]
479
+ problematic_connectec_case = negative_delay_impact[(negative_delay_impact["delay_at_checkout_in_minutes"] <= threshold) &
480
+ (negative_delay_impact["checkin_type"] == "connect")]
481
+ total_problems_cases = len(negative_delay_impact)
482
+ total_connect_pb_cases = len(negative_delay_impact[negative_delay_impact["checkin_type"] == "connect"])
483
+
484
+ solved_cases = problematic_cases.shape[0]
485
+ solved_cases_all_list.append(solved_cases)
486
+ solved_cases_connect = problematic_connectec_case.shape[0]
487
+ solved_cases_connect_list.append(solved_cases_connect)
488
+
489
+ percentage_solved_all = (solved_cases / total_problems_cases) * 100
490
+ percentage_connect_solved = (solved_cases_connect / total_connect_pb_cases) * 100
491
+
492
+ # Convert to DataFrame
493
+ df_solved_cases = pd.DataFrame({
494
+ "Threshold (minutes)": thresholds,
495
+ "Solved Cases (All Check-ins)": solved_cases_all_list,
496
+ "Solved Cases (Connect Check-ins)": solved_cases_connect_list,
497
+ "Revenue Impacted (%)": percentage_revenue_impacted
498
+ })
499
+
500
+ # Select a threshold with a slider
501
+ selected_threshold = st.selectbox("Select a threshold โณ (in minutes):", thresholds, key="selectbox_3")
502
+
503
+ # Get values for selected threshold
504
+ selected_data = df_solved_cases[df_solved_cases["Threshold (minutes)"] == selected_threshold].iloc[0]
505
+
506
+ # Display Metrics in Two Columns
507
+ col1, col2, col3 = st.columns(3)
508
+ with col1:
509
+ st.metric(label="๐Ÿ“ฒ All Check-ins Solved", value=f"{selected_data['Solved Cases (All Check-ins)']}")
510
+ with col2:
511
+ st.metric(label="๐Ÿ›œ Connect Check-ins Solved", value=f"{selected_data['Solved Cases (Connect Check-ins)']}")
512
+ with col3:
513
+ st.metric(label="๐Ÿ’ฐ Revenue Impacted", value=f"{selected_data['Revenue Impacted (%)']:.2f} %")
514
+
515
+ # Create the figure
516
+ fig10 = go.Figure()
517
+ # Add line for "All Check-ins"
518
+ fig10.add_trace(go.Scatter(
519
+ x=thresholds,
520
+ y=solved_cases_all_list,
521
+ mode="lines+markers",
522
+ name="Solved Cases (All Check-ins)",
523
+ marker=dict(color="#FFA500")
524
+ ))
525
+ # Add line for "Connect Check-ins"
526
+ fig10.add_trace(go.Scatter(
527
+ x=thresholds,
528
+ y=solved_cases_connect_list,
529
+ mode="lines+markers",
530
+ name="Solved Cases (Connect Check-ins)",
531
+ marker=dict(color="#3CB371")
532
+ ))
533
+ # Add vertical dashed lines with text annotations
534
+ for i, threshold in enumerate(thresholds):
535
+ max_y_value = solved_cases_all_list[i] # Ensure line stops at "Solved Cases (All Check-ins)"
536
+
537
+ # Add dashed line from y=0 to y=max_y_value
538
+ fig10.add_trace(go.Scatter(
539
+ x=[threshold, threshold], # Vertical line at threshold
540
+ y=[0, max_y_value], # Stop at max_y_value
541
+ mode="lines",
542
+ line=dict(color="red", width=1.5, dash="dash"),
543
+ name="Revenue Impact Annotation" if i == 0 else None, # Show legend only once
544
+ showlegend=(i == 0)
545
+ ))
546
+ # Add text annotation slightly above the dashed line
547
+ fig10.add_annotation(
548
+ x=threshold,
549
+ y=max_y_value + 20, # Position slightly above the dashed line
550
+ text=f"{percentage_revenue_impacted[i]:.2f}%", # Format percentage
551
+ showarrow=False,
552
+ font=dict(size=10, color="red"),
553
+ align="center",
554
+ )
555
+ fig10.update_traces(textposition='top center', textfont_color="black")
556
+ fig10.update_xaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'), showline=True, linewidth=2, linecolor='black')
557
+ fig10.update_yaxes(showgrid=True, gridcolor='#A9A9A9', tickfont=dict(color='black'))
558
+ fig10.update_layout(title="Number of Problematic Cases Solved by Threshold",xaxis_title="",yaxis_title="", title_font=dict(weight="bold"),showlegend=True, xaxis=dict(zeroline=True,zerolinecolor="black",zerolinewidth=2), plot_bgcolor="#BDDFD6")
559
+ st.plotly_chart(fig10, use_container_width=True, theme=None)
560
+
561
+ st.markdown("""
562
+ #### ๐Ÿ“Š Data Table""")
563
+ st.dataframe(df_solved_cases)
564
+
565
+ st.markdown("""
566
+ Now, we can see the problematic cases solved in function of the check-in type (connect or all {mobile๐Ÿ“ฒ + connect๐Ÿ›œ})
567
+ with the impacted revenue percentage of each threshold. For me the best choice to solve problem without too much
568
+ economical impact is to choose the threshold of **180** or **360** minutes, for the scope of all check-in type.""")
569
+
570
+ st.markdown("""
571
+ โœจ Thanks for reading all the way through! I hope you enjoyed it and found it interesting.
572
+ Go to the last page, `The End & Thank You`, for a little surprise and links to my other worksโ€ผ๏ธ
573
+ """)
574
+
575
+ #################################################################### END & THANK YOU PAGE ####################################################################
576
+
577
+ elif page == "๐ŸŽ‰ The End & Thank You":
578
+ st.title("Thank You for Exploring! ๐ŸŽ‰")
579
+
580
+ # Create two columns
581
+ col1, col2 = st.columns([1, 2]) # Adjust column ratio (1:2 for image & text)
582
+
583
+ # Add an image in the first column
584
+ with col1:
585
+ st.image("ChibiElf1.png", use_container_width=True)
586
+
587
+ # Add text in the second column
588
+ with col2:
589
+ st.markdown("""
590
+ **Final Thoughts**
591
+ - ๐Ÿš€ This analysis helps optimize the rental platform.
592
+ - ๐Ÿ”Ž Finding the right balance between user experience and revenue impact is key.
593
+
594
+ **๐Ÿ™ Thank you for your time!**
595
+
596
+ ๐Ÿ“ฉ Feel free to reach out for more insights.
597
+
598
+ Here are the links for my other works on **Github** & **Linkedin**:
599
+ """)
600
+
601
+ # Define the GitHub and LinkedIn URLs
602
+ github_url = "https://github.com/HyraXuna?tab=repositories"
603
+ linkedin_url = "https://www.linkedin.com/in/youenn-patat-46b59b246/"
604
+
605
+ # Display clickable images for GitHub and LinkedIn
606
+ st.markdown(
607
+ f"""
608
+ <div style="display: flex; justify-content: center; gap: 20px;">
609
+ <a href="{github_url}" target="_blank">
610
+ <img src="https://cdn-icons-png.flaticon.com/512/25/25231.png" width="40">
611
+ </a>
612
+ <a href="{linkedin_url}" target="_blank">
613
+ <img src="https://cdn-icons-png.flaticon.com/512/174/174857.png" width="40">
614
+ </a>
615
+ </div>
616
+ """,
617
+ unsafe_allow_html=True
618
+ )
619
+
620
+ st.balloons() # ๐ŸŽˆ Fun effect for celebration!
621
+
622
+ ### Footer
623
+ st.markdown("---")
624
+
625
+ st.markdown(
626
+ """
627
+ <div style="text-align: center;">
628
+ <p>If you want to see more, check out my <strong>Github</strong> ๐Ÿ“–</p>
629
+ <a href="https://github.com/HyraXuna?tab=repositories" target="_blank">
630
+ <img src="https://cdn-icons-png.flaticon.com/512/25/25231.png" width="40">
631
+ </a>
632
+ </div>
633
+ """,
634
+ unsafe_allow_html=True
635
+ )
636
+
637
+ st.markdown("---")
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ boto3
2
+ pandas
3
+ gunicorn
4
+ streamlit
5
+ scikit-learn
6
+ matplotlib
7
+ seaborn
8
+ plotly