Yashvj123 commited on
Commit
ef1891d
·
verified ·
1 Parent(s): 9efed1e

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +47 -23
app.py CHANGED
@@ -111,7 +111,7 @@ if st.session_state.current_page == "Model Pipeline":
111
 
112
  st.markdown(
113
  """
114
- <div style="text-align: center; margin-top: 30px;">
115
  <a href="https://github.com/Yashvj22/Life_Expectancy_Model" target="_blank" style="
116
  background-color: #007bff;
117
  color: white;
@@ -132,7 +132,7 @@ if st.session_state.current_page == "Model Pipeline":
132
  st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
133
 
134
  st.markdown('''
135
- <h2 style="color:#5d3fd3; text-align:center;"> About Author</h2>
136
  <div style="background-color:#f5f5f5; border-radius:10px; padding:20px; margin-top:20px;">
137
  <p style="font-size:16px; text-align:center; font-family:Georgia; line-height:1.6; color:#000;">
138
  Hello! I’m <b>Yash Jadhav</b>, a passionate <span style="color:#FF6347;">Data Scientist</span>
@@ -312,6 +312,8 @@ elif st.session_state.current_page == "Simple EDA":
312
  elif st.session_state.current_page == "Data Pre-processing":
313
  st.markdown("<h1 class='title'>Data Preprocessing</h1>", unsafe_allow_html=True)
314
 
 
 
315
  st.markdown("<h2 class='subtitle' style='text-align: center;'>Handling Missing Values</h2>", unsafe_allow_html=True)
316
 
317
  st.markdown("<br>", unsafe_allow_html=True)
@@ -320,46 +322,68 @@ elif st.session_state.current_page == "Data Pre-processing":
320
  <h5 style="text-align: center;">
321
  <b>Using "Median" Imputation to Fill Highly Skewed Data</b>
322
  </h5>
323
- <p style="text-align: justify;">
324
- Median imputation is used to handle missing values in columns where data distribution is skewed.
325
- This method is more robust than mean imputation in such cases, as it prevents the effect of outliers
326
- from distorting the dataset. For example, GDP, Population, and Adult Mortality tend to have extreme values,
327
- making median a better choice for filling in missing data.
328
- </p>
329
  """, unsafe_allow_html=True)
330
 
331
- st.markdown("<br>", unsafe_allow_html=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
332
 
333
  st.markdown("""
334
  <h5 style="text-align: center;">
335
  <b>Mean Imputation for Columns with Small Missing Values and Normally Distributed Data</b>
336
  </h5>
337
- <p style="text-align: justify;">
338
- Mean imputation is applied to columns where missing values are relatively small and the data follows a normal
339
- distribution. This method ensures that the overall distribution remains unchanged. Columns like BMI, Polio,
340
- and Schooling are typically well-suited for this approach as they do not contain extreme outliers that could
341
- distort the mean.
342
- </p>
343
  """, unsafe_allow_html=True)
344
 
345
- st.markdown("<br>", unsafe_allow_html=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
346
 
347
  st.markdown("""
348
  <h5 style="text-align: center;">
349
  <b>Applying One-Hot Encoding on "Status" Column</b>
350
  </h5>
351
- <p style="text-align: justify;">
352
- The "Status" column contains categorical data, differentiating countries as either <b>Developed</b> or
353
- <b>Developing</b>. Since machine learning models work better with numerical data, we apply One-Hot Encoding,
354
- which converts this categorical variable into a numerical format. We use the "drop='first'" parameter to avoid
355
- multicollinearity by keeping only one of the binary categories.
356
- </p>
357
  """, unsafe_allow_html=True)
358
 
359
- st.markdown("<br>", unsafe_allow_html=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
360
 
361
  if st.button("🔙 Go Back to Model Pipeline"):
362
  switch_page("Model Pipeline")
 
363
 
364
 
365
  elif st.session_state.current_page == "EDA":
 
111
 
112
  st.markdown(
113
  """
114
+ <div style="text-align: center;">
115
  <a href="https://github.com/Yashvj22/Life_Expectancy_Model" target="_blank" style="
116
  background-color: #007bff;
117
  color: white;
 
132
  st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
133
 
134
  st.markdown('''
135
+ <h2 style="text-align:center;"> About Author</h2>
136
  <div style="background-color:#f5f5f5; border-radius:10px; padding:20px; margin-top:20px;">
137
  <p style="font-size:16px; text-align:center; font-family:Georgia; line-height:1.6; color:#000;">
138
  Hello! I’m <b>Yash Jadhav</b>, a passionate <span style="color:#FF6347;">Data Scientist</span>
 
312
  elif st.session_state.current_page == "Data Pre-processing":
313
  st.markdown("<h1 class='title'>Data Preprocessing</h1>", unsafe_allow_html=True)
314
 
315
+ st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
316
+
317
  st.markdown("<h2 class='subtitle' style='text-align: center;'>Handling Missing Values</h2>", unsafe_allow_html=True)
318
 
319
  st.markdown("<br>", unsafe_allow_html=True)
 
322
  <h5 style="text-align: center;">
323
  <b>Using "Median" Imputation to Fill Highly Skewed Data</b>
324
  </h5>
 
 
 
 
 
 
325
  """, unsafe_allow_html=True)
326
 
327
+ st.markdown("""
328
+ <div style="
329
+ border: 1px solid #ddd;
330
+ border-radius: 8px;
331
+ padding: 15px;
332
+ background-color: #f9f9f9;
333
+ text-align: justify;">
334
+ Median imputation is used for columns where data distribution is highly skewed.
335
+ This approach ensures that extreme values do not overly influence the dataset.
336
+ Examples include GDP, Population, and Adult Mortality.
337
+ </div>
338
+ """, unsafe_allow_html=True)
339
+
340
+ st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
341
 
342
  st.markdown("""
343
  <h5 style="text-align: center;">
344
  <b>Mean Imputation for Columns with Small Missing Values and Normally Distributed Data</b>
345
  </h5>
 
 
 
 
 
 
346
  """, unsafe_allow_html=True)
347
 
348
+ st.markdown("""
349
+ <div style="
350
+ border: 1px solid #ddd;
351
+ border-radius: 8px;
352
+ padding: 15px;
353
+ background-color: #f9f9f9;
354
+ text-align: justify;">
355
+ Mean imputation is applied when missing values are small and the data is normally distributed.
356
+ This helps maintain the overall dataset structure without being affected by extreme values.
357
+ Suitable columns include BMI, Polio, and Schooling.
358
+ </div>
359
+ """, unsafe_allow_html=True)
360
+
361
+ st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
362
 
363
  st.markdown("""
364
  <h5 style="text-align: center;">
365
  <b>Applying One-Hot Encoding on "Status" Column</b>
366
  </h5>
 
 
 
 
 
 
367
  """, unsafe_allow_html=True)
368
 
369
+ st.markdown("""
370
+ <div style="
371
+ border: 1px solid #ddd;
372
+ border-radius: 8px;
373
+ padding: 15px;
374
+ background-color: #f9f9f9;
375
+ text-align: justify;">
376
+ The "Status" column categorizes countries as either Developed or Developing.
377
+ One-Hot Encoding is used to convert this categorical variable into a numerical format
378
+ suitable for machine learning models. The "drop='first'" parameter is applied to prevent
379
+ multicollinearity.
380
+ </div>
381
+ """, unsafe_allow_html=True)
382
+
383
 
384
  if st.button("🔙 Go Back to Model Pipeline"):
385
  switch_page("Model Pipeline")
386
+
387
 
388
 
389
  elif st.session_state.current_page == "EDA":