Update app.py
Browse files
app.py
CHANGED
|
@@ -111,7 +111,7 @@ if st.session_state.current_page == "Model Pipeline":
|
|
| 111 |
|
| 112 |
st.markdown(
|
| 113 |
"""
|
| 114 |
-
<div style="text-align: center;
|
| 115 |
<a href="https://github.com/Yashvj22/Life_Expectancy_Model" target="_blank" style="
|
| 116 |
background-color: #007bff;
|
| 117 |
color: white;
|
|
@@ -132,7 +132,7 @@ if st.session_state.current_page == "Model Pipeline":
|
|
| 132 |
st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
|
| 133 |
|
| 134 |
st.markdown('''
|
| 135 |
-
<h2 style="
|
| 136 |
<div style="background-color:#f5f5f5; border-radius:10px; padding:20px; margin-top:20px;">
|
| 137 |
<p style="font-size:16px; text-align:center; font-family:Georgia; line-height:1.6; color:#000;">
|
| 138 |
Hello! I’m <b>Yash Jadhav</b>, a passionate <span style="color:#FF6347;">Data Scientist</span>
|
|
@@ -312,6 +312,8 @@ elif st.session_state.current_page == "Simple EDA":
|
|
| 312 |
elif st.session_state.current_page == "Data Pre-processing":
|
| 313 |
st.markdown("<h1 class='title'>Data Preprocessing</h1>", unsafe_allow_html=True)
|
| 314 |
|
|
|
|
|
|
|
| 315 |
st.markdown("<h2 class='subtitle' style='text-align: center;'>Handling Missing Values</h2>", unsafe_allow_html=True)
|
| 316 |
|
| 317 |
st.markdown("<br>", unsafe_allow_html=True)
|
|
@@ -320,46 +322,68 @@ elif st.session_state.current_page == "Data Pre-processing":
|
|
| 320 |
<h5 style="text-align: center;">
|
| 321 |
<b>Using "Median" Imputation to Fill Highly Skewed Data</b>
|
| 322 |
</h5>
|
| 323 |
-
<p style="text-align: justify;">
|
| 324 |
-
Median imputation is used to handle missing values in columns where data distribution is skewed.
|
| 325 |
-
This method is more robust than mean imputation in such cases, as it prevents the effect of outliers
|
| 326 |
-
from distorting the dataset. For example, GDP, Population, and Adult Mortality tend to have extreme values,
|
| 327 |
-
making median a better choice for filling in missing data.
|
| 328 |
-
</p>
|
| 329 |
""", unsafe_allow_html=True)
|
| 330 |
|
| 331 |
-
st.markdown("
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 332 |
|
| 333 |
st.markdown("""
|
| 334 |
<h5 style="text-align: center;">
|
| 335 |
<b>Mean Imputation for Columns with Small Missing Values and Normally Distributed Data</b>
|
| 336 |
</h5>
|
| 337 |
-
<p style="text-align: justify;">
|
| 338 |
-
Mean imputation is applied to columns where missing values are relatively small and the data follows a normal
|
| 339 |
-
distribution. This method ensures that the overall distribution remains unchanged. Columns like BMI, Polio,
|
| 340 |
-
and Schooling are typically well-suited for this approach as they do not contain extreme outliers that could
|
| 341 |
-
distort the mean.
|
| 342 |
-
</p>
|
| 343 |
""", unsafe_allow_html=True)
|
| 344 |
|
| 345 |
-
st.markdown("
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 346 |
|
| 347 |
st.markdown("""
|
| 348 |
<h5 style="text-align: center;">
|
| 349 |
<b>Applying One-Hot Encoding on "Status" Column</b>
|
| 350 |
</h5>
|
| 351 |
-
<p style="text-align: justify;">
|
| 352 |
-
The "Status" column contains categorical data, differentiating countries as either <b>Developed</b> or
|
| 353 |
-
<b>Developing</b>. Since machine learning models work better with numerical data, we apply One-Hot Encoding,
|
| 354 |
-
which converts this categorical variable into a numerical format. We use the "drop='first'" parameter to avoid
|
| 355 |
-
multicollinearity by keeping only one of the binary categories.
|
| 356 |
-
</p>
|
| 357 |
""", unsafe_allow_html=True)
|
| 358 |
|
| 359 |
-
st.markdown("
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 360 |
|
| 361 |
if st.button("🔙 Go Back to Model Pipeline"):
|
| 362 |
switch_page("Model Pipeline")
|
|
|
|
| 363 |
|
| 364 |
|
| 365 |
elif st.session_state.current_page == "EDA":
|
|
|
|
| 111 |
|
| 112 |
st.markdown(
|
| 113 |
"""
|
| 114 |
+
<div style="text-align: center;">
|
| 115 |
<a href="https://github.com/Yashvj22/Life_Expectancy_Model" target="_blank" style="
|
| 116 |
background-color: #007bff;
|
| 117 |
color: white;
|
|
|
|
| 132 |
st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
|
| 133 |
|
| 134 |
st.markdown('''
|
| 135 |
+
<h2 style="text-align:center;"> About Author</h2>
|
| 136 |
<div style="background-color:#f5f5f5; border-radius:10px; padding:20px; margin-top:20px;">
|
| 137 |
<p style="font-size:16px; text-align:center; font-family:Georgia; line-height:1.6; color:#000;">
|
| 138 |
Hello! I’m <b>Yash Jadhav</b>, a passionate <span style="color:#FF6347;">Data Scientist</span>
|
|
|
|
| 312 |
elif st.session_state.current_page == "Data Pre-processing":
|
| 313 |
st.markdown("<h1 class='title'>Data Preprocessing</h1>", unsafe_allow_html=True)
|
| 314 |
|
| 315 |
+
st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
|
| 316 |
+
|
| 317 |
st.markdown("<h2 class='subtitle' style='text-align: center;'>Handling Missing Values</h2>", unsafe_allow_html=True)
|
| 318 |
|
| 319 |
st.markdown("<br>", unsafe_allow_html=True)
|
|
|
|
| 322 |
<h5 style="text-align: center;">
|
| 323 |
<b>Using "Median" Imputation to Fill Highly Skewed Data</b>
|
| 324 |
</h5>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 325 |
""", unsafe_allow_html=True)
|
| 326 |
|
| 327 |
+
st.markdown("""
|
| 328 |
+
<div style="
|
| 329 |
+
border: 1px solid #ddd;
|
| 330 |
+
border-radius: 8px;
|
| 331 |
+
padding: 15px;
|
| 332 |
+
background-color: #f9f9f9;
|
| 333 |
+
text-align: justify;">
|
| 334 |
+
Median imputation is used for columns where data distribution is highly skewed.
|
| 335 |
+
This approach ensures that extreme values do not overly influence the dataset.
|
| 336 |
+
Examples include GDP, Population, and Adult Mortality.
|
| 337 |
+
</div>
|
| 338 |
+
""", unsafe_allow_html=True)
|
| 339 |
+
|
| 340 |
+
st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
|
| 341 |
|
| 342 |
st.markdown("""
|
| 343 |
<h5 style="text-align: center;">
|
| 344 |
<b>Mean Imputation for Columns with Small Missing Values and Normally Distributed Data</b>
|
| 345 |
</h5>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 346 |
""", unsafe_allow_html=True)
|
| 347 |
|
| 348 |
+
st.markdown("""
|
| 349 |
+
<div style="
|
| 350 |
+
border: 1px solid #ddd;
|
| 351 |
+
border-radius: 8px;
|
| 352 |
+
padding: 15px;
|
| 353 |
+
background-color: #f9f9f9;
|
| 354 |
+
text-align: justify;">
|
| 355 |
+
Mean imputation is applied when missing values are small and the data is normally distributed.
|
| 356 |
+
This helps maintain the overall dataset structure without being affected by extreme values.
|
| 357 |
+
Suitable columns include BMI, Polio, and Schooling.
|
| 358 |
+
</div>
|
| 359 |
+
""", unsafe_allow_html=True)
|
| 360 |
+
|
| 361 |
+
st.markdown("<hr style='border:1px solid #ddd;'>", unsafe_allow_html=True)
|
| 362 |
|
| 363 |
st.markdown("""
|
| 364 |
<h5 style="text-align: center;">
|
| 365 |
<b>Applying One-Hot Encoding on "Status" Column</b>
|
| 366 |
</h5>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 367 |
""", unsafe_allow_html=True)
|
| 368 |
|
| 369 |
+
st.markdown("""
|
| 370 |
+
<div style="
|
| 371 |
+
border: 1px solid #ddd;
|
| 372 |
+
border-radius: 8px;
|
| 373 |
+
padding: 15px;
|
| 374 |
+
background-color: #f9f9f9;
|
| 375 |
+
text-align: justify;">
|
| 376 |
+
The "Status" column categorizes countries as either Developed or Developing.
|
| 377 |
+
One-Hot Encoding is used to convert this categorical variable into a numerical format
|
| 378 |
+
suitable for machine learning models. The "drop='first'" parameter is applied to prevent
|
| 379 |
+
multicollinearity.
|
| 380 |
+
</div>
|
| 381 |
+
""", unsafe_allow_html=True)
|
| 382 |
+
|
| 383 |
|
| 384 |
if st.button("🔙 Go Back to Model Pipeline"):
|
| 385 |
switch_page("Model Pipeline")
|
| 386 |
+
|
| 387 |
|
| 388 |
|
| 389 |
elif st.session_state.current_page == "EDA":
|