Spaces:
Build error
Build error
File size: 5,049 Bytes
619bdd7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | '''
Achmad Dhani
Objective : Creating EDA page specifically to explain insights from EDA
'''
import streamlit as st
import pandas as pd
from PIL import Image
def run():
'''
Function for EDA page
'''
st.title('Exploration Data Analysis Section')
df= pd.read_csv('water_potability.csv') # reading CSV
#============================= Display Data ===============================
col1, col2 = st.columns(2)
with col1.expander("View the top 10 entries of the original dataset"):
st.table(df.head(10))
with col2.expander("View the bottom 10 entries of the original dataset"):
st.table(df.tail(10))
#============================= Correlation =====================================
st.subheader('Correlation Matrix Between The Chemicals')
col3, col4 = st.columns(2)
# 1st image
col3.write('Pearsons Correlation Matrix')
image1 = Image.open('pearsons.png')
col3.image(image1, caption='Figure 1 Pearsons Correlation Matrix of All Chemicals')
# 2nd image
col4.write('Spearman Correlation Matrix')
image2 = Image.open('spearman.png')
col4.image(image2, caption='Figure 2 Spearman Correlation Matrix of All Chemicals')
# explaination
with st.expander('Explanation'):
st.caption(
'''
Based on both visualization, most of the variables do not have any relationship except for a few.
Based on both visualization, most of the variables do not have any relationship except for a few.
- `Hardness` has a very positive low value with `ph` in spearman but close to 0 in pearsons. This suggests there might be a very weak positive non
linear relationship.
- `Sulfate` with `Solids` and with `Sulfate` has a very low negative value both in spearman and pearsons. This suggests there might be a very weak
negative linear relationship.
'''
)
#================================ ph ==========================================
st.subheader('ph Values Distribution')
image3 = Image.open('ph.png')
st.image(image3, caption='Figure 3 ph values distribution histogram', width=600)
# explaination
with st.expander('Explanation'):
st.caption(
'''
- The water sample taken mostly has ph between `5-9`
- The visualization also suggest a lot of data are in the range for drinkable water but doesn't mean that the water is drinkable.
- This could mean most water samples that's taken could come contaminated water bodies.
'''
)
#================================ Missing Values ===============================
st.subheader('Missing Values Visualizations')
# missing plot
st.write('Missing Values Bar Plot')
image4 = Image.open('missing_values.png')
st.image(image4, caption='Figure 4 Bar plot of missing values of each column')
# displaying explaination
with st.expander('Explanation'):
st.caption(
'''
**From Data Loading**
- There are otal missing values in the dataset: 1434
- Columns with missing values:
`['ph', 'Sulfate', 'Trihalomethanes']`
Number of missing values per column:
>ph `491`
>
>Sulfate `781`
>
>Trihalomethanes `162`
>
>dtype: int64
Missing data percentage (%):
>ph `15`
>
>Sulfate `24`
>
>Trihalomethanes `5`
'''
)
# missing matrix
st.write('Missing Values Correlation Matrix')
image5 = Image.open('missing_corr.png')
st.image(image5, caption='Figure 5 Correlation matrix of the missing values')
# display explaination
with st.expander('Explanation'):
st.caption(
'''
- Based on the visualization above, the missing values have no correlation and can be cosidered the missingness is `completly random`
- The missing values being random could be due to the person that took the water sample did not have the equipment to measure the chemical level.
'''
)
#================================== PCA =============================
st.subheader('Feature Importance')
image6 = Image.open('PCA.png')
st.image(image6, caption='Figure 6 Linechart of explained variance ratio with number of components')
# displaying explaination
with st.expander('Explanation'):
st.caption(
'''
- Based on the visualization of PCA, there is a linear relationship between number of components and the EVR cummulative
- This suggest, each feature is important and retains unique information of the dataset
'''
) |