data_analysis / pages /1_Introduction to Data_Analysis.py
Harika22's picture
Update pages/1_Introduction to Data_Analysis.py
6f1277c verified
import streamlit as st
st.title(":blue[Introduction to Data Analysis]")
st.caption("***From data dust to diamond insights — analysis is the alchemy***")
st.subheader("What is Data Analysis?...",divider="green")
multi = '''The process of inspecting the data , cleaning the data and transforming the data into meaningful sights from extracting the data that is collected.
It is the process of systematically applying statistical, logical, and computational techniques to describe, summarize, and evaluate the data.
'''
st.markdown(multi)
st.subheader("Types of Data",divider="green")
multi = '''For performing the data analysis we need to know the type of data that we collected . Majorly data is divided based on the pre-defined structure.
Based on this data is classified into three types.
'''
st.markdown(multi)
multi=''':violet[1.Structured data]'''
st.markdown(multi)
multi=''':violet[2.Unstructured data]'''
st.markdown(multi)
multi=''':violet[3.Semi-Structured data]'''
st.markdown(multi)
st.subheader("1.Structured Data",divider="red")
multi = '''Structured data is well-formatted and organized data.
It is usually in tabular format known as RDBMS("Relational Database Management System") where the data is stored in rows and columns.
It is easy to search and typically known as quantitative data.
Examples of structured data is - Excel files(.xlsx), SQL files etc...
'''
st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/ewYq-ld-Fr7SCE7Th0idQ.png")
st.markdown(multi)
st.subheader("2.Unstructured Data",divider="red")
multi = '''Unstructured data is not pre-definely formatted and organized data.
This type of data doesn't fit into rows and columns it is combination of text, images and audio etc..
It is not easy to analyse and perform the analysis typically known as qualitative data.
Examples of unstructured data is - Text, images, audios, videos etc...
'''
st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/o96nGe5pQ7EkbXTdjOkpW.png")
st.markdown(multi)
st.subheader("3.Semistructured Data",divider="red")
multi = '''Semi structured data is a hybrid of structured and unstructured data.
As the data is combination of both it is much more difficult for analysis.
Examples of semi-structured data is - csv files, json files and xml files
'''
st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/Gz_AZKg8M7e9K96TsVenU.png")
st.markdown(multi)
st.header("**Steps in Data Analysis**",divider="green")
st.markdown('''Basically there are 7 steps involved to perform complete data analysis''')
st.subheader(":blue[1.Problem Statement]")
multi = '''Basically when the data is collected and need to perform data analysis the first step is problem statement - it is concise description for the problem needs to be solved.
It gives major blueprint for the data analysis as it clearly identifies the specific issue that needs to be addressed.
'''
st.markdown(multi)
st.subheader(":blue[2.Data Collection]")
multi = '''After analyzing the major issue that needs to be addressed we need to collect the data which is related to the particular issue .
Data needs to collected in differenr formats from many sources, websites etc... so that we can perform analysis in easier way.
We can gather data or collect the data from previous one with the help of stake holders and domain experts.
'''
st.markdown(multi)
st.subheader(":blue[3.Simple EDA(Exploratory Data Analysis)]")
multi = '''After collecting the data we need to check whether the collected data has any impurities or not.
For that we need simple EDA which gives the information about collected data has any impurities or not.
If the collected data doesn’t have any impurities then directly go for EDA phase else it goes to pre-processing phase
'''
st.markdown(multi)
st.subheader(":blue[4.Pre-processing]")
multi = '''If the collected data has any impurities it performs cleaning the data and then transforming the data.
It cleans any sort of impurities and performs cleaning process.
Raw data ---> Cleaned data'''
st.markdown(multi)
st.subheader(":blue[5.EDA]")
multi = '''After the pre-processing phase the data goes through EDA process which unveil all the hidden insights from the data'''
st.markdown(multi)
st.subheader(":blue[6.Visualization]")
multi='''After the insights are found from the collected data - the insights goes through the many visualization techniques as they are represented further in dashboard format
'''
st.markdown(multi)
st.subheader(":blue[7.Story Telling]")
multi = '''Final step in the data analysis as it is foremost important because the client doen't understand the the data that is in dashboard format ...
So we need to explain or analyse the clients so that they can understand the data .So majorly deployment plays major role in data analysis'''
st.markdown(multi)