| import streamlit as st | |
| def run(): | |
| st.title("Data Understanding") | |
| st.write("## Overview") | |
| st.write(""" | |
| Data Understanding is the second phase of the CRISP-DM process. It involves collecting initial data, describing the data, exploring the data, and verifying data quality. | |
| """) | |
| st.write("## Key Concepts & Explanations") | |
| st.markdown(""" | |
| - **Data Collection**: Gathering data from various sources. | |
| - **Data Description**: Summarizing the main characteristics of the data. | |
| - **Data Exploration**: Using statistical and visualization techniques to understand the data. | |
| - **Data Quality Verification**: Ensuring the data is accurate, complete, and reliable. | |
| """) | |
| st.write("## Introduction") | |
| st.write(""" | |
| The Data Understanding phase is crucial for identifying potential issues with the data and gaining insights that will inform the subsequent phases of the CRISP-DM process. | |
| """) | |
| st.header("Objectives") | |
| st.write(""" | |
| - **Collect Initial Data**: Gather data from various sources to get a comprehensive dataset. | |
| - **Describe the Data**: Summarize the main characteristics of the data, including its structure and content. | |
| - **Explore the Data**: Use statistical and visualization techniques to identify patterns, trends, and anomalies. | |
| - **Verify Data Quality**: Assess the quality of the data to ensure it is suitable for analysis. | |
| """) | |
| st.header("Key Activities") | |
| st.write(""" | |
| - **Data Collection**: Gather data from internal and external sources. | |
| - **Data Description**: Generate summary statistics and visualizations to describe the data. | |
| - **Data Exploration**: Perform exploratory data analysis (EDA) to uncover patterns and relationships. | |
| - **Data Quality Verification**: Check for missing values, outliers, and inconsistencies in the data. | |
| """) | |
| st.write("## Detailed Steps") | |
| st.write(""" | |
| 1. **Collect Initial Data**: | |
| - Identify relevant data sources. | |
| - Extract data from various sources and consolidate it into a single dataset. | |
| 2. **Describe the Data**: | |
| - Generate summary statistics (e.g., mean, median, standard deviation). | |
| - Create visualizations (e.g., histograms, box plots) to describe the data distribution. | |
| 3. **Explore the Data**: | |
| - Perform exploratory data analysis (EDA) to identify patterns, trends, and anomalies. | |
| - Use visualization tools (e.g., scatter plots, heatmaps) to explore relationships between variables. | |
| 4. **Verify Data Quality**: | |
| - Check for missing values and handle them appropriately. | |
| - Identify and address outliers and inconsistencies in the data. | |
| - Assess the overall quality of the data to ensure it is suitable for analysis. | |
| """) | |
| st.write("## Quiz: Conceptual Questions") | |
| q1 = st.radio("What is the main purpose of the Data Understanding phase?", ["Collect data", "Describe data", "Explore data", "All of the above"]) | |
| if q1 == "All of the above": | |
| st.success("β Correct!") | |
| else: | |
| st.error("β Incorrect. The main purpose is to collect, describe, and explore data.") | |
| st.write("## Learning Resources") | |
| st.markdown(""" | |
| - π [CRISP-DM Guide](https://www.sv-europe.com/crisp-dm-methodology/) | |
| - π [Data Understanding in Data Science](https://towardsdatascience.com/data-understanding-in-data-science-1a1d5e8b1c3d) | |
| - π¬ [Exploratory Data Analysis (EDA)](https://www.analyticsvidhya.com/blog/2021/06/exploratory-data-analysis-eda-a-step-by-step-guide/) | |
| """) | |