Update app.py
Browse files
app.py
CHANGED
|
@@ -187,19 +187,18 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 187 |
elif lifecycle_option == "Simple EDA":
|
| 188 |
st.write("""
|
| 189 |
#### π 3. Simple EDA
|
|
|
|
|
|
|
| 190 |
|
| 191 |
#### Checking Data Balance
|
| 192 |
-
Before proceeding with analysis, it's
|
| 193 |
-
|
| 194 |
**Example**: In a classification dataset:
|
| 195 |
- Class Distribution:
|
| 196 |
- Class A: 700 instances
|
| 197 |
- Class B: 300 instances
|
| 198 |
- The dataset shows a 70:30 imbalance, which may require techniques like oversampling, undersampling, or synthetic data generation to correct.
|
| 199 |
-
|
| 200 |
-
####
|
| 201 |
-
Simple EDA provides a high-level understanding of the dataset and its characteristics. It focuses on summarizing key features, identifying potential issues, and visualizing distributions to inform further analysis.
|
| 202 |
-
|
| 203 |
- **Basic Data Inspection**: Examine data types, view the first few rows, and understand the overall structure.
|
| 204 |
- **Summary Statistics**: Calculate key metrics like mean, median, and standard deviation to summarize numerical variables.
|
| 205 |
- **Basic Visualizations**: Use histograms, boxplots, and scatterplots to explore data distributions and relationships.
|
|
|
|
| 187 |
elif lifecycle_option == "Simple EDA":
|
| 188 |
st.write("""
|
| 189 |
#### π 3. Simple EDA
|
| 190 |
+
#### Simple Exploratory Data Analysis (Simple EDA)
|
| 191 |
+
Simple EDA provides a high-level understanding of the dataset and its characteristics. It focuses on summarizing key features, identifying potential issues, and visualizing distributions to inform further analysis.
|
| 192 |
|
| 193 |
#### Checking Data Balance
|
| 194 |
+
Before proceeding with the analysis, it's essential to assess whether the dataset is balanced or imbalanced by using simple EDA (Exploratory Data Analysis). This involves examining the distribution of classes or categories in the data. By calculating the count or percentage of instances in each class, we can determine if the data is evenly distributed or if certain classes are underrepresented. Addressing class imbalance is important to ensure that the analysis and modeling processes are reliable and accurate.
|
|
|
|
| 195 |
**Example**: In a classification dataset:
|
| 196 |
- Class Distribution:
|
| 197 |
- Class A: 700 instances
|
| 198 |
- Class B: 300 instances
|
| 199 |
- The dataset shows a 70:30 imbalance, which may require techniques like oversampling, undersampling, or synthetic data generation to correct.
|
| 200 |
+
|
| 201 |
+
#### Steps to Understand and Explore Your Data
|
|
|
|
|
|
|
| 202 |
- **Basic Data Inspection**: Examine data types, view the first few rows, and understand the overall structure.
|
| 203 |
- **Summary Statistics**: Calculate key metrics like mean, median, and standard deviation to summarize numerical variables.
|
| 204 |
- **Basic Visualizations**: Use histograms, boxplots, and scatterplots to explore data distributions and relationships.
|