Mpavan45 commited on
Commit
c495b41
Β·
verified Β·
1 Parent(s): fc5491b

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +5 -6
app.py CHANGED
@@ -187,19 +187,18 @@ elif st.session_state.selected_page == "NLP Lifecycle":
187
  elif lifecycle_option == "Simple EDA":
188
  st.write("""
189
  #### πŸ“Š 3. Simple EDA
 
 
190
 
191
  #### Checking Data Balance
192
- Before proceeding with analysis, it's important to evaluate whether the dataset is **balanced or imbalanced**. This involves examining the distribution of classes or categories in the data. By calculating the count or percentage of instances in each class, we can determine if the data is evenly distributed or if certain classes are underrepresented. Addressing imbalanced datasets is crucial to ensure reliable analysis and modeling.
193
-
194
  **Example**: In a classification dataset:
195
  - Class Distribution:
196
  - Class A: 700 instances
197
  - Class B: 300 instances
198
  - The dataset shows a 70:30 imbalance, which may require techniques like oversampling, undersampling, or synthetic data generation to correct.
199
-
200
- #### Simple Exploratory Data Analysis (Simple EDA)
201
- Simple EDA provides a high-level understanding of the dataset and its characteristics. It focuses on summarizing key features, identifying potential issues, and visualizing distributions to inform further analysis.
202
-
203
  - **Basic Data Inspection**: Examine data types, view the first few rows, and understand the overall structure.
204
  - **Summary Statistics**: Calculate key metrics like mean, median, and standard deviation to summarize numerical variables.
205
  - **Basic Visualizations**: Use histograms, boxplots, and scatterplots to explore data distributions and relationships.
 
187
  elif lifecycle_option == "Simple EDA":
188
  st.write("""
189
  #### πŸ“Š 3. Simple EDA
190
+ #### Simple Exploratory Data Analysis (Simple EDA)
191
+ Simple EDA provides a high-level understanding of the dataset and its characteristics. It focuses on summarizing key features, identifying potential issues, and visualizing distributions to inform further analysis.
192
 
193
  #### Checking Data Balance
194
+ Before proceeding with the analysis, it's essential to assess whether the dataset is balanced or imbalanced by using simple EDA (Exploratory Data Analysis). This involves examining the distribution of classes or categories in the data. By calculating the count or percentage of instances in each class, we can determine if the data is evenly distributed or if certain classes are underrepresented. Addressing class imbalance is important to ensure that the analysis and modeling processes are reliable and accurate.
 
195
  **Example**: In a classification dataset:
196
  - Class Distribution:
197
  - Class A: 700 instances
198
  - Class B: 300 instances
199
  - The dataset shows a 70:30 imbalance, which may require techniques like oversampling, undersampling, or synthetic data generation to correct.
200
+
201
+ #### Steps to Understand and Explore Your Data
 
 
202
  - **Basic Data Inspection**: Examine data types, view the first few rows, and understand the overall structure.
203
  - **Summary Statistics**: Calculate key metrics like mean, median, and standard deviation to summarize numerical variables.
204
  - **Basic Visualizations**: Use histograms, boxplots, and scatterplots to explore data distributions and relationships.