Mpavan45 commited on
Commit
4e00702
Β·
verified Β·
1 Parent(s): 7ae1ca0

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +60 -62
app.py CHANGED
@@ -7,7 +7,7 @@ sidebar = st.sidebar
7
  sidebar.header('🌐 NLP Navigation')
8
 
9
  # Sidebar options for NLP Overview, Lifecycle, and Techniques
10
- sidebar_option = sidebar.radio('Choose a section to explore:', ['What is NLP?', 'NLP Lifecycle', 'NLP Techniques'])
11
 
12
  # Store the selected page in session state
13
  if 'selected_page' not in st.session_state:
@@ -21,20 +21,18 @@ if sidebar_option != st.session_state.selected_page:
21
  def set_title(title, color="black"):
22
  st.markdown(f"<h1 style='text-align: center; color: {color};'>{title}</h1>", unsafe_allow_html=True)
23
 
24
- if st.session_state.selected_page == 'What is NLP?':
25
- set_title('Natural Language Processing (NLP)')
26
 
27
- elif st.session_state.selected_page == 'NLP Lifecycle':
28
- set_title('Natural Language Processing (NLP) Lifecycle')
29
- if sidebar_option == 'Problem Definition':
30
- set_title('Steps in the Natural Language Processing (NLP) lifecycle:')
31
 
32
- elif st.session_state.selected_page == 'NLP Techniques':
33
- set_title('Techniques in Natural Language Processing (NLP)')
34
 
35
- # Content for "What is NLP?"
36
- if st.session_state.selected_page == 'What is NLP?':
37
- st.markdown("<h2 style='text-align: center; color: black;'>Introduction</h2>", unsafe_allow_html=True)
38
  st.write("""
39
  #### πŸ€– What is NLP?
40
  Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is meaningful.
@@ -62,15 +60,15 @@ if st.session_state.selected_page == 'What is NLP?':
62
  # Content for NLP Lifecycle
63
  elif st.session_state.selected_page == "NLP Lifecycle":
64
  lifecycle_option = sidebar.radio("Select NLP Lifecycle Step:", [
65
- "Overview of the NLP Life Cycle",
66
- "Problem Definition",
67
- "Data Collection",
68
- "Simple EDA",
69
- "Data Preprocessing",
70
- "Feature Engineering",
71
- "Model Training",
72
- "Evaluation",
73
- "Deployment"
74
  ])
75
 
76
  if lifecycle_option == "Overview of the NLP Life Cycle":
@@ -101,34 +99,34 @@ elif st.session_state.selected_page == "NLP Lifecycle":
101
  - Ensuring solutions are fast and efficient.
102
 
103
  #### Steps in the NLP Life Cycle:
104
- 1. Problem Definition
105
- 2. Data Collection
106
- 3. Simple EDA
107
- 4. Data Preprocessing
108
- 5. Feature Engineering
109
- 6. Model Selection and Training
110
- 7. Model Evaluation
111
- 8. Model Tuning
112
- 9. Deployment
113
- 10. Monitoring and Maintenance
114
- """)
115
 
116
  elif lifecycle_option == "Problem Definition":
117
- st.write("""
118
- #### πŸ”§ 1. Problem Definition
119
- - The first step in the NLP lifecycle is defining the problem. This means understanding the goal and figuring out how NLP can help solve the problem.
120
- - Based on the problem, you will need to gather the data.
121
- - **To better understand the problem, consider asking questions such as**:
122
- - 🎯 What is the main goal of this analysis?
123
- - πŸ“ What kind of text data are we working with (e.g., reviews, social media posts, documents)?
124
- - πŸ“Š What do we want the output to be (e.g., sentiment score, summary, or classification)?
125
-
126
- **Example of a problem statement**: The goal could be to classify customer reviews as either positive or negative, or to find the main topics in product reviews.
127
- """)
128
 
129
  elif lifecycle_option == "Data Collection":
130
  st.write("""
131
- #### 2. Data Collection
132
  Data collection is the second step in the NLP lifecycle. It involves gathering data from various sources based on the problem statement, so it can be analyzed and processed.
133
  - **Sources for data collection**:
134
  - πŸ“š The data should be collected based on a clear understanding of the problem statement.
@@ -182,11 +180,11 @@ elif st.session_state.selected_page == "NLP Lifecycle":
182
  ```
183
  Using the above code structure, we can efficiently extract data from various file formats such as CSV, JSON, Excel, and XML, and load it into a structured format suitable for analysis.
184
  """)
185
-
186
-
187
  elif lifecycle_option == "Simple EDA":
188
  st.write("""
189
- #### πŸ“Š 3. Simple EDA
190
  #### Simple Exploratory Data Analysis (Simple EDA)
191
  Simple EDA provides a high-level understanding of the dataset and its characteristics. It focuses on summarizing key features, identifying potential issues, and visualizing distributions to inform further analysis.
192
 
@@ -197,7 +195,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
197
  - Class A: 700 instances
198
  - Class B: 300 instances
199
  - The dataset shows a 70:30 imbalance, which may require techniques like oversampling, undersampling, or synthetic data generation to correct.
200
-
201
  #### Steps to Understand and Explore Your Data
202
  - **Basic Data Inspection**: Examine data types, view the first few rows, and understand the overall structure.
203
  - **Summary Statistics**: Calculate key metrics like mean, median, and standard deviation to summarize numerical variables.
@@ -216,14 +214,14 @@ elif st.session_state.selected_page == "NLP Lifecycle":
216
  - Histogram for sales distribution
217
  - Boxplot to detect outliers
218
  """)
219
-
220
-
221
-
222
  elif lifecycle_option == "Data Preprocessing":
223
  st.write("""
224
  #### 🧹 4. Text Preprocessing
225
  Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
226
-
227
  **Key Steps in Text Preprocessing:**
228
  - **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
229
  - **Stop Words Removal**: Removing common words that don’t contribute much information.
@@ -242,7 +240,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
242
  - URL Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "🦊", "#awesome"]
243
  - Emoji Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "#awesome"]
244
  - Hashtag Removal: ["The", "quick", "brown", "fox", "is", "running", "fast"]
245
-
246
  Now, let's apply the necessary text preprocessing steps to clean up the data:
247
 
248
  ```python
@@ -275,8 +273,8 @@ elif st.session_state.selected_page == "NLP Lifecycle":
275
 
276
  By following these preprocessing steps, the raw text is now ready for further analysis or machine learning tasks.
277
  """)
278
-
279
-
280
  elif lifecycle_option == "Feature Engineering":
281
  st.write("""
282
  #### πŸ“ 5. Text Representation
@@ -284,12 +282,12 @@ elif st.session_state.selected_page == "NLP Lifecycle":
284
  - **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
285
  - **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
286
  - **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning.
287
-
288
  **Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
289
  - Vocabulary: ["I", "love", "NLP"]
290
  - Vector: [1, 1, 1] (word frequency representation)
291
  """)
292
-
293
  elif lifecycle_option == "Model Training":
294
  st.write("""
295
  #### πŸ‹οΈβ€β™‚οΈ 6. Model Training
@@ -297,10 +295,10 @@ elif st.session_state.selected_page == "NLP Lifecycle":
297
  - **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
298
  - **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
299
  - **Sentiment Analysis**: Logistic regression, Naive Bayes, or transformer-based models like BERT.
300
-
301
  **Example**: Training a Naive Bayes classifier to categorize news articles into topics such as "Sports", "Politics", etc.
302
  """)
303
-
304
  elif lifecycle_option == "Evaluation":
305
  st.write("""
306
  #### πŸ… 7. Evaluation
@@ -309,10 +307,10 @@ elif st.session_state.selected_page == "NLP Lifecycle":
309
  - **Precision**: The percentage of relevant instances among the retrieved instances.
310
  - **Recall**: The percentage of relevant instances that were retrieved.
311
  - **F1-score**: The harmonic mean of precision and recall.
312
-
313
  **Example**: Evaluating a sentiment analysis model using accuracy and F1-score on a test dataset.
314
  """)
315
-
316
  elif lifecycle_option == "Deployment":
317
  st.write("""
318
  #### πŸš€ 8. Deployment
@@ -320,7 +318,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
320
  - **Integration** with web applications, chatbots, or other tools.
321
  - **API Development**: Exposing the model through an API for real-time predictions.
322
  - **Continuous Monitoring**: Tracking the model’s performance and retraining it as needed.
323
-
324
  **Example**: Deploying a sentiment analysis model in a customer service chatbot that analyzes customer inquiries in real time.
325
  """)
326
 
 
7
  sidebar.header('🌐 NLP Navigation')
8
 
9
  # Sidebar options for NLP Overview, Lifecycle, and Techniques
10
+ sidebar_option = sidebar.radio('Choose a section to explore:', ['What is NLP? 🧠', 'NLP Lifecycle πŸ”„', 'NLP Techniques βš™οΈ'])
11
 
12
  # Store the selected page in session state
13
  if 'selected_page' not in st.session_state:
 
21
  def set_title(title, color="black"):
22
  st.markdown(f"<h1 style='text-align: center; color: {color};'>{title}</h1>", unsafe_allow_html=True)
23
 
24
+ if st.session_state.selected_page == 'What is NLP? 🧠':
25
+ set_title('🌟 Natural Language Processing (NLP) 🌟', color="purple")
26
 
27
+ elif st.session_state.selected_page == 'NLP Lifecycle πŸ”„':
28
+ set_title('πŸ”„ Natural Language Processing (NLP) Lifecycle πŸ”„', color="darkblue")
 
 
29
 
30
+ elif st.session_state.selected_page == 'NLP Techniques βš™οΈ':
31
+ set_title('βš™οΈ Techniques in Natural Language Processing (NLP) βš™οΈ', color="darkgreen")
32
 
33
+ # Content for "What is NLP? 🧠"
34
+ if st.session_state.selected_page == 'What is NLP? 🧠':
35
+ st.markdown("<h2 style='text-align: center; color: orange;'>πŸ“˜ Introduction to NLP</h2>", unsafe_allow_html=True)
36
  st.write("""
37
  #### πŸ€– What is NLP?
38
  Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is meaningful.
 
60
  # Content for NLP Lifecycle
61
  elif st.session_state.selected_page == "NLP Lifecycle":
62
  lifecycle_option = sidebar.radio("Select NLP Lifecycle Step:", [
63
+ "Overview of the NLP Life Cycle 🌐",
64
+ "Problem Definition πŸ”§",
65
+ "Data Collection πŸ“Š",
66
+ "Simple EDA πŸ“ˆ",
67
+ "Data Preprocessing 🧹",
68
+ "Feature Engineering πŸ“",
69
+ "Model Training πŸ‹οΈβ€β™‚οΈ",
70
+ "Evaluation πŸ…",
71
+ "Deployment πŸš€"
72
  ])
73
 
74
  if lifecycle_option == "Overview of the NLP Life Cycle":
 
99
  - Ensuring solutions are fast and efficient.
100
 
101
  #### Steps in the NLP Life Cycle:
102
+ 1. **Problem Definition** πŸ”§
103
+ 2. **Data Collection** πŸ“Š
104
+ 3. **Simple EDA** πŸ”
105
+ 4. **Data Preprocessing** 🧹
106
+ 5. **Feature Engineering** πŸ“
107
+ 6. **Model Selection and Training** πŸ‹οΈβ€β™‚οΈ
108
+ 7. **Model Evaluation** πŸ…
109
+ 8. **Model Tuning** βš™οΈ
110
+ 9. **Deployment** πŸš€
111
+ 10. **Monitoring and Maintenance** πŸ› οΈ
112
+ """)
113
 
114
  elif lifecycle_option == "Problem Definition":
115
+ st.write("""
116
+ #### πŸ”§ 1. Problem Definition
117
+ - The first step in the NLP lifecycle is defining the problem. This means understanding the goal and figuring out how NLP can help solve the problem.
118
+ - Based on the problem, you will need to gather the data.
119
+ - **To better understand the problem, consider asking questions such as**:
120
+ - 🎯 What is the main goal of this analysis?
121
+ - πŸ“ What kind of text data are we working with (e.g., reviews, social media posts, documents)?
122
+ - πŸ“Š What do we want the output to be (e.g., sentiment score, summary, or classification)?
123
+
124
+ **Example of a problem statement**: The goal could be to classify customer reviews as either positive or negative, or to find the main topics in product reviews.
125
+ """)
126
 
127
  elif lifecycle_option == "Data Collection":
128
  st.write("""
129
+ #### πŸ“š 2. Data Collection
130
  Data collection is the second step in the NLP lifecycle. It involves gathering data from various sources based on the problem statement, so it can be analyzed and processed.
131
  - **Sources for data collection**:
132
  - πŸ“š The data should be collected based on a clear understanding of the problem statement.
 
180
  ```
181
  Using the above code structure, we can efficiently extract data from various file formats such as CSV, JSON, Excel, and XML, and load it into a structured format suitable for analysis.
182
  """)
183
+
184
+
185
  elif lifecycle_option == "Simple EDA":
186
  st.write("""
187
+ #### πŸ” 3. Simple EDA
188
  #### Simple Exploratory Data Analysis (Simple EDA)
189
  Simple EDA provides a high-level understanding of the dataset and its characteristics. It focuses on summarizing key features, identifying potential issues, and visualizing distributions to inform further analysis.
190
 
 
195
  - Class A: 700 instances
196
  - Class B: 300 instances
197
  - The dataset shows a 70:30 imbalance, which may require techniques like oversampling, undersampling, or synthetic data generation to correct.
198
+
199
  #### Steps to Understand and Explore Your Data
200
  - **Basic Data Inspection**: Examine data types, view the first few rows, and understand the overall structure.
201
  - **Summary Statistics**: Calculate key metrics like mean, median, and standard deviation to summarize numerical variables.
 
214
  - Histogram for sales distribution
215
  - Boxplot to detect outliers
216
  """)
217
+
218
+
219
+
220
  elif lifecycle_option == "Data Preprocessing":
221
  st.write("""
222
  #### 🧹 4. Text Preprocessing
223
  Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
224
+
225
  **Key Steps in Text Preprocessing:**
226
  - **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
227
  - **Stop Words Removal**: Removing common words that don’t contribute much information.
 
240
  - URL Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "🦊", "#awesome"]
241
  - Emoji Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "#awesome"]
242
  - Hashtag Removal: ["The", "quick", "brown", "fox", "is", "running", "fast"]
243
+
244
  Now, let's apply the necessary text preprocessing steps to clean up the data:
245
 
246
  ```python
 
273
 
274
  By following these preprocessing steps, the raw text is now ready for further analysis or machine learning tasks.
275
  """)
276
+
277
+
278
  elif lifecycle_option == "Feature Engineering":
279
  st.write("""
280
  #### πŸ“ 5. Text Representation
 
282
  - **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
283
  - **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
284
  - **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning.
285
+
286
  **Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
287
  - Vocabulary: ["I", "love", "NLP"]
288
  - Vector: [1, 1, 1] (word frequency representation)
289
  """)
290
+
291
  elif lifecycle_option == "Model Training":
292
  st.write("""
293
  #### πŸ‹οΈβ€β™‚οΈ 6. Model Training
 
295
  - **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
296
  - **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
297
  - **Sentiment Analysis**: Logistic regression, Naive Bayes, or transformer-based models like BERT.
298
+
299
  **Example**: Training a Naive Bayes classifier to categorize news articles into topics such as "Sports", "Politics", etc.
300
  """)
301
+
302
  elif lifecycle_option == "Evaluation":
303
  st.write("""
304
  #### πŸ… 7. Evaluation
 
307
  - **Precision**: The percentage of relevant instances among the retrieved instances.
308
  - **Recall**: The percentage of relevant instances that were retrieved.
309
  - **F1-score**: The harmonic mean of precision and recall.
310
+
311
  **Example**: Evaluating a sentiment analysis model using accuracy and F1-score on a test dataset.
312
  """)
313
+
314
  elif lifecycle_option == "Deployment":
315
  st.write("""
316
  #### πŸš€ 8. Deployment
 
318
  - **Integration** with web applications, chatbots, or other tools.
319
  - **API Development**: Exposing the model through an API for real-time predictions.
320
  - **Continuous Monitoring**: Tracking the model’s performance and retraining it as needed.
321
+
322
  **Example**: Deploying a sentiment analysis model in a customer service chatbot that analyzes customer inquiries in real time.
323
  """)
324