Mpavan45 commited on
Commit
061c131
Β·
verified Β·
1 Parent(s): 396f967

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +62 -46
app.py CHANGED
@@ -1,12 +1,10 @@
1
  import streamlit as st
2
 
3
-
4
-
5
  # Sidebar for navigation
6
  sidebar = st.sidebar
7
 
8
  # Sidebar header
9
- sidebar.header('NLP Navigation')
10
 
11
  # Sidebar options for NLP Overview, Lifecycle, and Techniques
12
  sidebar_option = sidebar.radio('Choose a section to explore:', ['What is NLP?', 'NLP Lifecycle', 'NLP Techniques'])
@@ -18,28 +16,27 @@ if 'selected_page' not in st.session_state:
18
  # Update the selected page if the user selects a different option
19
  if sidebar_option != st.session_state.selected_page:
20
  st.session_state.selected_page = sidebar_option
21
-
22
  # Dynamically update the title based on the selected option
23
  if st.session_state.selected_page == 'What is NLP?':
24
- st.title('What is Natural Language Processing (NLP)?')
25
  elif st.session_state.selected_page == 'NLP Lifecycle':
26
- st.title('Natural Language Processing (NLP) Lifecycle')
27
- if sidebar_option == 'Problem Definition':
28
- st.title('Steps in the Natural Language Processing (NLP) lifecycle:')
29
  elif st.session_state.selected_page == 'NLP Techniques':
30
- st.title('Techniques in Natural Language Processing (NLP)')
31
-
32
-
33
  # Content for "What is NLP?"
34
  if st.session_state.selected_page == 'What is NLP?':
35
  st.write("""
36
- ### What is NLP?
37
  Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is meaningful.
38
 
39
  NLP is essential for enabling computers to process and analyze large amounts of natural language data, such as:
40
- - Text from documents
41
- - Speech from conversations
42
- - Images with textual descriptions
43
 
44
  #### Key Components of NLP:
45
  - **Syntax**: Refers to the arrangement of words in a sentence.
@@ -71,7 +68,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
71
 
72
  if lifecycle_option == "Overview of the NLP Life Cycle":
73
  st.write("""
74
- #### Overview of the NLP Life Cycle
75
  The NLP life cycle is a structured process for building, using, and maintaining systems that work with human language. It turns unstructured text into meaningful insights or automated actions. This process ensures continuous improvement and adapts to real-world needs.
76
 
77
  - **How It Flows**:
@@ -96,7 +93,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
96
  - Handling multiple languages and specific industries.
97
  - Ensuring solutions are fast and efficient.
98
 
99
- #### Steps in the NLP Life Cycle
100
  1. Problem Definition
101
  2. Data Collection
102
  3. Data Preprocessing
@@ -107,15 +104,16 @@ elif st.session_state.selected_page == "NLP Lifecycle":
107
  8. Deployment
108
  9. Monitoring and Maintenance
109
  """)
 
110
  elif lifecycle_option == "Problem Definition":
111
  st.write("""
112
- #### 1. Problem Definition
113
  - The first step in the NLP lifecycle is defining the problem. This means understanding the goal and figuring out how NLP can help solve the problem.
114
  - Based on the problem, you will need to gather the data.
115
  - **To better understand the problem, consider asking questions such as**:
116
- - What is the main goal of this analysis?
117
- - What kind of text data are we working with (e.g., reviews, social media posts, documents)?
118
- - What do we want the output to be (e.g., sentiment score, summary, or classification)?
119
 
120
  **Example of a problem statement**: The goal could be to classify customer reviews as either positive or negative, or to find the main topics in product reviews.
121
  """)
@@ -125,20 +123,19 @@ elif st.session_state.selected_page == "NLP Lifecycle":
125
  #### 2. Data Collection
126
  Data collection is the second step in the NLP lifecycle. It involves gathering data from various sources based on the problem statement, so it can be analyzed and processed.
127
  - **Sources for data collection**:
128
- - The data should be collected based on a clear understanding of the problem statement.
129
- - From datasets available on websites like Kaggle.
130
- - Through APIs.
131
- - Web scraping can also be used to gather data from websites using tools like Selenium or BeautifulSoup.
132
- - Manually, when needed.
133
  - In most cases, data is collected from websites, APIs, or through web scraping. However, manual collection may be necessary in rare cases.
134
 
135
-
136
  **Example**: Scraping customer reviews from Amazon to analyze sentiment and feedback about a product.
137
  """)
138
 
139
  elif lifecycle_option == "Text Preprocessing":
140
  st.write("""
141
- #### 3. Text Preprocessing
142
  Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
143
  - **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
144
  - **Stop Words Removal**: Removing common words that don’t contribute much information.
@@ -154,7 +151,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
154
 
155
  elif lifecycle_option == "Text Representation":
156
  st.write("""
157
- #### 4. Text Representation
158
  After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
159
  - **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
160
  - **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
@@ -167,7 +164,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
167
 
168
  elif lifecycle_option == "Model Training":
169
  st.write("""
170
- #### 5. Model Training
171
  In the model training stage, machine learning algorithms are trained on the preprocessed and represented text data. The choice of model depends on the task:
172
  - **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
173
  - **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
@@ -178,30 +175,34 @@ elif st.session_state.selected_page == "NLP Lifecycle":
178
 
179
  elif lifecycle_option == "Evaluation":
180
  st.write("""
181
- #### 6. Evaluation
182
- After training the model, it's important to evaluate its performance. Common evaluation metrics include:
183
- - **Accuracy**: The percentage of correctly classified samples.
184
- - **Precision**: The proportion of true positive predictions among all positive predictions.
185
- - **Recall**: The proportion of true positive predictions among all actual positive cases.
186
- - **F1-Score**: The harmonic mean of precision and recall.
187
- - **ROC and AUC**: Metrics used to evaluate classification models.
188
-
189
- **Example**: Using a confusion matrix to evaluate the performance of a sentiment analysis model.
190
  """)
191
 
192
  elif lifecycle_option == "Deployment":
193
  st.write("""
194
- #### 7. Deployment
195
- Once the model is trained and evaluated, it is deployed to production for real-world use. This might include integration with applications like chatbots, recommendation systems, or text summarization tools.
196
- - **Monitoring**: Continuous monitoring to ensure that the model performs well over time.
197
- - **Retraining**: The model might need to be retrained periodically as new data becomes available.
198
-
199
- **Example**: Deploying a chatbot powered by an NLP model to assist users on a website.
200
  """)
201
 
 
 
 
 
202
  # Content for "NLP Techniques"
203
  elif st.session_state.selected_page == "NLP Techniques":
204
  technique_option = sidebar.radio("Select NLP Technique:", [
 
205
  "Tokenization",
206
  "Stop Words Removal",
207
  "Lemmatization",
@@ -213,8 +214,23 @@ elif st.session_state.selected_page == "NLP Techniques":
213
  "Part-of-Speech (POS) Tagging",
214
  "Sentiment Analysis"
215
  ])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
216
 
217
- if technique_option == "Tokenization":
218
  st.write("""
219
  #### 1. Tokenization
220
  Tokenization is the process of splitting text into smaller units, such as words, sentences, or subwords. This is a key preprocessing step for many NLP tasks.
 
1
  import streamlit as st
2
 
 
 
3
  # Sidebar for navigation
4
  sidebar = st.sidebar
5
 
6
  # Sidebar header
7
+ sidebar.header('🌐 NLP Navigation')
8
 
9
  # Sidebar options for NLP Overview, Lifecycle, and Techniques
10
  sidebar_option = sidebar.radio('Choose a section to explore:', ['What is NLP?', 'NLP Lifecycle', 'NLP Techniques'])
 
16
  # Update the selected page if the user selects a different option
17
  if sidebar_option != st.session_state.selected_page:
18
  st.session_state.selected_page = sidebar_option
19
+
20
  # Dynamically update the title based on the selected option
21
  if st.session_state.selected_page == 'What is NLP?':
22
+ st.title('πŸ€– What is Natural Language Processing (NLP)?')
23
  elif st.session_state.selected_page == 'NLP Lifecycle':
24
+ st.title('πŸ”„ Natural Language Processing (NLP) Lifecycle')
25
+ if sidebar_option == 'Problem Definition':
26
+ st.title('πŸ”§ Steps in the Natural Language Processing (NLP) lifecycle:')
27
  elif st.session_state.selected_page == 'NLP Techniques':
28
+ st.title('βš™οΈ Techniques in Natural Language Processing (NLP)')
29
+
 
30
  # Content for "What is NLP?"
31
  if st.session_state.selected_page == 'What is NLP?':
32
  st.write("""
33
+ ### πŸ€– What is NLP?
34
  Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is meaningful.
35
 
36
  NLP is essential for enabling computers to process and analyze large amounts of natural language data, such as:
37
+ - πŸ“œ Text from documents
38
+ - πŸ—£οΈ Speech from conversations
39
+ - πŸ–ΌοΈ Images with textual descriptions
40
 
41
  #### Key Components of NLP:
42
  - **Syntax**: Refers to the arrangement of words in a sentence.
 
68
 
69
  if lifecycle_option == "Overview of the NLP Life Cycle":
70
  st.write("""
71
+ #### πŸ”„ Overview of the NLP Life Cycle
72
  The NLP life cycle is a structured process for building, using, and maintaining systems that work with human language. It turns unstructured text into meaningful insights or automated actions. This process ensures continuous improvement and adapts to real-world needs.
73
 
74
  - **How It Flows**:
 
93
  - Handling multiple languages and specific industries.
94
  - Ensuring solutions are fast and efficient.
95
 
96
+ #### Steps in the NLP Life Cycle:
97
  1. Problem Definition
98
  2. Data Collection
99
  3. Data Preprocessing
 
104
  8. Deployment
105
  9. Monitoring and Maintenance
106
  """)
107
+
108
  elif lifecycle_option == "Problem Definition":
109
  st.write("""
110
+ #### πŸ”§ 1. Problem Definition
111
  - The first step in the NLP lifecycle is defining the problem. This means understanding the goal and figuring out how NLP can help solve the problem.
112
  - Based on the problem, you will need to gather the data.
113
  - **To better understand the problem, consider asking questions such as**:
114
+ - 🎯 What is the main goal of this analysis?
115
+ - πŸ“ What kind of text data are we working with (e.g., reviews, social media posts, documents)?
116
+ - πŸ“Š What do we want the output to be (e.g., sentiment score, summary, or classification)?
117
 
118
  **Example of a problem statement**: The goal could be to classify customer reviews as either positive or negative, or to find the main topics in product reviews.
119
  """)
 
123
  #### 2. Data Collection
124
  Data collection is the second step in the NLP lifecycle. It involves gathering data from various sources based on the problem statement, so it can be analyzed and processed.
125
  - **Sources for data collection**:
126
+ - πŸ“š The data should be collected based on a clear understanding of the problem statement.
127
+ - 🌐 From datasets available on websites like Kaggle.
128
+ - πŸ”Œ Through APIs.
129
+ - πŸ•ΈοΈ Web scraping can also be used to gather data from websites using tools like Selenium or BeautifulSoup.
130
+ - βœ‹ Manually, when needed.
131
  - In most cases, data is collected from websites, APIs, or through web scraping. However, manual collection may be necessary in rare cases.
132
 
 
133
  **Example**: Scraping customer reviews from Amazon to analyze sentiment and feedback about a product.
134
  """)
135
 
136
  elif lifecycle_option == "Text Preprocessing":
137
  st.write("""
138
+ #### 🧹 3. Text Preprocessing
139
  Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
140
  - **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
141
  - **Stop Words Removal**: Removing common words that don’t contribute much information.
 
151
 
152
  elif lifecycle_option == "Text Representation":
153
  st.write("""
154
+ #### πŸ“ 4. Text Representation
155
  After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
156
  - **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
157
  - **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
 
164
 
165
  elif lifecycle_option == "Model Training":
166
  st.write("""
167
+ #### πŸ‹οΈβ€β™‚οΈ 5. Model Training
168
  In the model training stage, machine learning algorithms are trained on the preprocessed and represented text data. The choice of model depends on the task:
169
  - **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
170
  - **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
 
175
 
176
  elif lifecycle_option == "Evaluation":
177
  st.write("""
178
+ #### πŸ… 6. Evaluation
179
+ After training the model, it's important to evaluate its performance using metrics such as accuracy, precision, recall, and F1-score.
180
+ - **Accuracy**: The percentage of correct predictions.
181
+ - **Precision**: The percentage of relevant instances among the retrieved instances.
182
+ - **Recall**: The percentage of relevant instances that were retrieved.
183
+ - **F1-score**: The harmonic mean of precision and recall.
184
+
185
+ **Example**: If a sentiment analysis model predicts positive sentiment in 80 out of 100 reviews, its accuracy is 80%.
 
186
  """)
187
 
188
  elif lifecycle_option == "Deployment":
189
  st.write("""
190
+ #### πŸš€ 7. Deployment
191
+ The final step is deploying the model for real-time use. This involves integrating it into a system or application where it can process live data.
192
+ - **Real-time Applications**: Chatbots, sentiment analysis for social media monitoring, text summarization for news.
193
+ - **Maintenance**: Continuously monitor the model to ensure its performance remains high. Updates might be necessary if the language evolves or new data emerges.
194
+
195
+ **Example**: Deploying a chatbot to answer customer inquiries based on historical support tickets.
196
  """)
197
 
198
+ # Content for NLP Techniques
199
+
200
+
201
+
202
  # Content for "NLP Techniques"
203
  elif st.session_state.selected_page == "NLP Techniques":
204
  technique_option = sidebar.radio("Select NLP Technique:", [
205
+ "NLP Techniques",
206
  "Tokenization",
207
  "Stop Words Removal",
208
  "Lemmatization",
 
214
  "Part-of-Speech (POS) Tagging",
215
  "Sentiment Analysis"
216
  ])
217
+ if technique_option == "NLP Techniques":
218
+ st.write("""
219
+ ### βš™οΈ Techniques in NLP
220
+ NLP uses a variety of techniques to process and analyze text data. Some of the most common techniques include:
221
+
222
+ 1. **Tokenization**: Breaking down text into smaller units (e.g., words, sentences).
223
+ 2. **Part-of-Speech (POS) Tagging**: Identifying the grammatical roles of words in a sentence (e.g., noun, verb, adjective).
224
+ 3. **Named Entity Recognition (NER)**: Identifying entities such as names, dates, locations, etc.
225
+ 4. **Dependency Parsing**: Analyzing the syntactic structure of sentences.
226
+ 5. **Sentiment Analysis**: Analyzing the sentiment of text (positive, negative, neutral).
227
+ 6. **Word Embeddings**: Representing words as vectors in a continuous space (e.g., Word2Vec, GloVe).
228
+
229
+ **Example**: Sentiment analysis can be used to identify whether customer reviews are positive, negative, or neutral based on the words used in the text.
230
+ """)
231
+
232
 
233
+ elif technique_option == "Tokenization":
234
  st.write("""
235
  #### 1. Tokenization
236
  Tokenization is the process of splitting text into smaller units, such as words, sentences, or subwords. This is a key preprocessing step for many NLP tasks.