Update app.py
Browse files
app.py
CHANGED
|
@@ -1,12 +1,10 @@
|
|
| 1 |
import streamlit as st
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
# Sidebar for navigation
|
| 6 |
sidebar = st.sidebar
|
| 7 |
|
| 8 |
# Sidebar header
|
| 9 |
-
sidebar.header('NLP Navigation')
|
| 10 |
|
| 11 |
# Sidebar options for NLP Overview, Lifecycle, and Techniques
|
| 12 |
sidebar_option = sidebar.radio('Choose a section to explore:', ['What is NLP?', 'NLP Lifecycle', 'NLP Techniques'])
|
|
@@ -18,28 +16,27 @@ if 'selected_page' not in st.session_state:
|
|
| 18 |
# Update the selected page if the user selects a different option
|
| 19 |
if sidebar_option != st.session_state.selected_page:
|
| 20 |
st.session_state.selected_page = sidebar_option
|
| 21 |
-
|
| 22 |
# Dynamically update the title based on the selected option
|
| 23 |
if st.session_state.selected_page == 'What is NLP?':
|
| 24 |
-
st.title('What is Natural Language Processing (NLP)?')
|
| 25 |
elif st.session_state.selected_page == 'NLP Lifecycle':
|
| 26 |
-
st.title('Natural Language Processing (NLP) Lifecycle')
|
| 27 |
-
if sidebar_option
|
| 28 |
-
st.title('Steps in the Natural Language Processing (NLP) lifecycle:')
|
| 29 |
elif st.session_state.selected_page == 'NLP Techniques':
|
| 30 |
-
st.title('Techniques in Natural Language Processing (NLP)')
|
| 31 |
-
|
| 32 |
-
|
| 33 |
# Content for "What is NLP?"
|
| 34 |
if st.session_state.selected_page == 'What is NLP?':
|
| 35 |
st.write("""
|
| 36 |
-
### What is NLP?
|
| 37 |
Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is meaningful.
|
| 38 |
|
| 39 |
NLP is essential for enabling computers to process and analyze large amounts of natural language data, such as:
|
| 40 |
-
- Text from documents
|
| 41 |
-
- Speech from conversations
|
| 42 |
-
- Images with textual descriptions
|
| 43 |
|
| 44 |
#### Key Components of NLP:
|
| 45 |
- **Syntax**: Refers to the arrangement of words in a sentence.
|
|
@@ -71,7 +68,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 71 |
|
| 72 |
if lifecycle_option == "Overview of the NLP Life Cycle":
|
| 73 |
st.write("""
|
| 74 |
-
#### Overview of the NLP Life Cycle
|
| 75 |
The NLP life cycle is a structured process for building, using, and maintaining systems that work with human language. It turns unstructured text into meaningful insights or automated actions. This process ensures continuous improvement and adapts to real-world needs.
|
| 76 |
|
| 77 |
- **How It Flows**:
|
|
@@ -96,7 +93,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 96 |
- Handling multiple languages and specific industries.
|
| 97 |
- Ensuring solutions are fast and efficient.
|
| 98 |
|
| 99 |
-
#### Steps in the NLP Life Cycle
|
| 100 |
1. Problem Definition
|
| 101 |
2. Data Collection
|
| 102 |
3. Data Preprocessing
|
|
@@ -107,15 +104,16 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 107 |
8. Deployment
|
| 108 |
9. Monitoring and Maintenance
|
| 109 |
""")
|
|
|
|
| 110 |
elif lifecycle_option == "Problem Definition":
|
| 111 |
st.write("""
|
| 112 |
-
#### 1. Problem Definition
|
| 113 |
- The first step in the NLP lifecycle is defining the problem. This means understanding the goal and figuring out how NLP can help solve the problem.
|
| 114 |
- Based on the problem, you will need to gather the data.
|
| 115 |
- **To better understand the problem, consider asking questions such as**:
|
| 116 |
-
- What is the main goal of this analysis?
|
| 117 |
-
- What kind of text data are we working with (e.g., reviews, social media posts, documents)?
|
| 118 |
-
- What do we want the output to be (e.g., sentiment score, summary, or classification)?
|
| 119 |
|
| 120 |
**Example of a problem statement**: The goal could be to classify customer reviews as either positive or negative, or to find the main topics in product reviews.
|
| 121 |
""")
|
|
@@ -125,20 +123,19 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 125 |
#### 2. Data Collection
|
| 126 |
Data collection is the second step in the NLP lifecycle. It involves gathering data from various sources based on the problem statement, so it can be analyzed and processed.
|
| 127 |
- **Sources for data collection**:
|
| 128 |
-
- The data should be collected based on a clear understanding of the problem statement.
|
| 129 |
-
- From datasets available on websites like Kaggle.
|
| 130 |
-
- Through APIs.
|
| 131 |
-
- Web scraping can also be used to gather data from websites using tools like Selenium or BeautifulSoup.
|
| 132 |
-
- Manually, when needed.
|
| 133 |
- In most cases, data is collected from websites, APIs, or through web scraping. However, manual collection may be necessary in rare cases.
|
| 134 |
|
| 135 |
-
|
| 136 |
**Example**: Scraping customer reviews from Amazon to analyze sentiment and feedback about a product.
|
| 137 |
""")
|
| 138 |
|
| 139 |
elif lifecycle_option == "Text Preprocessing":
|
| 140 |
st.write("""
|
| 141 |
-
#### 3. Text Preprocessing
|
| 142 |
Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
|
| 143 |
- **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
|
| 144 |
- **Stop Words Removal**: Removing common words that donβt contribute much information.
|
|
@@ -154,7 +151,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 154 |
|
| 155 |
elif lifecycle_option == "Text Representation":
|
| 156 |
st.write("""
|
| 157 |
-
#### 4. Text Representation
|
| 158 |
After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
|
| 159 |
- **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
|
| 160 |
- **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
|
|
@@ -167,7 +164,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 167 |
|
| 168 |
elif lifecycle_option == "Model Training":
|
| 169 |
st.write("""
|
| 170 |
-
#### 5. Model Training
|
| 171 |
In the model training stage, machine learning algorithms are trained on the preprocessed and represented text data. The choice of model depends on the task:
|
| 172 |
- **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
|
| 173 |
- **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
|
|
@@ -178,30 +175,34 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 178 |
|
| 179 |
elif lifecycle_option == "Evaluation":
|
| 180 |
st.write("""
|
| 181 |
-
#### 6. Evaluation
|
| 182 |
-
After training the model, it's important to evaluate its performance
|
| 183 |
-
- **Accuracy**: The percentage of
|
| 184 |
-
- **Precision**: The
|
| 185 |
-
- **Recall**: The
|
| 186 |
-
- **F1-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
**Example**: Using a confusion matrix to evaluate the performance of a sentiment analysis model.
|
| 190 |
""")
|
| 191 |
|
| 192 |
elif lifecycle_option == "Deployment":
|
| 193 |
st.write("""
|
| 194 |
-
#### 7. Deployment
|
| 195 |
-
|
| 196 |
-
- **
|
| 197 |
-
- **
|
| 198 |
-
|
| 199 |
-
**Example**: Deploying a chatbot
|
| 200 |
""")
|
| 201 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 202 |
# Content for "NLP Techniques"
|
| 203 |
elif st.session_state.selected_page == "NLP Techniques":
|
| 204 |
technique_option = sidebar.radio("Select NLP Technique:", [
|
|
|
|
| 205 |
"Tokenization",
|
| 206 |
"Stop Words Removal",
|
| 207 |
"Lemmatization",
|
|
@@ -213,8 +214,23 @@ elif st.session_state.selected_page == "NLP Techniques":
|
|
| 213 |
"Part-of-Speech (POS) Tagging",
|
| 214 |
"Sentiment Analysis"
|
| 215 |
])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 216 |
|
| 217 |
-
|
| 218 |
st.write("""
|
| 219 |
#### 1. Tokenization
|
| 220 |
Tokenization is the process of splitting text into smaller units, such as words, sentences, or subwords. This is a key preprocessing step for many NLP tasks.
|
|
|
|
| 1 |
import streamlit as st
|
| 2 |
|
|
|
|
|
|
|
| 3 |
# Sidebar for navigation
|
| 4 |
sidebar = st.sidebar
|
| 5 |
|
| 6 |
# Sidebar header
|
| 7 |
+
sidebar.header('π NLP Navigation')
|
| 8 |
|
| 9 |
# Sidebar options for NLP Overview, Lifecycle, and Techniques
|
| 10 |
sidebar_option = sidebar.radio('Choose a section to explore:', ['What is NLP?', 'NLP Lifecycle', 'NLP Techniques'])
|
|
|
|
| 16 |
# Update the selected page if the user selects a different option
|
| 17 |
if sidebar_option != st.session_state.selected_page:
|
| 18 |
st.session_state.selected_page = sidebar_option
|
| 19 |
+
|
| 20 |
# Dynamically update the title based on the selected option
|
| 21 |
if st.session_state.selected_page == 'What is NLP?':
|
| 22 |
+
st.title('π€ What is Natural Language Processing (NLP)?')
|
| 23 |
elif st.session_state.selected_page == 'NLP Lifecycle':
|
| 24 |
+
st.title('π Natural Language Processing (NLP) Lifecycle')
|
| 25 |
+
if sidebar_option == 'Problem Definition':
|
| 26 |
+
st.title('π§ Steps in the Natural Language Processing (NLP) lifecycle:')
|
| 27 |
elif st.session_state.selected_page == 'NLP Techniques':
|
| 28 |
+
st.title('βοΈ Techniques in Natural Language Processing (NLP)')
|
| 29 |
+
|
|
|
|
| 30 |
# Content for "What is NLP?"
|
| 31 |
if st.session_state.selected_page == 'What is NLP?':
|
| 32 |
st.write("""
|
| 33 |
+
### π€ What is NLP?
|
| 34 |
Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is meaningful.
|
| 35 |
|
| 36 |
NLP is essential for enabling computers to process and analyze large amounts of natural language data, such as:
|
| 37 |
+
- π Text from documents
|
| 38 |
+
- π£οΈ Speech from conversations
|
| 39 |
+
- πΌοΈ Images with textual descriptions
|
| 40 |
|
| 41 |
#### Key Components of NLP:
|
| 42 |
- **Syntax**: Refers to the arrangement of words in a sentence.
|
|
|
|
| 68 |
|
| 69 |
if lifecycle_option == "Overview of the NLP Life Cycle":
|
| 70 |
st.write("""
|
| 71 |
+
#### π Overview of the NLP Life Cycle
|
| 72 |
The NLP life cycle is a structured process for building, using, and maintaining systems that work with human language. It turns unstructured text into meaningful insights or automated actions. This process ensures continuous improvement and adapts to real-world needs.
|
| 73 |
|
| 74 |
- **How It Flows**:
|
|
|
|
| 93 |
- Handling multiple languages and specific industries.
|
| 94 |
- Ensuring solutions are fast and efficient.
|
| 95 |
|
| 96 |
+
#### Steps in the NLP Life Cycle:
|
| 97 |
1. Problem Definition
|
| 98 |
2. Data Collection
|
| 99 |
3. Data Preprocessing
|
|
|
|
| 104 |
8. Deployment
|
| 105 |
9. Monitoring and Maintenance
|
| 106 |
""")
|
| 107 |
+
|
| 108 |
elif lifecycle_option == "Problem Definition":
|
| 109 |
st.write("""
|
| 110 |
+
#### π§ 1. Problem Definition
|
| 111 |
- The first step in the NLP lifecycle is defining the problem. This means understanding the goal and figuring out how NLP can help solve the problem.
|
| 112 |
- Based on the problem, you will need to gather the data.
|
| 113 |
- **To better understand the problem, consider asking questions such as**:
|
| 114 |
+
- π― What is the main goal of this analysis?
|
| 115 |
+
- π What kind of text data are we working with (e.g., reviews, social media posts, documents)?
|
| 116 |
+
- π What do we want the output to be (e.g., sentiment score, summary, or classification)?
|
| 117 |
|
| 118 |
**Example of a problem statement**: The goal could be to classify customer reviews as either positive or negative, or to find the main topics in product reviews.
|
| 119 |
""")
|
|
|
|
| 123 |
#### 2. Data Collection
|
| 124 |
Data collection is the second step in the NLP lifecycle. It involves gathering data from various sources based on the problem statement, so it can be analyzed and processed.
|
| 125 |
- **Sources for data collection**:
|
| 126 |
+
- π The data should be collected based on a clear understanding of the problem statement.
|
| 127 |
+
- π From datasets available on websites like Kaggle.
|
| 128 |
+
- π Through APIs.
|
| 129 |
+
- πΈοΈ Web scraping can also be used to gather data from websites using tools like Selenium or BeautifulSoup.
|
| 130 |
+
- β Manually, when needed.
|
| 131 |
- In most cases, data is collected from websites, APIs, or through web scraping. However, manual collection may be necessary in rare cases.
|
| 132 |
|
|
|
|
| 133 |
**Example**: Scraping customer reviews from Amazon to analyze sentiment and feedback about a product.
|
| 134 |
""")
|
| 135 |
|
| 136 |
elif lifecycle_option == "Text Preprocessing":
|
| 137 |
st.write("""
|
| 138 |
+
#### π§Ή 3. Text Preprocessing
|
| 139 |
Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
|
| 140 |
- **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
|
| 141 |
- **Stop Words Removal**: Removing common words that donβt contribute much information.
|
|
|
|
| 151 |
|
| 152 |
elif lifecycle_option == "Text Representation":
|
| 153 |
st.write("""
|
| 154 |
+
#### π 4. Text Representation
|
| 155 |
After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
|
| 156 |
- **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
|
| 157 |
- **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
|
|
|
|
| 164 |
|
| 165 |
elif lifecycle_option == "Model Training":
|
| 166 |
st.write("""
|
| 167 |
+
#### ποΈββοΈ 5. Model Training
|
| 168 |
In the model training stage, machine learning algorithms are trained on the preprocessed and represented text data. The choice of model depends on the task:
|
| 169 |
- **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
|
| 170 |
- **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
|
|
|
|
| 175 |
|
| 176 |
elif lifecycle_option == "Evaluation":
|
| 177 |
st.write("""
|
| 178 |
+
#### π
6. Evaluation
|
| 179 |
+
After training the model, it's important to evaluate its performance using metrics such as accuracy, precision, recall, and F1-score.
|
| 180 |
+
- **Accuracy**: The percentage of correct predictions.
|
| 181 |
+
- **Precision**: The percentage of relevant instances among the retrieved instances.
|
| 182 |
+
- **Recall**: The percentage of relevant instances that were retrieved.
|
| 183 |
+
- **F1-score**: The harmonic mean of precision and recall.
|
| 184 |
+
|
| 185 |
+
**Example**: If a sentiment analysis model predicts positive sentiment in 80 out of 100 reviews, its accuracy is 80%.
|
|
|
|
| 186 |
""")
|
| 187 |
|
| 188 |
elif lifecycle_option == "Deployment":
|
| 189 |
st.write("""
|
| 190 |
+
#### π 7. Deployment
|
| 191 |
+
The final step is deploying the model for real-time use. This involves integrating it into a system or application where it can process live data.
|
| 192 |
+
- **Real-time Applications**: Chatbots, sentiment analysis for social media monitoring, text summarization for news.
|
| 193 |
+
- **Maintenance**: Continuously monitor the model to ensure its performance remains high. Updates might be necessary if the language evolves or new data emerges.
|
| 194 |
+
|
| 195 |
+
**Example**: Deploying a chatbot to answer customer inquiries based on historical support tickets.
|
| 196 |
""")
|
| 197 |
|
| 198 |
+
# Content for NLP Techniques
|
| 199 |
+
|
| 200 |
+
|
| 201 |
+
|
| 202 |
# Content for "NLP Techniques"
|
| 203 |
elif st.session_state.selected_page == "NLP Techniques":
|
| 204 |
technique_option = sidebar.radio("Select NLP Technique:", [
|
| 205 |
+
"NLP Techniques",
|
| 206 |
"Tokenization",
|
| 207 |
"Stop Words Removal",
|
| 208 |
"Lemmatization",
|
|
|
|
| 214 |
"Part-of-Speech (POS) Tagging",
|
| 215 |
"Sentiment Analysis"
|
| 216 |
])
|
| 217 |
+
if technique_option == "NLP Techniques":
|
| 218 |
+
st.write("""
|
| 219 |
+
### βοΈ Techniques in NLP
|
| 220 |
+
NLP uses a variety of techniques to process and analyze text data. Some of the most common techniques include:
|
| 221 |
+
|
| 222 |
+
1. **Tokenization**: Breaking down text into smaller units (e.g., words, sentences).
|
| 223 |
+
2. **Part-of-Speech (POS) Tagging**: Identifying the grammatical roles of words in a sentence (e.g., noun, verb, adjective).
|
| 224 |
+
3. **Named Entity Recognition (NER)**: Identifying entities such as names, dates, locations, etc.
|
| 225 |
+
4. **Dependency Parsing**: Analyzing the syntactic structure of sentences.
|
| 226 |
+
5. **Sentiment Analysis**: Analyzing the sentiment of text (positive, negative, neutral).
|
| 227 |
+
6. **Word Embeddings**: Representing words as vectors in a continuous space (e.g., Word2Vec, GloVe).
|
| 228 |
+
|
| 229 |
+
**Example**: Sentiment analysis can be used to identify whether customer reviews are positive, negative, or neutral based on the words used in the text.
|
| 230 |
+
""")
|
| 231 |
+
|
| 232 |
|
| 233 |
+
elif technique_option == "Tokenization":
|
| 234 |
st.write("""
|
| 235 |
#### 1. Tokenization
|
| 236 |
Tokenization is the process of splitting text into smaller units, such as words, sentences, or subwords. This is a key preprocessing step for many NLP tasks.
|