Update app.py
Browse files
app.py
CHANGED
|
@@ -7,7 +7,7 @@ sidebar = st.sidebar
|
|
| 7 |
sidebar.header('π NLP Navigation')
|
| 8 |
|
| 9 |
# Sidebar options for NLP Overview, Lifecycle, and Techniques
|
| 10 |
-
sidebar_option = sidebar.radio('Choose a section to explore:', ['What is NLP?', 'NLP Lifecycle', 'NLP Techniques'])
|
| 11 |
|
| 12 |
# Store the selected page in session state
|
| 13 |
if 'selected_page' not in st.session_state:
|
|
@@ -21,20 +21,18 @@ if sidebar_option != st.session_state.selected_page:
|
|
| 21 |
def set_title(title, color="black"):
|
| 22 |
st.markdown(f"<h1 style='text-align: center; color: {color};'>{title}</h1>", unsafe_allow_html=True)
|
| 23 |
|
| 24 |
-
if st.session_state.selected_page == 'What is NLP?':
|
| 25 |
-
set_title('Natural Language Processing (NLP)')
|
| 26 |
|
| 27 |
-
elif st.session_state.selected_page == 'NLP Lifecycle':
|
| 28 |
-
set_title('Natural Language Processing (NLP) Lifecycle')
|
| 29 |
-
if sidebar_option == 'Problem Definition':
|
| 30 |
-
set_title('Steps in the Natural Language Processing (NLP) lifecycle:')
|
| 31 |
|
| 32 |
-
elif st.session_state.selected_page == 'NLP Techniques':
|
| 33 |
-
set_title('Techniques in Natural Language Processing (NLP)')
|
| 34 |
|
| 35 |
-
# Content for "What is NLP?"
|
| 36 |
-
if st.session_state.selected_page == 'What is NLP?':
|
| 37 |
-
st.markdown("<h2 style='text-align: center; color:
|
| 38 |
st.write("""
|
| 39 |
#### π€ What is NLP?
|
| 40 |
Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is meaningful.
|
|
@@ -62,15 +60,15 @@ if st.session_state.selected_page == 'What is NLP?':
|
|
| 62 |
# Content for NLP Lifecycle
|
| 63 |
elif st.session_state.selected_page == "NLP Lifecycle":
|
| 64 |
lifecycle_option = sidebar.radio("Select NLP Lifecycle Step:", [
|
| 65 |
-
"Overview of the NLP Life Cycle",
|
| 66 |
-
"Problem Definition",
|
| 67 |
-
"Data Collection",
|
| 68 |
-
"Simple EDA",
|
| 69 |
-
"Data Preprocessing",
|
| 70 |
-
"Feature Engineering",
|
| 71 |
-
"Model Training",
|
| 72 |
-
"Evaluation",
|
| 73 |
-
"Deployment"
|
| 74 |
])
|
| 75 |
|
| 76 |
if lifecycle_option == "Overview of the NLP Life Cycle":
|
|
@@ -101,34 +99,34 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 101 |
- Ensuring solutions are fast and efficient.
|
| 102 |
|
| 103 |
#### Steps in the NLP Life Cycle:
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
|
| 116 |
elif lifecycle_option == "Problem Definition":
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
|
| 129 |
elif lifecycle_option == "Data Collection":
|
| 130 |
st.write("""
|
| 131 |
-
#### 2. Data Collection
|
| 132 |
Data collection is the second step in the NLP lifecycle. It involves gathering data from various sources based on the problem statement, so it can be analyzed and processed.
|
| 133 |
- **Sources for data collection**:
|
| 134 |
- π The data should be collected based on a clear understanding of the problem statement.
|
|
@@ -182,11 +180,11 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 182 |
```
|
| 183 |
Using the above code structure, we can efficiently extract data from various file formats such as CSV, JSON, Excel, and XML, and load it into a structured format suitable for analysis.
|
| 184 |
""")
|
| 185 |
-
|
| 186 |
-
|
| 187 |
elif lifecycle_option == "Simple EDA":
|
| 188 |
st.write("""
|
| 189 |
-
####
|
| 190 |
#### Simple Exploratory Data Analysis (Simple EDA)
|
| 191 |
Simple EDA provides a high-level understanding of the dataset and its characteristics. It focuses on summarizing key features, identifying potential issues, and visualizing distributions to inform further analysis.
|
| 192 |
|
|
@@ -197,7 +195,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 197 |
- Class A: 700 instances
|
| 198 |
- Class B: 300 instances
|
| 199 |
- The dataset shows a 70:30 imbalance, which may require techniques like oversampling, undersampling, or synthetic data generation to correct.
|
| 200 |
-
|
| 201 |
#### Steps to Understand and Explore Your Data
|
| 202 |
- **Basic Data Inspection**: Examine data types, view the first few rows, and understand the overall structure.
|
| 203 |
- **Summary Statistics**: Calculate key metrics like mean, median, and standard deviation to summarize numerical variables.
|
|
@@ -216,14 +214,14 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 216 |
- Histogram for sales distribution
|
| 217 |
- Boxplot to detect outliers
|
| 218 |
""")
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
elif lifecycle_option == "Data Preprocessing":
|
| 223 |
st.write("""
|
| 224 |
#### π§Ή 4. Text Preprocessing
|
| 225 |
Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
|
| 226 |
-
|
| 227 |
**Key Steps in Text Preprocessing:**
|
| 228 |
- **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
|
| 229 |
- **Stop Words Removal**: Removing common words that donβt contribute much information.
|
|
@@ -242,7 +240,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 242 |
- URL Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "π¦", "#awesome"]
|
| 243 |
- Emoji Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "#awesome"]
|
| 244 |
- Hashtag Removal: ["The", "quick", "brown", "fox", "is", "running", "fast"]
|
| 245 |
-
|
| 246 |
Now, let's apply the necessary text preprocessing steps to clean up the data:
|
| 247 |
|
| 248 |
```python
|
|
@@ -275,8 +273,8 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 275 |
|
| 276 |
By following these preprocessing steps, the raw text is now ready for further analysis or machine learning tasks.
|
| 277 |
""")
|
| 278 |
-
|
| 279 |
-
|
| 280 |
elif lifecycle_option == "Feature Engineering":
|
| 281 |
st.write("""
|
| 282 |
#### π 5. Text Representation
|
|
@@ -284,12 +282,12 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 284 |
- **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
|
| 285 |
- **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
|
| 286 |
- **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning.
|
| 287 |
-
|
| 288 |
**Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
|
| 289 |
- Vocabulary: ["I", "love", "NLP"]
|
| 290 |
- Vector: [1, 1, 1] (word frequency representation)
|
| 291 |
""")
|
| 292 |
-
|
| 293 |
elif lifecycle_option == "Model Training":
|
| 294 |
st.write("""
|
| 295 |
#### ποΈββοΈ 6. Model Training
|
|
@@ -297,10 +295,10 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 297 |
- **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
|
| 298 |
- **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
|
| 299 |
- **Sentiment Analysis**: Logistic regression, Naive Bayes, or transformer-based models like BERT.
|
| 300 |
-
|
| 301 |
**Example**: Training a Naive Bayes classifier to categorize news articles into topics such as "Sports", "Politics", etc.
|
| 302 |
""")
|
| 303 |
-
|
| 304 |
elif lifecycle_option == "Evaluation":
|
| 305 |
st.write("""
|
| 306 |
#### π
7. Evaluation
|
|
@@ -309,10 +307,10 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 309 |
- **Precision**: The percentage of relevant instances among the retrieved instances.
|
| 310 |
- **Recall**: The percentage of relevant instances that were retrieved.
|
| 311 |
- **F1-score**: The harmonic mean of precision and recall.
|
| 312 |
-
|
| 313 |
**Example**: Evaluating a sentiment analysis model using accuracy and F1-score on a test dataset.
|
| 314 |
""")
|
| 315 |
-
|
| 316 |
elif lifecycle_option == "Deployment":
|
| 317 |
st.write("""
|
| 318 |
#### π 8. Deployment
|
|
@@ -320,7 +318,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 320 |
- **Integration** with web applications, chatbots, or other tools.
|
| 321 |
- **API Development**: Exposing the model through an API for real-time predictions.
|
| 322 |
- **Continuous Monitoring**: Tracking the modelβs performance and retraining it as needed.
|
| 323 |
-
|
| 324 |
**Example**: Deploying a sentiment analysis model in a customer service chatbot that analyzes customer inquiries in real time.
|
| 325 |
""")
|
| 326 |
|
|
|
|
| 7 |
sidebar.header('π NLP Navigation')
|
| 8 |
|
| 9 |
# Sidebar options for NLP Overview, Lifecycle, and Techniques
|
| 10 |
+
sidebar_option = sidebar.radio('Choose a section to explore:', ['What is NLP? π§ ', 'NLP Lifecycle π', 'NLP Techniques βοΈ'])
|
| 11 |
|
| 12 |
# Store the selected page in session state
|
| 13 |
if 'selected_page' not in st.session_state:
|
|
|
|
| 21 |
def set_title(title, color="black"):
|
| 22 |
st.markdown(f"<h1 style='text-align: center; color: {color};'>{title}</h1>", unsafe_allow_html=True)
|
| 23 |
|
| 24 |
+
if st.session_state.selected_page == 'What is NLP? π§ ':
|
| 25 |
+
set_title('π Natural Language Processing (NLP) π', color="purple")
|
| 26 |
|
| 27 |
+
elif st.session_state.selected_page == 'NLP Lifecycle π':
|
| 28 |
+
set_title('π Natural Language Processing (NLP) Lifecycle π', color="darkblue")
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
elif st.session_state.selected_page == 'NLP Techniques βοΈ':
|
| 31 |
+
set_title('βοΈ Techniques in Natural Language Processing (NLP) βοΈ', color="darkgreen")
|
| 32 |
|
| 33 |
+
# Content for "What is NLP? π§ "
|
| 34 |
+
if st.session_state.selected_page == 'What is NLP? π§ ':
|
| 35 |
+
st.markdown("<h2 style='text-align: center; color: orange;'>π Introduction to NLP</h2>", unsafe_allow_html=True)
|
| 36 |
st.write("""
|
| 37 |
#### π€ What is NLP?
|
| 38 |
Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is meaningful.
|
|
|
|
| 60 |
# Content for NLP Lifecycle
|
| 61 |
elif st.session_state.selected_page == "NLP Lifecycle":
|
| 62 |
lifecycle_option = sidebar.radio("Select NLP Lifecycle Step:", [
|
| 63 |
+
"Overview of the NLP Life Cycle π",
|
| 64 |
+
"Problem Definition π§",
|
| 65 |
+
"Data Collection π",
|
| 66 |
+
"Simple EDA π",
|
| 67 |
+
"Data Preprocessing π§Ή",
|
| 68 |
+
"Feature Engineering π",
|
| 69 |
+
"Model Training ποΈββοΈ",
|
| 70 |
+
"Evaluation π
",
|
| 71 |
+
"Deployment π"
|
| 72 |
])
|
| 73 |
|
| 74 |
if lifecycle_option == "Overview of the NLP Life Cycle":
|
|
|
|
| 99 |
- Ensuring solutions are fast and efficient.
|
| 100 |
|
| 101 |
#### Steps in the NLP Life Cycle:
|
| 102 |
+
1. **Problem Definition** π§
|
| 103 |
+
2. **Data Collection** π
|
| 104 |
+
3. **Simple EDA** π
|
| 105 |
+
4. **Data Preprocessing** π§Ή
|
| 106 |
+
5. **Feature Engineering** π
|
| 107 |
+
6. **Model Selection and Training** ποΈββοΈ
|
| 108 |
+
7. **Model Evaluation** π
|
| 109 |
+
8. **Model Tuning** βοΈ
|
| 110 |
+
9. **Deployment** π
|
| 111 |
+
10. **Monitoring and Maintenance** π οΈ
|
| 112 |
+
""")
|
| 113 |
|
| 114 |
elif lifecycle_option == "Problem Definition":
|
| 115 |
+
st.write("""
|
| 116 |
+
#### π§ 1. Problem Definition
|
| 117 |
+
- The first step in the NLP lifecycle is defining the problem. This means understanding the goal and figuring out how NLP can help solve the problem.
|
| 118 |
+
- Based on the problem, you will need to gather the data.
|
| 119 |
+
- **To better understand the problem, consider asking questions such as**:
|
| 120 |
+
- π― What is the main goal of this analysis?
|
| 121 |
+
- π What kind of text data are we working with (e.g., reviews, social media posts, documents)?
|
| 122 |
+
- π What do we want the output to be (e.g., sentiment score, summary, or classification)?
|
| 123 |
+
|
| 124 |
+
**Example of a problem statement**: The goal could be to classify customer reviews as either positive or negative, or to find the main topics in product reviews.
|
| 125 |
+
""")
|
| 126 |
|
| 127 |
elif lifecycle_option == "Data Collection":
|
| 128 |
st.write("""
|
| 129 |
+
#### π 2. Data Collection
|
| 130 |
Data collection is the second step in the NLP lifecycle. It involves gathering data from various sources based on the problem statement, so it can be analyzed and processed.
|
| 131 |
- **Sources for data collection**:
|
| 132 |
- π The data should be collected based on a clear understanding of the problem statement.
|
|
|
|
| 180 |
```
|
| 181 |
Using the above code structure, we can efficiently extract data from various file formats such as CSV, JSON, Excel, and XML, and load it into a structured format suitable for analysis.
|
| 182 |
""")
|
| 183 |
+
|
| 184 |
+
|
| 185 |
elif lifecycle_option == "Simple EDA":
|
| 186 |
st.write("""
|
| 187 |
+
#### π 3. Simple EDA
|
| 188 |
#### Simple Exploratory Data Analysis (Simple EDA)
|
| 189 |
Simple EDA provides a high-level understanding of the dataset and its characteristics. It focuses on summarizing key features, identifying potential issues, and visualizing distributions to inform further analysis.
|
| 190 |
|
|
|
|
| 195 |
- Class A: 700 instances
|
| 196 |
- Class B: 300 instances
|
| 197 |
- The dataset shows a 70:30 imbalance, which may require techniques like oversampling, undersampling, or synthetic data generation to correct.
|
| 198 |
+
|
| 199 |
#### Steps to Understand and Explore Your Data
|
| 200 |
- **Basic Data Inspection**: Examine data types, view the first few rows, and understand the overall structure.
|
| 201 |
- **Summary Statistics**: Calculate key metrics like mean, median, and standard deviation to summarize numerical variables.
|
|
|
|
| 214 |
- Histogram for sales distribution
|
| 215 |
- Boxplot to detect outliers
|
| 216 |
""")
|
| 217 |
+
|
| 218 |
+
|
| 219 |
+
|
| 220 |
elif lifecycle_option == "Data Preprocessing":
|
| 221 |
st.write("""
|
| 222 |
#### π§Ή 4. Text Preprocessing
|
| 223 |
Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
|
| 224 |
+
|
| 225 |
**Key Steps in Text Preprocessing:**
|
| 226 |
- **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
|
| 227 |
- **Stop Words Removal**: Removing common words that donβt contribute much information.
|
|
|
|
| 240 |
- URL Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "π¦", "#awesome"]
|
| 241 |
- Emoji Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "#awesome"]
|
| 242 |
- Hashtag Removal: ["The", "quick", "brown", "fox", "is", "running", "fast"]
|
| 243 |
+
|
| 244 |
Now, let's apply the necessary text preprocessing steps to clean up the data:
|
| 245 |
|
| 246 |
```python
|
|
|
|
| 273 |
|
| 274 |
By following these preprocessing steps, the raw text is now ready for further analysis or machine learning tasks.
|
| 275 |
""")
|
| 276 |
+
|
| 277 |
+
|
| 278 |
elif lifecycle_option == "Feature Engineering":
|
| 279 |
st.write("""
|
| 280 |
#### π 5. Text Representation
|
|
|
|
| 282 |
- **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
|
| 283 |
- **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
|
| 284 |
- **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning.
|
| 285 |
+
|
| 286 |
**Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
|
| 287 |
- Vocabulary: ["I", "love", "NLP"]
|
| 288 |
- Vector: [1, 1, 1] (word frequency representation)
|
| 289 |
""")
|
| 290 |
+
|
| 291 |
elif lifecycle_option == "Model Training":
|
| 292 |
st.write("""
|
| 293 |
#### ποΈββοΈ 6. Model Training
|
|
|
|
| 295 |
- **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
|
| 296 |
- **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
|
| 297 |
- **Sentiment Analysis**: Logistic regression, Naive Bayes, or transformer-based models like BERT.
|
| 298 |
+
|
| 299 |
**Example**: Training a Naive Bayes classifier to categorize news articles into topics such as "Sports", "Politics", etc.
|
| 300 |
""")
|
| 301 |
+
|
| 302 |
elif lifecycle_option == "Evaluation":
|
| 303 |
st.write("""
|
| 304 |
#### π
7. Evaluation
|
|
|
|
| 307 |
- **Precision**: The percentage of relevant instances among the retrieved instances.
|
| 308 |
- **Recall**: The percentage of relevant instances that were retrieved.
|
| 309 |
- **F1-score**: The harmonic mean of precision and recall.
|
| 310 |
+
|
| 311 |
**Example**: Evaluating a sentiment analysis model using accuracy and F1-score on a test dataset.
|
| 312 |
""")
|
| 313 |
+
|
| 314 |
elif lifecycle_option == "Deployment":
|
| 315 |
st.write("""
|
| 316 |
#### π 8. Deployment
|
|
|
|
| 318 |
- **Integration** with web applications, chatbots, or other tools.
|
| 319 |
- **API Development**: Exposing the model through an API for real-time predictions.
|
| 320 |
- **Continuous Monitoring**: Tracking the modelβs performance and retraining it as needed.
|
| 321 |
+
|
| 322 |
**Example**: Deploying a sentiment analysis model in a customer service chatbot that analyzes customer inquiries in real time.
|
| 323 |
""")
|
| 324 |
|