Update app.py
Browse files
app.py
CHANGED
|
@@ -279,9 +279,13 @@ elif st.session_state.selected_page == "🔄NLP Lifecycle":
|
|
| 279 |
st.write("""
|
| 280 |
#### 📝 5. Text Representation
|
| 281 |
After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
|
|
|
|
| 282 |
- **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
|
| 283 |
- **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
|
| 284 |
-
- **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning.
|
|
|
|
|
|
|
|
|
|
| 285 |
|
| 286 |
**Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
|
| 287 |
- Vocabulary: ["I", "love", "NLP"]
|
|
@@ -331,6 +335,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
|
|
| 331 |
"Stop Words Removal",
|
| 332 |
"Lemmatization",
|
| 333 |
"Stemming",
|
|
|
|
| 334 |
"Bag of Words (BoW)",
|
| 335 |
"TF-IDF",
|
| 336 |
"Word Embeddings",
|
|
@@ -384,9 +389,20 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
|
|
| 384 |
- **Example**: "running" → "run", "happiness" → "happi".
|
| 385 |
""")
|
| 386 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 387 |
elif technique_option == "Bag of Words (BoW)":
|
| 388 |
st.write("""
|
| 389 |
-
####
|
| 390 |
The Bag of Words model represents text as a set of individual words, disregarding grammar and word order but keeping multiplicity. It is a simple and widely used method for text representation.
|
| 391 |
- **Example**:
|
| 392 |
- Text: "I love NLP"
|
|
@@ -395,14 +411,14 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
|
|
| 395 |
|
| 396 |
elif technique_option == "TF-IDF":
|
| 397 |
st.write("""
|
| 398 |
-
####
|
| 399 |
TF-IDF helps determine the importance of a word in a document relative to the entire dataset. It reduces the weight of common words and increases the weight of rare but important words.
|
| 400 |
- **Example**: The word "data" might have a high TF-IDF score in a document about data analysis but a low score in a document about cooking.
|
| 401 |
""")
|
| 402 |
|
| 403 |
elif technique_option == "Word Embeddings":
|
| 404 |
st.write("""
|
| 405 |
-
####
|
| 406 |
Word embeddings are vector representations of words that capture semantic relationships. Words with similar meanings have similar vectors. Common word embedding models include:
|
| 407 |
- **Word2Vec**
|
| 408 |
- **GloVe**
|
|
@@ -413,7 +429,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
|
|
| 413 |
|
| 414 |
elif technique_option == "Named Entity Recognition (NER)":
|
| 415 |
st.write("""
|
| 416 |
-
####
|
| 417 |
NER is the task of identifying named entities such as persons, organizations, locations, and dates in text. This technique is commonly used for information extraction.
|
| 418 |
- **Example**: "Barack Obama was born in Hawaii."
|
| 419 |
- Entities: ["Barack Obama" (Person), "Hawaii" (Location)]
|
|
@@ -421,7 +437,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
|
|
| 421 |
|
| 422 |
elif technique_option == "Part-of-Speech (POS) Tagging":
|
| 423 |
st.write("""
|
| 424 |
-
####
|
| 425 |
POS tagging involves assigning grammatical labels (such as noun, verb, adjective) to each word in a sentence.
|
| 426 |
- **Example**: "The cat sat on the mat."
|
| 427 |
- POS Tags: [("The", "DT"), ("cat", "NN"), ("sat", "VBD"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
|
|
@@ -429,7 +445,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
|
|
| 429 |
|
| 430 |
elif technique_option == "Sentiment Analysis":
|
| 431 |
st.write("""
|
| 432 |
-
####
|
| 433 |
Sentiment analysis involves determining the sentiment of a piece of text, typically categorizing it as positive, negative, or neutral. This is commonly used for customer feedback and social media monitoring.
|
| 434 |
- **Example**: "I love this product!" → Positive Sentiment
|
| 435 |
""")
|
|
|
|
| 279 |
st.write("""
|
| 280 |
#### 📝 5. Text Representation
|
| 281 |
After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
|
| 282 |
+
- **One-Hot Encoding**: Represents each word as a binary vector.
|
| 283 |
- **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
|
| 284 |
- **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
|
| 285 |
+
- **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning. Common word embedding models include:
|
| 286 |
+
- **Word2Vec**
|
| 287 |
+
- **GloVe**
|
| 288 |
+
- **FastText**
|
| 289 |
|
| 290 |
**Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
|
| 291 |
- Vocabulary: ["I", "love", "NLP"]
|
|
|
|
| 335 |
"Stop Words Removal",
|
| 336 |
"Lemmatization",
|
| 337 |
"Stemming",
|
| 338 |
+
"One-Hot Encoding",
|
| 339 |
"Bag of Words (BoW)",
|
| 340 |
"TF-IDF",
|
| 341 |
"Word Embeddings",
|
|
|
|
| 389 |
- **Example**: "running" → "run", "happiness" → "happi".
|
| 390 |
""")
|
| 391 |
|
| 392 |
+
elif technique_option == "One-Hot Encoding":
|
| 393 |
+
st.write("""
|
| 394 |
+
#### 5.One-Hot Encoding
|
| 395 |
+
- Represents each word as a binary vector.
|
| 396 |
+
- Example:
|
| 397 |
+
- Vocabulary: ["cat", "dog", "fish"]
|
| 398 |
+
- Encoding for "cat": [1, 0, 0]
|
| 399 |
+
- Encoding for "dog": [0, 1, 0]
|
| 400 |
+
- **Pros**: Simple to implement.
|
| 401 |
+
- **Cons**: Results in sparse and high-dimensional vectors.
|
| 402 |
+
""")
|
| 403 |
elif technique_option == "Bag of Words (BoW)":
|
| 404 |
st.write("""
|
| 405 |
+
#### 6. Bag of Words (BoW)
|
| 406 |
The Bag of Words model represents text as a set of individual words, disregarding grammar and word order but keeping multiplicity. It is a simple and widely used method for text representation.
|
| 407 |
- **Example**:
|
| 408 |
- Text: "I love NLP"
|
|
|
|
| 411 |
|
| 412 |
elif technique_option == "TF-IDF":
|
| 413 |
st.write("""
|
| 414 |
+
#### 7. TF-IDF (Term Frequency-Inverse Document Frequency)
|
| 415 |
TF-IDF helps determine the importance of a word in a document relative to the entire dataset. It reduces the weight of common words and increases the weight of rare but important words.
|
| 416 |
- **Example**: The word "data" might have a high TF-IDF score in a document about data analysis but a low score in a document about cooking.
|
| 417 |
""")
|
| 418 |
|
| 419 |
elif technique_option == "Word Embeddings":
|
| 420 |
st.write("""
|
| 421 |
+
#### 8. Word Embeddings
|
| 422 |
Word embeddings are vector representations of words that capture semantic relationships. Words with similar meanings have similar vectors. Common word embedding models include:
|
| 423 |
- **Word2Vec**
|
| 424 |
- **GloVe**
|
|
|
|
| 429 |
|
| 430 |
elif technique_option == "Named Entity Recognition (NER)":
|
| 431 |
st.write("""
|
| 432 |
+
#### 9. Named Entity Recognition (NER)
|
| 433 |
NER is the task of identifying named entities such as persons, organizations, locations, and dates in text. This technique is commonly used for information extraction.
|
| 434 |
- **Example**: "Barack Obama was born in Hawaii."
|
| 435 |
- Entities: ["Barack Obama" (Person), "Hawaii" (Location)]
|
|
|
|
| 437 |
|
| 438 |
elif technique_option == "Part-of-Speech (POS) Tagging":
|
| 439 |
st.write("""
|
| 440 |
+
#### 10. Part-of-Speech (POS) Tagging
|
| 441 |
POS tagging involves assigning grammatical labels (such as noun, verb, adjective) to each word in a sentence.
|
| 442 |
- **Example**: "The cat sat on the mat."
|
| 443 |
- POS Tags: [("The", "DT"), ("cat", "NN"), ("sat", "VBD"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
|
|
|
|
| 445 |
|
| 446 |
elif technique_option == "Sentiment Analysis":
|
| 447 |
st.write("""
|
| 448 |
+
#### 11. Sentiment Analysis
|
| 449 |
Sentiment analysis involves determining the sentiment of a piece of text, typically categorizing it as positive, negative, or neutral. This is commonly used for customer feedback and social media monitoring.
|
| 450 |
- **Example**: "I love this product!" → Positive Sentiment
|
| 451 |
""")
|