Spaces:

Mpavan45
/

NLP_Blog

Build error

App Files Files Community

Mpavan45 commited on Dec 21, 2024

Commit

bad919e

verified ·

1 Parent(s): ab01801

Update app.py

Browse files

Files changed (1) hide show

app.py +23 -7

app.py CHANGED Viewed

@@ -279,9 +279,13 @@ elif st.session_state.selected_page == "🔄NLP Lifecycle":
         st.write("""
         #### 📝 5. Text Representation
         After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
         - **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
         - **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
-        - **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning.
         **Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
         - Vocabulary: ["I", "love", "NLP"]
@@ -331,6 +335,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
         "Stop Words Removal",
         "Lemmatization",
         "Stemming",
         "Bag of Words (BoW)",
         "TF-IDF",
         "Word Embeddings",
@@ -384,9 +389,20 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
         - **Example**: "running" → "run", "happiness" → "happi".
         """)
     elif technique_option == "Bag of Words (BoW)":
         st.write("""
-        #### 5. Bag of Words (BoW)
         The Bag of Words model represents text as a set of individual words, disregarding grammar and word order but keeping multiplicity. It is a simple and widely used method for text representation.
         - **Example**:
           - Text: "I love NLP"
@@ -395,14 +411,14 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
     elif technique_option == "TF-IDF":
         st.write("""
-        #### 6. TF-IDF (Term Frequency-Inverse Document Frequency)
         TF-IDF helps determine the importance of a word in a document relative to the entire dataset. It reduces the weight of common words and increases the weight of rare but important words.
         - **Example**: The word "data" might have a high TF-IDF score in a document about data analysis but a low score in a document about cooking.
         """)
     elif technique_option == "Word Embeddings":
         st.write("""
-        #### 7. Word Embeddings
         Word embeddings are vector representations of words that capture semantic relationships. Words with similar meanings have similar vectors. Common word embedding models include:
         - **Word2Vec**
         - **GloVe**
@@ -413,7 +429,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
     elif technique_option == "Named Entity Recognition (NER)":
         st.write("""
-        #### 8. Named Entity Recognition (NER)
         NER is the task of identifying named entities such as persons, organizations, locations, and dates in text. This technique is commonly used for information extraction.
         - **Example**: "Barack Obama was born in Hawaii."
           - Entities: ["Barack Obama" (Person), "Hawaii" (Location)]
@@ -421,7 +437,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
     elif technique_option == "Part-of-Speech (POS) Tagging":
         st.write("""
-        #### 9. Part-of-Speech (POS) Tagging
         POS tagging involves assigning grammatical labels (such as noun, verb, adjective) to each word in a sentence.
         - **Example**: "The cat sat on the mat."
           - POS Tags: [("The", "DT"), ("cat", "NN"), ("sat", "VBD"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
@@ -429,7 +445,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
     elif technique_option == "Sentiment Analysis":
         st.write("""
-        #### 10. Sentiment Analysis
         Sentiment analysis involves determining the sentiment of a piece of text, typically categorizing it as positive, negative, or neutral. This is commonly used for customer feedback and social media monitoring.
         - **Example**: "I love this product!" → Positive Sentiment
         """)

         st.write("""
         #### 📝 5. Text Representation
         After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
+        - **One-Hot Encoding**: Represents each word as a binary vector.
         - **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
         - **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
+        - **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning. Common word embedding models include:
+            - **Word2Vec**
+            - **GloVe**
+            - **FastText**
         **Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
         - Vocabulary: ["I", "love", "NLP"]
         "Stop Words Removal",
         "Lemmatization",
         "Stemming",
+        "One-Hot Encoding",
         "Bag of Words (BoW)",
         "TF-IDF",
         "Word Embeddings",
         - **Example**: "running" → "run", "happiness" → "happi".
         """)
+    elif technique_option == "One-Hot Encoding":
+        st.write("""
+        #### 5.One-Hot Encoding
+          - Represents each word as a binary vector.
+          - Example:
+           - Vocabulary: ["cat", "dog", "fish"]
+           - Encoding for "cat": [1, 0, 0]
+           - Encoding for "dog": [0, 1, 0]
+       - **Pros**: Simple to implement.
+       - **Cons**: Results in sparse and high-dimensional vectors.
+        """)
     elif technique_option == "Bag of Words (BoW)":
         st.write("""
+        #### 6. Bag of Words (BoW)
         The Bag of Words model represents text as a set of individual words, disregarding grammar and word order but keeping multiplicity. It is a simple and widely used method for text representation.
         - **Example**:
           - Text: "I love NLP"
     elif technique_option == "TF-IDF":
         st.write("""
+        #### 7. TF-IDF (Term Frequency-Inverse Document Frequency)
         TF-IDF helps determine the importance of a word in a document relative to the entire dataset. It reduces the weight of common words and increases the weight of rare but important words.
         - **Example**: The word "data" might have a high TF-IDF score in a document about data analysis but a low score in a document about cooking.
         """)
     elif technique_option == "Word Embeddings":
         st.write("""
+        #### 8. Word Embeddings
         Word embeddings are vector representations of words that capture semantic relationships. Words with similar meanings have similar vectors. Common word embedding models include:
         - **Word2Vec**
         - **GloVe**
     elif technique_option == "Named Entity Recognition (NER)":
         st.write("""
+        #### 9. Named Entity Recognition (NER)
         NER is the task of identifying named entities such as persons, organizations, locations, and dates in text. This technique is commonly used for information extraction.
         - **Example**: "Barack Obama was born in Hawaii."
           - Entities: ["Barack Obama" (Person), "Hawaii" (Location)]
     elif technique_option == "Part-of-Speech (POS) Tagging":
         st.write("""
+        #### 10. Part-of-Speech (POS) Tagging
         POS tagging involves assigning grammatical labels (such as noun, verb, adjective) to each word in a sentence.
         - **Example**: "The cat sat on the mat."
           - POS Tags: [("The", "DT"), ("cat", "NN"), ("sat", "VBD"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
     elif technique_option == "Sentiment Analysis":
         st.write("""
+        #### 11. Sentiment Analysis
         Sentiment analysis involves determining the sentiment of a piece of text, typically categorizing it as positive, negative, or neutral. This is commonly used for customer feedback and social media monitoring.
         - **Example**: "I love this product!" → Positive Sentiment
         """)