Spaces:

sree4411
/

NLP

Sleeping

App Files Files Community

sree4411 commited on Feb 5, 2025

Commit

1c1d531

verified ·

1 Parent(s): 011d79e

Update app.py

Browse files

Files changed (1) hide show

app.py +104 -107

app.py CHANGED Viewed

@@ -1,50 +1,39 @@
 import streamlit as st
 from gensim.models import Word2Vec
-# Apply custom styles using Streamlit's markdown
-st.markdown("""
-    <style>
-    .main-title { color: #FF5733; font-size: 20px; font-weight: bold; text-align: center; }
-    .section-title { color: #2E86C1; font-size: 30px; font-weight: bold; margin-top: 20px; }
-    .sub-title { color: #27AE60; font-size: 24px; font-weight: bold; margin-top: 10px; }
-    .text { font-size: 18px; }
-    </style>
-""", unsafe_allow_html=True)
 # Title
-st.markdown('<p class="main-title">Introduction to NLP</p>', unsafe_allow_html=True)
 # Section: What is NLP?
-st.markdown('<p class="section-title">What is NLP?</p>', unsafe_allow_html=True)
-st.markdown("""
 Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to process, understand, and generate human language.
-""")
-# Section: Applications of NLP
-st.markdown('<p class="sub-title">Applications of NLP:</p>', unsafe_allow_html=True)
-st.markdown("""
-- ✅ Chatbots & Virtual Assistants (e.g., Siri, Alexa)
-- ✅ Sentiment Analysis (e.g., Product reviews, Social Media monitoring)
-- ✅ Machine Translation (e.g., Google Translate)
-- ✅ Text Summarization (e.g., News article summaries)
-- ✅ Speech Recognition (e.g., Voice commands)
 """)
 # Section: NLP Terminologies
-st.markdown('<p class="section-title">NLP Terminologies</p>', unsafe_allow_html=True)
-st.markdown("""
-**Corpus**: A collection of text documents used for NLP tasks.
-**Tokenization**: Splitting text into individual words or phrases.
-**Stop Words**: Common words (e.g., "the", "is") that are often removed.
-**Stemming**: Reducing words to their base form (e.g., "running" → "run").
-**Lemmatization**: More advanced than stemming; converts words to their dictionary form.
-**NER (Named Entity Recognition)**: Identifies entities like names, dates, and locations.
-**Sentiment Analysis**: Determines the sentiment (positive, negative, neutral) of a text.
-**n-grams**: Sequences of 'n' consecutive words (e.g., "New York" is a bi-gram).
 """)
 # Section: Text Representation Methods
-st.markdown('<p class="section-title">Text Representation Methods</p>', unsafe_allow_html=True)
 methods = [
     "Bag of Words",
     "TF-IDF",
@@ -53,100 +42,108 @@ methods = [
 ]
 selected_method = st.radio("Select a text representation method:", methods)
 if selected_method == "Bag of Words":
-    st.markdown('<p class="sub-title">Bag of Words (BoW)</p>', unsafe_allow_html=True)
-    st.markdown("""
     **Definition**: Represents text as a collection of word counts, ignoring grammar and word order.
     """)
-    st.markdown("""
-    **Uses:**
-    - ✅ Sentiment analysis
-    - ✅ Document classification
-    - ✅ Information retrieval
-    **Advantages:**
-    - ✅ Simple and easy to implement
-    - ✅ Works well with traditional ML models
-    **Disadvantages:**
-    - ❌ Ignores word order and context
-    - ❌ High-dimensionality for large vocabularies
-    """)
 elif selected_method == "TF-IDF":
-    st.markdown('<p class="sub-title">Term Frequency-Inverse Document Frequency (TF-IDF)</p>', unsafe_allow_html=True)
-    st.markdown("""
     **Definition**: Weighs words based on their frequency in a document and across all documents.
     """)
-    st.markdown("""
-    **Uses:**
-    - ✅ Information retrieval (e.g., search engines)
-    - ✅ Text classification
-    - ✅ Keyword extraction
-    **Advantages:**
-    - ✅ Reduces the impact of common words
-    - ✅ Highlights important words
-    **Disadvantages:**
-    - ❌ Still ignores word order
-    - ❌ Does not capture deep semantics
-    """)
 elif selected_method == "One-Hot Encoding":
-    st.markdown('<p class="sub-title">One-Hot Encoding</p>', unsafe_allow_html=True)
-    st.markdown("""
     **Definition**: Represents words as binary vectors where each word has a unique position in a vocabulary.
     """)
-    st.markdown("""
-    **Uses:**
-    - ✅ Simple NLP tasks
-    - ✅ Word-level feature engineering
-    **Advantages:**
-    - ✅ Simple to understand
-    - ✅ Works well with small vocabulary sizes
-    **Disadvantages:**
-    - ❌ Inefficient for large vocabularies
-    - ❌ No information on word meaning
-    """)
 elif selected_method == "Word Embeddings (Word2Vec)":
-    st.markdown('<p class="sub-title">Word Embeddings (Word2Vec)</p>', unsafe_allow_html=True)
-    st.markdown("""
     **Definition**: Converts words into dense numerical vectors capturing semantic relationships.
-    """)
-    st.markdown("""
-    **Uses:**
-    - ✅ Machine translation
-    - ✅ Speech recognition
-    - ✅ Sentiment analysis
-    **Advantages:**
-    - ✅ Captures semantic relationships
-    - ✅ Works well for deep learning models
-    **Disadvantages:**
-    - ❌ Requires large datasets to train
-    - ❌ Computationally expensive
-    """)
-    # Sample texts for Word2Vec model
-    texts = [
-        "Natural Language Processing is fascinating.",
-        "Natural Language Processing involves understanding human language.",
-        "The field of NLP is growing rapidly."
-    ]
     model = Word2Vec(sentences=[text.split() for text in texts], vector_size=100, window=5, min_count=1, workers=4)
     word_vectors = model.wv
     word = 'natural'
     if word in word_vectors:
-        st.markdown(f'Word2Vec Representation of "{word}":')
         st.write(word_vectors[word])
     else:
-        st.markdown(f'Word "{word}" not found in the vocabulary.')
 # Footer
-st.markdown('<hr>', unsafe_allow_html=True)
-st.markdown('<p class="text" style="text-align:center;">Developed with ❤️ using Streamlit for NLP enthusiasts.</p>', unsafe_allow_html=True)

 import streamlit as st
+from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
+import numpy as np
 from gensim.models import Word2Vec
 # Title
+st.title("Introduction to NLP")
 # Section: What is NLP?
+st.header("What is NLP?")
+st.write("""
 Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to process, understand, and generate human language.
+### Applications of NLP:
+- **Chatbots & Virtual Assistants** (e.g., Siri, Alexa)
+- **Sentiment Analysis** (e.g., Product reviews, Social Media monitoring)
+- **Machine Translation** (e.g., Google Translate)
+- **Text Summarization** (e.g., News article summaries)
+- **Speech Recognition** (e.g., Voice commands)
 """)
 # Section: NLP Terminologies
+st.header("NLP Terminologies")
+st.write("""
+- **Corpus**: A collection of text documents used for NLP tasks.
+- **Tokenization**: Splitting text into individual words or phrases.
+- **Stop Words**: Common words (e.g., "the", "is") that are often removed.
+- **Stemming**: Reducing words to their base form (e.g., "running" → "run").
+- **Lemmatization**: More advanced than stemming; it converts words to their dictionary form.
+- **Named Entity Recognition (NER)**: Identifies entities like names, dates, and locations.
+- **Sentiment Analysis**: Determines the sentiment (positive, negative, neutral) of a text.
+- **n-grams**: Sequences of 'n' consecutive words (e.g., "New York" is a bi-gram).
 """)
 # Section: Text Representation Methods
+st.header("Text Representation Methods")
 methods = [
     "Bag of Words",
     "TF-IDF",
 ]
 selected_method = st.radio("Select a text representation method:", methods)
+# Sample Texts
+texts = [
+    "Natural Language Processing is fascinating.",
+    "Natural Language Processing involves understanding human language.",
+    "The field of NLP is growing rapidly."
+]
 if selected_method == "Bag of Words":
+    st.subheader("Bag of Words (BoW)")
+    st.write("""
     **Definition**: Represents text as a collection of word counts, ignoring grammar and word order.
+    **Uses**:
+    - Sentiment analysis
+    - Document classification
+    - Information retrieval
+    **Advantages**:
+    ✅ Simple and easy to implement
+    ✅ Works well with traditional ML models
+    **Disadvantages**:
+    ❌ Ignores word order and context
+    ❌ High-dimensionality for large vocabularies
     """)
+    vectorizer = CountVectorizer()
+    X_bow = vectorizer.fit_transform(texts)
+    st.write("Feature Names:", vectorizer.get_feature_names_out())
+    st.write("Bag of Words Representation:", X_bow.toarray())
 elif selected_method == "TF-IDF":
+    st.subheader("Term Frequency-Inverse Document Frequency (TF-IDF)")
+    st.write("""
     **Definition**: Weighs words based on their frequency in a document and across all documents.
+    **Uses**:
+    - Information retrieval (e.g., search engines)
+    - Text classification
+    - Keyword extraction
+    **Advantages**:
+    ✅ Reduces the impact of common words
+    ✅ Highlights important words
+    **Disadvantages**:
+    ❌ Still ignores word order
+    ❌ Does not capture deep semantics
     """)
+    tfidf_vectorizer = TfidfVectorizer()
+    X_tfidf = tfidf_vectorizer.fit_transform(texts)
+    st.write("Feature Names:", tfidf_vectorizer.get_feature_names_out())
+    st.write("TF-IDF Representation:", X_tfidf.toarray())
 elif selected_method == "One-Hot Encoding":
+    st.subheader("One-Hot Encoding")
+    st.write("""
     **Definition**: Represents words as binary vectors where each word has a unique position in a vocabulary.
+    **Uses**:
+    - Simple NLP tasks
+    - Word-level feature engineering
+    **Advantages**:
+    ✅ Simple to understand
+    ✅ Works well with small vocabulary sizes
+    **Disadvantages**:
+    ❌ Inefficient for large vocabularies
+    ❌ No information on word meaning
     """)
+    one_hot_vectorizer = CountVectorizer(binary=True)
+    X_one_hot = one_hot_vectorizer.fit_transform(texts)
+    st.write("Feature Names:", one_hot_vectorizer.get_feature_names_out())
+    st.write("One-Hot Encoding Representation:", X_one_hot.toarray())
 elif selected_method == "Word Embeddings (Word2Vec)":
+    st.subheader("Word Embeddings (Word2Vec)")
+    st.write("""
     **Definition**: Converts words into dense numerical vectors capturing semantic relationships.
+    **Uses**:
+    - Machine translation
+    - Speech recognition
+    - Sentiment analysis
+    **Advantages**:
+    ✅ Captures semantic relationships
+    ✅ Works well for deep learning models
+    **Disadvantages**:
+    ❌ Requires large datasets to train
+    ❌ Computationally expensive
+    """)
     model = Word2Vec(sentences=[text.split() for text in texts], vector_size=100, window=5, min_count=1, workers=4)
     word_vectors = model.wv
     word = 'natural'
     if word in word_vectors:
+        st.write(f"Word2Vec Representation of '{word}':")
         st.write(word_vectors[word])
     else:
+        st.write(f"Word '{word}' not found in the vocabulary.")
 # Footer
+st.write("---")
+st.write("Developed with ❤️ using Streamlit for NLP enthusiasts.")