Spaces:

DOMMETI
/

From_Zero_to_ML_Hero

Sleeping

App Files Files Community

DOMMETI commited on Jan 27, 2025

Commit

2f145ad

verified ·

1 Parent(s): 209b944

Create 9_natural_language_processing.py

Browse files

Files changed (1) hide show

pages/9_natural_language_processing.py +208 -0

pages/9_natural_language_processing.py ADDED Viewed

	@@ -0,0 +1,208 @@

+import streamlit as st
+# Page Configuration
+st.set_page_config(page_title="NLP Guide", layout="wide")
+# Custom CSS Styling
+st.markdown("""
+    <style>
+    body {
+        background-color: #eef2f7;
+        font-family: 'Roboto', sans-serif;
+    }
+    h1 {
+        color: #00FFFF;
+        font-family: 'Roboto', sans-serif;
+        font-weight: bold;
+        text-align: center;
+        margin-bottom: 25px;
+    }
+    h2 {
+        color: #FFFACD;
+        font-family: 'Roboto', sans-serif;
+        font-weight: 700;
+        margin-top: 30px;
+    }
+    h3 {
+        color: #ba95b0;
+        font-family: 'Roboto', sans-serif;
+        font-weight: 600;
+        margin-top: 20px;
+    }
+    p, ul {
+        font-family: 'Georgia', serif;
+        line-height: 1.8;
+        color: #2b2b2b;
+        margin-bottom: 20px;
+    }
+    .icon-bullet {
+        list-style-type: none;
+        padding-left: 20px;
+    }
+    .icon-bullet li {
+        font-family: 'Georgia', serif;
+        font-size: 1.1em;
+        margin-bottom: 10px;
+        color: #2b2b2b;
+    }
+    .icon-bullet li::before {
+        content: "✔️";
+        padding-right: 10px;
+        color: #00FFFF;
+    }
+    .stImage img {
+        border-radius: 10px;
+    }
+    </style>
+""", unsafe_allow_html=True)
+# Function to display the Home Page
+def show_home_page():
+    st.title("Natural Language Processing (NLP)")
+    st.markdown(
+        """
+        ### Welcome to NLP Guide 🌟
+        Natural Language Processing (NLP) bridges the gap between computers and human language. It's the core technology behind:
+        - Chatbots (e.g., Alexa, Siri)
+        - Machine Translation (Google Translate)
+        - Sentiment Analysis
+        - Search Engines (e.g., Google, Bing)
+        Dive into **Tokenization**, **Vectorization**, and more to understand how machines process text!
+        """
+    )
+    st.image(
+        "https://cdn-uploads.huggingface.co/production/uploads/64c972774515835c4dadd754/wSlRj9jk4szr4yy3wTlfA.webp",
+        caption="Applications of NLP",
+        width=800,
+    )
+# Function to display specific topic pages
+def show_page(page):
+    if page == "Tokenization":
+        st.title("Tokenization")
+        st.markdown("""
+        ### Tokenization 🛠️
+        Tokenization breaks text into smaller units (tokens), such as words or sentences. This is the first step in most NLP pipelines.
+        #### Types of Tokenization:
+        1. **Word Tokenization**:
+            - Splits text into individual words.
+            - Example: *"I love NLP"* → `["I", "love", "NLP"]`
+        2. **Sentence Tokenization**:
+            - Splits text into sentences.
+            - Example: *"NLP is exciting. Let's learn it."* → `["NLP is exciting.", "Let's learn it."]`
+        #### Libraries for Tokenization:
+        - **NLTK**: Popular for academic projects.
+        - **SpaCy**: Fast and production-ready.
+        - **Transformers**: Advanced tokenization for models like BERT.
+        #### Challenges in Tokenization:
+        - Handling contractions (e.g., "I'm" → ["I", "'m"]).
+        - Handling multi-lingual data (e.g., "Bonjour NLP").
+        """)
+    elif page == "NLP Terminologies":
+        st.title("NLP Terminologies")
+        st.markdown("""
+        ### NLP Terminologies 📚
+        - **Stop Words**: Commonly used words like "the" or "is" that are removed during preprocessing.
+        - **Stemming**: Reducing words to their root forms (e.g., "running" → "run").
+        - **Lemmatization**: Converting words to their base dictionary forms (e.g., "better" → "good").
+        - **POS Tagging**: Assigning parts of speech to words (e.g., noun, verb).
+        - **NER (Named Entity Recognition)**: Identifying entities like names or places (e.g., "New York").
+        """)
+    elif page == "One-Hot Vectorization":
+        st.title("One-Hot Vectorization")
+        st.markdown("""
+        ### One-Hot Vectorization 🔢
+        A simple way to represent text where each word is converted into a unique binary vector.
+        #### How It Works:
+        - Each word in the vocabulary is assigned an index.
+        - The vector is all zeros except for a `1` at the word's index.
+        #### Example:
+        Vocabulary: ["cat", "dog", "bird"]
+        - "cat" → [1, 0, 0]
+        - "dog" → [0, 1, 0]
+        #### Advantages:
+        - Easy to implement.
+        #### Limitations:
+        - High dimensionality for large vocabularies.
+        - Does not capture semantic relationships (e.g., "king" and "queen").
+        """)
+    elif page == "Bag of Words":
+        st.title("Bag of Words (BoW)")
+        st.markdown("""
+        ### Bag of Words 🧳
+        Represents text as word frequency counts.
+        #### How It Works:
+        1. Create a vocabulary of unique words.
+        2. Count the frequency of each word in a document.
+        #### Example:
+        Given two sentences:
+        - "I love NLP."
+        - "I love programming."
+        Vocabulary: ["I", "love", "NLP", "programming"]
+        - Sentence 1: [1, 1, 1, 0]
+        - Sentence 2: [1, 1, 0, 1]
+        """)
+    elif page == "TF-IDF Vectorizer":
+        st.title("TF-IDF Vectorizer")
+        st.markdown("""
+        ### TF-IDF Vectorizer 📊
+        A statistical measure that evaluates the importance of a word in a document relative to a collection of documents (corpus).
+        #### Formula:
+        \[
+        \text{TF-IDF} = \text{TF} \times \text{IDF}
+        \]
+        - *TF*: Term Frequency
+        - *IDF*: Inverse Document Frequency
+        """)
+    elif page == "Word2Vec":
+        st.title("Word2Vec")
+        st.markdown("""
+        ### Word2Vec 🤖
+        A neural network-based method for creating dense vector representations of words.
+        #### Key Features:
+        - Captures semantic relationships (e.g., "king" - "man" + "woman" = "queen").
+        """)
+# Sidebar navigation
+st.sidebar.title("Explore NLP Topics")
+menu_options = [
+    "Home",
+    "Tokenization",
+    "NLP Terminologies",
+    "One-Hot Vectorization",
+    "Bag of Words",
+    "TF-IDF Vectorizer",
+    "Word2Vec",
+]
+selected_page = st.sidebar.radio("Select a topic", menu_options)
+# Display the selected page
+if selected_page == "Home":
+    show_home_page()
+else:
+    show_page(selected_page)