Spaces:

DOMMETI
/

From_Zero_to_ML_Hero

Sleeping

File size: 6,337 Bytes

2f145ad
 
b44a0c1
2f145ad
 
 
 
 
 
 
 
b44a0c1
2f145ad
 
 
b44a0c1
 
 
 
2f145ad
 
 
 
 
 
b44a0c1
2f145ad
 
b44a0c1
 
 
 
2f145ad
 
 
 
 
 
 
 
 
b44a0c1
2f145ad
 
 
 
 
 
 
 
 
b44a0c1
67db7cf
2f145ad
b44a0c1
2f145ad
b44a0c1
 
2f145ad
 
 
 
b44a0c1
2f145ad
b44a0c1

import streamlit as st

# Apply custom CSS styling
st.markdown("""
    <style>
    body {
        background-color: #eef2f7;
    }
    h1 {
        color: #00FFFF;
        font-family: 'Roboto', sans-serif;
        font-weight: 700;
        text-align: center;
        margin-bottom: 25px;
    }
    h2, h3 {
        font-family: 'Roboto', sans-serif;
        font-weight: 600;
    }
    h2 {
        color: #FFFACD;
    }
    h3 {
        color: #ba95b0;
    }
    p, ul, ol {
        font-family: 'Georgia', serif;
        line-height: 1.8;
        color: #495057;
    }
    ul {
        margin-left: 20px;
    }
    .icon-bullet {
        list-style-type: none;
        padding-left: 20px;
    }
    .icon-bullet li {
        font-family: 'Georgia', serif;
        font-size: 1.1em;
        margin-bottom: 10px;
        color: #495057;
    }
    .icon-bullet li::before {
        content: "✔️";
        padding-right: 10px;
        color: #00FFFF;
    }
    </style>
""", unsafe_allow_html=True)

# Page Configuration
st.title("Interactive NLP Guide")

# Sidebar Navigation
st.sidebar.title("Explore NLP Topics")
topics = [
    "Introduction",
    "Tokenization",
    "One-Hot Vectorization",
    "Bag of Words",
    "TF-IDF Vectorizer",
    "Word Embeddings",
]
selected_topic = st.sidebar.radio("Select a topic", topics)

# Content Based on Selection
if selected_topic == "Introduction":
    st.markdown("<h1>Natural Language Processing (NLP)</h1>", unsafe_allow_html=True)
    st.markdown("<h2>Introduction to NLP</h2>", unsafe_allow_html=True)
    st.markdown("""
    <p>Natural Language Processing (NLP) is a field at the intersection of linguistics and computer science, focusing on enabling computers to understand, interpret, and respond to human language.</p>
    <h3>Applications of NLP:</h3>
    <ul>
        <li>Chatbots and Virtual Assistants (e.g., Alexa, Siri)</li>
        <li>Machine Translation (e.g., Google Translate)</li>
        <li>Text Summarization</li>
        <li>Sentiment Analysis</li>
        <li>Speech Recognition Systems</li>
    </ul>
    """, unsafe_allow_html=True)

elif selected_topic == "Tokenization":
    st.markdown("<h1>Tokenization</h1>", unsafe_allow_html=True)
    st.markdown("<h2>What is Tokenization?</h2>", unsafe_allow_html=True)
    st.markdown("""
    <p>Tokenization is the process of breaking down a text into smaller units, such as sentences or words, called tokens. It is the first step in any NLP pipeline.</p>
    <h3>Types of Tokenization:</h3>
    <ul>
        <li><b>Word Tokenization:</b> Splits text into words (e.g., "I love NLP." → ["I", "love", "NLP"])</li>
        <li><b>Sentence Tokenization:</b> Splits text into sentences (e.g., "NLP is fascinating. It's the future." → ["NLP is fascinating.", "It's the future."])</li>
    </ul>
    <h3>Code Example:</h3>
    """, unsafe_allow_html=True)
    st.code("""
from nltk.tokenize import word_tokenize, sent_tokenize
text = "Natural Language Processing is exciting. Let's explore it!"
word_tokens = word_tokenize(text)
sentence_tokens = sent_tokenize(text)
print("Word Tokens:", word_tokens)
print("Sentence Tokens:", sentence_tokens)
    """, language="python")

elif selected_topic == "One-Hot Vectorization":
    st.markdown("<h1>One-Hot Vectorization</h1>", unsafe_allow_html=True)
    st.markdown("""
    <p>One-Hot Vectorization is a method to represent text where each unique word is converted into a unique binary vector.</p>
    <h3>How It Works:</h3>
    <ul>
        <li>Each word in the vocabulary is assigned an index.</li>
        <li>The vector is all zeros except for a <code>1</code> at the word's index.</li>
    </ul>
    <h3>Example:</h3>
    <ul>
        <li>Vocabulary: ["cat", "dog", "bird"]</li>
        <li>"cat" → [1, 0, 0]</li>
        <li>"dog" → [0, 1, 0]</li>
    </ul>
    <h3>Limitations:</h3>
    <ul>
        <li>High dimensionality for large vocabularies.</li>
        <li>Does not capture semantic relationships between words.</li>
    </ul>
    """, unsafe_allow_html=True)

elif selected_topic == "Bag of Words":
    st.markdown("<h1>Bag of Words (BoW)</h1>", unsafe_allow_html=True)
    st.markdown("""
    <p>Bag of Words represents text as word frequency counts, disregarding word order.</p>
    <h3>How It Works:</h3>
    <ul>
        <li>Create a vocabulary of unique words.</li>
        <li>Count the frequency of each word in a document.</li>
    </ul>
    <h3>Example:</h3>
    <ul>
        <li>Given Sentences:
            <ul>
                <li>"I love NLP."</li>
                <li>"I love programming."</li>
            </ul>
        </li>
        <li>Vocabulary: ["I", "love", "NLP", "programming"]</li>
        <li>Sentence 1: [1, 1, 1, 0]</li>
        <li>Sentence 2: [1, 1, 0, 1]</li>
    </ul>
    """, unsafe_allow_html=True)

elif selected_topic == "TF-IDF Vectorizer":
    st.markdown("<h1>TF-IDF Vectorizer</h1>", unsafe_allow_html=True)
    st.markdown("""
    <p>TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure that evaluates the importance of a word in a document relative to a collection of documents (corpus).</p>
    <h3>Formula:</h3>
    """, unsafe_allow_html=True)
    st.latex(r'''
    \text{TF-IDF} = \text{TF} \times \text{IDF}
    ''')
    st.markdown("""
    <ul>
        <li><b>Term Frequency (TF):</b> Frequency of a word in a document.</li>
        <li><b>Inverse Document Frequency (IDF):</b> Logarithm of the ratio of the total number of documents to the number of documents containing the word.</li>
    </ul>
    """, unsafe_allow_html=True)

elif selected_topic == "Word Embeddings":
    st.markdown("<h1>Word Embeddings</h1>", unsafe_allow_html=True)
    st.markdown("""
    <p>Word Embeddings are dense vector representations of words that capture semantic meanings and relationships.</p>
    <h3>Key Features:</h3>
    <ul>
        <li>Captures semantic relationships between words (e.g., "king" - "man" + "woman" = "queen").</li>
        <li>Efficient representation for large vocabularies.</li>
    </ul>
    <h3>Popular Word Embedding Models:</h3>
    <ul>
        <li>Word2Vec</li>
        <li>GloVe</li>
        <li>FastText</li>
    </ul>
    """, unsafe_allow_html=True)

# Footer
st.sidebar.markdown("---")
st.sidebar.markdown("Explore each topic to dive deeper into NLP concepts!")