import streamlit as st

# Function to display the Home Page
def show_home_page():
    st.title("🔦 :red[Natural Language Processing (NLP)]")
    st.markdown(
        """
        ### :green[Welcome to the NLP Guide]
        Natural Language Processing (NLP) is a fascinating branch of Artificial Intelligence that focuses on the interaction between 
        computers and humans using natural language. It enables machines to read, understand, and generate human language in a meaningful way.
        This guide explores key NLP concepts and techniques, from basic terminologies to advanced vectorization methods. Use the sidebar to explore each topic in detail.
        
        #### :green[Applications of NLP:]
        - Chatbots and virtual assistants (e.g., Alexa, Siri)
        - Sentiment analysis
        - Language translation tools (e.g., Google Translate)
        - Text summarization and more!
        """
    )
    st.image("https://cdn-uploads.huggingface.co/production/uploads/66be28cc7e8987822d129400/1zCao_p5aQZr6zgYScaOB.png")
# Function to display specific topic pages
def show_page(page):
    if page == "NLP Terminologies":
        st.title("🔍 :blue[NLP Terminologies]")
        st.markdown(
            """
            ### :red[Key NLP Terms:]
            - **Tokenization**: Splitting text into smaller units like words or sentences.
            - **Stop Words**: Commonly used words (e.g., "the", "is") often removed during preprocessing.
            - **Stemming**: Reducing words to their root form (e.g., "running" → "run").
            - **Lemmatization**: Converting words to their dictionary base form (e.g., "running" → "run").
            - **Corpus**: A large collection of text used for NLP training and analysis.
            - **Vocabulary**: The set of unique words in a corpus.
            - **n-grams**: Sequences of *n* words or characters in text.
            - **POS Tagging**: Assigning parts of speech (e.g., noun, verb) to words.
            - **NER (Named Entity Recognition)**: Identifying names, places, organizations, etc.
            - **Parsing**: Analyzing the grammatical structure of a sentence.
            """
        )

    elif page == "One-Hot Vectorization":
        st.title("🔧 :green[One-Hot Vectorization]")
        st.markdown(
            """
            ### :red[One-Hot Vectorization Explained]
            One-Hot Vectorization is a simple representation where each word is encoded as a binary vector.
            #### :red[How It Works:]
            - Each unique word in the vocabulary is assigned an index.
            - The vector for a word is all zeros except for a `1` at the index of that word.
            #### :red[Example:]
            Vocabulary: ["cat", "dog", "bird"]
            - "cat" → [1, 0, 0]
            - "dog" → [0, 1, 0]
            - "bird" → [0, 0, 1]
            #### :red[Advantages:]
            - Simple and intuitive to implement.
            #### :red[Limitations:]
            - High dimensionality for large vocabularies.
            - Does not capture semantic relationships (e.g., "cat" and "kitten" have no connection).
            #### :red[Applications:]
            - Suitable for small datasets where simplicity is a priority.
            """
        )

    elif page == "Bag of Words":
        st.title("🔄 :green[Bag of Words (BoW)]")
        st.markdown(
            """
            ### :orange[Bag of Words (BoW) Method]
            Bag of Words is a way of representing text by counting word occurrences while ignoring word order.
            #### :orange[How It Works:]
            1. Create a vocabulary of all unique words in the text.
            2. Count the frequency of each word in a document.
            #### :orange[Example:]
            Given two sentences:
            - Sentence 1: "I love NLP."
            - Sentence 2: "I love programming."
            Vocabulary: ["I", "love", "NLP", "programming"]
            - Sentence 1: [1, 1, 1, 0]
            - Sentence 2: [1, 1, 0, 1]
            #### :orange[Advantages:]
            - Simple to implement and interpret.
            #### :orange[Limitations:]
            - High dimensionality for large vocabularies.
            - Ignores word order and semantic meaning.
            - Sensitive to noisy or frequent terms.
            #### :orange[Applications:]
            - Text classification and clustering.
            """
        )

    elif page == "TF-IDF Vectorizer":
        st.title("🔄 :blue[TF-IDF Vectorizer]")
        st.markdown(
            """
            ### :green[TF-IDF (Term Frequency-Inverse Document Frequency)]
            TF-IDF evaluates the importance of a word in a document relative to a collection of documents (corpus).
            #### :rainbow[Formula:]
            \[ \text{TF-IDF} = \text{TF} \times \text{IDF} \]
            - **TF (Term Frequency)**: Frequency of a word in a document divided by the total words in the document.
            - **IDF (Inverse Document Frequency)**: Logarithm of total documents divided by the number of documents containing the word.
            #### :rainbow[Example:]
            For the corpus:
            - Document 1: "NLP is amazing."
            - Document 2: "NLP is fun and amazing."
            Words like "fun" and "amazing" will have higher weights than commonly occurring words like "is".
            #### :rainbow[Advantages:]
            - Highlights unique and relevant terms.
            - Reduces the impact of frequent, less informative words.
            #### :rainbow[Applications:]
            - Information retrieval, search engines, and document classification.
            """
        )

    elif page == "Word2Vec":
        st.title("🌐 :red[Word2Vec]")
        st.markdown(
            """
            ### :green[Word2Vec]
            Word2Vec creates dense vector representations of words, capturing semantic relationships using neural networks.
            #### :green[Key Models:]
            - **CBOW (Continuous Bag of Words)**: Predicts the target word from its context.
            - **Skip-gram**: Predicts the context from a target word.
            #### :green[Example:]
            Word2Vec can capture relationships like:
            - "king" - "man" + "woman" ≈ "queen"
            #### :green[Advantages:]
            - Captures semantic meaning and relationships.
            - Efficient for large datasets.
            #### :green[Applications:]
            - Sentiment analysis, recommendation systems, and machine translation.
            #### :green[Limitations:]
            - Computationally intensive for training on large datasets.
            """
        )

    elif page == "FastText":
        st.title("🔄 :red[FastText]")
        st.markdown(
            """
            ### :blue[FastText]
            FastText extends Word2Vec by representing words as character n-grams, enabling it to handle rare and out-of-vocabulary words.
            #### :blue[Example:]
            The word "playing" might be represented by subwords like "pla", "lay", "ayi", "ing".
            #### :blue[Advantages:]
            - Handles rare words and misspellings.
            - Captures subword information (e.g., prefixes and suffixes).
            #### :blue[Applications:]
            - Multilingual text processing.
            - Working with noisy or incomplete data.
            #### :blue[Limitations:]
            - Higher computational cost than Word2Vec.
            """
        )

    elif page == "Tokenization":
        st.title("🔢 :blue[Tokenization]")
        st.markdown(
            """
            ### :red[Tokenization]
            Tokenization is the process of splitting text into smaller units (tokens) such as words, phrases, or sentences.
            #### :red[Types:]
            - **Word Tokenization**: Splits text into words.
            - **Sentence Tokenization**: Splits text into sentences.
            #### :red[Example:]
            Sentence: "NLP is exciting."
            - Word Tokens: ["NLP", "is", "exciting", "."]
            #### :red[Libraries:]
            - NLTK
            - SpaCy
            - Hugging Face Transformers
            #### :red[Challenges:]
            - Handling complex text (e.g., abbreviations, contractions, multilingual data).
            #### :red[Applications:]
            - Preprocessing for machine learning models.
            """
        )

    elif page == "Stop Words":
        st.title("🔐 :green[Stop Words]")
        st.markdown(
            """
            ### :rainbow[Stop Words]
            Stop words are commonly used words in a language that are often removed during text preprocessing (e.g., "is", "the", "and").
            #### :rainbow[Why Remove Stop Words?]
            - To reduce noise and focus on meaningful terms in text.
            #### :rainbow[Example Stop Words:]
            - English: "is", "the", "and".
            - Spanish: "es", "el", "y".
            #### :rainbow[Challenges:]
            - Some stop words might carry important context in specific use cases.
            #### :rainbow[Applications:]
            - Sentiment analysis, text classification, and search engines.
            """
        )

# Sidebar navigation
st.sidebar.title("🔍 NLP Topics")
menu_options = [
    "Home",
    "NLP Terminologies",
    "One-Hot Vectorization",
    "Bag of Words",
    "TF-IDF Vectorizer",
    "Word2Vec",
    "FastText",
    "Tokenization",
    "Stop Words",
]
selected_page = st.sidebar.radio("Select a topic", menu_options)

# Display the selected page
if selected_page == "Home":
    show_home_page()
else:
    show_page(selected_page)