Spaces:

Phani1008
/

Natural_Language_Processing

Sleeping

App Files Files Community

Phani1008 commited on Jan 27, 2025

Commit

63d1f4e

verified ·

1 Parent(s): 0444a83

Update app.py

Browse files

Files changed (1) hide show

app.py +14 -75

app.py CHANGED Viewed

@@ -15,21 +15,21 @@ def show_home_page():
     )
     if st.button("NLP Terminologies"):
-        st.query_params(page="terminologies")
     if st.button("One-Hot Vectorization"):
-        st.query_params(page="one_hot")
     if st.button("Bag of Words"):
-        st.query_params(page="bow")
     if st.button("TF-IDF Vectorizer"):
-        st.query_params(page="tfidf")
     if st.button("Word2Vec"):
-        st.query_params(page="word2vec")
     if st.button("FastText"):
-        st.query_params(page="fasttext")
     if st.button("Tokenization"):
-        st.query_params(page="tokenization")
     if st.button("Stop Words"):
-        st.query_params(page="stop_words")
 def show_page(page):
     if page == "terminologies":
@@ -60,7 +60,6 @@ def show_page(page):
             - **Named Entity Recognition (NER)**: Identifying entities like names, locations, and organizations in text.
             - **Parsing**: Analyzing grammatical structure and relationships between words.
             """
         )
     elif page == "one_hot":
@@ -139,17 +138,6 @@ def show_page(page):
             - **Term Frequency (TF)**: Number of times a term appears in a document divided by total terms in the document.
             - **Inverse Document Frequency (IDF)**: Logarithm of total documents divided by the number of documents containing the term.
-            #### Advantages:
-            - Reduces the weight of common words.
-            - Highlights unique and important words.
-            #### Example:
-            For the corpus:
-            - Doc1: "NLP is amazing."
-            - Doc2: "NLP is fun and amazing."
-            TF-IDF highlights words like "fun" and "amazing" over commonly occurring words like "is".
             #### Applications:
             - Search engines, information retrieval, and document classification.
             """
@@ -166,19 +154,8 @@ def show_page(page):
             - **CBOW (Continuous Bag of Words)**: Predicts the target word from its context.
             - **Skip-gram**: Predicts the context from the target word.
-            #### Advantages:
-            - Captures semantic meaning (e.g., "king" - "man" + "woman" ≈ "queen").
-            - Efficient for large datasets.
-            #### Training Process:
-            - Uses shallow neural networks.
-            - Optimized using techniques like negative sampling.
             #### Applications:
             - Text classification, sentiment analysis, and recommendation systems.
-            #### Limitations:
-            - Requires significant computational resources.
             """
         )
     elif page == "fasttext":
@@ -189,19 +166,9 @@ def show_page(page):
             FastText is an extension of Word2Vec that represents words as a combination of character n-grams.
-            #### Advantages:
-            - Handles rare and out-of-vocabulary words.
-            - Captures subword information (e.g., prefixes and suffixes).
-            #### Example:
-            The word "playing" might be represented by n-grams like "pla", "lay", "ayi", "ing".
             #### Applications:
             - Multilingual text processing.
             - Handling noisy and incomplete data.
-            #### Limitations:
-            - Higher computational cost compared to Word2Vec.
             """
         )
     elif page == "tokenization":
@@ -211,23 +178,6 @@ def show_page(page):
             ### Tokenization
             Tokenization is the process of breaking text into smaller units (tokens) such as words, phrases, or sentences.
-            #### Types of Tokenization:
-            - **Word Tokenization**: Splits text into words.
-            - **Sentence Tokenization**: Splits text into sentences.
-            #### Libraries for Tokenization:
-            - NLTK, SpaCy, and Hugging Face Transformers.
-            #### Example:
-            Sentence: "NLP is exciting."
-            - Word Tokens: ["NLP", "is", "exciting", "."]
-            #### Applications:
-            - Preprocessing for machine learning models.
-            #### Challenges:
-            - Handling complex text like abbreviations and multilingual data.
             """
         )
     elif page == "stop_words":
@@ -237,26 +187,15 @@ def show_page(page):
             ### Stop Words
             Stop words are commonly used words in a language that are often removed during text preprocessing.
-            #### Examples of Stop Words:
-            - English: "is", "the", "and", "in".
-            - Spanish: "es", "el", "y", "en".
-            #### Why Remove Stop Words?
-            - To reduce noise in text data.
-            #### Applications:
-            - Sentiment analysis, text classification, and search engines.
-            #### Challenges:
-            - Some stop words might carry context-specific importance.
             """
         )
-query_params = st.query_params()
-page = query_params.get("page", ["home"])[0]
-if page == "home":
     show_home_page()
 else:
-    show_page(page)

     )
     if st.button("NLP Terminologies"):
+        st.session_state["page"] = "terminologies"
     if st.button("One-Hot Vectorization"):
+        st.session_state["page"] = "one_hot"
     if st.button("Bag of Words"):
+        st.session_state["page"] = "bow"
     if st.button("TF-IDF Vectorizer"):
+        st.session_state["page"] = "tfidf"
     if st.button("Word2Vec"):
+        st.session_state["page"] = "word2vec"
     if st.button("FastText"):
+        st.session_state["page"] = "fasttext"
     if st.button("Tokenization"):
+        st.session_state["page"] = "tokenization"
     if st.button("Stop Words"):
+        st.session_state["page"] = "stop_words"
 def show_page(page):
     if page == "terminologies":
             - **Named Entity Recognition (NER)**: Identifying entities like names, locations, and organizations in text.
             - **Parsing**: Analyzing grammatical structure and relationships between words.
             """
         )
     elif page == "one_hot":
             - **Term Frequency (TF)**: Number of times a term appears in a document divided by total terms in the document.
             - **Inverse Document Frequency (IDF)**: Logarithm of total documents divided by the number of documents containing the term.
             #### Applications:
             - Search engines, information retrieval, and document classification.
             """
             - **CBOW (Continuous Bag of Words)**: Predicts the target word from its context.
             - **Skip-gram**: Predicts the context from the target word.
             #### Applications:
             - Text classification, sentiment analysis, and recommendation systems.
             """
         )
     elif page == "fasttext":
             FastText is an extension of Word2Vec that represents words as a combination of character n-grams.
             #### Applications:
             - Multilingual text processing.
             - Handling noisy and incomplete data.
             """
         )
     elif page == "tokenization":
             ### Tokenization
             Tokenization is the process of breaking text into smaller units (tokens) such as words, phrases, or sentences.
             """
         )
     elif page == "stop_words":
             ### Stop Words
             Stop words are commonly used words in a language that are often removed during text preprocessing.
             """
         )
+# Initialize session state for page navigation
+if "page" not in st.session_state:
+    st.session_state["page"] = "home"
+# Show appropriate page
+if st.session_state["page"] == "home":
     show_home_page()
 else:
+    show_page(st.session_state["page"])