Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

App Files Files Community

Harika22 commited on Jan 27, 2025

Commit

5f0db14

verified ·

1 Parent(s): 7b57fd0

Update pages/3_Terminology.py

Browse files

Files changed (1) hide show

pages/3_Terminology.py +77 -12

pages/3_Terminology.py CHANGED Viewed

@@ -26,6 +26,47 @@ st.markdown(
             color: black
             animation: fadeIn 2s ease-in-out;
         }
         .section {
             font-size: 1.1rem;
             text-align: justify;
@@ -40,7 +81,7 @@ st.markdown(
         }
         .term {
             font-weight: bold;
-            color: black
             animation: fadeIn 3s ease-in-out;
         }
         .definition {
@@ -63,7 +104,7 @@ st.markdown(
 st.markdown(
     """
     <p class="section"><span class="term">Documents</span><br>
-    It is a collection of sentence / paragraph / single word / single character
     </p>
     """,
     unsafe_allow_html=True,
@@ -71,8 +112,8 @@ st.markdown(
 st.markdown(
     """
-    <p class="section"><span class="term">Stemming</span><br>
-    Stemming is the process of reducing words to their base or root form. For example, "running" becomes "run." It helps in reducing the complexity of text data by grouping similar words together.
     </p>
     """,
     unsafe_allow_html=True,
@@ -80,8 +121,8 @@ st.markdown(
 st.markdown(
     """
-    <p class="section"><span class="term">Lemmatization</span><br>
-    Lemmatization is a more advanced form of stemming that reduces words to their base form by considering the context and meaning. For example, "better" becomes "good" based on its usage in a sentence.
     </p>
     """,
     unsafe_allow_html=True,
@@ -89,8 +130,8 @@ st.markdown(
 st.markdown(
     """
-    <p class="section"><span class="term">Named Entity Recognition (NER)</span><br>
-    NER is the task of identifying and classifying named entities in text, such as person names, locations, organizations, and dates. This technique is useful in tasks like information retrieval and summarization.
     </p>
     """,
     unsafe_allow_html=True,
@@ -98,8 +139,8 @@ st.markdown(
 st.markdown(
     """
-    <p class="section"><span class="term">Part-of-Speech (POS) Tagging</span><br>
-    POS tagging involves labeling each word in a sentence with its grammatical category, such as noun, verb, or adjective. It helps in understanding the syntactic structure of the text.
     </p>
     """,
     unsafe_allow_html=True,
@@ -107,12 +148,36 @@ st.markdown(
 st.markdown(
     """
-    <p class="section"><span class="term">Word Embeddings</span><br>
-    Word embeddings are numerical representations of words in a continuous vector space, where similar words are closer together. Common techniques include Word2Vec, GloVe, and FastText.
     </p>
     """,
     unsafe_allow_html=True,
 )
 st.markdown(
     """

             color: black
             animation: fadeIn 2s ease-in-out;
         }
+        /* Style for headers */
+        h2 {
+        color: violet;
+        font-family: 'Roboto', sans-serif;
+        font-weight: 600;
+        margin-top: 30px;
+        }
+        /* Style for subheaders */
+         h3 {
+        color: green;
+        font-family: 'Roboto', sans-serif;
+        font-weight: 500;
+        margin-top: 20px;
+        }
+        .custom-subheader {
+        color: #00FFFF;
+        font-family: 'Roboto', sans-serif;
+        font-weight: 600;
+        margin-bottom: 15px;
+        }
+        .icon-bullet {
+        list-style-type: none;
+        padding-left: 20px;
+        }
+        .icon-bullet li {
+        font-family: 'Georgia', serif;
+        font-size: 1.1em;
+        margin-bottom: 10px;
+        color: black;
+        }
+        .icon-bullet li::before {
+        content: "◆";
+        padding-right: 10px;
+        color: black;
+        }
         .section {
             font-size: 1.1rem;
             text-align: justify;
         }
         .term {
             font-weight: bold;
+            color: red
             animation: fadeIn 3s ease-in-out;
         }
         .definition {
 st.markdown(
     """
     <p class="section"><span class="term">Documents</span><br>
+    Document is defined as collection of sentence / paragraph / single word / single character
     </p>
     """,
     unsafe_allow_html=True,
 st.markdown(
     """
+    <p class="section"><span class="term">Paragraph</span><br>
+    Paragraph is defined as collection of sentence.
     </p>
     """,
     unsafe_allow_html=True,
 st.markdown(
     """
+    <p class="section"><span class="term">Sentence</span><br>
+    Sentence is defined as collection of words.
     </p>
     """,
     unsafe_allow_html=True,
 st.markdown(
     """
+    <p class="section"><span class="term">Words</span><br>
+    Words are defined as collection of characters
     </p>
     """,
     unsafe_allow_html=True,
 st.markdown(
     """
+    <p class="section"><span class="term">Character</span><br>
+    Character can either be in number , alphabets or special symbol.
     </p>
     """,
     unsafe_allow_html=True,
 st.markdown(
     """
+    <p class="section"><span class="term">Tokenization</span><br>
+    It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens.
     </p>
     """,
     unsafe_allow_html=True,
 )
+st.header("Types of Tokenization")
+    st.markdown("""
+    <ul class="icon-bullet">
+    <li>Sentence tokenization</li>
+    <li>Word tokennization</li>
+    <li>Character tokenization </li>
+    </ul>
+    """, unsafe_allow_html=True)
+st.subheader("Sentence tokenization")
+st.markdown('''
+- It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are in sentence.
+''')
+st.subheader("Word tokenization")
+st.markdown('''
+- It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are words.
+''')
+st.subheader("Character tokenization")
+st.markdown('''
+- It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are in characters.
+''')
 st.markdown(
     """