Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

App Files Files Community

Harika22 commited on Jan 27, 2025

Commit

bcbd8c9

verified ·

1 Parent(s): 628ffb3

Update pages/3_Terminology.py

Browse files

Files changed (1) hide show

pages/3_Terminology.py +45 -84

pages/3_Terminology.py CHANGED Viewed

@@ -100,61 +100,37 @@ st.markdown(
     "<p class='caption'>Explore essential terms in Natural Language Processing and their meanings!...</p>",
     unsafe_allow_html=True,
 )
-st.markdown(
-    """
-    <p class="section"><span class="term">Documents</span><br>
-    Document is defined as collection of sentence / paragraph / single word / single character
-    </p>
-    """,
-    unsafe_allow_html=True,
-)
-st.markdown(
-    """
-    <p class="section"><span class="term">Paragraph</span><br>
-    Paragraph is defined as collection of sentence.
-    </p>
-    """,
-    unsafe_allow_html=True,
-)
-st.markdown(
-    """
-    <p class="section"><span class="term">Sentence</span><br>
-    Sentence is defined as collection of words.
-    </p>
-    """,
-    unsafe_allow_html=True,
-)
-st.markdown(
-    """
-    <p class="section"><span class="term">Words</span><br>
-    Words are defined as collection of characters
-    </p>
-    """,
-    unsafe_allow_html=True,
-)
-st.markdown(
-    """
-    <p class="section"><span class="term">Character</span><br>
-    Character can either be in number , alphabets or special symbol.
-    </p>
-    """,
-    unsafe_allow_html=True,
-)
-st.markdown(
-    """
-    <p class="section"><span class="term">Tokenization</span><br>
-    It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens.
-    </p>
-    """,
-    unsafe_allow_html=True,
-)
-st.header("Types of Tokenization")
 st.markdown("""
     <ul class="icon-bullet">
     <li>Sentence tokenization</li>
@@ -178,40 +154,25 @@ st.markdown('''
 - It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are in characters.
 ''')
-st.markdown(
-    """
-    <p class="section"><span class="term">Bag-of-Words (BoW)</span><br>
-    Bag-of-Words is a simple representation of text data where each word is treated as a feature. The order of words is ignored, and the text is represented by a frequency count of words in the document.
-    </p>
-    """,
-    unsafe_allow_html=True,
-)
-st.markdown(
-    """
-    <p class="section"><span class="term">TF-IDF (Term Frequency - Inverse Document Frequency)</span><br>
-    TF-IDF is a statistic used to evaluate the importance of a word in a document relative to all other documents. It balances the frequency of a word in a document with its rarity across the entire dataset.
-    </p>
-    """,
-    unsafe_allow_html=True,
-)
-st.markdown(
-    """
-    <p class="section"><span class="term">Sentiment Analysis</span><br>
-    Sentiment Analysis is the task of determining the sentiment or opinion expressed in text. It is often used to analyze social media posts, customer feedback, and reviews to gauge public opinion.
-    </p>
-    """,
-    unsafe_allow_html=True,
-)
-st.markdown(
-    """
-    <p class="section"><span class="term">Language Model</span><br>
-    A language model predicts the probability of a sequence of words occurring in a sentence. Popular models include GPT, BERT, and LSTM, which help in text generation, translation, and summarization tasks.
-    </p>
-    """,
-    unsafe_allow_html=True,
-)

     "<p class='caption'>Explore essential terms in Natural Language Processing and their meanings!...</p>",
     unsafe_allow_html=True,
 )
+st.header("Document")
+st.markdown('''
+- Document is defined as collection of sentence / paragraph / single word / single character
+''')
+st.header("Paragraph")
+st.markdown('''
+- Paragraph is defined as collection of sentence.
+''')
+st.header("Sentence")
+st.markdown('''
+-  Sentence is defined as collection of words.
+''')
+st.header("Word")
+st.markdown('''
+-  Words are defined as collection of characters
+''')
+st.header("Character")
+st.markdown('''
+- Character can either be in number , alphabets or special symbol.
+''')
+st.header("Tokenization")
+st.markdown('''
+- It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens.
+''')
+st.subheader("Types of Tokenization")
 st.markdown("""
     <ul class="icon-bullet">
     <li>Sentence tokenization</li>
 - It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are in characters.
 ''')
+st.header("Stop Words")
+st.markdown('''
+- They are set of words which didn't have impact on the meaning of sentence / paragraph
+- Stop words are used to make the grammar very clear
+''')
+st.header("Vectorization")
+st.markdown('''
+- It is a technique which helps us to convert a text into vector format
+''')
+st.subheader("Different types of techniques")
+st.markdown("""
+    <ul class="icon-bullet">
+    <li>One-Hot Vectorization </li>
+    <li>Bag of Words</li>
+    <li>TF-IDF (Term Frequency and Inverse Document Frequency)</li>
+    <li>Word2Vector</li>
+    <li>Glove</li>
+    <li>Fast text</li>
+    </ul>
+    """, unsafe_allow_html=True)