Update Basic_Terminologies.py
Browse files- Basic_Terminologies.py +9 -9
Basic_Terminologies.py
CHANGED
|
@@ -47,8 +47,8 @@ st.markdown(
|
|
| 47 |
st.markdown(
|
| 48 |
"""
|
| 49 |
<h5>Before diving deep into the concepts of NLP we must know about the frequently used terminologies in NLP</h5>
|
| 50 |
-
<h5 style="color: ##
|
| 51 |
-
<ul style="color: #
|
| 52 |
<li><b>Corpus:</b> A collection of text documents. Example: {d1, d2, d3, ...}</li>
|
| 53 |
<li><b>Document:</b> A single unit of text (e.g., a sentence, paragraph, or article).</li>
|
| 54 |
<li><b>Paragraph:</b> A collection of sentences.</li>
|
|
@@ -61,8 +61,8 @@ st.markdown(
|
|
| 61 |
)
|
| 62 |
st.markdown(
|
| 63 |
"""
|
| 64 |
-
<h5 style="color: #
|
| 65 |
-
<p style="color: #
|
| 66 |
<h6>Types of Tokenization:</h6>
|
| 67 |
<ul style="color: #d4e6f1; line-height: 1.8;">
|
| 68 |
<li><b>Sentence Tokenization:</b> Splitting text into sentences. <br> Example: "I love ice-cream. I love chocolate." → ["I love ice-cream", "I love chocolate"]</li>
|
|
@@ -74,8 +74,8 @@ st.markdown(
|
|
| 74 |
)
|
| 75 |
st.markdown(
|
| 76 |
"""
|
| 77 |
-
<h5 style="color: #
|
| 78 |
-
<p style="color: #
|
| 79 |
<h6>Example:</h6>
|
| 80 |
<p style="color: #d4e6f1;">"In Hyderabad, we can eat famous biryani." <br> Stop words: ["in", "we", "can"]</p>
|
| 81 |
""",
|
|
@@ -84,7 +84,7 @@ st.markdown(
|
|
| 84 |
st.markdown(
|
| 85 |
"""
|
| 86 |
<h5 style="color: #20B2AA;">4.Vectorization</h5>
|
| 87 |
-
<p style="color: #d4e6f1;">Vectorization
|
| 88 |
<h6>Types of Vectorization:</h6>
|
| 89 |
<ul style="color: #d4e6f1; line-height: 1.8;">
|
| 90 |
<li><b>One-Hot Encoding:</b> Represents each word as a binary vector.</li>
|
|
@@ -100,7 +100,7 @@ st.markdown(
|
|
| 100 |
st.markdown(
|
| 101 |
"""
|
| 102 |
<h5 style="color: #20B2AA;">5. Stemming</h5>
|
| 103 |
-
<p style="color: #d4e6f1;">Stemming
|
| 104 |
<h6>Example:</h6>
|
| 105 |
<ul style="color: #d4e6f1; line-height: 1.8;">
|
| 106 |
<li><b>Original Words:</b> "running", "runner", "runs"</li>
|
|
@@ -112,7 +112,7 @@ st.markdown(
|
|
| 112 |
st.markdown(
|
| 113 |
"""
|
| 114 |
<h5 style="color: #20B2AA;">6. Lemmatization</h5>
|
| 115 |
-
<p style="color: #d4e6f1;">Lemmatization
|
| 116 |
<h6>Example:</h6>
|
| 117 |
<ul style="color: #d4e6f1; line-height: 1.8;">
|
| 118 |
<li><b>Original Words:</b> "studying", "better", "carrying"</li>
|
|
|
|
| 47 |
st.markdown(
|
| 48 |
"""
|
| 49 |
<h5>Before diving deep into the concepts of NLP we must know about the frequently used terminologies in NLP</h5>
|
| 50 |
+
<h5 style="color: ##00FF00;">1.Key Terminologies in NLP</h5>
|
| 51 |
+
<ul style="color: #008000; line-height: 1.8;">
|
| 52 |
<li><b>Corpus:</b> A collection of text documents. Example: {d1, d2, d3, ...}</li>
|
| 53 |
<li><b>Document:</b> A single unit of text (e.g., a sentence, paragraph, or article).</li>
|
| 54 |
<li><b>Paragraph:</b> A collection of sentences.</li>
|
|
|
|
| 61 |
)
|
| 62 |
st.markdown(
|
| 63 |
"""
|
| 64 |
+
<h5 style="color: #00FFFF;">2.Tokenization</h5>
|
| 65 |
+
<p style="color: #FFA500;">Tokenization is the process of breaking down a large piece of text into smaller units called tokens. These tokens can be words, sentences, or subwords, depending on the granularity required for the task.</p>
|
| 66 |
<h6>Types of Tokenization:</h6>
|
| 67 |
<ul style="color: #d4e6f1; line-height: 1.8;">
|
| 68 |
<li><b>Sentence Tokenization:</b> Splitting text into sentences. <br> Example: "I love ice-cream. I love chocolate." → ["I love ice-cream", "I love chocolate"]</li>
|
|
|
|
| 74 |
)
|
| 75 |
st.markdown(
|
| 76 |
"""
|
| 77 |
+
<h5 style="color: #008080;">3.Stop Words</h5>
|
| 78 |
+
<p style="color: #000080;">Stop words are commonly used words in a language that carry little or no meaningful information for text analysis. </p>
|
| 79 |
<h6>Example:</h6>
|
| 80 |
<p style="color: #d4e6f1;">"In Hyderabad, we can eat famous biryani." <br> Stop words: ["in", "we", "can"]</p>
|
| 81 |
""",
|
|
|
|
| 84 |
st.markdown(
|
| 85 |
"""
|
| 86 |
<h5 style="color: #20B2AA;">4.Vectorization</h5>
|
| 87 |
+
<p style="color: #d4e6f1;">Vectorization is the process of converting text data into numerical representations so that machine learning models can process and analyze it.</p>
|
| 88 |
<h6>Types of Vectorization:</h6>
|
| 89 |
<ul style="color: #d4e6f1; line-height: 1.8;">
|
| 90 |
<li><b>One-Hot Encoding:</b> Represents each word as a binary vector.</li>
|
|
|
|
| 100 |
st.markdown(
|
| 101 |
"""
|
| 102 |
<h5 style="color: #20B2AA;">5. Stemming</h5>
|
| 103 |
+
<p style="color: #d4e6f1;">Stemming is the process of reducing words to their base or root form, often by removing prefixes or suffixes. It is a rule-based, heuristic approach to standardize words by removing derivational affixes.</p>
|
| 104 |
<h6>Example:</h6>
|
| 105 |
<ul style="color: #d4e6f1; line-height: 1.8;">
|
| 106 |
<li><b>Original Words:</b> "running", "runner", "runs"</li>
|
|
|
|
| 112 |
st.markdown(
|
| 113 |
"""
|
| 114 |
<h5 style="color: #20B2AA;">6. Lemmatization</h5>
|
| 115 |
+
<p style="color: #d4e6f1;">Lemmatization is the process of reducing a word to its base or root form (called a lemma) using linguistic rules and a vocabulary (dictionary). Unlike stemming, lemmatization ensures that the resulting word is a valid word in the language.</p>
|
| 116 |
<h6>Example:</h6>
|
| 117 |
<ul style="color: #d4e6f1; line-height: 1.8;">
|
| 118 |
<li><b>Original Words:</b> "studying", "better", "carrying"</li>
|