Pasham123 commited on
Commit
8fe91ee
·
verified ·
1 Parent(s): bac3726

Update Basic_Terminologies.py

Browse files
Files changed (1) hide show
  1. Basic_Terminologies.py +9 -9
Basic_Terminologies.py CHANGED
@@ -47,8 +47,8 @@ st.markdown(
47
  st.markdown(
48
  """
49
  <h5>Before diving deep into the concepts of NLP we must know about the frequently used terminologies in NLP</h5>
50
- <h5 style="color: ##20B2AA;">1.Key Terminologies in NLP</h5>
51
- <ul style="color: #d4e6f1; line-height: 1.8;">
52
  <li><b>Corpus:</b> A collection of text documents. Example: {d1, d2, d3, ...}</li>
53
  <li><b>Document:</b> A single unit of text (e.g., a sentence, paragraph, or article).</li>
54
  <li><b>Paragraph:</b> A collection of sentences.</li>
@@ -61,8 +61,8 @@ st.markdown(
61
  )
62
  st.markdown(
63
  """
64
- <h5 style="color: #20B2AA;">2.Tokenization</h5>
65
- <p style="color: #d4e6f1;">Tokenization is the process of splitting text into smaller units, called tokens.</p>
66
  <h6>Types of Tokenization:</h6>
67
  <ul style="color: #d4e6f1; line-height: 1.8;">
68
  <li><b>Sentence Tokenization:</b> Splitting text into sentences. <br> Example: "I love ice-cream. I love chocolate." → ["I love ice-cream", "I love chocolate"]</li>
@@ -74,8 +74,8 @@ st.markdown(
74
  )
75
  st.markdown(
76
  """
77
- <h5 style="color: #20B2AA;">3.Stop Words</h5>
78
- <p style="color: #d4e6f1;">Stop words are commonly used words in a language that are ignored during text processing as they contribute little to the overall meaning.</p>
79
  <h6>Example:</h6>
80
  <p style="color: #d4e6f1;">"In Hyderabad, we can eat famous biryani." <br> Stop words: ["in", "we", "can"]</p>
81
  """,
@@ -84,7 +84,7 @@ st.markdown(
84
  st.markdown(
85
  """
86
  <h5 style="color: #20B2AA;">4.Vectorization</h5>
87
- <p style="color: #d4e6f1;">Vectorization converts text data into numerical format for machine learning models. It enables text processing and analysis.</p>
88
  <h6>Types of Vectorization:</h6>
89
  <ul style="color: #d4e6f1; line-height: 1.8;">
90
  <li><b>One-Hot Encoding:</b> Represents each word as a binary vector.</li>
@@ -100,7 +100,7 @@ st.markdown(
100
  st.markdown(
101
  """
102
  <h5 style="color: #20B2AA;">5. Stemming</h5>
103
- <p style="color: #d4e6f1;">Stemming reduces words to their base or root form by chopping off prefixes or suffixes. It is a rule-based heuristic process and can produce words that may not be valid in the language.</p>
104
  <h6>Example:</h6>
105
  <ul style="color: #d4e6f1; line-height: 1.8;">
106
  <li><b>Original Words:</b> "running", "runner", "runs"</li>
@@ -112,7 +112,7 @@ st.markdown(
112
  st.markdown(
113
  """
114
  <h5 style="color: #20B2AA;">6. Lemmatization</h5>
115
- <p style="color: #d4e6f1;">Lemmatization reduces words to their dictionary or base form, called a lemma, while considering the context of the word in a sentence.</p>
116
  <h6>Example:</h6>
117
  <ul style="color: #d4e6f1; line-height: 1.8;">
118
  <li><b>Original Words:</b> "studying", "better", "carrying"</li>
 
47
  st.markdown(
48
  """
49
  <h5>Before diving deep into the concepts of NLP we must know about the frequently used terminologies in NLP</h5>
50
+ <h5 style="color: ##00FF00;">1.Key Terminologies in NLP</h5>
51
+ <ul style="color: #008000; line-height: 1.8;">
52
  <li><b>Corpus:</b> A collection of text documents. Example: {d1, d2, d3, ...}</li>
53
  <li><b>Document:</b> A single unit of text (e.g., a sentence, paragraph, or article).</li>
54
  <li><b>Paragraph:</b> A collection of sentences.</li>
 
61
  )
62
  st.markdown(
63
  """
64
+ <h5 style="color: #00FFFF;">2.Tokenization</h5>
65
+ <p style="color: #FFA500;">Tokenization is the process of breaking down a large piece of text into smaller units called tokens. These tokens can be words, sentences, or subwords, depending on the granularity required for the task.</p>
66
  <h6>Types of Tokenization:</h6>
67
  <ul style="color: #d4e6f1; line-height: 1.8;">
68
  <li><b>Sentence Tokenization:</b> Splitting text into sentences. <br> Example: "I love ice-cream. I love chocolate." → ["I love ice-cream", "I love chocolate"]</li>
 
74
  )
75
  st.markdown(
76
  """
77
+ <h5 style="color: #008080;">3.Stop Words</h5>
78
+ <p style="color: #000080;">Stop words are commonly used words in a language that carry little or no meaningful information for text analysis. </p>
79
  <h6>Example:</h6>
80
  <p style="color: #d4e6f1;">"In Hyderabad, we can eat famous biryani." <br> Stop words: ["in", "we", "can"]</p>
81
  """,
 
84
  st.markdown(
85
  """
86
  <h5 style="color: #20B2AA;">4.Vectorization</h5>
87
+ <p style="color: #d4e6f1;">Vectorization is the process of converting text data into numerical representations so that machine learning models can process and analyze it.</p>
88
  <h6>Types of Vectorization:</h6>
89
  <ul style="color: #d4e6f1; line-height: 1.8;">
90
  <li><b>One-Hot Encoding:</b> Represents each word as a binary vector.</li>
 
100
  st.markdown(
101
  """
102
  <h5 style="color: #20B2AA;">5. Stemming</h5>
103
+ <p style="color: #d4e6f1;">Stemming is the process of reducing words to their base or root form, often by removing prefixes or suffixes. It is a rule-based, heuristic approach to standardize words by removing derivational affixes.</p>
104
  <h6>Example:</h6>
105
  <ul style="color: #d4e6f1; line-height: 1.8;">
106
  <li><b>Original Words:</b> "running", "runner", "runs"</li>
 
112
  st.markdown(
113
  """
114
  <h5 style="color: #20B2AA;">6. Lemmatization</h5>
115
+ <p style="color: #d4e6f1;">Lemmatization is the process of reducing a word to its base or root form (called a lemma) using linguistic rules and a vocabulary (dictionary). Unlike stemming, lemmatization ensures that the resulting word is a valid word in the language.</p>
116
  <h6>Example:</h6>
117
  <ul style="color: #d4e6f1; line-height: 1.8;">
118
  <li><b>Original Words:</b> "studying", "better", "carrying"</li>