Harika22 commited on
Commit
0391467
Β·
verified Β·
1 Parent(s): d8b57db

Update pages/5_Pre-procesing_of_text.py

Browse files
Files changed (1) hide show
  1. pages/5_Pre-procesing_of_text.py +16 -16
pages/5_Pre-procesing_of_text.py CHANGED
@@ -160,10 +160,10 @@ st.markdown(
160
  """
161
  <div class='info-box'>
162
  <ul>
163
- <li>πŸ”Ή A **Rule-based Algorithm** for stemming.</li>
164
  <li>πŸ”Ή It takes a particular word which have some rule.</li>
165
  <li>πŸ”Ή For a particular rule it'll going on removing suffix till it reaches 5th stage until the inflection is removed.</li>
166
- <li>πŸ”Ή Works **only for the English language**.</li>
167
  </ul>
168
  </div>
169
  """,
@@ -175,8 +175,8 @@ st.markdown(
175
  """
176
  <div class='info-box'>
177
  <ul>
178
- <li>πŸ”Ή An **advanced version of the Porter Stemmer**.</li>
179
- <li>πŸ”Ή Can be applied to **multiple languages**.</li>
180
  </ul>
181
  </div>
182
  """,
@@ -189,9 +189,9 @@ st.markdown(
189
  """
190
  <div class='info-box'>
191
  <ul>
192
- <li>πŸ”Ή An **Iterative Algorithm** for stemming.</li>
193
- <li>πŸ”Ή Removes suffixes in **multiple iterations**.</li>
194
- <li>⚠️ **More aggressive removal**, which might result in **non-English words**.</li>
195
  </ul>
196
  </div>
197
  """,
@@ -203,13 +203,13 @@ st.markdown("<h1 class='header-title'>πŸ“– Lemmatization πŸ”Ž</h1>", unsafe_allo
203
  st.markdown(
204
  """
205
  <div class='info-box'>
206
- <p>πŸ“ <span class='highlight'>Lemmatization</span> is the process of reducing an **inflected word** to its root form, known as the <span class='highlight'>lemma</span>.</p>
207
  <ul>
208
  <li>πŸ”Ή <span class='highlight'>Inflected word ➝ Root word (Lemma)</span></li>
209
- <li>βœ… The **lemma is always an actual English word**.</li>
210
  <li>🐒 <span class='highlight'>Performance is slower</span> than stemming.</li>
211
- <li>πŸ” **Both removal & dictionary-based checking** are performed.</li>
212
- <li>πŸ“ **Used when we need to preserve grammar** in text.</li>
213
  </ul>
214
  </div>
215
  """,
@@ -222,12 +222,12 @@ st.markdown(
222
  """
223
  <div class='info-box'>
224
  <ul>
225
- <li>πŸ”Ή Takes an **inflected word** as input.</li>
226
- <li>πŸ—„οΈ Searches in a **huge dictionary (WordNet)** containing millions of English words.</li>
227
- <li>πŸ”„ **Iteratively removes suffixes** & checks:</li>
228
  <ul>
229
- <li>βœ”οΈ If it's an **actual English word**, it continues removing more suffixes.</li>
230
- <li>❌ If it's **not an English word**, the last valid root word is returned as the lemma.</li>
231
  </ul>
232
  </ul>
233
  </div>
 
160
  """
161
  <div class='info-box'>
162
  <ul>
163
+ <li>πŸ”Ή A Rule-based Algorithm for stemming.</li>
164
  <li>πŸ”Ή It takes a particular word which have some rule.</li>
165
  <li>πŸ”Ή For a particular rule it'll going on removing suffix till it reaches 5th stage until the inflection is removed.</li>
166
+ <li>πŸ”Ή Works only for the English language.</li>
167
  </ul>
168
  </div>
169
  """,
 
175
  """
176
  <div class='info-box'>
177
  <ul>
178
+ <li>πŸ”Ή An advanced version of the Porter Stemmer.</li>
179
+ <li>πŸ”Ή Can be applied to multiple languages.</li>
180
  </ul>
181
  </div>
182
  """,
 
189
  """
190
  <div class='info-box'>
191
  <ul>
192
+ <li>πŸ”Ή An Iterative Algorithm for stemming.</li>
193
+ <li>πŸ”Ή Removes suffixes in multiple iterations.</li>
194
+ <li>⚠️ More aggressive removal, which might result in non-English words.</li>
195
  </ul>
196
  </div>
197
  """,
 
203
  st.markdown(
204
  """
205
  <div class='info-box'>
206
+ <p>πŸ“ <span class='highlight'>Lemmatization</span> is the process of reducing an inflected word to its root form, known as the <span class='highlight'>lemma</span>.</p>
207
  <ul>
208
  <li>πŸ”Ή <span class='highlight'>Inflected word ➝ Root word (Lemma)</span></li>
209
+ <li>βœ… The lemma is always an actual English word.</li>
210
  <li>🐒 <span class='highlight'>Performance is slower</span> than stemming.</li>
211
+ <li>πŸ” Both removal & dictionary-based checking are performed.</li>
212
+ <li>πŸ“ Used when we need to preserve grammar in text.</li>
213
  </ul>
214
  </div>
215
  """,
 
222
  """
223
  <div class='info-box'>
224
  <ul>
225
+ <li>πŸ”Ή Takes an inflected word as input.</li>
226
+ <li>πŸ—„οΈ Searches in a huge dictionary (WordNet) containing millions of English words.</li>
227
+ <li>πŸ”„ Iteratively removes suffixes & checks:</li>
228
  <ul>
229
+ <li>βœ”οΈ If it's an actual English word, it continues removing more suffixes.</li>
230
+ <li>❌ If it's not an English word, the last valid root word is returned as the lemma.</li>
231
  </ul>
232
  </ul>
233
  </div>