Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

Harika22 commited on Feb 1, 2025

Commit

b8713da

verified ·

1 Parent(s): 419554f

Update pages/5_Pre-procesing_of_text.py

Files changed (1) hide show

pages/5_Pre-procesing_of_text.py CHANGED Viewed

@@ -59,16 +59,20 @@ st.header(":blue[Pre-processing of Text🗺️]")
 st.markdown(
     '''
     <div class='section'>
-        We will convert raw data into pre-processed data in 3 ways
-            - **Cleaning** ---> which is based on the problem statement
-            - **Simple pr-processing**
-            - **Advance pre-processing**
     </div>
     ''',
     unsafe_allow_html=True,
 )
 st.markdown('''
 - Take a raw text and convert every character and word into single case
     - either upper case
     - or lower case
@@ -91,7 +95,8 @@ st.markdown(
     '''
     <div class='section'>
         Converts raw data into pre-processed data
-            - which has 2 benefits
             - Reduce the dimensionality ---> to increase the performance of ML
@@ -101,8 +106,8 @@ st.markdown(
             <li><b>Removing URL's / tags/mails/mentions</b> Converting or preserving information should be based on the problem statement</li>
             <li><b>Handling Emoji's</b> Emoji's data should be preserved</li>
             <li><b>Contractions and acronyms</b>Both the contractions and acronyms should be converted into general text</li>
-            <li><b>Stop Words</b> Stop words make the grammar very clear
-            <li><b>Stemming and Lemmatization</b>Both are purely based on problm statement and if problem statement wants grammatical concept don't perform stemming</li></li>
         </ul>
     </div>
     ''',

 st.markdown(
     '''
     <div class='section'>
+        We will convert raw data into pre-processed data in 3 ways:
+            - Cleaning ---> which is based on the problem statement
+            - Simple pre-processing
+            - Advance pre-processing
     </div>
     ''',
     unsafe_allow_html=True,
 )
 st.markdown('''
 - Take a raw text and convert every character and word into single case
     - either upper case
     - or lower case
     '''
     <div class='section'>
         Converts raw data into pre-processed data
+            - which has 2 benefits:
             - Reduce the dimensionality ---> to increase the performance of ML
             <li><b>Removing URL's / tags/mails/mentions</b> Converting or preserving information should be based on the problem statement</li>
             <li><b>Handling Emoji's</b> Emoji's data should be preserved</li>
             <li><b>Contractions and acronyms</b>Both the contractions and acronyms should be converted into general text</li>
+            <li><b>Stop Words</b> Stop words make the grammar very clear</li>
+            <li><b>Stemming and Lemmatization</b>Both are purely based on problm statement and if problem statement wants grammatical concept don't perform stemming</li>
         </ul>
     </div>
     ''',