Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

Harika22 commited on Feb 1, 2025

Commit

1b03fe7

verified ·

1 Parent(s): d32aaba

Update pages/5_Pre-procesing_of_text.py

Files changed (1) hide show

pages/5_Pre-procesing_of_text.py CHANGED Viewed

@@ -90,29 +90,7 @@ st.markdown('''
 ''')
-st.subheader(":red[Data Pre-processing]")
-st.markdown(
-    '''
-    <div class='section'>
-        Converts raw data into pre-processed data
-         which has 2 benefits:
-         Reduce the dimensionality ---> to increase the performance of ML
-         Raw data - preprocessed data ---> required by the problem statement
-        <ul>
-            <li><b>Converting into particular case</b>So that highly we can reduce the dimensionalty,if the problem statement says that grammar should be preserved then no need of conversion</li>
-            <li><b>Removing URL's / tags/mails/mentions</b>Converting or preserving information should be based on the problem statement</li>
-            <li><b>Handling Emoji's</b>Emoji's data should be preserved</li>
-            <li><b>Contractions and acronyms</b>Both the contractions and acronyms should be converted into general text</li>
-            <li><b>Stop Words</b>Stop words make the grammar very clear</li>
-            <li><b>Stemming and Lemmatization</b>Both are purely based on problem statement and if problem statement wants grammatical concept don't perform stemming</li>
-        </ul>
-    </div>
-    ''',
-    unsafe_allow_html=True,
-)
 st.markdown(
     """
@@ -121,3 +99,20 @@ st.markdown(
     unsafe_allow_html=True,
 )

 ''')
 st.markdown(
     """
     unsafe_allow_html=True,
 )
+st.markdown("<div class='section'>", unsafe_allow_html=True)
+st.markdown("<h2 class='title'>🔍 NLP Data Preprocessing</h2>", unsafe_allow_html=True)
+st.markdown("<p class='subtitle'>Transforming raw text into structured data for better ML performance</p>", unsafe_allow_html=True)
+st.success("📌 **Benefits of Preprocessing:**\n\n✅ Reduces dimensionality\n\n✅ Improves ML performance\n\n✅ Converts raw text into problem-specific structured data")
+st.markdown("### ✨ **Essential Preprocessing Steps:**")
+st.markdown("✅ **Converting Text Case** – Reduces dimensionality; case conversion depends on problem statement.")
+st.markdown("✅ **Removing URLs, Tags, and Mentions** – Retain only if required by the problem statement.")
+st.markdown("✅ **Handling Emojis** – Preserve or convert emoji data based on context.")
+st.markdown("✅ **Expanding Contractions & Acronyms** – Convert abbreviations into standard text.")
+st.markdown("✅ **Stop Words Removal** – Optional, useful for text simplification.")
+st.markdown("✅ **Stemming & Lemmatization** – Perform only if grammar is **not** crucial for analysis.")
+st.markdown("</div>", unsafe_allow_html=True)