Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

Harika22 commited on Feb 1, 2025

Commit

0d400f0

verified ·

1 Parent(s): 1b03fe7

Update pages/5_Pre-procesing_of_text.py

Files changed (1) hide show

pages/5_Pre-procesing_of_text.py CHANGED Viewed

@@ -54,43 +54,26 @@ st.markdown(
     """,
     unsafe_allow_html=True,
 )
-st.header(":blue[Pre-processing of Text🗺️]")
-st.markdown(
-    '''
-    <div class='section'>
-        We will convert raw data into pre-processed data in 3 ways:
-        Cleaning - which is based on the problem statement
-        Simple pre-processing
-        Advance pre-processing
-    </div>
-    ''',
-    unsafe_allow_html=True,
-)
-st.markdown('''
-- Take a raw text and convert every character and word into single case
-    - either upper case
-    - or lower case
-    - based on the problem statement
-    - Because as the dimensionality increases Ml performnace decreases as ML needs tabular data where every column is dimension
-- Same as with urls and tags based on the problem statement
-    - if the problem statemnt says preserve the data we shouldn't remove those urls and tags
-- Coming to mentions , digits and mails we can remove those data
-- Whereas emojis can't be removed because nowadays emojis plays a key role in information , so to preserve the information we willn't remove the emojis
-- When the problem statement says preserve the grammar then punctuations shouldn't be removed
-''')
 st.markdown(
     """

     """,
     unsafe_allow_html=True,
 )
+st.header(":blue[✨ Pre-processing of Text 🗺️]")
+st.markdown("<div class='section'>", unsafe_allow_html=True)
+st.markdown("<h2 class='title'>🔍 Transforming Raw Text</h2>", unsafe_allow_html=True)
+st.markdown("<p class='subtitle'>Convert unstructured text into a clean and structured format</p>", unsafe_allow_html=True)
+st.info("📌 **We preprocess text in three key ways:**\n\n✅ Cleaning - Problem-specific\n\n✅ Simple Pre-processing\n\n✅ Advanced Pre-processing")
+st.markdown("</div>", unsafe_allow_html=True)
+st.markdown("### ✨ **Essential Preprocessing Techniques:**")
+st.markdown("✅ **Convert Text Case** – Convert all words to **uppercase** or **lowercase** to maintain consistency and reduce dimensions.")
+st.markdown("✅ **Handle URLs and Tags** – Based on problem statement, either remove or preserve them.")
+st.markdown("✅ **Mentions, Digits, Emails** – Generally removed unless required by the analysis.")
+st.markdown("✅ **Preserve Emojis** – Emojis carry sentiment and play a crucial role in NLP tasks.")
+st.markdown("✅ **Grammar Preservation** – If grammar is needed, avoid removing punctuation.")
+st.success("🚀 Well-structured and clean text significantly boosts ML model performance!")
 st.markdown(
     """