Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

App Files Files Community

Harika22 commited on Feb 1, 2025

Commit

896d8db

verified ·

1 Parent(s): 1d0bfe1

Update pages/4_Simple_EDA.py

Browse files

Files changed (1) hide show

pages/4_Simple_EDA.py +23 -20

pages/4_Simple_EDA.py CHANGED Viewed

@@ -77,27 +77,30 @@ st.markdown("""
     </style>
     """, unsafe_allow_html=True)
-st.header(":red[Simple EDA💬]")
-st.markdown('''
-    - Simple EDA is a part of life cycle in NLP where after collecting the raw data we need to perform simple eda which tells the quallty of the data
-    - Simpl EDA is not performed based on the probelm statement
-    - It checks the exploration of the data
-''')
-st.subheader(":violet[Major Simple EDA📃]")
-st.markdown('''
-- Whether all the alphabets are in
-    - lower case
-    - upper case
-    - combination of lower and upper case
-- Whether the collected text data contains any html / url tags
-- Whether the collected text data contains any urls
-- Whether the collected text data contains any mentions / hashtags
-- Whether the collected text data contains any digits
-- Whether the collected text data contains any punctuations
-- Whether the collected text data contains any emojis
-- Whether the collected text data contains any data /time
-''')
 st.code('''
             import pandas as pd

     </style>
     """, unsafe_allow_html=True)
+st.header(":red[📊 Simple EDA 💬]")
+# Introduction to Simple EDA
+st.markdown("<div class='section'>", unsafe_allow_html=True)
+st.markdown("<h2 class='title'>🔍 Understanding Simple EDA</h2>", unsafe_allow_html=True)
+st.markdown("<p class='subtitle'>Evaluating raw text data quality before processing</p>", unsafe_allow_html=True)
+st.info("📌 **Simple EDA is a crucial step in the NLP lifecycle:**\n\n✅ Ensures raw data quality\n\n✅ Not dependent on problem statement\n\n✅ Helps in better data exploration")
+st.markdown("</div>", unsafe_allow_html=True)
+st.subheader(":violet[📃 Major Simple EDA Steps]")
+st.markdown("✅ **Check Text Case** – Identify if text is in **lowercase, uppercase, or mixed case**.")
+st.markdown("✅ **Detect HTML & URL Tags** – Analyze if text contains unwanted elements.")
+st.markdown("✅ **Identify URLs** – Ensure URLs are either preserved or removed based on problem statement.")
+st.markdown("✅ **Detect Mentions & Hashtags** – Find occurrences of `@mentions` or `#hashtags`.")
+st.markdown("✅ **Identify Numeric Data** – Detect if text includes **digits or numerical data**.")
+st.markdown("✅ **Analyze Punctuation Usage** – Check whether punctuation marks affect text clarity.")
+st.markdown("✅ **Detect Emojis** – Ensure **emoji-based sentiments** are not lost.")
+st.markdown("✅ **Analyze Date/Time Formats** – Identify the presence of date/time-related text.")
+st.success("🚀 Performing **Simple EDA** ensures structured and high-quality text data, leading to better NLP model performance!")
 st.code('''
             import pandas as pd