Harika22 commited on
Commit
1b03fe7
Β·
verified Β·
1 Parent(s): d32aaba

Update pages/5_Pre-procesing_of_text.py

Browse files
Files changed (1) hide show
  1. pages/5_Pre-procesing_of_text.py +18 -23
pages/5_Pre-procesing_of_text.py CHANGED
@@ -90,29 +90,7 @@ st.markdown('''
90
  ''')
91
 
92
 
93
- st.subheader(":red[Data Pre-processing]")
94
- st.markdown(
95
- '''
96
- <div class='section'>
97
- Converts raw data into pre-processed data
98
-
99
- which has 2 benefits:
100
-
101
- Reduce the dimensionality ---> to increase the performance of ML
102
-
103
- Raw data - preprocessed data ---> required by the problem statement
104
- <ul>
105
- <li><b>Converting into particular case</b>So that highly we can reduce the dimensionalty,if the problem statement says that grammar should be preserved then no need of conversion</li>
106
- <li><b>Removing URL's / tags/mails/mentions</b>Converting or preserving information should be based on the problem statement</li>
107
- <li><b>Handling Emoji's</b>Emoji's data should be preserved</li>
108
- <li><b>Contractions and acronyms</b>Both the contractions and acronyms should be converted into general text</li>
109
- <li><b>Stop Words</b>Stop words make the grammar very clear</li>
110
- <li><b>Stemming and Lemmatization</b>Both are purely based on problem statement and if problem statement wants grammatical concept don't perform stemming</li>
111
- </ul>
112
- </div>
113
- ''',
114
- unsafe_allow_html=True,
115
- )
116
 
117
  st.markdown(
118
  """
@@ -121,3 +99,20 @@ st.markdown(
121
  unsafe_allow_html=True,
122
  )
123
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ''')
91
 
92
 
93
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
  st.markdown(
96
  """
 
99
  unsafe_allow_html=True,
100
  )
101
 
102
+ st.markdown("<div class='section'>", unsafe_allow_html=True)
103
+ st.markdown("<h2 class='title'>πŸ” NLP Data Preprocessing</h2>", unsafe_allow_html=True)
104
+ st.markdown("<p class='subtitle'>Transforming raw text into structured data for better ML performance</p>", unsafe_allow_html=True)
105
+
106
+
107
+ st.success("πŸ“Œ **Benefits of Preprocessing:**\n\nβœ… Reduces dimensionality\n\nβœ… Improves ML performance\n\nβœ… Converts raw text into problem-specific structured data")
108
+
109
+ st.markdown("### ✨ **Essential Preprocessing Steps:**")
110
+
111
+ st.markdown("βœ… **Converting Text Case** – Reduces dimensionality; case conversion depends on problem statement.")
112
+ st.markdown("βœ… **Removing URLs, Tags, and Mentions** – Retain only if required by the problem statement.")
113
+ st.markdown("βœ… **Handling Emojis** – Preserve or convert emoji data based on context.")
114
+ st.markdown("βœ… **Expanding Contractions & Acronyms** – Convert abbreviations into standard text.")
115
+ st.markdown("βœ… **Stop Words Removal** – Optional, useful for text simplification.")
116
+ st.markdown("βœ… **Stemming & Lemmatization** – Perform only if grammar is **not** crucial for analysis.")
117
+
118
+ st.markdown("</div>", unsafe_allow_html=True)