Harika22 commited on
Commit
0d400f0
Β·
verified Β·
1 Parent(s): 1b03fe7

Update pages/5_Pre-procesing_of_text.py

Browse files
Files changed (1) hide show
  1. pages/5_Pre-procesing_of_text.py +14 -31
pages/5_Pre-procesing_of_text.py CHANGED
@@ -54,43 +54,26 @@ st.markdown(
54
  """,
55
  unsafe_allow_html=True,
56
  )
 
57
 
58
- st.header(":blue[Pre-processing of TextπŸ—ΊοΈ]")
59
- st.markdown(
60
- '''
61
- <div class='section'>
62
- We will convert raw data into pre-processed data in 3 ways:
63
-
64
- Cleaning - which is based on the problem statement
65
-
66
- Simple pre-processing
67
-
68
- Advance pre-processing
69
- </div>
70
- ''',
71
- unsafe_allow_html=True,
72
- )
73
- st.markdown('''
74
- - Take a raw text and convert every character and word into single case
75
 
76
- - either upper case
77
-
78
- - or lower case
79
-
80
- - based on the problem statement
81
-
82
- - Because as the dimensionality increases Ml performnace decreases as ML needs tabular data where every column is dimension
83
- - Same as with urls and tags based on the problem statement
84
 
85
- - if the problem statemnt says preserve the data we shouldn't remove those urls and tags
86
-
87
- - Coming to mentions , digits and mails we can remove those data
88
- - Whereas emojis can't be removed because nowadays emojis plays a key role in information , so to preserve the information we willn't remove the emojis
89
- - When the problem statement says preserve the grammar then punctuations shouldn't be removed
90
- ''')
91
 
 
92
 
 
 
 
 
 
93
 
 
94
 
95
  st.markdown(
96
  """
 
54
  """,
55
  unsafe_allow_html=True,
56
  )
57
+ st.header(":blue[✨ Pre-processing of Text πŸ—ΊοΈ]")
58
 
59
+ st.markdown("<div class='section'>", unsafe_allow_html=True)
60
+ st.markdown("<h2 class='title'>πŸ” Transforming Raw Text</h2>", unsafe_allow_html=True)
61
+ st.markdown("<p class='subtitle'>Convert unstructured text into a clean and structured format</p>", unsafe_allow_html=True)
62
+
63
+ st.info("πŸ“Œ **We preprocess text in three key ways:**\n\nβœ… Cleaning - Problem-specific\n\nβœ… Simple Pre-processing\n\nβœ… Advanced Pre-processing")
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
+ st.markdown("</div>", unsafe_allow_html=True)
 
 
 
 
 
 
 
66
 
 
 
 
 
 
 
67
 
68
+ st.markdown("### ✨ **Essential Preprocessing Techniques:**")
69
 
70
+ st.markdown("βœ… **Convert Text Case** – Convert all words to **uppercase** or **lowercase** to maintain consistency and reduce dimensions.")
71
+ st.markdown("βœ… **Handle URLs and Tags** – Based on problem statement, either remove or preserve them.")
72
+ st.markdown("βœ… **Mentions, Digits, Emails** – Generally removed unless required by the analysis.")
73
+ st.markdown("βœ… **Preserve Emojis** – Emojis carry sentiment and play a crucial role in NLP tasks.")
74
+ st.markdown("βœ… **Grammar Preservation** – If grammar is needed, avoid removing punctuation.")
75
 
76
+ st.success("πŸš€ Well-structured and clean text significantly boosts ML model performance!")
77
 
78
  st.markdown(
79
  """