Harika22 commited on
Commit
419554f
·
verified ·
1 Parent(s): 7b6ca16

Update pages/5_Pre-procesing_of_text.py

Browse files
Files changed (1) hide show
  1. pages/5_Pre-procesing_of_text.py +8 -1
pages/5_Pre-procesing_of_text.py CHANGED
@@ -70,11 +70,16 @@ st.markdown(
70
  st.markdown('''
71
  - Take a raw text and convert every character and word into single case
72
  - either upper case
 
73
  - or lower case
 
74
  - based on the problem statement
 
75
  - Because as the dimensionality increases Ml performnace decreases as ML needs tabular data where every column is dimension
76
  - Same as with urls and tags based on the problem statement
 
77
  - if the problem statemnt says preserve the data we shouldn't remove those urls and tags
 
78
  - Coming to mentions , digits and mails we can remove those data
79
  - Whereas emojis can't be removed because nowadays emojis plays a key role in information , so to preserve the information we willn't remove the emojis
80
  - When the problem statement says preserve the grammar then punctuations shouldn't be removed
@@ -86,8 +91,10 @@ st.markdown(
86
  '''
87
  <div class='section'>
88
  Converts raw data into pre-processed data
89
- - which has 2 benefits:;
 
90
  - Reduce the dimensionality ---> to increase the performance of ML
 
91
  - Raw data - preprocessed data ---> required by the problem statement
92
  <ul>
93
  <li><b>Converting into particular case</b> So that highly we can reduce the dimensionalty.If the problem statement says that grammar should be preserved then no need of conversion</li>
 
70
  st.markdown('''
71
  - Take a raw text and convert every character and word into single case
72
  - either upper case
73
+
74
  - or lower case
75
+
76
  - based on the problem statement
77
+
78
  - Because as the dimensionality increases Ml performnace decreases as ML needs tabular data where every column is dimension
79
  - Same as with urls and tags based on the problem statement
80
+
81
  - if the problem statemnt says preserve the data we shouldn't remove those urls and tags
82
+
83
  - Coming to mentions , digits and mails we can remove those data
84
  - Whereas emojis can't be removed because nowadays emojis plays a key role in information , so to preserve the information we willn't remove the emojis
85
  - When the problem statement says preserve the grammar then punctuations shouldn't be removed
 
91
  '''
92
  <div class='section'>
93
  Converts raw data into pre-processed data
94
+ - which has 2 benefits
95
+
96
  - Reduce the dimensionality ---> to increase the performance of ML
97
+
98
  - Raw data - preprocessed data ---> required by the problem statement
99
  <ul>
100
  <li><b>Converting into particular case</b> So that highly we can reduce the dimensionalty.If the problem statement says that grammar should be preserved then no need of conversion</li>