Harika22 commited on
Commit
9c5b037
Β·
verified Β·
1 Parent(s): dfceb96

Update pages/6_Feature_Engineering.py

Browse files
Files changed (1) hide show
  1. pages/6_Feature_Engineering.py +30 -2
pages/6_Feature_Engineering.py CHANGED
@@ -190,5 +190,33 @@ if file_type == "One-Hot Vectorization":
190
  - This method is useful for transforming text into a numerical format for Machine Learning tasks.
191
  """)
192
 
193
-
194
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  - This method is useful for transforming text into a numerical format for Machine Learning tasks.
191
  """)
192
 
193
+ st.subheader(":green[Advantages]")
194
+ st.markdown('''
195
+ - One-Hot Vectorization is easy to implement
196
+ ''')
197
+ st.subheader(":green[Disadvantages]")
198
+ st.markdown('''
199
+ - 1.Every document have different no.of words (here we're not converting document to vector , we're converting word to vector)
200
+ - We can't convert into tabular data
201
+ - It would be possible to convert into tabular data when we're converting document into vector(this is solved by Bag of Words(BOW))
202
+ - 2.**Sparsity** - The vector which is created using one-hhot vectorization gives sparse vector
203
+ - Entire data is given to any alogorithm and machine is going to learn fom data and algorithm it is biasd towards zero values as the data is sparse data
204
+ - This issue in ML is known as overfitting
205
+ - It is solved in Deep learning
206
+ - 3.**Curse of Dimensionality**
207
+ - Document increases ↑ Vocabulary ↑ and vector increases ↑ dimensionality also increases ↑
208
+ - Ml performance decreases ↓ - as the dimensionality totally depends on vocabulary and it shootup as the document increases and different
209
+ - 4.**Out of Vocabulary**
210
+ - Document only converted during training time and we're giving our own dataset
211
+ - If the word is not present in our dataset while training it can't convert into vector format results in key error
212
+ - This is solved by Fasttext
213
+ - 5.**Unable to preserve semantic meaning of the words
214
+ - While converting text β†’ vector format (same relationship should be preserved)
215
+ - We need to convert document into vector in such a way that semantic relationship should be preserved
216
+ - Similarity ⬆️ and Distance ⬇️
217
+ - Similarity ∝ 1 / Distance
218
+ - Distance between vectors should be very small
219
+ - If this is satisfied then the technique has good semantic meaning
220
+ - 6.**No Sequential information**
221
+ - Sequential information is not preserved
222
+ ''')