Harika22 commited on
Commit
e4f9e2d
Β·
verified Β·
1 Parent(s): 39864a0

Update pages/6_Feature_Engineering.py

Browse files
Files changed (1) hide show
  1. pages/6_Feature_Engineering.py +25 -17
pages/6_Feature_Engineering.py CHANGED
@@ -153,23 +153,31 @@ if file_type == "One-Hot Vectorization":
153
  - This technique is called One-Hot Vectorization
154
  ''')
155
 
156
- st.markdown('''Example for One-Hot Vectorization is :
157
- - There is a corpus contains 3 documents d1, d2, d3
158
- - d1 ➑️ Toy is good
159
- - d2 ➑️ Toy is not good
160
- - d3 ➑️ Bad toy
161
- - It converts d1 into v1 where (v1 is numerical representation of d1)
162
- - It converts d2 into v2 where (v2 is numerical representation of d2)
163
- - It converts d3 into v3 where (v3 is numerical representation of d3)
164
- - Creates a vocabulary ➑️ {toy, is, good, not, bad }
165
- - len(vocavulary) = 5 in 5 dimension
166
- - Each word is represented as 5-dim where every dimension belongs to unique word
167
- - toy ➑️ [1,0,0,0,0] , is ➑️ [0,1,0,0,0] , good ➑️ [0,0,1,0,0] , not ➑️ [0,0,0,1,0] , bad ➑️ [0,0,0,0,1]
168
- - d1 β†’ v1 β†’ [[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]
169
- - d2 β†’ v2 β†’ [[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]
170
- - d3 β†’ v3 β†’ [[0,0,0,0,1], [1,0,0,0,0]]
171
- - Here we're converting each and every word into vector form and combining it to form vector this technique is known as **One-Hot Vectorization**
172
- ''')
 
 
 
 
 
 
 
 
173
 
174
 
175
 
 
153
  - This technique is called One-Hot Vectorization
154
  ''')
155
 
156
+ st.markdown("""
157
+ | **Word** | **Vector Representation** |
158
+ |----------|--------------------------|
159
+ | **toy** | [1,0,0,0,0] |
160
+ | **is** | [0,1,0,0,0] |
161
+ | **good** | [0,0,1,0,0] |
162
+ | **not** | [0,0,0,1,0] |
163
+ | **bad** | [0,0,0,0,1] |
164
+ """, unsafe_allow_html=True)
165
+
166
+ st.markdown("""
167
+ ### πŸ“ Document Representations:
168
+ - **d₁ β†’ v₁** β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
169
+ - **dβ‚‚ β†’ vβ‚‚** β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
170
+ - **d₃ β†’ v₃** β†’ `[[0,0,0,0,1], [1,0,0,0,0]]`
171
+
172
+ βœ… This **One-Hot Vectorization** technique **converts words into numerical vectors** while preserving their uniqueness.
173
+ """)
174
+
175
+ st.markdown("""
176
+ ### 🎯 Key Takeaways:
177
+ - πŸ”Ή **Each word** is represented as a **5-dimensional** vector.
178
+ - πŸ”Ή **Every dimension** corresponds to a **unique word** in the vocabulary.
179
+ - πŸ”Ή This method is **useful** for transforming text into a **numerical format** for Machine Learning tasks.
180
+ """)
181
 
182
 
183