Update pages/6_Feature_Engineering.py
Browse files
pages/6_Feature_Engineering.py
CHANGED
|
@@ -153,5 +153,23 @@ if file_type == "One-Hot Vectorization":
|
|
| 153 |
- This technique is called One-Hot Vectorization
|
| 154 |
''')
|
| 155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
|
|
|
|
| 153 |
- This technique is called One-Hot Vectorization
|
| 154 |
''')
|
| 155 |
|
| 156 |
+
st.markdown('''Example for One-Hot Vectorization is :
|
| 157 |
+
- There is a corpus contains 3 documents d1, d2, d3
|
| 158 |
+
- d1 ➡️ Toy is good
|
| 159 |
+
- d2 ➡️ Toy is not good
|
| 160 |
+
- d3 ➡️ Bad toy
|
| 161 |
+
- It converts d1 into v1 where (v1 is numerical representation of d1)
|
| 162 |
+
- It converts d2 into v2 where (v2 is numerical representation of d2)
|
| 163 |
+
- It converts d3 into v3 where (v3 is numerical representation of d3)
|
| 164 |
+
- Creates a vocabulary ➡️ {toy, is, good, not, bad }
|
| 165 |
+
- len(vocavulary) = 5 in 5 dimension
|
| 166 |
+
- Each word is represented as 5-dim where every dimension belongs to unique word
|
| 167 |
+
- toy ➡️ [1,0,0,0,0] , is ➡️ [0,1,0,0,0] , good ➡️ [0,0,1,0,0] , not ➡️ [0,0,0,1,0] , bad ➡️ [0,0,0,0,1]
|
| 168 |
+
- d1 → v1 → [[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]
|
| 169 |
+
- d2 → v2 → [[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]
|
| 170 |
+
- d3 → v3 → [[0,0,0,0,1], [1,0,0,0,0]]
|
| 171 |
+
- Here we're converting each and every word into vector form and combining it to form vector this technique is known as **One-Hot Vectorization**
|
| 172 |
+
''')
|
| 173 |
+
|
| 174 |
|
| 175 |
|