Update pages/6_Feature_Engineering.py
Browse files- pages/6_Feature_Engineering.py +25 -17
pages/6_Feature_Engineering.py
CHANGED
|
@@ -153,23 +153,31 @@ if file_type == "One-Hot Vectorization":
|
|
| 153 |
- This technique is called One-Hot Vectorization
|
| 154 |
''')
|
| 155 |
|
| 156 |
-
st.markdown(
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
|
| 174 |
|
| 175 |
|
|
|
|
| 153 |
- This technique is called One-Hot Vectorization
|
| 154 |
''')
|
| 155 |
|
| 156 |
+
st.markdown("""
|
| 157 |
+
| **Word** | **Vector Representation** |
|
| 158 |
+
|----------|--------------------------|
|
| 159 |
+
| **toy** | [1,0,0,0,0] |
|
| 160 |
+
| **is** | [0,1,0,0,0] |
|
| 161 |
+
| **good** | [0,0,1,0,0] |
|
| 162 |
+
| **not** | [0,0,0,1,0] |
|
| 163 |
+
| **bad** | [0,0,0,0,1] |
|
| 164 |
+
""", unsafe_allow_html=True)
|
| 165 |
+
|
| 166 |
+
st.markdown("""
|
| 167 |
+
### π Document Representations:
|
| 168 |
+
- **dβ β vβ** β `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
|
| 169 |
+
- **dβ β vβ** β `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
|
| 170 |
+
- **dβ β vβ** β `[[0,0,0,0,1], [1,0,0,0,0]]`
|
| 171 |
+
|
| 172 |
+
β
This **One-Hot Vectorization** technique **converts words into numerical vectors** while preserving their uniqueness.
|
| 173 |
+
""")
|
| 174 |
+
|
| 175 |
+
st.markdown("""
|
| 176 |
+
### π― Key Takeaways:
|
| 177 |
+
- πΉ **Each word** is represented as a **5-dimensional** vector.
|
| 178 |
+
- πΉ **Every dimension** corresponds to a **unique word** in the vocabulary.
|
| 179 |
+
- πΉ This method is **useful** for transforming text into a **numerical format** for Machine Learning tasks.
|
| 180 |
+
""")
|
| 181 |
|
| 182 |
|
| 183 |
|