Update pages/6_Feature_Engineering.py
Browse files- pages/6_Feature_Engineering.py +27 -14
pages/6_Feature_Engineering.py
CHANGED
|
@@ -141,17 +141,30 @@ file_type = st.sidebar.radio(
|
|
| 141 |
|
| 142 |
if file_type == "One-Hot Vectorization":
|
| 143 |
st.title(":red[One-Hot Vectorization]")
|
| 144 |
-
st.markdown(
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
|
| 156 |
st.markdown("""
|
| 157 |
| **Word** | **Vector Representation** |
|
|
@@ -165,9 +178,9 @@ if file_type == "One-Hot Vectorization":
|
|
| 165 |
|
| 166 |
st.markdown("""
|
| 167 |
### π Document Representations:
|
| 168 |
-
-
|
| 169 |
-
-
|
| 170 |
-
-
|
| 171 |
|
| 172 |
β
This **One-Hot Vectorization** technique **converts words into numerical vectors** while preserving their uniqueness.
|
| 173 |
""")
|
|
|
|
| 141 |
|
| 142 |
if file_type == "One-Hot Vectorization":
|
| 143 |
st.title(":red[One-Hot Vectorization]")
|
| 144 |
+
st.markdown("""
|
| 145 |
+
### π What is One-Hot Vectorization?
|
| 146 |
+
- It is a type of vectorization technique where text is converted into a numerical vector.
|
| 147 |
+
- This technique helps in representing words as unique vectors for machine learning models.
|
| 148 |
+
""")
|
| 149 |
+
|
| 150 |
+
st.markdown("""
|
| 151 |
+
### π οΈ Steps in One-Hot Vectorization:
|
| 152 |
+
1οΈβ£ Create a Vocabulary β‘οΈ (A set of all unique words in the collected corpus).
|
| 153 |
+
2οΈβ£ Find the Length of Vocabulary β‘οΈ (Total number of unique words = d-dimensions).
|
| 154 |
+
3οΈβ£ Convert Each Word into a Vector:
|
| 155 |
+
- π Every unique word is transformed into a vector.
|
| 156 |
+
- π Each vector has d-dimensions, where each dimension corresponds to a unique word.
|
| 157 |
+
- π Words are converted individually, and then combined to form a vector.
|
| 158 |
+
|
| 159 |
+
β
This technique ensures that each word is treated uniquely and efficiently in NLP tasks.
|
| 160 |
+
""")
|
| 161 |
+
|
| 162 |
+
st.markdown("""
|
| 163 |
+
### π― Key Takeaways:
|
| 164 |
+
- π― Each word gets a unique vector representation.
|
| 165 |
+
- π― The number of dimensions = total vocabulary size.
|
| 166 |
+
- π― Words are vectorized separately, then combined into document vectors.
|
| 167 |
+
""")
|
| 168 |
|
| 169 |
st.markdown("""
|
| 170 |
| **Word** | **Vector Representation** |
|
|
|
|
| 178 |
|
| 179 |
st.markdown("""
|
| 180 |
### π Document Representations:
|
| 181 |
+
- dβ β vβ β `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
|
| 182 |
+
- dβ β vβ β `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
|
| 183 |
+
- dβ β vβ β `[[0,0,0,0,1], [1,0,0,0,0]]`
|
| 184 |
|
| 185 |
β
This **One-Hot Vectorization** technique **converts words into numerical vectors** while preserving their uniqueness.
|
| 186 |
""")
|