Update pages/6_Feature_Engineering.py
Browse files- pages/6_Feature_Engineering.py +16 -18
pages/6_Feature_Engineering.py
CHANGED
|
@@ -149,21 +149,20 @@ if file_type == "One-Hot Vectorization":
|
|
| 149 |
|
| 150 |
st.markdown("""
|
| 151 |
### π οΈ Steps in One-Hot Vectorization:
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
-
|
| 156 |
-
-
|
| 157 |
-
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
""")
|
| 161 |
|
| 162 |
st.markdown("""
|
| 163 |
-
|
| 164 |
-
-
|
| 165 |
-
-
|
| 166 |
-
- π― Words are vectorized separately, then combined into document vectors.
|
| 167 |
""")
|
| 168 |
|
| 169 |
st.markdown("""
|
|
@@ -177,19 +176,18 @@ if file_type == "One-Hot Vectorization":
|
|
| 177 |
""", unsafe_allow_html=True)
|
| 178 |
|
| 179 |
st.markdown("""
|
| 180 |
-
### π Document Representations:
|
| 181 |
- dβ β vβ β `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
|
| 182 |
- dβ β vβ β `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
|
| 183 |
- dβ β vβ β `[[0,0,0,0,1], [1,0,0,0,0]]`
|
| 184 |
|
| 185 |
-
|
| 186 |
""")
|
| 187 |
|
| 188 |
st.markdown("""
|
| 189 |
-
###
|
| 190 |
-
-
|
| 191 |
-
-
|
| 192 |
-
-
|
| 193 |
""")
|
| 194 |
|
| 195 |
|
|
|
|
| 149 |
|
| 150 |
st.markdown("""
|
| 151 |
### π οΈ Steps in One-Hot Vectorization:
|
| 152 |
+
- Create a Vocabulary β‘οΈ (A set of all unique words in the collected corpus).
|
| 153 |
+
- Find the Length of Vocabulary β‘οΈ (Total number of unique words = d-dimensions).
|
| 154 |
+
- Convert Each Word into a Vector:
|
| 155 |
+
- Every unique word is transformed into a vector.
|
| 156 |
+
- Each vector has d-dimensions, where each dimension corresponds to a unique word.
|
| 157 |
+
- Words are converted individually, and then combined to form a vector.
|
| 158 |
+
|
| 159 |
+
This technique ensures that each word is treated uniquely and efficiently in NLP tasks.
|
| 160 |
""")
|
| 161 |
|
| 162 |
st.markdown("""
|
| 163 |
+
- Each word gets a unique vector representation.
|
| 164 |
+
- The number of dimensions = total vocabulary size.
|
| 165 |
+
- Words are vectorized separately, then combined into document vectors.
|
|
|
|
| 166 |
""")
|
| 167 |
|
| 168 |
st.markdown("""
|
|
|
|
| 176 |
""", unsafe_allow_html=True)
|
| 177 |
|
| 178 |
st.markdown("""
|
|
|
|
| 179 |
- dβ β vβ β `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
|
| 180 |
- dβ β vβ β `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
|
| 181 |
- dβ β vβ β `[[0,0,0,0,1], [1,0,0,0,0]]`
|
| 182 |
|
| 183 |
+
This One-Hot Vectorization technique converts words into numerical vectors while preserving their uniqueness.
|
| 184 |
""")
|
| 185 |
|
| 186 |
st.markdown("""
|
| 187 |
+
### Key Takeaways:
|
| 188 |
+
- Each word is represented as a 5-dimensional vector.
|
| 189 |
+
- Every dimension corresponds to a unique word in the vocabulary.
|
| 190 |
+
- This method is useful for transforming text into a numerical format for Machine Learning tasks.
|
| 191 |
""")
|
| 192 |
|
| 193 |
|