Harika22 commited on
Commit
dfceb96
Β·
verified Β·
1 Parent(s): 9408831

Update pages/6_Feature_Engineering.py

Browse files
Files changed (1) hide show
  1. pages/6_Feature_Engineering.py +16 -18
pages/6_Feature_Engineering.py CHANGED
@@ -149,21 +149,20 @@ if file_type == "One-Hot Vectorization":
149
 
150
  st.markdown("""
151
  ### πŸ› οΈ Steps in One-Hot Vectorization:
152
- 1️⃣ Create a Vocabulary ➑️ (A set of all unique words in the collected corpus).
153
- 2️⃣ Find the Length of Vocabulary ➑️ (Total number of unique words = d-dimensions).
154
- 3️⃣ Convert Each Word into a Vector:
155
- - πŸ“Œ Every unique word is transformed into a vector.
156
- - πŸ“Œ Each vector has d-dimensions, where each dimension corresponds to a unique word.
157
- - πŸ“Œ Words are converted individually, and then combined to form a vector.
158
-
159
- βœ… This technique ensures that each word is treated uniquely and efficiently in NLP tasks.
160
  """)
161
 
162
  st.markdown("""
163
- ### 🎯 Key Takeaways:
164
- - 🎯 Each word gets a unique vector representation.
165
- - 🎯 The number of dimensions = total vocabulary size.
166
- - 🎯 Words are vectorized separately, then combined into document vectors.
167
  """)
168
 
169
  st.markdown("""
@@ -177,19 +176,18 @@ if file_type == "One-Hot Vectorization":
177
  """, unsafe_allow_html=True)
178
 
179
  st.markdown("""
180
- ### πŸ“ Document Representations:
181
  - d₁ β†’ v₁ β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
182
  - dβ‚‚ β†’ vβ‚‚ β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
183
  - d₃ β†’ v₃ β†’ `[[0,0,0,0,1], [1,0,0,0,0]]`
184
 
185
- βœ… This **One-Hot Vectorization** technique **converts words into numerical vectors** while preserving their uniqueness.
186
  """)
187
 
188
  st.markdown("""
189
- ### 🎯 Key Takeaways:
190
- - πŸ”Ή **Each word** is represented as a **5-dimensional** vector.
191
- - πŸ”Ή **Every dimension** corresponds to a **unique word** in the vocabulary.
192
- - πŸ”Ή This method is **useful** for transforming text into a **numerical format** for Machine Learning tasks.
193
  """)
194
 
195
 
 
149
 
150
  st.markdown("""
151
  ### πŸ› οΈ Steps in One-Hot Vectorization:
152
+ - Create a Vocabulary ➑️ (A set of all unique words in the collected corpus).
153
+ - Find the Length of Vocabulary ➑️ (Total number of unique words = d-dimensions).
154
+ - Convert Each Word into a Vector:
155
+ - Every unique word is transformed into a vector.
156
+ - Each vector has d-dimensions, where each dimension corresponds to a unique word.
157
+ - Words are converted individually, and then combined to form a vector.
158
+
159
+ This technique ensures that each word is treated uniquely and efficiently in NLP tasks.
160
  """)
161
 
162
  st.markdown("""
163
+ - Each word gets a unique vector representation.
164
+ - The number of dimensions = total vocabulary size.
165
+ - Words are vectorized separately, then combined into document vectors.
 
166
  """)
167
 
168
  st.markdown("""
 
176
  """, unsafe_allow_html=True)
177
 
178
  st.markdown("""
 
179
  - d₁ β†’ v₁ β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
180
  - dβ‚‚ β†’ vβ‚‚ β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
181
  - d₃ β†’ v₃ β†’ `[[0,0,0,0,1], [1,0,0,0,0]]`
182
 
183
+ This One-Hot Vectorization technique converts words into numerical vectors while preserving their uniqueness.
184
  """)
185
 
186
  st.markdown("""
187
+ ### Key Takeaways:
188
+ - Each word is represented as a 5-dimensional vector.
189
+ - Every dimension corresponds to a unique word in the vocabulary.
190
+ - This method is useful for transforming text into a numerical format for Machine Learning tasks.
191
  """)
192
 
193