Harika22 commited on
Commit
9408831
Β·
verified Β·
1 Parent(s): e4f9e2d

Update pages/6_Feature_Engineering.py

Browse files
Files changed (1) hide show
  1. pages/6_Feature_Engineering.py +27 -14
pages/6_Feature_Engineering.py CHANGED
@@ -141,17 +141,30 @@ file_type = st.sidebar.radio(
141
 
142
  if file_type == "One-Hot Vectorization":
143
  st.title(":red[One-Hot Vectorization]")
144
- st.markdown('''
145
- - It is type of vectorization technique where we can convert text into vector
146
- - Steps in One-Hot vectorization
147
- - 1. Create a vocabulary (set of all unique words in collected corpus)
148
- - 2. Find the length of the vocabulary
149
- - 3. Converting every document into vector form
150
- - Every unique word into vector (where vector will have β†’ d-dimension β†’ len(vocabulary))
151
- - Every dimension belongs to unique word
152
- - Here we're not converting document into vector , we're converting each and every word to vector form and then combining it to form vector
153
- - This technique is called One-Hot Vectorization
154
- ''')
 
 
 
 
 
 
 
 
 
 
 
 
 
155
 
156
  st.markdown("""
157
  | **Word** | **Vector Representation** |
@@ -165,9 +178,9 @@ if file_type == "One-Hot Vectorization":
165
 
166
  st.markdown("""
167
  ### πŸ“ Document Representations:
168
- - **d₁ β†’ v₁** β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
169
- - **dβ‚‚ β†’ vβ‚‚** β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
170
- - **d₃ β†’ v₃** β†’ `[[0,0,0,0,1], [1,0,0,0,0]]`
171
 
172
  βœ… This **One-Hot Vectorization** technique **converts words into numerical vectors** while preserving their uniqueness.
173
  """)
 
141
 
142
  if file_type == "One-Hot Vectorization":
143
  st.title(":red[One-Hot Vectorization]")
144
+ st.markdown("""
145
+ ### πŸ“Œ What is One-Hot Vectorization?
146
+ - It is a type of vectorization technique where text is converted into a numerical vector.
147
+ - This technique helps in representing words as unique vectors for machine learning models.
148
+ """)
149
+
150
+ st.markdown("""
151
+ ### πŸ› οΈ Steps in One-Hot Vectorization:
152
+ 1️⃣ Create a Vocabulary ➑️ (A set of all unique words in the collected corpus).
153
+ 2️⃣ Find the Length of Vocabulary ➑️ (Total number of unique words = d-dimensions).
154
+ 3️⃣ Convert Each Word into a Vector:
155
+ - πŸ“Œ Every unique word is transformed into a vector.
156
+ - πŸ“Œ Each vector has d-dimensions, where each dimension corresponds to a unique word.
157
+ - πŸ“Œ Words are converted individually, and then combined to form a vector.
158
+
159
+ βœ… This technique ensures that each word is treated uniquely and efficiently in NLP tasks.
160
+ """)
161
+
162
+ st.markdown("""
163
+ ### 🎯 Key Takeaways:
164
+ - 🎯 Each word gets a unique vector representation.
165
+ - 🎯 The number of dimensions = total vocabulary size.
166
+ - 🎯 Words are vectorized separately, then combined into document vectors.
167
+ """)
168
 
169
  st.markdown("""
170
  | **Word** | **Vector Representation** |
 
178
 
179
  st.markdown("""
180
  ### πŸ“ Document Representations:
181
+ - d₁ β†’ v₁ β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
182
+ - dβ‚‚ β†’ vβ‚‚ β†’ `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
183
+ - d₃ β†’ v₃ β†’ `[[0,0,0,0,1], [1,0,0,0,0]]`
184
 
185
  βœ… This **One-Hot Vectorization** technique **converts words into numerical vectors** while preserving their uniqueness.
186
  """)