Harika22 commited on
Commit
51ac896
Β·
verified Β·
1 Parent(s): 094e4c7

Update pages/7_Advance_vectorization_techniques.py

Browse files
pages/7_Advance_vectorization_techniques.py CHANGED
@@ -147,7 +147,6 @@ if file_type == "Word2Vec":
147
  st.title(":red[Word2Vec]")
148
  st.markdown(
149
  """
150
- <div class='box'>
151
  <h3 style='color: #6A0572;'>πŸ“Œ How Word2Vec Works?</h3>
152
  <ul>
153
  <li>After <strong>training</strong>, we obtain the final <span class='highlight'>Word2Vec model</span></li>
@@ -156,19 +155,16 @@ if file_type == "Word2Vec":
156
  <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
157
  { w1: [v1], w2: [v2], w3: [v3] }
158
  </pre>
159
- </div>
160
  """,
161
  unsafe_allow_html=True,
162
  )
163
  st.markdown(
164
  """
165
- <div class='box'>
166
  <h3 style='color: #6A0572;'>βš™οΈ Training vs. Test Time</h3>
167
  <ul>
168
  <li><strong>Training Time</strong>: <span class='highlight'>Corpus + Deep Learning Algorithm</span> β†’ Generates Model</li>
169
  <li><strong>Test Time</strong>: <span class='highlight'>Word</span> β†’ Looked up in Dictionary β†’ Returns <span class='highlight'>Vector Representation</span></li>
170
  </ul>
171
- </div>
172
  """,
173
  unsafe_allow_html=True,
174
  )
@@ -187,17 +183,62 @@ if file_type == "Word2Vec":
187
 
188
  st.markdown(
189
  """
190
- <div class='box'>
191
  <h3 style='color: #6A0572;'>πŸ“š Why is Corpus Important?</h3>
192
  <ul>
193
  <li>The <strong>Word2Vec algorithm</strong> is completely dependent on the corpus</li>
194
  <li>Better corpus β†’ Better word representation</li>
195
  <li>It <strong>preserves semantic meaning</strong> using neighborhood words (context)</li>
196
  </ul>
197
- </div>
198
  """,
199
  unsafe_allow_html=True,
200
  )
201
  st.markdown('''
202
- -
 
 
 
 
203
  ''')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  st.title(":red[Word2Vec]")
148
  st.markdown(
149
  """
 
150
  <h3 style='color: #6A0572;'>πŸ“Œ How Word2Vec Works?</h3>
151
  <ul>
152
  <li>After <strong>training</strong>, we obtain the final <span class='highlight'>Word2Vec model</span></li>
 
155
  <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
156
  { w1: [v1], w2: [v2], w3: [v3] }
157
  </pre>
 
158
  """,
159
  unsafe_allow_html=True,
160
  )
161
  st.markdown(
162
  """
 
163
  <h3 style='color: #6A0572;'>βš™οΈ Training vs. Test Time</h3>
164
  <ul>
165
  <li><strong>Training Time</strong>: <span class='highlight'>Corpus + Deep Learning Algorithm</span> β†’ Generates Model</li>
166
  <li><strong>Test Time</strong>: <span class='highlight'>Word</span> β†’ Looked up in Dictionary β†’ Returns <span class='highlight'>Vector Representation</span></li>
167
  </ul>
 
168
  """,
169
  unsafe_allow_html=True,
170
  )
 
183
 
184
  st.markdown(
185
  """
 
186
  <h3 style='color: #6A0572;'>πŸ“š Why is Corpus Important?</h3>
187
  <ul>
188
  <li>The <strong>Word2Vec algorithm</strong> is completely dependent on the corpus</li>
189
  <li>Better corpus β†’ Better word representation</li>
190
  <li>It <strong>preserves semantic meaning</strong> using neighborhood words (context)</li>
191
  </ul>
 
192
  """,
193
  unsafe_allow_html=True,
194
  )
195
  st.markdown('''
196
+ - Word2Vec is not converting document into vector, it is converting word to vector
197
+ - There are 2 techniques by using which we can convert entire document into vector
198
+ - They are :
199
+ - Average Word2Vec
200
+ - TIF-IDF Word2Vec
201
  ''')
202
+
203
+ st.subheader(":blue[Average Word2Vec]")
204
+ st.markdown(
205
+ """
206
+ <h3 style='color: #6A0572;'>πŸ“Œ Step-by-Step Process</h3>
207
+ <ul>
208
+ <li>Given a document <span class='highlight'>d1</span>: <strong>w1, w2, w3</strong></li>
209
+ <li>Retrieve vector representations <strong>v1, v2, v3</strong> from Word2Vec</li>
210
+ <li>Perform <span class='highlight'>element-wise addition</span> of vectors:
211
+ <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
212
+ v_total = v1 + v2 + v3
213
+ </pre>
214
+ </li>
215
+ <li>Normalize by dividing by the total number of words (element-wise division):
216
+ <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
217
+ v_avg = v_total / len(d1)
218
+ </pre>
219
+ </li>
220
+ <li>Final representation contains the <span class='highlight'>average meaning</span> of all words</li>
221
+ </ul>
222
+ """,
223
+ unsafe_allow_html=True,
224
+ )
225
+
226
+ st.markdown(
227
+ """
228
+ <h3 style='color: #6A0572;'>⚠️ Problem: Equal Importance to Every Word</h3>
229
+ <ul>
230
+ <li>Word2Vec assigns <span class='highlight'>equal weight</span> to all words</li>
231
+ <li>No emphasis on <strong>important words</strong> that carry significant meaning</li>
232
+ <li>This limits the effectiveness in understanding <span class='highlight'>word importance</span></li>
233
+ </ul>
234
+ """,
235
+ unsafe_allow_html=True,
236
+ )
237
+
238
+ st.markdown(
239
+ """
240
+ <strong>Word2Vec averages word meanings, but lacks weightage for important words! </strong>
241
+ """,
242
+ unsafe_allow_html=True,
243
+ )
244
+