Harika22 commited on
Commit
6f30acf
Β·
verified Β·
1 Parent(s): 51ac896

Update pages/7_Advance_vectorization_techniques.py

Browse files
pages/7_Advance_vectorization_techniques.py CHANGED
@@ -240,5 +240,67 @@ if file_type == "Word2Vec":
240
  <strong>Word2Vec averages word meanings, but lacks weightage for important words! </strong>
241
  """,
242
  unsafe_allow_html=True,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
243
  )
 
244
 
 
240
  <strong>Word2Vec averages word meanings, but lacks weightage for important words! </strong>
241
  """,
242
  unsafe_allow_html=True,
243
+ )
244
+
245
+ st.subheader(":blue[TF-IDF Word2Vec]")
246
+ st.markdown(
247
+ """
248
+ <h3 style='color: #6A0572;'>⚠️ Issue with Word2Vec</h3>
249
+ <ul>
250
+ <li>Gives equal importance to every word</li>
251
+ <li>Even words that appear frequently in a document but rarely in the corpus get equal weight</li>
252
+ </ul>
253
+ """,
254
+ unsafe_allow_html=True,
255
+ )
256
+
257
+ st.markdown(
258
+ """
259
+ <h3 style='color: #6A0572;'>πŸš€ Solution: Adding Weightage</h3>
260
+ <ul>
261
+ <li>Consider a document with 3 words: <strong>w1, w2, w3</strong></li>
262
+ <li>Each word has a vector representation:
263
+ <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
264
+ w1 β†’ v1, w2 β†’ v2, w3 β†’ v3
265
+ </pre>
266
+ </li>
267
+ <li>We use <span class='highlight'>two models</span>:
268
+ <ul>
269
+ <li><strong>TF-IDF</strong> β†’ Computes weightage for each word</li>
270
+ <li><strong>Word2Vec</strong> β†’ Converts words into vectors</li>
271
+ </ul>
272
+ </li>
273
+ <li>For each word, multiply its TF-IDF value with its vector</li>
274
+ </ul>
275
+ """,
276
+ unsafe_allow_html=True,
277
+ )
278
+
279
+ st.markdown(
280
+ """
281
+ <div class='formula'>
282
+ <strong>Final Weighted Representation:</strong>
283
+ <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
284
+ v_final = (TF-IDF(w1) * v1 + TF-IDF(w2) * v2 + TF-IDF(w3) * v3)
285
+ / (TF-IDF(w1) + TF-IDF(w2) + TF-IDF(w3))
286
+ </pre>
287
+ </div>
288
+ """,
289
+ unsafe_allow_html=True,
290
+ )
291
+
292
+ st.markdown(
293
+ """
294
+ <div class='box'>
295
+ <h3 style='color: #6A0572;'> Why This Works?</h3>
296
+ <ul>
297
+ <li><span class='highlight'>Instead of equal weighting (1)</span>, we use TF-IDF values</li>
298
+ <li>Gives <strong>more importance</strong> to words that are key in the document</li>
299
+ <li>Improves the <strong>semantic representation</strong> of text</li>
300
+ </ul>
301
+ </div>
302
+ """,
303
+ unsafe_allow_html=True,
304
  )
305
+
306