Harika22 commited on
Commit
e9ca576
Β·
verified Β·
1 Parent(s): 2e651b3

Update pages/7_Advance_vectorization_techniques.py

Browse files
pages/7_Advance_vectorization_techniques.py CHANGED
@@ -328,7 +328,7 @@ if file_type == "Word2Vec":
328
  </ul>
329
  </li>
330
  <li>Apply a <span class='highlight'>window size</span> of 2 (how many neighbors we consider).</li>
331
- <li>Slide the window over the text with <span class='highlight'>stride = 1</span>.</li>
332
  </ul>
333
  """,
334
  unsafe_allow_html=True,
@@ -367,5 +367,80 @@ if file_type == "Word2Vec":
367
  """,
368
  unsafe_allow_html=True,
369
  )
370
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
371
 
 
328
  </ul>
329
  </li>
330
  <li>Apply a <span class='highlight'>window size</span> of 2 (how many neighbors we consider).</li>
331
+ <li>Slide the window over the text with <span class='highlight'>slide = 1</span>.</li>
332
  </ul>
333
  """,
334
  unsafe_allow_html=True,
 
367
  """,
368
  unsafe_allow_html=True,
369
  )
370
+
371
+ st.subheader(":red[Skipgram]")
372
+ st.markdown(
373
+ """
374
+ <div class='box'>
375
+ <h3 style='color: #6A0572;'>What is Skipgram?</h3>
376
+ <p><strong>Skipgram</strong> is a technique where we use focus words to predict the context words.</p>
377
+ </div>
378
+ """,
379
+ unsafe_allow_html=True,
380
+ )
381
+
382
+ st.markdown(
383
+ """
384
+ <h3 style='color: #6A0572;'>πŸ“‚ Example Corpus</h3>
385
+ <ul>
386
+ <li><strong>d1:</strong> w1, w2, w3, w4, w5, w4</li>
387
+ <li><strong>d2:</strong> w3, w4, w5, w2, w1, w2, w3, w4</li>
388
+ </ul>
389
+ <p>We first preprocess the data to extract meaningful relationships.</p>
390
+ """,
391
+ unsafe_allow_html=True,
392
+ )
393
+
394
+ st.markdown(
395
+ """
396
+ <h3 style='color: #6A0572;'>πŸ“Œ Steps to Process the Data</h3>
397
+ <ul>
398
+ <li>Create a <span class='highlight'>vocabulary</span> from the entire corpus: <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">{w1, w2, w3, w4, w5}</pre></li>
399
+ <li>Generate a <strong>tabular dataset</strong> with:
400
+ <ul>
401
+ <li><strong>Feature variables (Focus Words)</strong></li>
402
+ <li><strong>Class variables (Context Words)</strong></li>
403
+ </ul>
404
+ </li>
405
+ <li>Apply a <span class='highlight'>window size</span> of 2 (how many neighbors we consider).</li>
406
+ <li>Slide the window over the text with <span class='highlight'>slide = 1</span>.</li>
407
+ </ul>
408
+ """,
409
+ unsafe_allow_html=True,
410
+ )
411
+
412
+ st.markdown(
413
+ """
414
+ <h3 style='color: #6A0572;'> Handling Variable Context Length</h3>
415
+ <ul>
416
+ <li>To ensure a consistent feature length, we use <strong>zero-padding</strong> when needed.</li>
417
+ <li>The model tries to understand relationships<span class='highlight'>focus words</span>.</li>
418
+ </ul>
419
+ """,
420
+ unsafe_allow_html=True,
421
+ )
422
+
423
+ st.markdown(
424
+ """
425
+ <strong>Mathematical Representation:</strong>
426
+ <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
427
+ y = f(xi)
428
+ where,
429
+ y = Context Word
430
+ xi = Focus Words
431
+ </pre>
432
+ """,
433
+ unsafe_allow_html=True,
434
+ )
435
+
436
+ st.markdown(
437
+ """
438
+ <h3 style='color: #6A0572;'> Training with Artificial Neural Networks</h3>
439
+ <p>The tabular data is passed to an <strong>Artificial Neural Network (ANN)</strong> which learns:</p>
440
+ <ul>
441
+ <li>How <span class='highlight'>focus words</span> are related with <span class='highlight'>context words</span>.</li>
442
+ </ul>
443
+ """,
444
+ unsafe_allow_html=True,
445
+ )
446