Harika22 commited on
Commit
bc05235
·
verified ·
1 Parent(s): 5881f79

Update pages/7_Advance_vectorization_techniques.py

Browse files
pages/7_Advance_vectorization_techniques.py CHANGED
@@ -497,8 +497,53 @@ elif file_type == "Fasttext":
497
  unsafe_allow_html=True,
498
  )
499
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
500
 
 
 
 
 
 
 
 
 
 
 
 
 
 
501
 
502
- st.markdown('''Example :
503
- -
504
- ''')
 
 
 
 
 
 
 
 
 
497
  unsafe_allow_html=True,
498
  )
499
 
500
+ st.markdown(
501
+ """
502
+ <h3 style='color: #6A0572;'>Implementing CBOW with Character N-Grams</h3>
503
+ <ul>
504
+ <li><span class='highlight'>Window Size</span>: 5</li>
505
+ <li><span class='highlight'>Window</span>: 2</li>
506
+ <li><span class='highlight'>Slide</span>: 1</li>
507
+ </ul>
508
+ <p>A tabular format is created with <strong>context words</strong> and <strong>focus words</strong>.</p>
509
+ """,
510
+ unsafe_allow_html=True,
511
+ )
512
+
513
+ st.markdown(
514
+ """
515
+ <h3 style='color: #1D3557;'>Context Words & Focus Words</h3>
516
+ <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
517
+ Context Words: <app, pp, pl, le> --> "is"
518
+ Focus Words: <go, oo, od> --> "good"
519
+ </pre>
520
+ <p>Here, <strong>&lt; &gt;</strong> are used to define word patterns for the machine to understand.</p>
521
+ """,
522
+ unsafe_allow_html=True,
523
+ )
524
 
525
+ st.markdown(
526
+ """
527
+ <h3 style='color: #6A0572;'>Vocabulary</h3>
528
+ <p>The vocabulary consists of <span class='highlight'>unique character n-grams</span>.</p>
529
+ <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
530
+ { keys: values }
531
+ where,
532
+ - Keys: Character n-grams
533
+ - Values: Vector representations
534
+ </pre>
535
+ """,
536
+ unsafe_allow_html=True,
537
+ )
538
 
539
+ st.markdown(
540
+ """
541
+ <h3 style='color: #6A0572;'> FastText Model</h3>
542
+ <ul>
543
+ <li>The dictionary created is the <span class='highlight'>FastText model</span>.</li>
544
+ <li>Text is broken down into <strong>character n-grams</strong> to generate vector representations.</li>
545
+ <li>It follows <span class='highlight'>element-wise addition</span>, giving an <strong>average 2D representation</strong> of the word.</li>
546
+ </ul>
547
+ """,
548
+ unsafe_allow_html=True,
549
+ )