Update pages/7_Advance_vectorization_techniques.py
Browse files
pages/7_Advance_vectorization_techniques.py
CHANGED
|
@@ -497,8 +497,53 @@ elif file_type == "Fasttext":
|
|
| 497 |
unsafe_allow_html=True,
|
| 498 |
)
|
| 499 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 500 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 501 |
|
| 502 |
-
st.markdown(
|
| 503 |
-
|
| 504 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 497 |
unsafe_allow_html=True,
|
| 498 |
)
|
| 499 |
|
| 500 |
+
st.markdown(
|
| 501 |
+
"""
|
| 502 |
+
<h3 style='color: #6A0572;'>Implementing CBOW with Character N-Grams</h3>
|
| 503 |
+
<ul>
|
| 504 |
+
<li><span class='highlight'>Window Size</span>: 5</li>
|
| 505 |
+
<li><span class='highlight'>Window</span>: 2</li>
|
| 506 |
+
<li><span class='highlight'>Slide</span>: 1</li>
|
| 507 |
+
</ul>
|
| 508 |
+
<p>A tabular format is created with <strong>context words</strong> and <strong>focus words</strong>.</p>
|
| 509 |
+
""",
|
| 510 |
+
unsafe_allow_html=True,
|
| 511 |
+
)
|
| 512 |
+
|
| 513 |
+
st.markdown(
|
| 514 |
+
"""
|
| 515 |
+
<h3 style='color: #1D3557;'>Context Words & Focus Words</h3>
|
| 516 |
+
<pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
|
| 517 |
+
Context Words: <app, pp, pl, le> --> "is"
|
| 518 |
+
Focus Words: <go, oo, od> --> "good"
|
| 519 |
+
</pre>
|
| 520 |
+
<p>Here, <strong>< ></strong> are used to define word patterns for the machine to understand.</p>
|
| 521 |
+
""",
|
| 522 |
+
unsafe_allow_html=True,
|
| 523 |
+
)
|
| 524 |
|
| 525 |
+
st.markdown(
|
| 526 |
+
"""
|
| 527 |
+
<h3 style='color: #6A0572;'>Vocabulary</h3>
|
| 528 |
+
<p>The vocabulary consists of <span class='highlight'>unique character n-grams</span>.</p>
|
| 529 |
+
<pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
|
| 530 |
+
{ keys: values }
|
| 531 |
+
where,
|
| 532 |
+
- Keys: Character n-grams
|
| 533 |
+
- Values: Vector representations
|
| 534 |
+
</pre>
|
| 535 |
+
""",
|
| 536 |
+
unsafe_allow_html=True,
|
| 537 |
+
)
|
| 538 |
|
| 539 |
+
st.markdown(
|
| 540 |
+
"""
|
| 541 |
+
<h3 style='color: #6A0572;'> FastText Model</h3>
|
| 542 |
+
<ul>
|
| 543 |
+
<li>The dictionary created is the <span class='highlight'>FastText model</span>.</li>
|
| 544 |
+
<li>Text is broken down into <strong>character n-grams</strong> to generate vector representations.</li>
|
| 545 |
+
<li>It follows <span class='highlight'>element-wise addition</span>, giving an <strong>average 2D representation</strong> of the word.</li>
|
| 546 |
+
</ul>
|
| 547 |
+
""",
|
| 548 |
+
unsafe_allow_html=True,
|
| 549 |
+
)
|