Update pages/7_Advance_vectorization_techniques.py
Browse files
pages/7_Advance_vectorization_techniques.py
CHANGED
|
@@ -328,7 +328,7 @@ if file_type == "Word2Vec":
|
|
| 328 |
</ul>
|
| 329 |
</li>
|
| 330 |
<li>Apply a <span class='highlight'>window size</span> of 2 (how many neighbors we consider).</li>
|
| 331 |
-
<li>Slide the window over the text with <span class='highlight'>
|
| 332 |
</ul>
|
| 333 |
""",
|
| 334 |
unsafe_allow_html=True,
|
|
@@ -367,5 +367,80 @@ if file_type == "Word2Vec":
|
|
| 367 |
""",
|
| 368 |
unsafe_allow_html=True,
|
| 369 |
)
|
| 370 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 371 |
|
|
|
|
| 328 |
</ul>
|
| 329 |
</li>
|
| 330 |
<li>Apply a <span class='highlight'>window size</span> of 2 (how many neighbors we consider).</li>
|
| 331 |
+
<li>Slide the window over the text with <span class='highlight'>slide = 1</span>.</li>
|
| 332 |
</ul>
|
| 333 |
""",
|
| 334 |
unsafe_allow_html=True,
|
|
|
|
| 367 |
""",
|
| 368 |
unsafe_allow_html=True,
|
| 369 |
)
|
| 370 |
+
|
| 371 |
+
st.subheader(":red[Skipgram]")
|
| 372 |
+
st.markdown(
|
| 373 |
+
"""
|
| 374 |
+
<div class='box'>
|
| 375 |
+
<h3 style='color: #6A0572;'>What is Skipgram?</h3>
|
| 376 |
+
<p><strong>Skipgram</strong> is a technique where we use focus words to predict the context words.</p>
|
| 377 |
+
</div>
|
| 378 |
+
""",
|
| 379 |
+
unsafe_allow_html=True,
|
| 380 |
+
)
|
| 381 |
+
|
| 382 |
+
st.markdown(
|
| 383 |
+
"""
|
| 384 |
+
<h3 style='color: #6A0572;'>π Example Corpus</h3>
|
| 385 |
+
<ul>
|
| 386 |
+
<li><strong>d1:</strong> w1, w2, w3, w4, w5, w4</li>
|
| 387 |
+
<li><strong>d2:</strong> w3, w4, w5, w2, w1, w2, w3, w4</li>
|
| 388 |
+
</ul>
|
| 389 |
+
<p>We first preprocess the data to extract meaningful relationships.</p>
|
| 390 |
+
""",
|
| 391 |
+
unsafe_allow_html=True,
|
| 392 |
+
)
|
| 393 |
+
|
| 394 |
+
st.markdown(
|
| 395 |
+
"""
|
| 396 |
+
<h3 style='color: #6A0572;'>π Steps to Process the Data</h3>
|
| 397 |
+
<ul>
|
| 398 |
+
<li>Create a <span class='highlight'>vocabulary</span> from the entire corpus: <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">{w1, w2, w3, w4, w5}</pre></li>
|
| 399 |
+
<li>Generate a <strong>tabular dataset</strong> with:
|
| 400 |
+
<ul>
|
| 401 |
+
<li><strong>Feature variables (Focus Words)</strong></li>
|
| 402 |
+
<li><strong>Class variables (Context Words)</strong></li>
|
| 403 |
+
</ul>
|
| 404 |
+
</li>
|
| 405 |
+
<li>Apply a <span class='highlight'>window size</span> of 2 (how many neighbors we consider).</li>
|
| 406 |
+
<li>Slide the window over the text with <span class='highlight'>slide = 1</span>.</li>
|
| 407 |
+
</ul>
|
| 408 |
+
""",
|
| 409 |
+
unsafe_allow_html=True,
|
| 410 |
+
)
|
| 411 |
+
|
| 412 |
+
st.markdown(
|
| 413 |
+
"""
|
| 414 |
+
<h3 style='color: #6A0572;'> Handling Variable Context Length</h3>
|
| 415 |
+
<ul>
|
| 416 |
+
<li>To ensure a consistent feature length, we use <strong>zero-padding</strong> when needed.</li>
|
| 417 |
+
<li>The model tries to understand relationships<span class='highlight'>focus words</span>.</li>
|
| 418 |
+
</ul>
|
| 419 |
+
""",
|
| 420 |
+
unsafe_allow_html=True,
|
| 421 |
+
)
|
| 422 |
+
|
| 423 |
+
st.markdown(
|
| 424 |
+
"""
|
| 425 |
+
<strong>Mathematical Representation:</strong>
|
| 426 |
+
<pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
|
| 427 |
+
y = f(xi)
|
| 428 |
+
where,
|
| 429 |
+
y = Context Word
|
| 430 |
+
xi = Focus Words
|
| 431 |
+
</pre>
|
| 432 |
+
""",
|
| 433 |
+
unsafe_allow_html=True,
|
| 434 |
+
)
|
| 435 |
+
|
| 436 |
+
st.markdown(
|
| 437 |
+
"""
|
| 438 |
+
<h3 style='color: #6A0572;'> Training with Artificial Neural Networks</h3>
|
| 439 |
+
<p>The tabular data is passed to an <strong>Artificial Neural Network (ANN)</strong> which learns:</p>
|
| 440 |
+
<ul>
|
| 441 |
+
<li>How <span class='highlight'>focus words</span> are related with <span class='highlight'>context words</span>.</li>
|
| 442 |
+
</ul>
|
| 443 |
+
""",
|
| 444 |
+
unsafe_allow_html=True,
|
| 445 |
+
)
|
| 446 |
|