Natural_Language_Processing / pages /7_Advance_vectorization_techniques.py
Harika22's picture
Update pages/7_Advance_vectorization_techniques.py
094e4c7 verified
raw
history blame
5.71 kB
import streamlit as st
st.markdown("""
<style>
/* Set a soft background color */
body {
background-color: #eef2f7;
}
/* Style for main title */
h1 {
color: black;
font-family: 'Roboto', sans-serif;
font-weight: 700;
text-align: center;
margin-bottom: 25px;
}
/* Style for headers */
h2 {
color: black;
font-family: 'Roboto', sans-serif;
font-weight: 600;
margin-top: 30px;
}
/* Style for subheaders */
h3 {
color: red;
font-family: 'Roboto', sans-serif;
font-weight: 500;
margin-top: 20px;
}
.custom-subheader {
color: black;
font-family: 'Roboto', sans-serif;
font-weight: 600;
margin-bottom: 15px;
}
/* Paragraph styling */
p {
font-family: 'Georgia', serif;
line-height: 1.8;
color: black;
margin-bottom: 20px;
}
/* List styling with checkmark bullets */
.icon-bullet {
list-style-type: none;
padding-left: 20px;
}
.icon-bullet li {
font-family: 'Georgia', serif;
font-size: 1.1em;
margin-bottom: 10px;
color: black;
}
.icon-bullet li::before {
content: "◆";
padding-right: 10px;
color: black;
}
/* Sidebar styling */
.sidebar .sidebar-content {
background-color: #ffffff;
border-radius: 10px;
padding: 15px;
}
.sidebar h2 {
color: #495057;
}
.step-box {
font-size: 18px;
background-color: #F0F8FF;
padding: 15px;
border-radius: 10px;
box-shadow: 2px 2px 8px #D3D3D3;
line-height: 1.6;
}
.box {
font-size: 18px;
background-color: #F0F8FF;
padding: 15px;
border-radius: 10px;
box-shadow: 2px 2px 8px #D3D3D3;
line-height: 1.6;
}
.title {
font-size: 26px;
font-weight: bold;
color: #E63946;
text-align: center;
margin-bottom: 15px;
}
.formula {
font-size: 20px;
font-weight: bold;
color: #2A9D8F;
background-color: #F7F7F7;
padding: 10px;
border-radius: 5px;
text-align: center;
margin-top: 10px;
}
/* Custom button style */
.streamlit-button {
background-color: #00FFFF;
color: #000000;
font-weight: bold;
}
</style>
""", unsafe_allow_html=True)
st.header("Vectorization🧭")
st.markdown(
"""
<div class='info-box'>
<p>Vectorization is the process of converting text into vector.</p>
<p>This allows ML models to process text data effectively.</p>
</div>
""",
unsafe_allow_html=True
)
st.markdown("""
There are advance vectorization techniques.They are :
<ul class="icon-bullet">
<li>Word Embedding </li>
<li>Word2Vec </li>
<li>Fasttext</li>
</ul>
""", unsafe_allow_html=True)
st.sidebar.title("Navigation 🧭")
file_type = st.sidebar.radio(
"Choose a Vectorization technique :",
("Word2Vec", "Fasttext"))
st.header("Word Embedding Technique")
st.markdown('''
- It is a advanced vectorization technique it converts text into vectors in such a way that it preserves semantic meaning
- All the techniques which preserves semantic meaning while converting text into vector is word embedding technique
- There are 2 word embedding techniques:
- Word2Vec
- Fasttext
''')
if file_type == "Word2Vec":
st.title(":red[Word2Vec]")
st.markdown(
"""
<div class='box'>
<h3 style='color: #6A0572;'>📌 How Word2Vec Works?</h3>
<ul>
<li>After <strong>training</strong>, we obtain the final <span class='highlight'>Word2Vec model</span></li>
<li>The model stores a <strong>dictionary</strong> with word-vector pairs:</li>
</ul>
<pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
{ w1: [v1], w2: [v2], w3: [v3] }
</pre>
</div>
""",
unsafe_allow_html=True,
)
st.markdown(
"""
<div class='box'>
<h3 style='color: #6A0572;'>⚙️ Training vs. Test Time</h3>
<ul>
<li><strong>Training Time</strong>: <span class='highlight'>Corpus + Deep Learning Algorithm</span> → Generates Model</li>
<li><strong>Test Time</strong>: <span class='highlight'>Word</span> → Looked up in Dictionary → Returns <span class='highlight'>Vector Representation</span></li>
</ul>
</div>
""",
unsafe_allow_html=True,
)
st.markdown(
"""
<h3 style='color: #6A0572;'>🔍 How Does It Preserve Meaning?</h3>
<ul>
<li>It learns from the <strong>context</strong> of words in the <span class='highlight'>corpus</span></li>
<li>When given a word, it checks in the dictionary and retrieves the <strong>semantic vector</strong></li>
<li>Unlike other models, <span class='highlight'>dimensions are not words</span>, but their meanings</li>
</ul>
""",
unsafe_allow_html=True,
)
st.markdown(
"""
<div class='box'>
<h3 style='color: #6A0572;'>📚 Why is Corpus Important?</h3>
<ul>
<li>The <strong>Word2Vec algorithm</strong> is completely dependent on the corpus</li>
<li>Better corpus → Better word representation</li>
<li>It <strong>preserves semantic meaning</strong> using neighborhood words (context)</li>
</ul>
</div>
""",
unsafe_allow_html=True,
)
st.markdown('''
-
''')