Update pages/7_Advance_vectorization_techniques.py
Browse files
pages/7_Advance_vectorization_techniques.py
CHANGED
|
@@ -147,7 +147,6 @@ if file_type == "Word2Vec":
|
|
| 147 |
st.title(":red[Word2Vec]")
|
| 148 |
st.markdown(
|
| 149 |
"""
|
| 150 |
-
<div class='box'>
|
| 151 |
<h3 style='color: #6A0572;'>π How Word2Vec Works?</h3>
|
| 152 |
<ul>
|
| 153 |
<li>After <strong>training</strong>, we obtain the final <span class='highlight'>Word2Vec model</span></li>
|
|
@@ -156,19 +155,16 @@ if file_type == "Word2Vec":
|
|
| 156 |
<pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
|
| 157 |
{ w1: [v1], w2: [v2], w3: [v3] }
|
| 158 |
</pre>
|
| 159 |
-
</div>
|
| 160 |
""",
|
| 161 |
unsafe_allow_html=True,
|
| 162 |
)
|
| 163 |
st.markdown(
|
| 164 |
"""
|
| 165 |
-
<div class='box'>
|
| 166 |
<h3 style='color: #6A0572;'>βοΈ Training vs. Test Time</h3>
|
| 167 |
<ul>
|
| 168 |
<li><strong>Training Time</strong>: <span class='highlight'>Corpus + Deep Learning Algorithm</span> β Generates Model</li>
|
| 169 |
<li><strong>Test Time</strong>: <span class='highlight'>Word</span> β Looked up in Dictionary β Returns <span class='highlight'>Vector Representation</span></li>
|
| 170 |
</ul>
|
| 171 |
-
</div>
|
| 172 |
""",
|
| 173 |
unsafe_allow_html=True,
|
| 174 |
)
|
|
@@ -187,17 +183,62 @@ if file_type == "Word2Vec":
|
|
| 187 |
|
| 188 |
st.markdown(
|
| 189 |
"""
|
| 190 |
-
<div class='box'>
|
| 191 |
<h3 style='color: #6A0572;'>π Why is Corpus Important?</h3>
|
| 192 |
<ul>
|
| 193 |
<li>The <strong>Word2Vec algorithm</strong> is completely dependent on the corpus</li>
|
| 194 |
<li>Better corpus β Better word representation</li>
|
| 195 |
<li>It <strong>preserves semantic meaning</strong> using neighborhood words (context)</li>
|
| 196 |
</ul>
|
| 197 |
-
</div>
|
| 198 |
""",
|
| 199 |
unsafe_allow_html=True,
|
| 200 |
)
|
| 201 |
st.markdown('''
|
| 202 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
''')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
st.title(":red[Word2Vec]")
|
| 148 |
st.markdown(
|
| 149 |
"""
|
|
|
|
| 150 |
<h3 style='color: #6A0572;'>π How Word2Vec Works?</h3>
|
| 151 |
<ul>
|
| 152 |
<li>After <strong>training</strong>, we obtain the final <span class='highlight'>Word2Vec model</span></li>
|
|
|
|
| 155 |
<pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
|
| 156 |
{ w1: [v1], w2: [v2], w3: [v3] }
|
| 157 |
</pre>
|
|
|
|
| 158 |
""",
|
| 159 |
unsafe_allow_html=True,
|
| 160 |
)
|
| 161 |
st.markdown(
|
| 162 |
"""
|
|
|
|
| 163 |
<h3 style='color: #6A0572;'>βοΈ Training vs. Test Time</h3>
|
| 164 |
<ul>
|
| 165 |
<li><strong>Training Time</strong>: <span class='highlight'>Corpus + Deep Learning Algorithm</span> β Generates Model</li>
|
| 166 |
<li><strong>Test Time</strong>: <span class='highlight'>Word</span> β Looked up in Dictionary β Returns <span class='highlight'>Vector Representation</span></li>
|
| 167 |
</ul>
|
|
|
|
| 168 |
""",
|
| 169 |
unsafe_allow_html=True,
|
| 170 |
)
|
|
|
|
| 183 |
|
| 184 |
st.markdown(
|
| 185 |
"""
|
|
|
|
| 186 |
<h3 style='color: #6A0572;'>π Why is Corpus Important?</h3>
|
| 187 |
<ul>
|
| 188 |
<li>The <strong>Word2Vec algorithm</strong> is completely dependent on the corpus</li>
|
| 189 |
<li>Better corpus β Better word representation</li>
|
| 190 |
<li>It <strong>preserves semantic meaning</strong> using neighborhood words (context)</li>
|
| 191 |
</ul>
|
|
|
|
| 192 |
""",
|
| 193 |
unsafe_allow_html=True,
|
| 194 |
)
|
| 195 |
st.markdown('''
|
| 196 |
+
- Word2Vec is not converting document into vector, it is converting word to vector
|
| 197 |
+
- There are 2 techniques by using which we can convert entire document into vector
|
| 198 |
+
- They are :
|
| 199 |
+
- Average Word2Vec
|
| 200 |
+
- TIF-IDF Word2Vec
|
| 201 |
''')
|
| 202 |
+
|
| 203 |
+
st.subheader(":blue[Average Word2Vec]")
|
| 204 |
+
st.markdown(
|
| 205 |
+
"""
|
| 206 |
+
<h3 style='color: #6A0572;'>π Step-by-Step Process</h3>
|
| 207 |
+
<ul>
|
| 208 |
+
<li>Given a document <span class='highlight'>d1</span>: <strong>w1, w2, w3</strong></li>
|
| 209 |
+
<li>Retrieve vector representations <strong>v1, v2, v3</strong> from Word2Vec</li>
|
| 210 |
+
<li>Perform <span class='highlight'>element-wise addition</span> of vectors:
|
| 211 |
+
<pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
|
| 212 |
+
v_total = v1 + v2 + v3
|
| 213 |
+
</pre>
|
| 214 |
+
</li>
|
| 215 |
+
<li>Normalize by dividing by the total number of words (element-wise division):
|
| 216 |
+
<pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
|
| 217 |
+
v_avg = v_total / len(d1)
|
| 218 |
+
</pre>
|
| 219 |
+
</li>
|
| 220 |
+
<li>Final representation contains the <span class='highlight'>average meaning</span> of all words</li>
|
| 221 |
+
</ul>
|
| 222 |
+
""",
|
| 223 |
+
unsafe_allow_html=True,
|
| 224 |
+
)
|
| 225 |
+
|
| 226 |
+
st.markdown(
|
| 227 |
+
"""
|
| 228 |
+
<h3 style='color: #6A0572;'>β οΈ Problem: Equal Importance to Every Word</h3>
|
| 229 |
+
<ul>
|
| 230 |
+
<li>Word2Vec assigns <span class='highlight'>equal weight</span> to all words</li>
|
| 231 |
+
<li>No emphasis on <strong>important words</strong> that carry significant meaning</li>
|
| 232 |
+
<li>This limits the effectiveness in understanding <span class='highlight'>word importance</span></li>
|
| 233 |
+
</ul>
|
| 234 |
+
""",
|
| 235 |
+
unsafe_allow_html=True,
|
| 236 |
+
)
|
| 237 |
+
|
| 238 |
+
st.markdown(
|
| 239 |
+
"""
|
| 240 |
+
<strong>Word2Vec averages word meanings, but lacks weightage for important words! </strong>
|
| 241 |
+
""",
|
| 242 |
+
unsafe_allow_html=True,
|
| 243 |
+
)
|
| 244 |
+
|