Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

App Files Files Community

Harika22 commited on Feb 2, 2025

Commit

51ac896

verified ·

1 Parent(s): 094e4c7

Update pages/7_Advance_vectorization_techniques.py

Browse files

Files changed (1) hide show

pages/7_Advance_vectorization_techniques.py +48 -7

pages/7_Advance_vectorization_techniques.py CHANGED Viewed

@@ -147,7 +147,6 @@ if file_type == "Word2Vec":
     st.title(":red[Word2Vec]")
     st.markdown(
     """
-    <div class='box'>
         <h3 style='color: #6A0572;'>📌 How Word2Vec Works?</h3>
         <ul>
             <li>After <strong>training</strong>, we obtain the final <span class='highlight'>Word2Vec model</span></li>
@@ -156,19 +155,16 @@ if file_type == "Word2Vec":
         <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
         { w1: [v1], w2: [v2], w3: [v3] }
         </pre>
-    </div>
     """,
     unsafe_allow_html=True,
     )
     st.markdown(
     """
-    <div class='box'>
         <h3 style='color: #6A0572;'>⚙️ Training vs. Test Time</h3>
         <ul>
             <li><strong>Training Time</strong>: <span class='highlight'>Corpus + Deep Learning Algorithm</span> → Generates Model</li>
             <li><strong>Test Time</strong>: <span class='highlight'>Word</span> → Looked up in Dictionary → Returns <span class='highlight'>Vector Representation</span></li>
         </ul>
-    </div>
     """,
     unsafe_allow_html=True,
     )
@@ -187,17 +183,62 @@ if file_type == "Word2Vec":
     st.markdown(
     """
-    <div class='box'>
         <h3 style='color: #6A0572;'>📚 Why is Corpus Important?</h3>
         <ul>
             <li>The <strong>Word2Vec algorithm</strong> is completely dependent on the corpus</li>
             <li>Better corpus → Better word representation</li>
             <li>It <strong>preserves semantic meaning</strong> using neighborhood words (context)</li>
         </ul>
-    </div>
     """,
     unsafe_allow_html=True,
     )
     st.markdown('''
-    -
     ''')

     st.title(":red[Word2Vec]")
     st.markdown(
     """
         <h3 style='color: #6A0572;'>📌 How Word2Vec Works?</h3>
         <ul>
             <li>After <strong>training</strong>, we obtain the final <span class='highlight'>Word2Vec model</span></li>
         <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
         { w1: [v1], w2: [v2], w3: [v3] }
         </pre>
     """,
     unsafe_allow_html=True,
     )
     st.markdown(
     """
         <h3 style='color: #6A0572;'>⚙️ Training vs. Test Time</h3>
         <ul>
             <li><strong>Training Time</strong>: <span class='highlight'>Corpus + Deep Learning Algorithm</span> → Generates Model</li>
             <li><strong>Test Time</strong>: <span class='highlight'>Word</span> → Looked up in Dictionary → Returns <span class='highlight'>Vector Representation</span></li>
         </ul>
     """,
     unsafe_allow_html=True,
     )
     st.markdown(
     """
         <h3 style='color: #6A0572;'>📚 Why is Corpus Important?</h3>
         <ul>
             <li>The <strong>Word2Vec algorithm</strong> is completely dependent on the corpus</li>
             <li>Better corpus → Better word representation</li>
             <li>It <strong>preserves semantic meaning</strong> using neighborhood words (context)</li>
         </ul>
     """,
     unsafe_allow_html=True,
     )
     st.markdown('''
+    - Word2Vec is not converting document into vector, it is converting word to vector
+    - There are 2 techniques by using which we can convert entire document into vector
+    - They are :
+        - Average Word2Vec
+        - TIF-IDF Word2Vec
     ''')
+    st.subheader(":blue[Average Word2Vec]")
+    st.markdown(
+    """
+        <h3 style='color: #6A0572;'>📌 Step-by-Step Process</h3>
+        <ul>
+            <li>Given a document <span class='highlight'>d1</span>: <strong>w1, w2, w3</strong></li>
+            <li>Retrieve vector representations <strong>v1, v2, v3</strong> from Word2Vec</li>
+            <li>Perform <span class='highlight'>element-wise addition</span> of vectors:
+                <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
+                v_total = v1 + v2 + v3
+                </pre>
+            </li>
+            <li>Normalize by dividing by the total number of words (element-wise division):
+                <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
+                v_avg = v_total / len(d1)
+                </pre>
+            </li>
+            <li>Final representation contains the <span class='highlight'>average meaning</span> of all words</li>
+        </ul>
+    """,
+    unsafe_allow_html=True,
+    )
+    st.markdown(
+    """
+        <h3 style='color: #6A0572;'>⚠️ Problem: Equal Importance to Every Word</h3>
+        <ul>
+            <li>Word2Vec assigns <span class='highlight'>equal weight</span> to all words</li>
+            <li>No emphasis on <strong>important words</strong> that carry significant meaning</li>
+            <li>This limits the effectiveness in understanding <span class='highlight'>word importance</span></li>
+        </ul>
+    """,
+    unsafe_allow_html=True,
+    )
+    st.markdown(
+    """
+        <strong>Word2Vec averages word meanings, but lacks weightage for important words! </strong>
+    """,
+    unsafe_allow_html=True,
+)