Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

App Files Files Community

Harika22 commited on Feb 2, 2025

Commit

6f30acf

verified ·

1 Parent(s): 51ac896

Update pages/7_Advance_vectorization_techniques.py

Browse files

Files changed (1) hide show

pages/7_Advance_vectorization_techniques.py +62 -0

pages/7_Advance_vectorization_techniques.py CHANGED Viewed

@@ -240,5 +240,67 @@ if file_type == "Word2Vec":
         <strong>Word2Vec averages word meanings, but lacks weightage for important words! </strong>
     """,
     unsafe_allow_html=True,
 )

         <strong>Word2Vec averages word meanings, but lacks weightage for important words! </strong>
     """,
     unsafe_allow_html=True,
+    )
+    st.subheader(":blue[TF-IDF Word2Vec]")
+    st.markdown(
+    """
+        <h3 style='color: #6A0572;'>⚠️ Issue with Word2Vec</h3>
+        <ul>
+            <li>Gives equal importance to every word</li>
+            <li>Even words that appear frequently in a document but rarely in the corpus get equal weight</li>
+        </ul>
+    """,
+    unsafe_allow_html=True,
+    )
+    st.markdown(
+    """
+        <h3 style='color: #6A0572;'>🚀 Solution: Adding Weightage</h3>
+        <ul>
+            <li>Consider a document with 3 words: <strong>w1, w2, w3</strong></li>
+            <li>Each word has a vector representation:
+                <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
+                w1 → v1,  w2 → v2,  w3 → v3
+                </pre>
+            </li>
+            <li>We use <span class='highlight'>two models</span>:
+                <ul>
+                    <li><strong>TF-IDF</strong> → Computes weightage for each word</li>
+                    <li><strong>Word2Vec</strong> → Converts words into vectors</li>
+                </ul>
+            </li>
+            <li>For each word, multiply its TF-IDF value with its vector</li>
+        </ul>
+    """,
+    unsafe_allow_html=True,
+    )
+    st.markdown(
+    """
+    <div class='formula'>
+        <strong>Final Weighted Representation:</strong>
+        <pre style="background-color:#F7F7F7; padding: 10px; border-radius: 5px;">
+        v_final = (TF-IDF(w1) * v1 + TF-IDF(w2) * v2 + TF-IDF(w3) * v3)
+                 / (TF-IDF(w1) + TF-IDF(w2) + TF-IDF(w3))
+        </pre>
+    </div>
+    """,
+    unsafe_allow_html=True,
+    )
+    st.markdown(
+    """
+    <div class='box'>
+        <h3 style='color: #6A0572;'> Why This Works?</h3>
+        <ul>
+            <li><span class='highlight'>Instead of equal weighting (1)</span>, we use TF-IDF values</li>
+            <li>Gives <strong>more importance</strong> to words that are key in the document</li>
+            <li>Improves the <strong>semantic representation</strong> of text</li>
+        </ul>
+    </div>
+    """,
+    unsafe_allow_html=True,
 )