Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

App Files Files Community

Harika22 commited on Feb 1, 2025

Commit

9408831

verified ·

1 Parent(s): e4f9e2d

Update pages/6_Feature_Engineering.py

Browse files

Files changed (1) hide show

pages/6_Feature_Engineering.py +27 -14

pages/6_Feature_Engineering.py CHANGED Viewed

@@ -141,17 +141,30 @@ file_type = st.sidebar.radio(
 if file_type == "One-Hot Vectorization":
     st.title(":red[One-Hot Vectorization]")
-    st.markdown('''
-    - It is type of vectorization technique where we can convert text into vector
-    - Steps in One-Hot vectorization
-    - 1. Create a vocabulary (set of all unique words in collected corpus)
-    - 2. Find the length of the vocabulary
-    - 3. Converting every document into vector form
-        - Every unique word into vector (where vector will have → d-dimension → len(vocabulary))
-        - Every dimension belongs to unique word
-    - Here we're not converting document into vector , we're converting each and every word to vector form and then combining it to form vector
-    - This technique is called One-Hot Vectorization
-    ''')
     st.markdown("""
         | **Word** | **Vector Representation** |
@@ -165,9 +178,9 @@ if file_type == "One-Hot Vectorization":
     st.markdown("""
         ### 📝 Document Representations:
-        - **d₁ → v₁** → `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
-        - **d₂ → v₂** → `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
-        - **d₃ → v₃** → `[[0,0,0,0,1], [1,0,0,0,0]]`
     ✅ This **One-Hot Vectorization** technique **converts words into numerical vectors** while preserving their uniqueness.
     """)

 if file_type == "One-Hot Vectorization":
     st.title(":red[One-Hot Vectorization]")
+    st.markdown("""
+        ### 📌 What is One-Hot Vectorization?
+        -  It is a type of vectorization technique where text is converted into a numerical vector.
+        -  This technique helps in representing words as unique vectors for machine learning models.
+    """)
+    st.markdown("""
+        ### 🛠️ Steps in One-Hot Vectorization:
+        1️⃣ Create a Vocabulary ➡️ (A set of all unique words in the collected corpus).
+        2️⃣ Find the Length of Vocabulary ➡️ (Total number of unique words = d-dimensions).
+        3️⃣ Convert Each Word into a Vector:
+           - 📌 Every unique word is transformed into a vector.
+           - 📌 Each vector has d-dimensions, where each dimension corresponds to a unique word.
+           - 📌 Words are converted individually, and then combined to form a vector.
+        ✅ This technique ensures that each word is treated uniquely and efficiently in NLP tasks.
+        """)
+    st.markdown("""
+        ### 🎯 Key Takeaways:
+        - 🎯 Each word gets a unique vector representation.
+        - 🎯 The number of dimensions = total vocabulary size.
+        - 🎯 Words are vectorized separately, then combined into document vectors.
+    """)
     st.markdown("""
         | **Word** | **Vector Representation** |
     st.markdown("""
         ### 📝 Document Representations:
+        - d₁ → v₁ → `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]`
+        - d₂ → v₂ → `[[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]`
+        - d₃ → v₃ → `[[0,0,0,0,1], [1,0,0,0,0]]`
     ✅ This **One-Hot Vectorization** technique **converts words into numerical vectors** while preserving their uniqueness.
     """)