Spaces:

Harika22
/

Natural_Language_Processing

Running

Harika22 commited on Feb 1, 2025

Commit

39864a0

verified ·

1 Parent(s): e92ee8b

Update pages/6_Feature_Engineering.py

Files changed (1) hide show

pages/6_Feature_Engineering.py CHANGED Viewed

@@ -153,5 +153,23 @@ if file_type == "One-Hot Vectorization":
     - This technique is called One-Hot Vectorization
     ''')

     - This technique is called One-Hot Vectorization
     ''')
+    st.markdown('''Example for One-Hot Vectorization is :
+    - There is a corpus contains 3 documents d1, d2, d3
+    - d1 ➡️ Toy is good
+    - d2 ➡️ Toy is not good
+    - d3 ➡️ Bad toy
+        - It converts d1 into v1 where (v1 is numerical representation of d1)
+        - It converts d2 into v2 where (v2 is numerical representation of d2)
+        - It converts d3 into v3 where (v3 is numerical representation of d3)
+    - Creates a vocabulary ➡️ {toy, is, good, not, bad }
+    - len(vocavulary) = 5 in 5 dimension
+    - Each word is represented as 5-dim where every dimension belongs to unique word
+    - toy ➡️ [1,0,0,0,0] , is ➡️ [0,1,0,0,0] , good ➡️ [0,0,1,0,0] , not ➡️ [0,0,0,1,0] , bad ➡️ [0,0,0,0,1]
+    - d1 → v1 → [[1,0,0,0,0] , [0,1,0,0,0] , [0,0,1,0,0]]
+    - d2 → v2 → [[1,0,0,0,0] , [0,1,0,0,0] , [0,0,0,1,0] , [0,0,1,0,0]]
+    - d3 → v3 → [[0,0,0,0,1], [1,0,0,0,0]]
+    - Here we're converting each and every word into vector form and combining it to form vector this technique is known as **One-Hot Vectorization**
+    ''')