NeonSamurai commited on
Commit
4298273
·
verified ·
1 Parent(s): 2b07e3a

Update stages/feature_engineering.py

Browse files
Files changed (1) hide show
  1. stages/feature_engineering.py +25 -4
stages/feature_engineering.py CHANGED
@@ -1,5 +1,26 @@
1
- import streamlit as st
2
-
3
- def main():
4
- st.title("Feature Engineering")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  main()
 
1
+ import streamlit as st
2
+
3
+ def main():
4
+ st.title("Step 6: Feature Engineering")
5
+
6
+ st.markdown("""
7
+ ### **:mag: What is Text Vectorization?** :bar_chart:
8
+
9
+ **Feature Engineering** for text data mainly involves **Text Vectorization**, which is the key process of transforming unstructured text data into numerical form so that machine learning models can understand and learn from it. Why do we need this?
10
+
11
+ **:brain: Why is Vectorization Necessary?**
12
+ - **Machine Learning Models Understand Only Numbers**: Algorithms can only work with numerical data, so raw text needs to be converted into numbers for the model to process it.
13
+ - **Makes Text Usable for Models**: Vectorization translates the meaning of words, sentences, and documents into a format that models can interpret. This allows them to identify patterns, relationships, and insights.
14
+
15
+ **:bulb: Think about it this way**: Text is like a language, but models only speak in numbers. Vectorization is the translator that helps them communicate effectively!
16
+
17
+ **Common Vectorization Techniques:**
18
+ - **Bag of Words (BoW)**: Counts word occurrences in a document, turning them into a numerical vector.
19
+ - **TF-IDF (Term Frequency-Inverse Document Frequency)**: Weighs words based on how often they appear in the document and how unique they are across all documents.
20
+ - **Word Embeddings**: Uses models like **Word2Vec**, **GloVe**, and **FastText** to represent words as dense vectors, capturing semantic meaning and context.
21
+
22
+ **:key: In Short**: Vectorization is crucial because it transforms raw text into a format that machine learning models can process and understand. Without it, models would have no way to interpret the meaning hidden in the text data.
23
+ """)
24
+
25
+ st.divider()
26
  main()