Update pages/6_Feature_Engineering.py
Browse files- pages/6_Feature_Engineering.py +114 -0
pages/6_Feature_Engineering.py
CHANGED
|
@@ -1 +1,115 @@
|
|
| 1 |
import streamlit as st
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import streamlit as st
|
| 2 |
+
|
| 3 |
+
st.markdown("""
|
| 4 |
+
<style>
|
| 5 |
+
/* Set a soft background color */
|
| 6 |
+
body {
|
| 7 |
+
background-color: #eef2f7;
|
| 8 |
+
}
|
| 9 |
+
/* Style for main title */
|
| 10 |
+
h1 {
|
| 11 |
+
color: black;
|
| 12 |
+
font-family: 'Roboto', sans-serif;
|
| 13 |
+
font-weight: 700;
|
| 14 |
+
text-align: center;
|
| 15 |
+
margin-bottom: 25px;
|
| 16 |
+
}
|
| 17 |
+
/* Style for headers */
|
| 18 |
+
h2 {
|
| 19 |
+
color: black;
|
| 20 |
+
font-family: 'Roboto', sans-serif;
|
| 21 |
+
font-weight: 600;
|
| 22 |
+
margin-top: 30px;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
/* Style for subheaders */
|
| 26 |
+
h3 {
|
| 27 |
+
color: red;
|
| 28 |
+
font-family: 'Roboto', sans-serif;
|
| 29 |
+
font-weight: 500;
|
| 30 |
+
margin-top: 20px;
|
| 31 |
+
}
|
| 32 |
+
.custom-subheader {
|
| 33 |
+
color: black;
|
| 34 |
+
font-family: 'Roboto', sans-serif;
|
| 35 |
+
font-weight: 600;
|
| 36 |
+
margin-bottom: 15px;
|
| 37 |
+
}
|
| 38 |
+
/* Paragraph styling */
|
| 39 |
+
p {
|
| 40 |
+
font-family: 'Georgia', serif;
|
| 41 |
+
line-height: 1.8;
|
| 42 |
+
color: black;
|
| 43 |
+
margin-bottom: 20px;
|
| 44 |
+
}
|
| 45 |
+
/* List styling with checkmark bullets */
|
| 46 |
+
.icon-bullet {
|
| 47 |
+
list-style-type: none;
|
| 48 |
+
padding-left: 20px;
|
| 49 |
+
}
|
| 50 |
+
.icon-bullet li {
|
| 51 |
+
font-family: 'Georgia', serif;
|
| 52 |
+
font-size: 1.1em;
|
| 53 |
+
margin-bottom: 10px;
|
| 54 |
+
color: black;
|
| 55 |
+
}
|
| 56 |
+
.icon-bullet li::before {
|
| 57 |
+
content: "◆";
|
| 58 |
+
padding-right: 10px;
|
| 59 |
+
color: black;
|
| 60 |
+
}
|
| 61 |
+
/* Sidebar styling */
|
| 62 |
+
.sidebar .sidebar-content {
|
| 63 |
+
background-color: #ffffff;
|
| 64 |
+
border-radius: 10px;
|
| 65 |
+
padding: 15px;
|
| 66 |
+
}
|
| 67 |
+
.sidebar h2 {
|
| 68 |
+
color: #495057;
|
| 69 |
+
}
|
| 70 |
+
/* Custom button style */
|
| 71 |
+
.streamlit-button {
|
| 72 |
+
background-color: #00FFFF;
|
| 73 |
+
color: #000000;
|
| 74 |
+
font-weight: bold;
|
| 75 |
+
}
|
| 76 |
+
</style>
|
| 77 |
+
""", unsafe_allow_html=True)
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
st.header("Feature Engineering📌")
|
| 81 |
+
st.markdown('''
|
| 82 |
+
- When you already have n- no.of features which belongs to collected data and we want to add extra feature where this is automatically engineered made from existing features and the technique of creating the feature is called **feature engineering**
|
| 83 |
+
- There is a sub part in feature engineering known as feature extraction
|
| 84 |
+
''')
|
| 85 |
+
|
| 86 |
+
st.subheader(":violet[Feature Extraxtion]")
|
| 87 |
+
st.markdown('''
|
| 88 |
+
- We are having text data which is natural language where the text is given to machine to understand the natural language
|
| 89 |
+
- Text is converted into vector form with feature extraction techniques using algorithms which helps to convert text iinto vector
|
| 90 |
+
- While converting text into vector information should be preserved
|
| 91 |
+
''')
|
| 92 |
+
|
| 93 |
+
st.header("Vectorization🧭")
|
| 94 |
+
st.markdown('''
|
| 95 |
+
- Vectorization is a technique of converting text into vectors
|
| 96 |
+
''')
|
| 97 |
+
|
| 98 |
+
st.subheader("Vectorization techniques")
|
| 99 |
+
st.markdown("""
|
| 100 |
+
There a different techniques to convert text into vector format.They are :
|
| 101 |
+
<ul class="icon-bullet">
|
| 102 |
+
<li>One-Hot Vectorization </li>
|
| 103 |
+
<li>Bag of Words(BOW) </li>
|
| 104 |
+
<li>Term Frequency - Inverse Document Frequency(TF-IDF) </li>
|
| 105 |
+
</ul>
|
| 106 |
+
""", unsafe_allow_html=True)
|
| 107 |
+
|
| 108 |
+
st.markdown("""
|
| 109 |
+
There are advance vectorization techniques.They are :
|
| 110 |
+
<ul class="icon-bullet">
|
| 111 |
+
<li>Word Embedding </li>
|
| 112 |
+
<li>Word2Vec </li>
|
| 113 |
+
<li>Fasttext</li>
|
| 114 |
+
</ul>
|
| 115 |
+
""", unsafe_allow_html=True)
|