NLP / pages /Vectorization_Technique.py
Krishnaveni11's picture
Update pages/Vectorization_Technique.py
f6152ea verified
import streamlit as st
# Add custom CSS styling to the app
st.markdown(
"""
<style>
/* App Background */
.stApp {
background: linear-gradient(to right, #1e3c72, #2a5298); /* Subtle gradient with cool tones */
color: #f0f0f0;
padding: 20px;
}
/* Align content to the left */
.block-container {
text-align: left;
padding: 2rem;
}
/* Header and Subheader Text */
h1 {
background: linear-gradient(to right, #ff7f50, #ffd700); /* Orange to yellow gradient */
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
font-family: 'Arial', sans-serif !important;
font-weight: bold !important;
text-align: center;
}
h2, h3, h4, h5, h6 {
background: linear-gradient(to right, #ff7f50, #ffd700); /* Orange to yellow gradient */
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
font-family: 'Arial', sans-serif !important;
font-weight: bold !important;
}
/* Paragraph Text */
p {
color: #f0f0f0 !important; /* Light gray for readability */
font-family: 'Roboto', sans-serif !important;
line-height: 1.6;
font-size: 1.1rem;
}
/* List Styling */
ul li {
color: #f0f0f0;
font-family: 'Roboto', sans-serif;
font-size: 1.1rem;
margin-bottom: 0.5rem;
}
</style>
""",
unsafe_allow_html=True
)
# Page header for vectorization section
st.title('Text Vectorization Techniques')
# Introduction to vectorization
st.subheader('Introduction to Text Vectorization')
st.write("""
Text vectorization is the process of converting text data into a numerical format that can be understood by machine learning models. Various techniques are used for this, each with its own strengths and ideal use cases. Below are the most common vectorization methods available in this app:
""")
# Section for One-Hot Encoding
st.subheader('1. One-Hot Encoding (OHE)')
st.write("""
**Description**:
One-Hot Encoding is a simple technique where each word is represented by a binary vector. The vector is 1 at the index corresponding to the word's position in the vocabulary and 0 elsewhere.
**Use Case**:
Useful for representing categorical variables or small vocabularies, but often not efficient for larger datasets due to sparse representations.
""")
# Section for Bag of Words (BoW)
st.subheader('2. Bag of Words (BoW)')
st.write("""
**Description**:
In this method, the text is represented as a collection of words, ignoring grammar and word order, but keeping the word frequency. It creates a matrix where rows represent text documents and columns represent individual words in the corpus.
**Use Case**:
Great for text classification tasks but can result in high-dimensional data with large vocabularies.
""")
# Section for TF-IDF
st.subheader('3. Term Frequency-Inverse Document Frequency (TF-IDF)')
st.write("""
**Description**:
TF-IDF helps to adjust the frequency of words based on their importance across the entire corpus. Words that appear frequently in a document but rarely across others are given higher weight.
**Use Case**:
Ideal for identifying significant words and is commonly used for text classification and information retrieval tasks.
""")
# Section for Word2Vec
st.subheader('4. Word2Vec (Word Embeddings)')
st.write("""
**Description**:
Word2Vec represents words as dense vectors in a continuous vector space, where words with similar meanings have similar representations. The technique uses either the Skip-Gram or Continuous Bag of Words (CBOW) approach.
**Use Case**:
Better suited for capturing the semantic meaning of words, especially useful for deep learning models in natural language processing (NLP).
""")
# Section for GloVe
st.subheader('5. GloVe (Global Vectors for Word Representation)')
st.write("""
**Description**:
GloVe is another word embedding technique that learns word representations by looking at word co-occurrence statistics. Unlike Word2Vec, which uses a shallow neural network, GloVe performs matrix factorization on a word co-occurrence matrix.
**Use Case**:
Effective for capturing global relationships between words in large corpora, often used for word similarity and analogy tasks.
""")
# How to use the vectorization methods
st.subheader('How to Use These Vectorization Methods')
st.write("""
To interact with the vectorization techniques, select a method from the sidebar and input a piece of text. You can visualize the results as numerical vectors or see them in other visual formats, depending on the chosen technique.
""")