import streamlit as st
st.markdown("""
""", unsafe_allow_html=True)
st.header("Vectorization🧭")
st.markdown(
"""
Vectorization is the process of converting text into vector.
This allows ML models to process text data effectively.
""",
unsafe_allow_html=True
)
st.markdown("""
There are advance vectorization techniques.They are :
- Word Embedding
- Word2Vec
- Fasttext
""", unsafe_allow_html=True)
st.sidebar.title("Navigation 🧭")
file_type = st.sidebar.radio(
"Choose a Vectorization technique :",
("Word2Vec", "Fasttext"))
st.header("Word Embedding Technique")
st.markdown('''
- It is a advanced vectorization technique it converts text into vectors in such a way that it preserves semantic meaning
- All the techniques which preserves semantic meaning while converting text into vector is word embedding technique
- There are 2 word embedding techniques:
- Word2Vec
- Fasttext
''')
if file_type == "Word2Vec":
st.title(":red[Word2Vec]")
st.markdown(
"""
📌 How Word2Vec Works?
- After training, we obtain the final Word2Vec model
- The model stores a dictionary with word-vector pairs:
{ w1: [v1], w2: [v2], w3: [v3] }
""",
unsafe_allow_html=True,
)
st.markdown(
"""
⚙️ Training vs. Test Time
- Training Time: Corpus + Deep Learning Algorithm → Generates Model
- Test Time: Word → Looked up in Dictionary → Returns Vector Representation
""",
unsafe_allow_html=True,
)
st.markdown(
"""
🔍 How Does It Preserve Meaning?
- It learns from the context of words in the corpus
- When given a word, it checks in the dictionary and retrieves the semantic vector
- Unlike other models, dimensions are not words, but their meanings
""",
unsafe_allow_html=True,
)
st.markdown(
"""
📚 Why is Corpus Important?
- The Word2Vec algorithm is completely dependent on the corpus
- Better corpus → Better word representation
- It preserves semantic meaning using neighborhood words (context)
""",
unsafe_allow_html=True,
)
st.markdown('''
-
''')