import streamlit as st st.markdown(""" """, unsafe_allow_html=True) st.header("Vectorization🧭") st.markdown( """

Vectorization is the process of converting text into vector.

This allows ML models to process text data effectively.

""", unsafe_allow_html=True ) st.markdown(""" There are advance vectorization techniques.They are :

Word Embedding
Word2Vec
Fasttext

""", unsafe_allow_html=True) st.sidebar.title("Navigation 🧭") file_type = st.sidebar.radio( "Choose a Vectorization technique :", ("Word2Vec", "Fasttext")) st.header("Word Embedding Technique") st.markdown(''' - It is a advanced vectorization technique it converts text into vectors in such a way that it preserves semantic meaning - All the techniques which preserves semantic meaning while converting text into vector is word embedding technique - There are 2 word embedding techniques: - Word2Vec - Fasttext ''') if file_type == "Word2Vec": st.title(":red[Word2Vec]") st.markdown( """

📌 How Word2Vec Works?

After training, we obtain the final Word2Vec model
The model stores a dictionary with word-vector pairs:

        { w1: [v1], w2: [v2], w3: [v3] }

""", unsafe_allow_html=True, ) st.markdown( """

⚙️ Training vs. Test Time

Training Time: Corpus + Deep Learning Algorithm → Generates Model
Test Time: Word → Looked up in Dictionary → Returns Vector Representation

""", unsafe_allow_html=True, ) st.markdown( """

🔍 How Does It Preserve Meaning?

It learns from the context of words in the corpus
When given a word, it checks in the dictionary and retrieves the semantic vector
Unlike other models, dimensions are not words, but their meanings

""", unsafe_allow_html=True, ) st.markdown( """

📚 Why is Corpus Important?

The Word2Vec algorithm is completely dependent on the corpus
Better corpus → Better word representation
It preserves semantic meaning using neighborhood words (context)

""", unsafe_allow_html=True, ) st.markdown(''' - ''')