import streamlit as st # Apply custom CSS styling st.markdown(""" """, unsafe_allow_html=True) # Page Configuration st.title("Interactive NLP Guide") # Sidebar Navigation st.sidebar.title("Explore NLP Topics") topics = [ "Introduction", "Tokenization", "One-Hot Vectorization", "Bag of Words", "TF-IDF Vectorizer", "Word Embeddings", ] selected_topic = st.sidebar.radio("Select a topic", topics) # Content Based on Selection if selected_topic == "Introduction": st.markdown("

Natural Language Processing (NLP)

", unsafe_allow_html=True) st.markdown("

Introduction to NLP

", unsafe_allow_html=True) st.markdown("""

Natural Language Processing (NLP) is a field at the intersection of linguistics and computer science, focusing on enabling computers to understand, interpret, and respond to human language.

Applications of NLP:

""", unsafe_allow_html=True) elif selected_topic == "Tokenization": st.markdown("

Tokenization

", unsafe_allow_html=True) st.markdown("

What is Tokenization?

", unsafe_allow_html=True) st.markdown("""

Tokenization is the process of breaking down a text into smaller units, such as sentences or words, called tokens. It is the first step in any NLP pipeline.

Types of Tokenization:

Code Example:

""", unsafe_allow_html=True) st.code(""" from nltk.tokenize import word_tokenize, sent_tokenize text = "Natural Language Processing is exciting. Let's explore it!" word_tokens = word_tokenize(text) sentence_tokens = sent_tokenize(text) print("Word Tokens:", word_tokens) print("Sentence Tokens:", sentence_tokens) """, language="python") elif selected_topic == "One-Hot Vectorization": st.markdown("

One-Hot Vectorization

", unsafe_allow_html=True) st.markdown("""

One-Hot Vectorization is a method to represent text where each unique word is converted into a unique binary vector.

How It Works:

Example:

Limitations:

""", unsafe_allow_html=True) elif selected_topic == "Bag of Words": st.markdown("

Bag of Words (BoW)

", unsafe_allow_html=True) st.markdown("""

Bag of Words represents text as word frequency counts, disregarding word order.

How It Works:

Example:

""", unsafe_allow_html=True) elif selected_topic == "TF-IDF Vectorizer": st.markdown("

TF-IDF Vectorizer

", unsafe_allow_html=True) st.markdown("""

TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure that evaluates the importance of a word in a document relative to a collection of documents (corpus).

Formula:

""", unsafe_allow_html=True) st.latex(r''' \text{TF-IDF} = \text{TF} \times \text{IDF} ''') st.markdown(""" """, unsafe_allow_html=True) elif selected_topic == "Word Embeddings": st.markdown("

Word Embeddings

", unsafe_allow_html=True) st.markdown("""

Word Embeddings are dense vector representations of words that capture semantic meanings and relationships.

Key Features:

Popular Word Embedding Models:

""", unsafe_allow_html=True) # Footer st.sidebar.markdown("---") st.sidebar.markdown("Explore each topic to dive deeper into NLP concepts!")