Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| # Apply custom CSS styling | |
| st.markdown(""" | |
| <style> | |
| body { | |
| background-color: #eef2f7; | |
| } | |
| h1 { | |
| color: #00FFFF; | |
| font-family: 'Roboto', sans-serif; | |
| font-weight: 700; | |
| text-align: center; | |
| margin-bottom: 25px; | |
| } | |
| h2, h3 { | |
| font-family: 'Roboto', sans-serif; | |
| font-weight: 600; | |
| } | |
| h2 { | |
| color: #FFFACD; | |
| } | |
| h3 { | |
| color: #ba95b0; | |
| } | |
| p, ul, ol { | |
| font-family: 'Georgia', serif; | |
| line-height: 1.8; | |
| color: #495057; | |
| } | |
| ul { | |
| margin-left: 20px; | |
| } | |
| .icon-bullet { | |
| list-style-type: none; | |
| padding-left: 20px; | |
| } | |
| .icon-bullet li { | |
| font-family: 'Georgia', serif; | |
| font-size: 1.1em; | |
| margin-bottom: 10px; | |
| color: #495057; | |
| } | |
| .icon-bullet li::before { | |
| content: "βοΈ"; | |
| padding-right: 10px; | |
| color: #00FFFF; | |
| } | |
| </style> | |
| """, unsafe_allow_html=True) | |
| # Page Configuration | |
| st.title("Interactive NLP Guide") | |
| # Sidebar Navigation | |
| st.sidebar.title("Explore NLP Topics") | |
| topics = [ | |
| "Introduction", | |
| "Tokenization", | |
| "One-Hot Vectorization", | |
| "Bag of Words", | |
| "TF-IDF Vectorizer", | |
| "Word Embeddings", | |
| ] | |
| selected_topic = st.sidebar.radio("Select a topic", topics) | |
| # Content Based on Selection | |
| if selected_topic == "Introduction": | |
| st.markdown("<h1>Natural Language Processing (NLP)</h1>", unsafe_allow_html=True) | |
| st.markdown("<h2>Introduction to NLP</h2>", unsafe_allow_html=True) | |
| st.markdown(""" | |
| <p>Natural Language Processing (NLP) is a field at the intersection of linguistics and computer science, focusing on enabling computers to understand, interpret, and respond to human language.</p> | |
| <h3>Applications of NLP:</h3> | |
| <ul> | |
| <li>Chatbots and Virtual Assistants (e.g., Alexa, Siri)</li> | |
| <li>Machine Translation (e.g., Google Translate)</li> | |
| <li>Text Summarization</li> | |
| <li>Sentiment Analysis</li> | |
| <li>Speech Recognition Systems</li> | |
| </ul> | |
| """, unsafe_allow_html=True) | |
| elif selected_topic == "Tokenization": | |
| st.markdown("<h1>Tokenization</h1>", unsafe_allow_html=True) | |
| st.markdown("<h2>What is Tokenization?</h2>", unsafe_allow_html=True) | |
| st.markdown(""" | |
| <p>Tokenization is the process of breaking down a text into smaller units, such as sentences or words, called tokens. It is the first step in any NLP pipeline.</p> | |
| <h3>Types of Tokenization:</h3> | |
| <ul> | |
| <li><b>Word Tokenization:</b> Splits text into words (e.g., "I love NLP." β ["I", "love", "NLP"])</li> | |
| <li><b>Sentence Tokenization:</b> Splits text into sentences (e.g., "NLP is fascinating. It's the future." β ["NLP is fascinating.", "It's the future."])</li> | |
| </ul> | |
| <h3>Code Example:</h3> | |
| """, unsafe_allow_html=True) | |
| st.code(""" | |
| from nltk.tokenize import word_tokenize, sent_tokenize | |
| text = "Natural Language Processing is exciting. Let's explore it!" | |
| word_tokens = word_tokenize(text) | |
| sentence_tokens = sent_tokenize(text) | |
| print("Word Tokens:", word_tokens) | |
| print("Sentence Tokens:", sentence_tokens) | |
| """, language="python") | |
| elif selected_topic == "One-Hot Vectorization": | |
| st.markdown("<h1>One-Hot Vectorization</h1>", unsafe_allow_html=True) | |
| st.markdown(""" | |
| <p>One-Hot Vectorization is a method to represent text where each unique word is converted into a unique binary vector.</p> | |
| <h3>How It Works:</h3> | |
| <ul> | |
| <li>Each word in the vocabulary is assigned an index.</li> | |
| <li>The vector is all zeros except for a <code>1</code> at the word's index.</li> | |
| </ul> | |
| <h3>Example:</h3> | |
| <ul> | |
| <li>Vocabulary: ["cat", "dog", "bird"]</li> | |
| <li>"cat" β [1, 0, 0]</li> | |
| <li>"dog" β [0, 1, 0]</li> | |
| </ul> | |
| <h3>Limitations:</h3> | |
| <ul> | |
| <li>High dimensionality for large vocabularies.</li> | |
| <li>Does not capture semantic relationships between words.</li> | |
| </ul> | |
| """, unsafe_allow_html=True) | |
| elif selected_topic == "Bag of Words": | |
| st.markdown("<h1>Bag of Words (BoW)</h1>", unsafe_allow_html=True) | |
| st.markdown(""" | |
| <p>Bag of Words represents text as word frequency counts, disregarding word order.</p> | |
| <h3>How It Works:</h3> | |
| <ul> | |
| <li>Create a vocabulary of unique words.</li> | |
| <li>Count the frequency of each word in a document.</li> | |
| </ul> | |
| <h3>Example:</h3> | |
| <ul> | |
| <li>Given Sentences: | |
| <ul> | |
| <li>"I love NLP."</li> | |
| <li>"I love programming."</li> | |
| </ul> | |
| </li> | |
| <li>Vocabulary: ["I", "love", "NLP", "programming"]</li> | |
| <li>Sentence 1: [1, 1, 1, 0]</li> | |
| <li>Sentence 2: [1, 1, 0, 1]</li> | |
| </ul> | |
| """, unsafe_allow_html=True) | |
| elif selected_topic == "TF-IDF Vectorizer": | |
| st.markdown("<h1>TF-IDF Vectorizer</h1>", unsafe_allow_html=True) | |
| st.markdown(""" | |
| <p>TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure that evaluates the importance of a word in a document relative to a collection of documents (corpus).</p> | |
| <h3>Formula:</h3> | |
| """, unsafe_allow_html=True) | |
| st.latex(r''' | |
| \text{TF-IDF} = \text{TF} \times \text{IDF} | |
| ''') | |
| st.markdown(""" | |
| <ul> | |
| <li><b>Term Frequency (TF):</b> Frequency of a word in a document.</li> | |
| <li><b>Inverse Document Frequency (IDF):</b> Logarithm of the ratio of the total number of documents to the number of documents containing the word.</li> | |
| </ul> | |
| """, unsafe_allow_html=True) | |
| elif selected_topic == "Word Embeddings": | |
| st.markdown("<h1>Word Embeddings</h1>", unsafe_allow_html=True) | |
| st.markdown(""" | |
| <p>Word Embeddings are dense vector representations of words that capture semantic meanings and relationships.</p> | |
| <h3>Key Features:</h3> | |
| <ul> | |
| <li>Captures semantic relationships between words (e.g., "king" - "man" + "woman" = "queen").</li> | |
| <li>Efficient representation for large vocabularies.</li> | |
| </ul> | |
| <h3>Popular Word Embedding Models:</h3> | |
| <ul> | |
| <li>Word2Vec</li> | |
| <li>GloVe</li> | |
| <li>FastText</li> | |
| </ul> | |
| """, unsafe_allow_html=True) | |
| # Footer | |
| st.sidebar.markdown("---") | |
| st.sidebar.markdown("Explore each topic to dive deeper into NLP concepts!") | |