Spaces:
Sleeping
Sleeping
File size: 6,337 Bytes
2f145ad b44a0c1 2f145ad b44a0c1 2f145ad b44a0c1 2f145ad b44a0c1 2f145ad b44a0c1 2f145ad b44a0c1 2f145ad b44a0c1 67db7cf 2f145ad b44a0c1 2f145ad b44a0c1 2f145ad b44a0c1 2f145ad b44a0c1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | import streamlit as st
# Apply custom CSS styling
st.markdown("""
<style>
body {
background-color: #eef2f7;
}
h1 {
color: #00FFFF;
font-family: 'Roboto', sans-serif;
font-weight: 700;
text-align: center;
margin-bottom: 25px;
}
h2, h3 {
font-family: 'Roboto', sans-serif;
font-weight: 600;
}
h2 {
color: #FFFACD;
}
h3 {
color: #ba95b0;
}
p, ul, ol {
font-family: 'Georgia', serif;
line-height: 1.8;
color: #495057;
}
ul {
margin-left: 20px;
}
.icon-bullet {
list-style-type: none;
padding-left: 20px;
}
.icon-bullet li {
font-family: 'Georgia', serif;
font-size: 1.1em;
margin-bottom: 10px;
color: #495057;
}
.icon-bullet li::before {
content: "✔️";
padding-right: 10px;
color: #00FFFF;
}
</style>
""", unsafe_allow_html=True)
# Page Configuration
st.title("Interactive NLP Guide")
# Sidebar Navigation
st.sidebar.title("Explore NLP Topics")
topics = [
"Introduction",
"Tokenization",
"One-Hot Vectorization",
"Bag of Words",
"TF-IDF Vectorizer",
"Word Embeddings",
]
selected_topic = st.sidebar.radio("Select a topic", topics)
# Content Based on Selection
if selected_topic == "Introduction":
st.markdown("<h1>Natural Language Processing (NLP)</h1>", unsafe_allow_html=True)
st.markdown("<h2>Introduction to NLP</h2>", unsafe_allow_html=True)
st.markdown("""
<p>Natural Language Processing (NLP) is a field at the intersection of linguistics and computer science, focusing on enabling computers to understand, interpret, and respond to human language.</p>
<h3>Applications of NLP:</h3>
<ul>
<li>Chatbots and Virtual Assistants (e.g., Alexa, Siri)</li>
<li>Machine Translation (e.g., Google Translate)</li>
<li>Text Summarization</li>
<li>Sentiment Analysis</li>
<li>Speech Recognition Systems</li>
</ul>
""", unsafe_allow_html=True)
elif selected_topic == "Tokenization":
st.markdown("<h1>Tokenization</h1>", unsafe_allow_html=True)
st.markdown("<h2>What is Tokenization?</h2>", unsafe_allow_html=True)
st.markdown("""
<p>Tokenization is the process of breaking down a text into smaller units, such as sentences or words, called tokens. It is the first step in any NLP pipeline.</p>
<h3>Types of Tokenization:</h3>
<ul>
<li><b>Word Tokenization:</b> Splits text into words (e.g., "I love NLP." → ["I", "love", "NLP"])</li>
<li><b>Sentence Tokenization:</b> Splits text into sentences (e.g., "NLP is fascinating. It's the future." → ["NLP is fascinating.", "It's the future."])</li>
</ul>
<h3>Code Example:</h3>
""", unsafe_allow_html=True)
st.code("""
from nltk.tokenize import word_tokenize, sent_tokenize
text = "Natural Language Processing is exciting. Let's explore it!"
word_tokens = word_tokenize(text)
sentence_tokens = sent_tokenize(text)
print("Word Tokens:", word_tokens)
print("Sentence Tokens:", sentence_tokens)
""", language="python")
elif selected_topic == "One-Hot Vectorization":
st.markdown("<h1>One-Hot Vectorization</h1>", unsafe_allow_html=True)
st.markdown("""
<p>One-Hot Vectorization is a method to represent text where each unique word is converted into a unique binary vector.</p>
<h3>How It Works:</h3>
<ul>
<li>Each word in the vocabulary is assigned an index.</li>
<li>The vector is all zeros except for a <code>1</code> at the word's index.</li>
</ul>
<h3>Example:</h3>
<ul>
<li>Vocabulary: ["cat", "dog", "bird"]</li>
<li>"cat" → [1, 0, 0]</li>
<li>"dog" → [0, 1, 0]</li>
</ul>
<h3>Limitations:</h3>
<ul>
<li>High dimensionality for large vocabularies.</li>
<li>Does not capture semantic relationships between words.</li>
</ul>
""", unsafe_allow_html=True)
elif selected_topic == "Bag of Words":
st.markdown("<h1>Bag of Words (BoW)</h1>", unsafe_allow_html=True)
st.markdown("""
<p>Bag of Words represents text as word frequency counts, disregarding word order.</p>
<h3>How It Works:</h3>
<ul>
<li>Create a vocabulary of unique words.</li>
<li>Count the frequency of each word in a document.</li>
</ul>
<h3>Example:</h3>
<ul>
<li>Given Sentences:
<ul>
<li>"I love NLP."</li>
<li>"I love programming."</li>
</ul>
</li>
<li>Vocabulary: ["I", "love", "NLP", "programming"]</li>
<li>Sentence 1: [1, 1, 1, 0]</li>
<li>Sentence 2: [1, 1, 0, 1]</li>
</ul>
""", unsafe_allow_html=True)
elif selected_topic == "TF-IDF Vectorizer":
st.markdown("<h1>TF-IDF Vectorizer</h1>", unsafe_allow_html=True)
st.markdown("""
<p>TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure that evaluates the importance of a word in a document relative to a collection of documents (corpus).</p>
<h3>Formula:</h3>
""", unsafe_allow_html=True)
st.latex(r'''
\text{TF-IDF} = \text{TF} \times \text{IDF}
''')
st.markdown("""
<ul>
<li><b>Term Frequency (TF):</b> Frequency of a word in a document.</li>
<li><b>Inverse Document Frequency (IDF):</b> Logarithm of the ratio of the total number of documents to the number of documents containing the word.</li>
</ul>
""", unsafe_allow_html=True)
elif selected_topic == "Word Embeddings":
st.markdown("<h1>Word Embeddings</h1>", unsafe_allow_html=True)
st.markdown("""
<p>Word Embeddings are dense vector representations of words that capture semantic meanings and relationships.</p>
<h3>Key Features:</h3>
<ul>
<li>Captures semantic relationships between words (e.g., "king" - "man" + "woman" = "queen").</li>
<li>Efficient representation for large vocabularies.</li>
</ul>
<h3>Popular Word Embedding Models:</h3>
<ul>
<li>Word2Vec</li>
<li>GloVe</li>
<li>FastText</li>
</ul>
""", unsafe_allow_html=True)
# Footer
st.sidebar.markdown("---")
st.sidebar.markdown("Explore each topic to dive deeper into NLP concepts!")
|