Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| st.markdown( | |
| """ | |
| <style> | |
| /* App Background */ | |
| .stApp { | |
| background: linear-gradient(to right, #1e3c72, #2a5298); /* Subtle gradient with cool tones */ | |
| color: #f0f0f0; | |
| padding: 20px; | |
| } | |
| /* Align content to the left */ | |
| .block-container { | |
| text-align: left; | |
| padding: 2rem; | |
| } | |
| /* Header and Subheader Text */ | |
| h1 { | |
| background: linear-gradient(to right, #ff7f50, #ffd700); /* Orange to yellow gradient */ | |
| -webkit-background-clip: text; | |
| -webkit-text-fill-color: transparent; | |
| font-family: 'Arial', sans-serif !important; | |
| font-weight: bold !important; | |
| text-align: center; | |
| } | |
| h2, h3, h4, h5, h6 { | |
| background: linear-gradient(to right, #ff7f50, #ffd700); /* Orange to yellow gradient */ | |
| -webkit-background-clip: text; | |
| -webkit-text-fill-color: transparent; | |
| font-family: 'Arial', sans-serif !important; | |
| font-weight: bold !important; | |
| } | |
| /* Paragraph Text */ | |
| p { | |
| color: #f0f0f0 !important; /* Light gray for readability */ | |
| font-family: 'Roboto', sans-serif !important; | |
| line-height: 1.6; | |
| font-size: 1.1rem; | |
| } | |
| /* List Styling */ | |
| ul li { | |
| color: #f0f0f0; | |
| font-family: 'Roboto', sans-serif; | |
| font-size: 1.1rem; | |
| margin-bottom: 0.5rem; | |
| } | |
| </style> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| # Page Content | |
| st.markdown( | |
| """ | |
| <h1>Basic Terminology in NLP</h1> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5>Before diving deep into the concepts of NLP, we must know about the frequently used terminologies in NLP.</h5> | |
| <h5>1. Key Terminologies in NLP</h5> | |
| <ul> | |
| <li><b>Corpus:</b> A collection of text documents. Example: {d1, d2, d3, ...}</li> | |
| <li><b>Document:</b> A single unit of text (e.g., a sentence, paragraph, or article).</li> | |
| <li><b>Paragraph:</b> A collection of sentences.</li> | |
| <li><b>Sentence:</b> A collection of words forming a meaningful expression.</li> | |
| <li><b>Word:</b> A collection of characters.</li> | |
| <li><b>Character:</b> A basic unit like an alphabet, number, or special symbol.</li> | |
| </ul> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5>2. Tokenization</h5> | |
| <p>Tokenization is the process of splitting text into smaller units, called tokens.</p> | |
| <h6>Types of Tokenization:</h6> | |
| <ul> | |
| <li><b>Sentence Tokenization:</b> Splitting text into sentences. <br> Example: "I love biryani. I love pizza." β ["I love biryani", "I love pizza"]</li> | |
| <li><b>Word Tokenization:</b> Splitting sentences into words. <br> Example: "I love NLP" β ["I", "love", "NLP"]</li> | |
| <li><b>Character Tokenization:</b> Splitting words into characters. <br> Example: "Cat" β ["C", "a", "t"]</li> | |
| </ul> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5>3. Stop Words</h5> | |
| <p>Stop words are commonly used words in a language that are ignored during text processing as they contribute little to the overall meaning.</p> | |
| <h6>Example:</h6> | |
| <p>"In Hyderabad, we can eat famous biryani." <br> Stop words: ["in", "we", "can"]</p> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5>4. Vectorization</h5> | |
| <p>Vectorization converts text data into numerical format for machine learning models. It enables text processing and analysis.</p> | |
| <h6>Types of Vectorization:</h6> | |
| <ul> | |
| <li><b>One-Hot Encoding:</b> Represents each word as a binary vector.</li> | |
| <li><b>Bag of Words (BoW):</b> Represents text based on word frequencies.</li> | |
| <li><b>TF-IDF:</b> Adjusts word frequency by importance.</li> | |
| <li><b>Word2Vec:</b> Embeds words in a vector space using deep learning.</li> | |
| <li><b>GloVe:</b> Uses global co-occurrence statistics for embedding.</li> | |
| <li><b>FastText:</b> Similar to Word2Vec but includes subword information.</li> | |
| </ul> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5>5. Stemming</h5> | |
| <p>Stemming reduces words to their base or root form by chopping off prefixes or suffixes. It is a rule-based heuristic process and can produce words that may not be valid in the language.</p> | |
| <h6>Example:</h6> | |
| <ul> | |
| <li><b>Original Words:</b> "running", "runner", "runs"</li> | |
| <li><b>Stemmed Form:</b> "run"</li> | |
| </ul> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |
| st.markdown( | |
| """ | |
| <h5>6. Lemmatization</h5> | |
| <p>Lemmatization reduces words to their dictionary or base form, called a lemma, while considering the context of the word in a sentence.</p> | |
| <h6>Example:</h6> | |
| <ul> | |
| <li><b>Original Words:</b> "running", "better", "went"</li> | |
| <li><b>Lemmatized Form:</b> "run", "good", "go"</li> | |
| </ul> | |
| <p>Lemmatization is more accurate than stemming but computationally more intensive as it requires a language dictionary.</p> | |
| """, | |
| unsafe_allow_html=True | |
| ) | |