import streamlit as st st.markdown(""" """, unsafe_allow_html=True) st.markdown("
✨ Explore essential terms in Natural Language Processing and their meanings!...
", unsafe_allow_html=True) st.header("📝 Corpus") st.markdown("- **A corpus** is a collection of documents.") st.header("📄 Document") st.markdown("- **A document** is a collection of sentences, paragraphs, single words, or even single characters.") st.header("📝 Paragraph") st.markdown("- **A paragraph** consists of multiple sentences.") st.header("📢 Sentence") st.markdown("- **A sentence** is a collection of words.") st.header("🔤 Word") st.markdown("- **Words** are made up of characters.") st.header("🔠 Character") st.markdown("- **A character** can be a number, alphabet, or special symbol.") st.header("✂️ Tokenization") st.markdown("- **Tokenization** is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens.") st.subheader("🛠️ Types of Tokenization") st.markdown(""" - 🔹 **Sentence Tokenization** – Splits text into sentences. - 🔹 **Word Tokenization** – Splits sentences into words. - 🔹 **Character Tokenization** – Splits words into individual characters. """) st.subheader("📝 Sentence Tokenization") st.markdown("- **Breaks a large text into meaningful sentence units.**") st.subheader("📖 Word Tokenization") st.markdown("- **Splits a sentence into individual words.**") st.subheader("🔡 Character Tokenization") st.markdown("- **Breaks words into separate characters.**") st.header("🚫 Stop Words") st.markdown("- **Common words** (e.g., 'the', 'is', 'and') that do not add meaning to the text but maintain grammatical structure.") st.header("📊 Vectorization") st.markdown("- **Transforms text into numerical representation** for machine learning models.") st.subheader("🔢 Different Types of Vectorization Techniques") st.markdown(""" - 🎯 **One-Hot Encoding** - 🏷️ **Bag of Words (BoW)** - 📊 **TF-IDF (Term Frequency-Inverse Document Frequency)** - 🧠 **Word2Vec** - 🌍 **GloVe** - ⚡ **FastText** """) st.success("🚀 Mastering these **NLP terminologies** will help you build powerful text-processing applications!")