Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

File size: 4,073 Bytes

import streamlit as st

st.markdown("""
    <style>
    /* Set a soft background color */
    body {
        background-color: #eef2f7;
    }
    /* Style for main title */
    h1 {
        color: black;
        font-family: 'Roboto', sans-serif;
        font-weight: 700;
        text-align: center;
        margin-bottom: 25px;
    }
    /* Style for headers */
    h2 {
        color: red;
        font-family: 'Roboto', sans-serif;
        font-weight: 600;
        margin-top: 30px;
    }
    
    /* Style for subheaders */
     h3 {
        color: violet;
        font-family: 'Roboto', sans-serif;
        font-weight: 500;
        margin-top: 20px;
    }
    .custom-subheader {
        color: violet;
        font-family: 'Roboto', sans-serif;
        font-weight: 600;
        margin-bottom: 15px;
    }
    /* Paragraph styling */
    p {
        font-family: 'Georgia', serif;
        line-height: 1.8;
        color: black;
        margin-bottom: 20px;
    }
    /* List styling with checkmark bullets */
    .icon-bullet {
        list-style-type: none;
        padding-left: 20px;
    }
    .icon-bullet li {
        font-family: 'Georgia', serif;
        font-size: 1.1em;
        margin-bottom: 10px;
        color: black;
    }
    .icon-bullet li::before {
        content: "◆";
        padding-right: 10px;
        color: black;
    }
    /* Sidebar styling */
    .sidebar .sidebar-content {
        background-color: #ffffff;
        border-radius: 10px;
        padding: 15px;
    }
    .sidebar h2 {
        color: #495057;
    }
    /* Custom button style */
    .streamlit-button {
        background-color: #00FFFF;
        color: #000000;
        font-weight: bold;
    }
    </style>
    """, unsafe_allow_html=True)


st.markdown("<h1 class='title'>📖 NLP Terminology</h1>", unsafe_allow_html=True)
st.markdown("<p class='caption'>✨ Explore essential terms in Natural Language Processing and their meanings!...</p>", unsafe_allow_html=True)

st.header("📝 Corpus")
st.markdown("- **A corpus** is a collection of documents.")

st.header("📄 Document")
st.markdown("- **A document** is a collection of sentences, paragraphs, single words, or even single characters.")

st.header("📝 Paragraph")
st.markdown("- **A paragraph** consists of multiple sentences.")

st.header("📢 Sentence")
st.markdown("- **A sentence** is a collection of words.")

st.header("🔤 Word")
st.markdown("- **Words** are made up of characters.")

st.header("🔠 Character")
st.markdown("- **A character** can be a number, alphabet, or special symbol.")

st.header("✂️ Tokenization")
st.markdown("- **Tokenization** is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens.")

st.subheader("🛠️ Types of Tokenization")
st.markdown("""
    - 🔹 **Sentence Tokenization** – Splits text into sentences.
    - 🔹 **Word Tokenization** – Splits sentences into words.
    - 🔹 **Character Tokenization** – Splits words into individual characters.
""")

st.subheader("📝 Sentence Tokenization")
st.markdown("- **Breaks a large text into meaningful sentence units.**")

st.subheader("📖 Word Tokenization")
st.markdown("- **Splits a sentence into individual words.**")

st.subheader("🔡 Character Tokenization")
st.markdown("- **Breaks words into separate characters.**")

st.header("🚫 Stop Words")
st.markdown("- **Common words** (e.g., 'the', 'is', 'and') that do not add meaning to the text but maintain grammatical structure.")

st.header("📊 Vectorization")
st.markdown("- **Transforms text into numerical representation** for machine learning models.")

st.subheader("🔢 Different Types of Vectorization Techniques")
st.markdown("""
    - 🎯 **One-Hot Encoding**
    - 🏷️ **Bag of Words (BoW)**
    - 📊 **TF-IDF (Term Frequency-Inverse Document Frequency)**
    - 🧠 **Word2Vec**
    - 🌍 **GloVe**
    - ⚡ **FastText**
""")

st.success("🚀 Mastering these **NLP terminologies** will help you build powerful text-processing applications!")