Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

App Files Files Community

Natural_Language_Processing / pages /3_Terminology.py

Harika22

Update pages/3_Terminology.py

f8cdeaa verified about 1 year ago

raw

history blame

4.07 kB

	import streamlit as st

	st.markdown("""
	<style>
	/* Set a soft background color */
	body {
	background-color: #eef2f7;
	}
	/* Style for main title */
	h1 {
	color: black;
	font-family: 'Roboto', sans-serif;
	font-weight: 700;
	text-align: center;
	margin-bottom: 25px;
	}
	/* Style for headers */
	h2 {
	color: red;
	font-family: 'Roboto', sans-serif;
	font-weight: 600;
	margin-top: 30px;
	}

	/* Style for subheaders */
	h3 {
	color: violet;
	font-family: 'Roboto', sans-serif;
	font-weight: 500;
	margin-top: 20px;
	}
	.custom-subheader {
	color: violet;
	font-family: 'Roboto', sans-serif;
	font-weight: 600;
	margin-bottom: 15px;
	}
	/* Paragraph styling */
	p {
	font-family: 'Georgia', serif;
	line-height: 1.8;
	color: black;
	margin-bottom: 20px;
	}
	/* List styling with checkmark bullets */
	.icon-bullet {
	list-style-type: none;
	padding-left: 20px;
	}
	.icon-bullet li {
	font-family: 'Georgia', serif;
	font-size: 1.1em;
	margin-bottom: 10px;
	color: black;
	}
	.icon-bullet li::before {
	content: "◆";
	padding-right: 10px;
	color: black;
	}
	/* Sidebar styling */
	.sidebar .sidebar-content {
	background-color: #ffffff;
	border-radius: 10px;
	padding: 15px;
	}
	.sidebar h2 {
	color: #495057;
	}
	/* Custom button style */
	.streamlit-button {
	background-color: #00FFFF;
	color: #000000;
	font-weight: bold;
	}
	</style>
	""", unsafe_allow_html=True)


	st.markdown("<h1 class='title'>📖 NLP Terminology</h1>", unsafe_allow_html=True)
	st.markdown("<p class='caption'>✨ Explore essential terms in Natural Language Processing and their meanings!...</p>", unsafe_allow_html=True)

	st.header("📝 Corpus")
	st.markdown("- A corpus is a collection of documents.")

	st.header("📄 Document")
	st.markdown("- A document is a collection of sentences, paragraphs, single words, or even single characters.")

	st.header("📝 Paragraph")
	st.markdown("- A paragraph consists of multiple sentences.")

	st.header("📢 Sentence")
	st.markdown("- A sentence is a collection of words.")

	st.header("🔤 Word")
	st.markdown("- Words are made up of characters.")

	st.header("🔠 Character")
	st.markdown("- A character can be a number, alphabet, or special symbol.")

	st.header("✂️ Tokenization")
	st.markdown("- Tokenization is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens.")

	st.subheader("🛠️ Types of Tokenization")
	st.markdown("""
	- 🔹 Sentence Tokenization – Splits text into sentences.
	- 🔹 Word Tokenization – Splits sentences into words.
	- 🔹 Character Tokenization – Splits words into individual characters.
	""")

	st.subheader("📝 Sentence Tokenization")
	st.markdown("- Breaks a large text into meaningful sentence units.")

	st.subheader("📖 Word Tokenization")
	st.markdown("- Splits a sentence into individual words.")

	st.subheader("🔡 Character Tokenization")
	st.markdown("- Breaks words into separate characters.")

	st.header("🚫 Stop Words")
	st.markdown("- Common words (e.g., 'the', 'is', 'and') that do not add meaning to the text but maintain grammatical structure.")

	st.header("📊 Vectorization")
	st.markdown("- Transforms text into numerical representation for machine learning models.")

	st.subheader("🔢 Different Types of Vectorization Techniques")
	st.markdown("""
	- 🎯 One-Hot Encoding
	- 🏷️ Bag of Words (BoW)
	- 📊 TF-IDF (Term Frequency-Inverse Document Frequency)
	- 🧠 Word2Vec
	- 🌍 GloVe
	- ⚡ FastText
	""")

	st.success("🚀 Mastering these NLP terminologies will help you build powerful text-processing applications!")