Spaces:

Harika22
/

Natural_Language_Processing

Sleeping

App Files Files Community

Natural_Language_Processing / pages /3_Terminology.py

Harika22

Update pages/3_Terminology.py

628ffb3 verified 11 months ago

raw

history blame

6.12 kB

	import streamlit as st


	st.markdown(
	"""
	<style>
	body {
	background-color: #f9f9f9; /* Light background */
	font-family: 'Arial', sans-serif;
	}
	@keyframes fadeIn {
	0% { opacity: 0; transform: translateY(-20px); }
	100% { opacity: 1; transform: translateY(0); }
	}
	.title {
	text-align: center;
	color: black
	font-size: 3rem;
	font-weight: bold;
	animation: fadeIn 1.5s ease-in-out;
	}
	.caption {
	text-align: center;
	font-style: italic;
	font-size: 1.2rem;
	color: black
	animation: fadeIn 2s ease-in-out;
	}

	/* Style for headers */
	h2 {
	color: violet;
	font-family: 'Roboto', sans-serif;
	font-weight: 600;
	margin-top: 30px;
	}

	/* Style for subheaders */
	h3 {
	color: green;
	font-family: 'Roboto', sans-serif;
	font-weight: 500;
	margin-top: 20px;
	}
	.custom-subheader {
	color: #00FFFF;
	font-family: 'Roboto', sans-serif;
	font-weight: 600;
	margin-bottom: 15px;
	}

	.icon-bullet {
	list-style-type: none;
	padding-left: 20px;
	}

	.icon-bullet li {
	font-family: 'Georgia', serif;
	font-size: 1.1em;
	margin-bottom: 10px;
	color: black;
	}

	.icon-bullet li::before {
	content: "◆";
	padding-right: 10px;
	color: black;
	}

	.section {
	font-size: 1.1rem;
	text-align: justify;
	line-height: 1.8;
	color: #34495e; /* Muted gray */
	background: #ffffff; /* White background */
	padding: 20px;
	border-radius: 10px;
	box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
	animation: fadeIn 2.5s ease-in-out;
	margin: 15px 0;
	}
	.term {
	font-weight: bold;
	color: red
	animation: fadeIn 3s ease-in-out;
	}
	.definition {
	font-style: italic;
	color: #34495e;
	animation: fadeIn 3.5s ease-in-out;
	}
	</style>
	""",
	unsafe_allow_html=True,
	)

	st.markdown("<h1 class='title'>NLP Terminology</h1>", unsafe_allow_html=True)

	st.markdown(
	"<p class='caption'>Explore essential terms in Natural Language Processing and their meanings!...</p>",
	unsafe_allow_html=True,
	)

	st.markdown(
	"""
	<p class="section"><span class="term">Documents</span><br>
	Document is defined as collection of sentence / paragraph / single word / single character
	</p>
	""",
	unsafe_allow_html=True,
	)

	st.markdown(
	"""
	<p class="section"><span class="term">Paragraph</span><br>
	Paragraph is defined as collection of sentence.
	</p>
	""",
	unsafe_allow_html=True,
	)

	st.markdown(
	"""
	<p class="section"><span class="term">Sentence</span><br>
	Sentence is defined as collection of words.
	</p>
	""",
	unsafe_allow_html=True,
	)

	st.markdown(
	"""
	<p class="section"><span class="term">Words</span><br>
	Words are defined as collection of characters
	</p>
	""",
	unsafe_allow_html=True,
	)

	st.markdown(
	"""
	<p class="section"><span class="term">Character</span><br>
	Character can either be in number , alphabets or special symbol.
	</p>
	""",
	unsafe_allow_html=True,
	)

	st.markdown(
	"""
	<p class="section"><span class="term">Tokenization</span><br>
	It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens.
	</p>
	""",
	unsafe_allow_html=True,
	)
	st.header("Types of Tokenization")
	st.markdown("""
	<ul class="icon-bullet">
	<li>Sentence tokenization</li>
	<li>Word tokennization</li>
	<li>Character tokenization </li>
	</ul>
	""", unsafe_allow_html=True)

	st.subheader("Sentence tokenization")
	st.markdown('''
	- It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are in sentence.
	''')

	st.subheader("Word tokenization")
	st.markdown('''
	- It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are words.
	''')

	st.subheader("Character tokenization")
	st.markdown('''
	- It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are in characters.
	''')


	st.markdown(
	"""
	<p class="section"><span class="term">Bag-of-Words (BoW)</span><br>
	Bag-of-Words is a simple representation of text data where each word is treated as a feature. The order of words is ignored, and the text is represented by a frequency count of words in the document.
	</p>
	""",
	unsafe_allow_html=True,
	)

	st.markdown(
	"""
	<p class="section"><span class="term">TF-IDF (Term Frequency - Inverse Document Frequency)</span><br>
	TF-IDF is a statistic used to evaluate the importance of a word in a document relative to all other documents. It balances the frequency of a word in a document with its rarity across the entire dataset.
	</p>
	""",
	unsafe_allow_html=True,
	)

	st.markdown(
	"""
	<p class="section"><span class="term">Sentiment Analysis</span><br>
	Sentiment Analysis is the task of determining the sentiment or opinion expressed in text. It is often used to analyze social media posts, customer feedback, and reviews to gauge public opinion.
	</p>
	""",
	unsafe_allow_html=True,
	)

	st.markdown(
	"""
	<p class="section"><span class="term">Language Model</span><br>
	A language model predicts the probability of a sequence of words occurring in a sentence. Popular models include GPT, BERT, and LSTM, which help in text generation, translation, and summarization tasks.
	</p>
	""",
	unsafe_allow_html=True,
	)