import streamlit as st st.markdown( """ """, unsafe_allow_html=True ) st.markdown( """

Basic Terminology in NLP

""", unsafe_allow_html=True ) st.markdown( """
Before diving deep into the concepts of NLP we must know about the frequently used terminologies in NLP
1.Key Terminologies in NLP
""", unsafe_allow_html=True ) st.markdown( """
2.Tokenization

Tokenization is the process of breaking down a large piece of text into smaller units called tokens. These tokens can be words, sentences, or subwords, depending on the granularity required for the task.

Types of Tokenization:
""", unsafe_allow_html=True ) st.markdown( """
3.Stop Words

Stop words are commonly used words in a language that carry little or no meaningful information for text analysis.

Example:

"In Hyderabad, we can eat famous biryani."
Stop words: ["in", "we", "can"]

""", unsafe_allow_html=True ) st.markdown( """
4.Vectorization

Vectorization is the process of converting text data into numerical representations so that machine learning models can process and analyze it.

Types of Vectorization:
""", unsafe_allow_html=True ) st.markdown( """
5. Stemming

Stemming is the process of reducing words to their base or root form, often by removing prefixes or suffixes. It is a rule-based, heuristic approach to standardize words by removing derivational affixes.

Example:
""", unsafe_allow_html=True ) st.markdown( """
6. Lemmatization

Lemmatization is the process of reducing a word to its base or root form (called a lemma) using linguistic rules and a vocabulary (dictionary). Unlike stemming, lemmatization ensures that the resulting word is a valid word in the language.

Example:

Lemmatization is more accurate than stemming but computationally more intensive as it requires a language dictionary.

""", unsafe_allow_html=True )