Update pages/Introduction.py
Browse files- pages/Introduction.py +14 -11
pages/Introduction.py
CHANGED
|
@@ -14,7 +14,7 @@ st.markdown("<p>NLP powers many applications that use language, such as text tra
|
|
| 14 |
st.subheader("NLP Techniques")
|
| 15 |
st.markdown("<p>NLP encompasses a wide array of techniques that aimed at enabling computers to process and understand human language. These tasks can be categorized into several broad areas, each addressing different aspects of language processing. Here are some of the key NLP techniques:</p>",unsafe_allow_html= True)
|
| 16 |
|
| 17 |
-
st.markdown('<p style="color
|
| 18 |
st.write("Before performing any analysis or modeling, raw text data must be cleaned and prepared.")
|
| 19 |
st.markdown('<p style="color:lightyellow;"><b>a. Tokenization</b></p>', unsafe_allow_html=True)
|
| 20 |
st.write("Splits text into smaller units like words or sentences.")
|
|
@@ -26,34 +26,34 @@ st.write("Example: _'I love NLP'_ → [‘I’, ‘love’, ‘NLP’]")
|
|
| 26 |
st.write("**(ii) Sentence Tokenization:** Breaking text into sentences.")
|
| 27 |
st.write("Example: _'I love NLP. It’s fascinating!'_ → [‘I love NLP.’, ‘It’s fascinating!’]")
|
| 28 |
|
| 29 |
-
st.markdown('<p style="color
|
| 30 |
st.write("Removes common words like “the,” “and,” “is” that do not contribute much to analysis.")
|
| 31 |
|
| 32 |
|
| 33 |
-
st.markdown('<p style="color
|
| 34 |
st.write("Stemming: Reduces words to their base or root form by chopping off suffixes (may not produce valid words).")
|
| 35 |
st.write("Example: _“running” _ → “run”")
|
| 36 |
|
| 37 |
st.write("Lemmatization: Converts words to their base form using vocabulary and grammar")
|
| 38 |
st.write("Example: _“good” _ → “better”")
|
| 39 |
|
| 40 |
-
st.markdown('<p style="color
|
| 41 |
st.write("Labels words with their grammatical roles (noun, verb, adjective, etc.)")
|
| 42 |
st.write("Example: _The cat sleeps”_ → [“The/DET”, “cat/NOUN”, “sleeps/VERB”]")
|
| 43 |
|
| 44 |
-
st.markdown('<p style="color
|
| 45 |
st.write("Identifies and classifies entities in text (e.g., names, dates, locations)")
|
| 46 |
st.write("Example: _ “Barack Obama was born in Hawaii. _ ” → [Barack Obama: PERSON, Hawaii: LOCATION]")
|
| 47 |
|
| 48 |
|
| 49 |
-
st.markdown('<p style="color
|
| 50 |
st.write("Converts text to a standard format (lowercasing, removing punctuation, etc.).")
|
| 51 |
|
| 52 |
|
| 53 |
-
st.markdown('<p style="color
|
| 54 |
st.write("Text needs to be transformed into numerical representations for machine learning models.")
|
| 55 |
|
| 56 |
-
st.markdown('<p style="color
|
| 57 |
st.write("Represents text as a vector of word frequencies or occurrences, ignoring grammar and order")
|
| 58 |
st.write("Examples:")
|
| 59 |
st.write("Text: “I love NLP” and “NLP is great”")
|
|
@@ -61,7 +61,7 @@ st.write("Vocabulary: [“I”, “love”, “NLP”, “is”, “great”]")
|
|
| 61 |
st.write("Vector for “I love NLP”: [1, 1, 1, 0, 0]")
|
| 62 |
|
| 63 |
|
| 64 |
-
st.markdown('<p style="color
|
| 65 |
st.write("The **TF-IDF Vectorizer** is a popular technique in Natural Language Processing (NLP) used to convert text into numerical values that can be used by machine learning models. It stands for Term Frequency-Inverse Document Frequency and helps highlight the importance of words in a document relative to a collection of documents (called a corpus).")
|
| 66 |
|
| 67 |
st.write('**Term Frequency (TF)** \n - Measures how often a word appears in a single document. \n - Formula: \n _TF_ = Number of times the word appears in the document / Total number of words in the document' )
|
|
@@ -89,10 +89,9 @@ st.write("""
|
|
| 89 |
""")
|
| 90 |
|
| 91 |
|
| 92 |
-
st.markdown('<p style="color
|
| 93 |
st.write("Word embeddings are a type of representation for text where words are converted into dense numerical vectors. These vectors capture the semantic meaning of words and their relationships with other words in a way that computers can understand.")
|
| 94 |
|
| 95 |
-
import streamlit as st
|
| 96 |
|
| 97 |
st.write("""
|
| 98 |
**Word Embedding Techniques**
|
|
@@ -132,4 +131,8 @@ The future of Natural Language Processing (NLP) is exciting, with advancements t
|
|
| 132 |
**5. Multimodal Learning**
|
| 133 |
- Beyond Text: Integrating text with images, audio, and video for richer applications like understanding memes, videos, or interactive media.
|
| 134 |
|
|
|
|
|
|
|
| 135 |
""")
|
|
|
|
|
|
|
|
|
| 14 |
st.subheader("NLP Techniques")
|
| 15 |
st.markdown("<p>NLP encompasses a wide array of techniques that aimed at enabling computers to process and understand human language. These tasks can be categorized into several broad areas, each addressing different aspects of language processing. Here are some of the key NLP techniques:</p>",unsafe_allow_html= True)
|
| 16 |
|
| 17 |
+
st.markdown('<p style="color:;"><b>1. Text Processing and Preprocessing In NLP</b></p>', unsafe_allow_html=True)
|
| 18 |
st.write("Before performing any analysis or modeling, raw text data must be cleaned and prepared.")
|
| 19 |
st.markdown('<p style="color:lightyellow;"><b>a. Tokenization</b></p>', unsafe_allow_html=True)
|
| 20 |
st.write("Splits text into smaller units like words or sentences.")
|
|
|
|
| 26 |
st.write("**(ii) Sentence Tokenization:** Breaking text into sentences.")
|
| 27 |
st.write("Example: _'I love NLP. It’s fascinating!'_ → [‘I love NLP.’, ‘It’s fascinating!’]")
|
| 28 |
|
| 29 |
+
st.markdown('<p style="color:;"><b>b. Stopword Removal</b></p>', unsafe_allow_html=True)
|
| 30 |
st.write("Removes common words like “the,” “and,” “is” that do not contribute much to analysis.")
|
| 31 |
|
| 32 |
|
| 33 |
+
st.markdown('<p style="color:;"><b>c. Stemming and Lemmatization</b></p>', unsafe_allow_html=True)
|
| 34 |
st.write("Stemming: Reduces words to their base or root form by chopping off suffixes (may not produce valid words).")
|
| 35 |
st.write("Example: _“running” _ → “run”")
|
| 36 |
|
| 37 |
st.write("Lemmatization: Converts words to their base form using vocabulary and grammar")
|
| 38 |
st.write("Example: _“good” _ → “better”")
|
| 39 |
|
| 40 |
+
st.markdown('<p style="color:;"><b>d. Part-of-Speech (POS) Tagging</b></p>', unsafe_allow_html=True)
|
| 41 |
st.write("Labels words with their grammatical roles (noun, verb, adjective, etc.)")
|
| 42 |
st.write("Example: _The cat sleeps”_ → [“The/DET”, “cat/NOUN”, “sleeps/VERB”]")
|
| 43 |
|
| 44 |
+
st.markdown('<p style="color:;"><b>e. Named Entity Recognition (NER)</b></p>', unsafe_allow_html=True)
|
| 45 |
st.write("Identifies and classifies entities in text (e.g., names, dates, locations)")
|
| 46 |
st.write("Example: _ “Barack Obama was born in Hawaii. _ ” → [Barack Obama: PERSON, Hawaii: LOCATION]")
|
| 47 |
|
| 48 |
|
| 49 |
+
st.markdown('<p style="color:;"><b>f. Text Normalization</b></p>', unsafe_allow_html=True)
|
| 50 |
st.write("Converts text to a standard format (lowercasing, removing punctuation, etc.).")
|
| 51 |
|
| 52 |
|
| 53 |
+
st.markdown('<p style="color:;"><b>2. Feature Extraction Techniques</b></p>', unsafe_allow_html=True)
|
| 54 |
st.write("Text needs to be transformed into numerical representations for machine learning models.")
|
| 55 |
|
| 56 |
+
st.markdown('<p style="color:;"><b>a. Bag of Words (BoW)</b></p>', unsafe_allow_html=True)
|
| 57 |
st.write("Represents text as a vector of word frequencies or occurrences, ignoring grammar and order")
|
| 58 |
st.write("Examples:")
|
| 59 |
st.write("Text: “I love NLP” and “NLP is great”")
|
|
|
|
| 61 |
st.write("Vector for “I love NLP”: [1, 1, 1, 0, 0]")
|
| 62 |
|
| 63 |
|
| 64 |
+
st.markdown('<p style="color:;"><b>b. Term Frequency-Inverse Document Frequency (TF-IDF)</b></p>', unsafe_allow_html=True)
|
| 65 |
st.write("The **TF-IDF Vectorizer** is a popular technique in Natural Language Processing (NLP) used to convert text into numerical values that can be used by machine learning models. It stands for Term Frequency-Inverse Document Frequency and helps highlight the importance of words in a document relative to a collection of documents (called a corpus).")
|
| 66 |
|
| 67 |
st.write('**Term Frequency (TF)** \n - Measures how often a word appears in a single document. \n - Formula: \n _TF_ = Number of times the word appears in the document / Total number of words in the document' )
|
|
|
|
| 89 |
""")
|
| 90 |
|
| 91 |
|
| 92 |
+
st.markdown('<p style="color:;"><b>c. Word Embeddings</b></p>', unsafe_allow_html=True)
|
| 93 |
st.write("Word embeddings are a type of representation for text where words are converted into dense numerical vectors. These vectors capture the semantic meaning of words and their relationships with other words in a way that computers can understand.")
|
| 94 |
|
|
|
|
| 95 |
|
| 96 |
st.write("""
|
| 97 |
**Word Embedding Techniques**
|
|
|
|
| 131 |
**5. Multimodal Learning**
|
| 132 |
- Beyond Text: Integrating text with images, audio, and video for richer applications like understanding memes, videos, or interactive media.
|
| 133 |
|
| 134 |
+
The future of NLP is about creating systems that communicate more naturally, inclusively, and intelligently, enabling transformative applications in every aspect of life.
|
| 135 |
+
|
| 136 |
""")
|
| 137 |
+
|
| 138 |
+
|