File size: 25,020 Bytes
1ea9964
 
3e4eebe
 
8cd33d7
3e4eebe
061c131
8cd33d7
3e4eebe
c9faa0f
8cd33d7
3e4eebe
 
 
6b894ac
3e4eebe
 
 
061c131
c3c4c06
67798e6
 
 
c9faa0f
797c500
6309476
c9faa0f
797c500
67798e6
c9faa0f
 
061c131
4e00702
c9faa0f
4996224
6b894ac
b7d7504
3e4eebe
ece396d
3e4eebe
061c131
 
 
3e4eebe
 
 
 
 
 
 
 
 
 
 
 
 
 
6b894ac
 
3e4eebe
c9faa0f
3e4eebe
61c9b67
c9faa0f
61c9b67
 
 
 
 
 
 
3e4eebe
79269c6
dd515b3
79269c6
41cb2b4
41e2874
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
061c131
797c500
 
 
 
 
 
 
 
 
ab01801
4e00702
061c131
c9faa0f
2406810
 
 
 
 
 
 
 
 
 
 
41e2874
61c9b67
95614fa
4e00702
95614fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fc5491b
95614fa
4e00702
 
61c9b67
8e54229
4e00702
c495b41
 
fc5491b
 
c495b41
fc5491b
 
 
 
 
4e00702
c495b41
fc5491b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e54229
4e00702
 
 
61c9b67
3e4eebe
8e54229
3e4eebe
4e00702
a42352d
3e4eebe
 
 
 
 
a42352d
 
 
 
 
 
 
 
 
 
 
 
4e00702
a42352d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e4eebe
4e00702
 
61c9b67
3e4eebe
8e54229
3e4eebe
bad919e
3e4eebe
 
bad919e
 
 
 
4e00702
3e4eebe
 
 
 
4e00702
61c9b67
3e4eebe
8e54229
3e4eebe
 
 
 
4e00702
3e4eebe
 
4e00702
61c9b67
3e4eebe
8e54229
061c131
 
 
 
 
4e00702
67798e6
3e4eebe
4e00702
61c9b67
3e4eebe
8e54229
67798e6
 
 
 
4e00702
67798e6
 
061c131
 
3e4eebe
c9faa0f
3e4eebe
061c131
3e4eebe
 
 
 
bad919e
3e4eebe
 
 
 
 
 
 
061c131
 
 
 
67798e6
061c131
 
 
 
 
 
 
 
 
 
ece396d
061c131
3e4eebe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bad919e
 
 
 
 
 
 
 
 
 
 
3e4eebe
 
bad919e
3e4eebe
 
 
 
 
 
 
 
bad919e
3e4eebe
 
 
 
 
 
bad919e
3e4eebe
 
 
 
 
 
 
 
 
 
bad919e
3e4eebe
 
 
 
 
 
 
bad919e
3e4eebe
 
 
 
 
 
 
bad919e
3e4eebe
 
 
4d14701
7ae1ca0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
import streamlit as st

# Sidebar for navigation
sidebar = st.sidebar

# Sidebar header
sidebar.header('🌐 NLP Navigation')

# Sidebar options for NLP Overview, Lifecycle, and Techniques
sidebar_option = sidebar.radio('Choose a section to explore:', ['🧠What is NLP?', '🔄NLP Lifecycle', '⚙️NLP Techniques'])

# Store the selected page in session state
if 'selected_page' not in st.session_state:
    st.session_state.selected_page = sidebar_option

# Update the selected page if the user selects a different option
if sidebar_option != st.session_state.selected_page:
    st.session_state.selected_page = sidebar_option

# Dynamically update the title based on the selected option
def set_title(title, color="black"):
    st.markdown(f"<h1 style='text-align: center; color: {color};'>{title}</h1>", unsafe_allow_html=True)

if st.session_state.selected_page == '🧠What is NLP?':
    set_title('Natural Language Processing (NLP)', color="darkgreen")
    
elif st.session_state.selected_page == '🔄NLP Lifecycle':
    set_title('Natural Language Processing (NLP) Lifecycle', color="darkgreen")

elif st.session_state.selected_page == '⚙️NLP Techniques':
    set_title('Techniques in Natural Language Processing (NLP)', color="darkgreen")

# Content for "What is NLP? 🧠"
if st.session_state.selected_page == '🧠What is NLP?':
    st.markdown("<h2 style='text-align: center; color: darkgreen;'>📘 Introduction to NLP</h2>", unsafe_allow_html=True)
    st.write("""
    #### 🤖 What is NLP?
    Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is meaningful.
    
    NLP is essential for enabling computers to process and analyze large amounts of natural language data, such as:
    - 📜 Text from documents
    - 🗣️ Speech from conversations
    - 🖼️ Images with textual descriptions

    #### Key Components of NLP:
    - **Syntax**: Refers to the arrangement of words in a sentence.
    - **Semantics**: Focuses on the meaning of the words and sentences.
    - **Pragmatics**: Involves the context and intent behind language.
    - **Discourse**: Studies how previous sentences and context influence meaning.

    #### Example Applications of NLP:
    - **Machine Translation**: Automatic translation of text from one language to another (e.g., Google Translate).
    - **Speech Recognition**: Converting spoken language into text (e.g., Siri, Alexa).
    - **Sentiment Analysis**: Analyzing text to determine the sentiment (positive, negative, neutral) (e.g., analyzing customer reviews).
    - **Text Summarization**: Creating a short summary of a long text (e.g., summarizing articles).

    NLP is used across multiple domains like healthcare, finance, and customer service to automate and improve various tasks.
    """)

# Content for NLP Lifecycle
elif st.session_state.selected_page == "🔄NLP Lifecycle":
    lifecycle_option = sidebar.radio("Select NLP Lifecycle Step:", [
        "🌐Overview of the NLP Life Cycle",
        "🎯Problem Definition",
        "📊Data Collection",
        "📈Simple EDA",
        "🧹Data Preprocessing ",
        "📝Feature Engineering",
        "🏋️‍♂️Model Training",
        "🏅Evaluation",
        "🚀Deployment"
    ])

    if lifecycle_option == "🌐Overview of the NLP Life Cycle":
        st.write("""
        #### Overview of the NLP Life Cycle:
        The NLP life cycle is a structured process for building, using, and maintaining systems that work with human language. It turns unstructured text into meaningful insights or automated actions. This process ensures continuous improvement and adapts to real-world needs.
    
        - **How It Flows**:
            - The process starts with identifying the problem and collecting the required text data.
            - Then, the data is cleaned and prepared for analysis.
            - Models are built and tested before being deployed for use.
            - Regular checks and updates ensure the solution keeps working well.
    
        - **Flexible and Adaptive**:
            - Since languages and data change (e.g., new words, trends), the process is repeated as needed.
            - Models may need updates or retraining to stay accurate.
    
        - **Combines Different Fields**:
            - The process involves skills from language studies, programming, and data analysis to make sure language is understood effectively.
    
        - **Designed for Practical Use**:
            - The goal is to create solutions that can handle tasks like analyzing text, identifying emotions, powering chatbots, or translating languages accurately and efficiently.
    
        - **Key Challenges Solved**:
            - Managing the complexity of language (e.g., meaning, structure).
            - Working with large and messy datasets.
            - Handling multiple languages and specific industries.
            - Ensuring solutions are fast and efficient.
    
        #### Steps in the NLP Life Cycle:
            1. 🔧Problem Definition 
            2. 📊Data Collection
            3. 🔍Simple EDA
            4. 🧹Data Preprocessing
            5. 📝Feature Engineering
            6. 🏋️‍♂️Model Selection and Training
            7. 🏅Model Evaluation
            8. ⚙️Model Tuning
            9. 🚀Deployment
            10.🛠️Monitoring and Maintenance
            """)

    elif lifecycle_option == "🎯Problem Definition":
        st.write("""
        #### 🔧 1. Problem Definition
        - The first step in the NLP lifecycle is defining the problem. This means understanding the goal and figuring out how NLP can help solve the problem.
        - Based on the problem, you will need to gather the data.
        - **To better understand the problem, consider asking questions such as**:
            - 🎯 What is the main goal of this analysis?
            - 📝 What kind of text data are we working with (e.g., reviews, social media posts, documents)?
            - 📊 What do we want the output to be (e.g., sentiment score, summary, or classification)?
    
        **Example of a problem statement**: The goal could be to classify customer reviews as either positive or negative, or to find the main topics in product reviews.
        """)

    elif lifecycle_option == "📊Data Collection":
        st.write("""
           #### 📚 2. Data Collection
           Data collection is the second step in the NLP lifecycle. It involves gathering data from various sources based on the problem statement, so it can be analyzed and processed.
           - **Sources for data collection**:
                - 📚 The data should be collected based on a clear understanding of the problem statement.
                - 🌐 From datasets available on websites like Kaggle.
                - 🔌 Through APIs.
                - 🕸️ Web scraping can also be used to gather data from websites using tools like Selenium or BeautifulSoup.
                - ✋ Manually, when needed.
                - In most cases, data is collected from websites, APIs, or through web scraping. However, manual collection may be necessary in rare cases.
    
           **Example**: Scraping customer reviews from Amazon to analyze sentiment and feedback about a product.
    
           #### Data Extraction from Files
           After collecting the data, it is often stored in various file formats like JSON, CSV, Excel, or XML. Using the **Pandas** library in Python, we can extract and convert this data into a **DataFrame**, which is a structured format ideal for analysis.
           - **Steps for Data Extraction**:
               1. Identify the file format (e.g., `.json`, `.csv`, `.xlsx`, `.xml`).
               2. Use Pandas functions like `pd.read_csv()`, `pd.read_json()`, `pd.read_excel()`, or `pd.read_xml()` to load the data.
               3. Handle additional parameters based on file structure (e.g., `delimiter` for CSV, `sheet_name` for Excel).
               4. Verify and clean the data using methods like `df.head()` and `df.info()`.
    
           **Example Code for Data Extraction**:
           ```python
           import pandas as pd
    
           # 1. Extracting Data from a CSV File
           print("Extracting data from CSV file...")
           csv_file = 'example_data.csv'  # Replace with your CSV file path
           df_csv = pd.read_csv(csv_file)
           print("CSV Data:")
           print(df_csv.head())  # Display the first few rows
    
           # 2. Extracting Data from a JSON File
           print("\\nExtracting data from JSON file...")
           json_file = 'example_data.json'  # Replace with your JSON file path
           df_json = pd.read_json(json_file)
           print("JSON Data:")
           print(df_json.head())  # Display the first few rows
    
           # 3. Extracting Data from an Excel File
           print("\\nExtracting data from Excel file...")
           excel_file = 'example_data.xlsx'  # Replace with your Excel file path
           df_excel = pd.read_excel(excel_file, sheet_name='Sheet1')  # Specify the sheet name if necessary
           print("Excel Data:")
           print(df_excel.head())  # Display the first few rows
    
           # 4. Extracting Data from an XML File
           print("\\nExtracting data from XML file...")
           xml_file = 'example_data.xml'  # Replace with your XML file path
           df_xml = pd.read_xml(xml_file)
           print("XML Data:")
           print(df_xml.head())  # Display the first few rows
           ```
           Using the above code structure, we can efficiently extract data from various file formats such as CSV, JSON, Excel, and XML, and load it into a structured format suitable for analysis.
        """)
    
    
    elif lifecycle_option == "📈Simple EDA":
        st.write("""
            #### 🔍 3. Simple EDA
           #### Simple Exploratory Data Analysis (Simple EDA)
            Simple EDA provides a high-level understanding of the dataset and its characteristics. It focuses on summarizing key features, identifying potential issues, and visualizing distributions to inform further analysis.
    
            #### Checking Data Balance
            Before proceeding with the analysis, it's essential to assess whether the dataset is balanced or imbalanced by using simple EDA (Exploratory Data Analysis). This involves examining the distribution of classes or categories in the data. By calculating the count or percentage of instances in each class, we can determine if the data is evenly distributed or if certain classes are underrepresented. Addressing class imbalance is important to ensure that the analysis and modeling processes are reliable and accurate.
            **Example**: In a classification dataset:
            - Class Distribution:
                - Class A: 700 instances
                - Class B: 300 instances
            - The dataset shows a 70:30 imbalance, which may require techniques like oversampling, undersampling, or synthetic data generation to correct.
    
            #### Steps to Understand and Explore Your Data
            - **Basic Data Inspection**: Examine data types, view the first few rows, and understand the overall structure.
            - **Summary Statistics**: Calculate key metrics like mean, median, and standard deviation to summarize numerical variables.
            - **Basic Visualizations**: Use histograms, boxplots, and scatterplots to explore data distributions and relationships.
            - **Missing Values Check**: Identify any missing data in columns and rows for potential cleaning.
            - **Outlier Detection**: Spot extreme values using visualizations and statistical methods.
    
            **Example**: In a sales dataset:
            - Data Inspection:
                - Dataset shape: (1000, 5)
                - Sample columns: [Sales, Marketing Spend, Date, etc.]
            - Summary Statistics:
                - Mean Sales: 1000
                - Median Sales: 950
            - Visualizations:
                - Histogram for sales distribution
                - Boxplot to detect outliers
        """)
    
    
    
    elif lifecycle_option == "🧹Data Preprocessing ":
        st.write("""
        #### 🧹 4. Text Preprocessing
        Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
    
        **Key Steps in Text Preprocessing:**
        - **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
        - **Stop Words Removal**: Removing common words that don’t contribute much information.
        - **Lemmatization**: Converting words into their base or dictionary form.
        - **Stemming**: Cutting off prefixes or suffixes from words.
        - **Lowercasing**: Converting all characters in the text to lowercase.
        - **HTML Tag Removal**: Eliminating any HTML tags like `<p>`, `<a>`, `<b>`, etc.
        - **URL Removal**: Stripping out URLs such as `http://example.com` or `www.example.com`.
        - **Emoji Removal**: Removing emojis (e.g., 🙂, 🚀) as they are typically non-informative for analysis.
        - **Hashtag Removal**: Removing hashtags (e.g., `#data`, `#AI`) that might not be relevant for textual analysis.
        - **Special Characters Removal**: Stripping out symbols or characters that don't contribute to the meaning of the text.
    
        **Example**: For the sentence "The quick brown fox is running fast 🦊 #awesome http://example.com", after preprocessing:
        - Tokenization: ["The", "quick", "brown", "fox", "is", "running", "fast", "🦊", "#awesome", "http://example.com"]
        - HTML Tag Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "🦊", "#awesome", "http://example.com"]
        - URL Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "🦊", "#awesome"]
        - Emoji Removal: ["The", "quick", "brown", "fox", "is", "running", "fast", "#awesome"]
        - Hashtag Removal: ["The", "quick", "brown", "fox", "is", "running", "fast"]
    
        Now, let's apply the necessary text preprocessing steps to clean up the data:
    
        ```python
        import re
        from bs4 import BeautifulSoup
    
        # Sample data
        data = "Check out this amazing post! 😊 #awesome #data http://example.com Visit us at www.example.com! 🚀 Let's talk about AI! #AI #machinelearning"
    
        # Remove HTML tags using BeautifulSoup
        cleaned_data = BeautifulSoup(data, "html.parser").get_text()
    
        # Remove URLs using regular expression
        cleaned_data = re.sub(r'http\S+|www\S+', '', cleaned_data)
    
        # Remove emojis using a regular expression (basic pattern for common emojis)
        cleaned_data = re.sub(r'[^\w\s,]', '', cleaned_data)
    
        # Remove hashtags (words starting with #)
        cleaned_data = re.sub(r'#\w+', '', cleaned_data)
    
        st.write(f"Cleaned Data: {cleaned_data}")
        ```
    
        **Output**: After cleaning, the data will look like:
        ```
        Check out this amazing post  awesome  data 
        Visit us at   Let's talk about AI  machinelearning
        ```
    
        By following these preprocessing steps, the raw text is now ready for further analysis or machine learning tasks.
        """)
    
    
    elif lifecycle_option == "📝Feature Engineering":
        st.write("""
        #### 📝 5. Text Representation
        After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
        - **One-Hot Encoding**: Represents each word as a binary vector.
        - **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
        - **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
        - **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning. Common word embedding models include:
            - **Word2Vec**
            - **GloVe**
            - **FastText**
    
        **Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
        - Vocabulary: ["I", "love", "NLP"]
        - Vector: [1, 1, 1] (word frequency representation)
        """)
    
    elif lifecycle_option == "🏋️‍♂️Model Training":
        st.write("""
        #### 🏋️‍♂️ 6. Model Training
        In the model training stage, machine learning algorithms are trained on the preprocessed and represented text data. The choice of model depends on the task:
        - **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
        - **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
        - **Sentiment Analysis**: Logistic regression, Naive Bayes, or transformer-based models like BERT.
    
        **Example**: Training a Naive Bayes classifier to categorize news articles into topics such as "Sports", "Politics", etc.
        """)
    
    elif lifecycle_option == "🏅Evaluation":
        st.write("""
        #### 🏅 7. Evaluation
        After training the model, it's important to evaluate its performance using metrics such as accuracy, precision, recall, and F1-score.
        - **Accuracy**: The percentage of correct predictions.
        - **Precision**: The percentage of relevant instances among the retrieved instances.
        - **Recall**: The percentage of relevant instances that were retrieved.
        - **F1-score**: The harmonic mean of precision and recall.
    
        **Example**: Evaluating a sentiment analysis model using accuracy and F1-score on a test dataset.
        """)
    
    elif lifecycle_option == "🚀Deployment":
        st.write("""
        #### 🚀 8. Deployment
        Once the model is evaluated and tuned, it is deployed into production where it can be used by end users. Deployment involves:
        - **Integration** with web applications, chatbots, or other tools.
        - **API Development**: Exposing the model through an API for real-time predictions.
        - **Continuous Monitoring**: Tracking the model’s performance and retraining it as needed.
    
        **Example**: Deploying a sentiment analysis model in a customer service chatbot that analyzes customer inquiries in real time.
        """)


# Content for "NLP Techniques"
elif st.session_state.selected_page == "⚙️NLP Techniques":
    technique_option = sidebar.radio("Select NLP Technique:", [
        "NLP Techniques",
        "Tokenization",
        "Stop Words Removal",
        "Lemmatization",
        "Stemming",
        "One-Hot Encoding",
        "Bag of Words (BoW)",
        "TF-IDF",
        "Word Embeddings",
        "Named Entity Recognition (NER)",
        "Part-of-Speech (POS) Tagging",
        "Sentiment Analysis"
    ])
    if technique_option == "NLP Techniques":
        st.write("""
        ### ⚙️ Techniques in NLP
        NLP uses a variety of techniques to process and analyze text data. Some of the most common techniques include:
        ### 🛠️ Common NLP Techniques
        1. **Tokenization**: Breaking down text into smaller units (e.g., words, sentences).
        2. **Part-of-Speech (POS) Tagging**: Identifying the grammatical roles of words in a sentence (e.g., noun, verb, adjective).
        3. **Named Entity Recognition (NER)**: Identifying entities such as names, dates, locations, etc.
        4. **Dependency Parsing**: Analyzing the syntactic structure of sentences.
        5. **Sentiment Analysis**: Analyzing the sentiment of text (positive, negative, neutral).
        6. **Word Embeddings**: Representing words as vectors in a continuous space (e.g., Word2Vec, GloVe).
    
        **Example**: Sentiment analysis can be used to identify whether customer reviews are positive, negative, or neutral based on the words used in the text.
        """)

    
    elif technique_option == "Tokenization":
        st.write("""
        #### 1. Tokenization
        Tokenization is the process of splitting text into smaller units, such as words, sentences, or subwords. This is a key preprocessing step for many NLP tasks.
        - **Example**: 
          - Sentence: "Natural Language Processing is awesome!"
          - Tokenized words: ["Natural", "Language", "Processing", "is", "awesome"]
        """)

    elif technique_option == "Stop Words Removal":
        st.write("""
        #### 2. Stop Words Removal
        Stop words are commonly used words like "the", "is", "at", etc., that do not carry much information in many NLP tasks. Removing stop words helps reduce the dimensionality and noise in the data.
        - **Example**: Removing "is" from the sentence "NLP is amazing!"
        """)

    elif technique_option == "Lemmatization":
        st.write("""
        #### 3. Lemmatization
        Lemmatization is the process of converting words into their root or base form based on context. It is more sophisticated than stemming, as it considers the meaning of words.
        - **Example**: "better" → "good", "running" → "run".
        """)

    elif technique_option == "Stemming":
        st.write("""
        #### 4. Stemming
        Stemming is the process of reducing words to their root form by removing prefixes or suffixes. This technique may result in non-dictionary words.
        - **Example**: "running" → "run", "happiness" → "happi".
        """)

    elif technique_option == "One-Hot Encoding":
        st.write("""
        #### 5.One-Hot Encoding
          - Represents each word as a binary vector.
          - Example:
           - Vocabulary: ["cat", "dog", "fish"]
           - Encoding for "cat": [1, 0, 0]
           - Encoding for "dog": [0, 1, 0]
       - **Pros**: Simple to implement.
       - **Cons**: Results in sparse and high-dimensional vectors.
        """)
    elif technique_option == "Bag of Words (BoW)":
        st.write("""
        #### 6. Bag of Words (BoW)
        The Bag of Words model represents text as a set of individual words, disregarding grammar and word order but keeping multiplicity. It is a simple and widely used method for text representation.
        - **Example**:
          - Text: "I love NLP"
          - BoW: {"I": 1, "love": 1, "NLP": 1}
        """)

    elif technique_option == "TF-IDF":
        st.write("""
        #### 7. TF-IDF (Term Frequency-Inverse Document Frequency)
        TF-IDF helps determine the importance of a word in a document relative to the entire dataset. It reduces the weight of common words and increases the weight of rare but important words.
        - **Example**: The word "data" might have a high TF-IDF score in a document about data analysis but a low score in a document about cooking.
        """)

    elif technique_option == "Word Embeddings":
        st.write("""
        #### 8. Word Embeddings
        Word embeddings are vector representations of words that capture semantic relationships. Words with similar meanings have similar vectors. Common word embedding models include:
        - **Word2Vec**
        - **GloVe**
        - **FastText**

        **Example**: The words "king" and "queen" would have similar vector representations because they share semantic relationships.
        """)

    elif technique_option == "Named Entity Recognition (NER)":
        st.write("""
        #### 9. Named Entity Recognition (NER)
        NER is the task of identifying named entities such as persons, organizations, locations, and dates in text. This technique is commonly used for information extraction.
        - **Example**: "Barack Obama was born in Hawaii."
          - Entities: ["Barack Obama" (Person), "Hawaii" (Location)]
        """)

    elif technique_option == "Part-of-Speech (POS) Tagging":
        st.write("""
        #### 10. Part-of-Speech (POS) Tagging
        POS tagging involves assigning grammatical labels (such as noun, verb, adjective) to each word in a sentence.
        - **Example**: "The cat sat on the mat."
          - POS Tags: [("The", "DT"), ("cat", "NN"), ("sat", "VBD"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
        """)

    elif technique_option == "Sentiment Analysis":
        st.write("""
        #### 11. Sentiment Analysis
        Sentiment analysis involves determining the sentiment of a piece of text, typically categorizing it as positive, negative, or neutral. This is commonly used for customer feedback and social media monitoring.
        - **Example**: "I love this product!" → Positive Sentiment
        """)