Mpavan45 commited on
Commit
bad919e
·
verified ·
1 Parent(s): ab01801

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +23 -7
app.py CHANGED
@@ -279,9 +279,13 @@ elif st.session_state.selected_page == "🔄NLP Lifecycle":
279
  st.write("""
280
  #### 📝 5. Text Representation
281
  After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
 
282
  - **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
283
  - **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
284
- - **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning.
 
 
 
285
 
286
  **Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
287
  - Vocabulary: ["I", "love", "NLP"]
@@ -331,6 +335,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
331
  "Stop Words Removal",
332
  "Lemmatization",
333
  "Stemming",
 
334
  "Bag of Words (BoW)",
335
  "TF-IDF",
336
  "Word Embeddings",
@@ -384,9 +389,20 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
384
  - **Example**: "running" → "run", "happiness" → "happi".
385
  """)
386
 
 
 
 
 
 
 
 
 
 
 
 
387
  elif technique_option == "Bag of Words (BoW)":
388
  st.write("""
389
- #### 5. Bag of Words (BoW)
390
  The Bag of Words model represents text as a set of individual words, disregarding grammar and word order but keeping multiplicity. It is a simple and widely used method for text representation.
391
  - **Example**:
392
  - Text: "I love NLP"
@@ -395,14 +411,14 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
395
 
396
  elif technique_option == "TF-IDF":
397
  st.write("""
398
- #### 6. TF-IDF (Term Frequency-Inverse Document Frequency)
399
  TF-IDF helps determine the importance of a word in a document relative to the entire dataset. It reduces the weight of common words and increases the weight of rare but important words.
400
  - **Example**: The word "data" might have a high TF-IDF score in a document about data analysis but a low score in a document about cooking.
401
  """)
402
 
403
  elif technique_option == "Word Embeddings":
404
  st.write("""
405
- #### 7. Word Embeddings
406
  Word embeddings are vector representations of words that capture semantic relationships. Words with similar meanings have similar vectors. Common word embedding models include:
407
  - **Word2Vec**
408
  - **GloVe**
@@ -413,7 +429,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
413
 
414
  elif technique_option == "Named Entity Recognition (NER)":
415
  st.write("""
416
- #### 8. Named Entity Recognition (NER)
417
  NER is the task of identifying named entities such as persons, organizations, locations, and dates in text. This technique is commonly used for information extraction.
418
  - **Example**: "Barack Obama was born in Hawaii."
419
  - Entities: ["Barack Obama" (Person), "Hawaii" (Location)]
@@ -421,7 +437,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
421
 
422
  elif technique_option == "Part-of-Speech (POS) Tagging":
423
  st.write("""
424
- #### 9. Part-of-Speech (POS) Tagging
425
  POS tagging involves assigning grammatical labels (such as noun, verb, adjective) to each word in a sentence.
426
  - **Example**: "The cat sat on the mat."
427
  - POS Tags: [("The", "DT"), ("cat", "NN"), ("sat", "VBD"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
@@ -429,7 +445,7 @@ elif st.session_state.selected_page == "⚙️NLP Techniques":
429
 
430
  elif technique_option == "Sentiment Analysis":
431
  st.write("""
432
- #### 10. Sentiment Analysis
433
  Sentiment analysis involves determining the sentiment of a piece of text, typically categorizing it as positive, negative, or neutral. This is commonly used for customer feedback and social media monitoring.
434
  - **Example**: "I love this product!" → Positive Sentiment
435
  """)
 
279
  st.write("""
280
  #### 📝 5. Text Representation
281
  After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
282
+ - **One-Hot Encoding**: Represents each word as a binary vector.
283
  - **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
284
  - **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
285
+ - **Word Embeddings**: Transforms words into dense vectors that capture semantic meaning. Common word embedding models include:
286
+ - **Word2Vec**
287
+ - **GloVe**
288
+ - **FastText**
289
 
290
  **Example**: Using BoW to convert the sentence "I love NLP" into a vector representation:
291
  - Vocabulary: ["I", "love", "NLP"]
 
335
  "Stop Words Removal",
336
  "Lemmatization",
337
  "Stemming",
338
+ "One-Hot Encoding",
339
  "Bag of Words (BoW)",
340
  "TF-IDF",
341
  "Word Embeddings",
 
389
  - **Example**: "running" → "run", "happiness" → "happi".
390
  """)
391
 
392
+ elif technique_option == "One-Hot Encoding":
393
+ st.write("""
394
+ #### 5.One-Hot Encoding
395
+ - Represents each word as a binary vector.
396
+ - Example:
397
+ - Vocabulary: ["cat", "dog", "fish"]
398
+ - Encoding for "cat": [1, 0, 0]
399
+ - Encoding for "dog": [0, 1, 0]
400
+ - **Pros**: Simple to implement.
401
+ - **Cons**: Results in sparse and high-dimensional vectors.
402
+ """)
403
  elif technique_option == "Bag of Words (BoW)":
404
  st.write("""
405
+ #### 6. Bag of Words (BoW)
406
  The Bag of Words model represents text as a set of individual words, disregarding grammar and word order but keeping multiplicity. It is a simple and widely used method for text representation.
407
  - **Example**:
408
  - Text: "I love NLP"
 
411
 
412
  elif technique_option == "TF-IDF":
413
  st.write("""
414
+ #### 7. TF-IDF (Term Frequency-Inverse Document Frequency)
415
  TF-IDF helps determine the importance of a word in a document relative to the entire dataset. It reduces the weight of common words and increases the weight of rare but important words.
416
  - **Example**: The word "data" might have a high TF-IDF score in a document about data analysis but a low score in a document about cooking.
417
  """)
418
 
419
  elif technique_option == "Word Embeddings":
420
  st.write("""
421
+ #### 8. Word Embeddings
422
  Word embeddings are vector representations of words that capture semantic relationships. Words with similar meanings have similar vectors. Common word embedding models include:
423
  - **Word2Vec**
424
  - **GloVe**
 
429
 
430
  elif technique_option == "Named Entity Recognition (NER)":
431
  st.write("""
432
+ #### 9. Named Entity Recognition (NER)
433
  NER is the task of identifying named entities such as persons, organizations, locations, and dates in text. This technique is commonly used for information extraction.
434
  - **Example**: "Barack Obama was born in Hawaii."
435
  - Entities: ["Barack Obama" (Person), "Hawaii" (Location)]
 
437
 
438
  elif technique_option == "Part-of-Speech (POS) Tagging":
439
  st.write("""
440
+ #### 10. Part-of-Speech (POS) Tagging
441
  POS tagging involves assigning grammatical labels (such as noun, verb, adjective) to each word in a sentence.
442
  - **Example**: "The cat sat on the mat."
443
  - POS Tags: [("The", "DT"), ("cat", "NN"), ("sat", "VBD"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
 
445
 
446
  elif technique_option == "Sentiment Analysis":
447
  st.write("""
448
+ #### 11. Sentiment Analysis
449
  Sentiment analysis involves determining the sentiment of a piece of text, typically categorizing it as positive, negative, or neutral. This is commonly used for customer feedback and social media monitoring.
450
  - **Example**: "I love this product!" → Positive Sentiment
451
  """)