Spaces:

Rajesh6
/

NLP

Sleeping

App Files Files Community

Rajesh6 commited on Nov 23, 2024

Commit

3ce499e

verified ·

1 Parent(s): 33f119e

Update pages/Introduction.py

Browse files

Files changed (1) hide show

pages/Introduction.py +24 -2

pages/Introduction.py CHANGED Viewed

@@ -67,8 +67,6 @@ st.write('**Term Frequency (TF)** \n - Measures how often a word appears in a si
 st.write('**Inverse Document Frequency (IDF)** \n Measures how unique or rare a word is across all documents in the corpus. \n - Formula: \n  _IDF_ = log(Total no.of documents / No of Documnets containing the word) \n Words that appear in many documents (like "the" or "and") will have a low IDF value, while unique words (like "NLP") will have a higher IDF.')
 st.write('**TF - IDF Score:** \n - Combines TF and IDF to calculate the importance of a word in a document. \n - Formula: \n _TF - IDF = TF x IDF_ \n Words that are frequent in a document but rare in the overall corpus get a higher score.')
-st.write("Examples:")
 st.write("""
 **Example**
 **Consider these two documents:**
@@ -88,3 +86,27 @@ st.write("""
 - "NLP" gets a TF-IDF score of **1/3 × 0 = 0** (not unique).
 - "love" and "amazing" get scores of **1/3 × 0.69 = 0.23** (more unique).
 """)

 st.write('**Inverse Document Frequency (IDF)** \n Measures how unique or rare a word is across all documents in the corpus. \n - Formula: \n  _IDF_ = log(Total no.of documents / No of Documnets containing the word) \n Words that appear in many documents (like "the" or "and") will have a low IDF value, while unique words (like "NLP") will have a higher IDF.')
 st.write('**TF - IDF Score:** \n - Combines TF and IDF to calculate the importance of a word in a document. \n - Formula: \n _TF - IDF = TF x IDF_ \n Words that are frequent in a document but rare in the overall corpus get a higher score.')
 st.write("""
 **Example**
 **Consider these two documents:**
 - "NLP" gets a TF-IDF score of **1/3 × 0 = 0** (not unique).
 - "love" and "amazing" get scores of **1/3 × 0.69 = 0.23** (more unique).
 """)
+st.markdown('<p style="color:lightblue;"><b>c. Word Embeddings</b></p>', unsafe_allow_html=True)
+st.write("Word embeddings are a type of representation for text where words are converted into dense numerical vectors. These vectors capture the semantic meaning of words and their relationships with other words in a way that computers can understand.")
+import streamlit as st
+st.write("""
+**Word Embedding Techniques**
+**1. Word2Vec**
+Developed by Google, it uses two main approaches:
+- **CBOW (Continuous Bag of Words):** Predicts a word based on its context.
+- **Skip-Gram:** Predicts the context given a word.
+**2. GloVe (Global Vectors)**
+Developed by Stanford, it captures word relationships by analyzing co-occurrence statistics of words in a large corpus.
+**3. FastText**
+Developed by Facebook, it extends Word2Vec by considering subword information, making it better at handling rare and misspelled words.
+**4. Transformers (Contextual Embeddings)**
+Models like **BERT**, **ELMo**, and **GPT** generate embeddings based on the context in which a word appears, capturing nuanced meanings.
+""")