Spaces:

Rajesh6
/

NLP

Sleeping

Rajesh6 commited on Nov 23, 2024

Commit

4825ee3

verified ·

1 Parent(s): 5cd7146

Update pages/Introduction.py

Files changed (1) hide show

pages/Introduction.py CHANGED Viewed

@@ -65,4 +65,26 @@ st.write("The **TF-IDF Vectorizer** is a popular technique in Natural Language P
 st.write('**Term Frequency (TF)** \n - Measures how often a word appears in a single document. \n - Formula: \n _TF_ = Number of times the word appears in the document / Total number of words in the document' )
 st.write('**Inverse Document Frequency (IDF)** \n Measures how unique or rare a word is across all documents in the corpus. \n - Formula: \n  _IDF_ = log(Total no.of documents / No of Documnets containing the word) \n Words that appear in many documents (like "the" or "and") will have a low IDF value, while unique words (like "NLP") will have a higher IDF.')
-st.write('**TF - IDF Score:** \n - Combines TF and IDF to calculate the importance of a word in a document. \n - Formula: \n TF - IDF = TF x IDF \n Words that are frequent in a document but rare in the overall corpus get a higher score.')

 st.write('**Term Frequency (TF)** \n - Measures how often a word appears in a single document. \n - Formula: \n _TF_ = Number of times the word appears in the document / Total number of words in the document' )
 st.write('**Inverse Document Frequency (IDF)** \n Measures how unique or rare a word is across all documents in the corpus. \n - Formula: \n  _IDF_ = log(Total no.of documents / No of Documnets containing the word) \n Words that appear in many documents (like "the" or "and") will have a low IDF value, while unique words (like "NLP") will have a higher IDF.')
+st.write('**TF - IDF Score:** \n - Combines TF and IDF to calculate the importance of a word in a document. \n - Formula: \n _TF - IDF = TF x IDF_ \n Words that are frequent in a document but rare in the overall corpus get a higher score.')
+st.write("Examples:")
+st.write("""
+### Example
+**Consider these two documents:**
+- "I love NLP"
+- "NLP is amazing"
+#### Step 1: Calculate TF
+- "NLP" appears once in each document, so its TF is **1/3** in both.
+- Words like "love" and "amazing" also have a TF of **1/3**.
+#### Step 2: Calculate IDF
+- "NLP" appears in both documents, so its IDF is **log(2/2) = 0**.
+- "love" and "amazing" appear in only one document each, so their IDF is **log(2/1) = 0.69**.
+#### Step 3: Compute TF-IDF
+- "NLP" gets a TF-IDF score of **1/3 × 0 = 0** (not unique).
+- "love" and "amazing" get scores of **1/3 × 0.69 = 0.23** (more unique).
+""")