Harika22 commited on
Commit
f300d27
Β·
verified Β·
1 Parent(s): f9c5382

Update pages/6_Feature_Engineering.py

Browse files
Files changed (1) hide show
  1. pages/6_Feature_Engineering.py +85 -2
pages/6_Feature_Engineering.py CHANGED
@@ -565,5 +565,88 @@ elif file_type == "Term Frequency - Inverse Document Frequency(TF-IDF)":
565
  unsafe_allow_html=True,
566
  )
567
 
568
- st.markdown("<p style='text-align: center; font-size: 18px;'><strong>TF-IDF effectively balances word significance and document relevance! πŸš€</strong></p>", unsafe_allow_html=True)
569
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
565
  unsafe_allow_html=True,
566
  )
567
 
568
+ st.markdown("<h1 class='title'>πŸ“Œ TF-IDF Key Insights</h1>", unsafe_allow_html=True)
569
+
570
+ st.markdown(
571
+ """
572
+ <div class='box'>
573
+ <h3 style='color: #6A0572;'>πŸ“ˆ Case 1: High TF-IDF Values</h3>
574
+ <ul>
575
+ <li>If the word appears <strong>frequently</strong> in a document β†’ <span class='highlight'>High TF-IDF</span></li>
576
+ </ul>
577
+ </div>
578
+ """,
579
+ unsafe_allow_html=True,
580
+ )
581
+
582
+ st.markdown(
583
+ """
584
+ <div class='box'>
585
+ <h3 style='color: #6A0572;'>πŸ“‰ Case 2: Low TF-IDF Values</h3>
586
+ <ul>
587
+ <li>If the word appears <strong>rarely</strong> in a document β†’ <span class='highlight'>Low TF-IDF</span></li>
588
+ <li>TF is always in the range: <strong>[0 - 1]</strong></li>
589
+ <li>IDF is in the range: <strong>[0 - ∞)</strong></li>
590
+ </ul>
591
+ </div>
592
+ """,
593
+ unsafe_allow_html=True,
594
+ )
595
+
596
+ st.markdown(
597
+ """
598
+ <div class='box'>
599
+ <h3 style='color: #6A0572;'>πŸ“Š Understanding TF (Term Frequency)</h3>
600
+ <ul>
601
+ <li>TF gives <strong>more importance</strong> to words that occur <strong>frequently</strong> in a document.</li>
602
+ <li>As the word frequency <span class='highlight'>increases</span> β†’ TF <span class='highlight'>increases</span>.</li>
603
+ </ul>
604
+ </div>
605
+ """,
606
+ unsafe_allow_html=True,
607
+ )
608
+
609
+ st.markdown(
610
+ """
611
+ <div class='box'>
612
+ <h3 style='color: #6A0572;'>πŸ“‰ Understanding IDF (Inverse Document Frequency)</h3>
613
+ <ul>
614
+ <li>IDF Formula: <span class='highlight'>IDF(wα΅’, C) = log(N/n)</span></li>
615
+ <li><strong>N:</strong> Total number of documents</li>
616
+ <li><strong>n:</strong> Number of documents containing the word</li>
617
+ </ul>
618
+ </div>
619
+ """,
620
+ unsafe_allow_html=True,
621
+ )
622
+
623
+ st.markdown(
624
+ """
625
+ <div class='formula'>
626
+ <strong>When n is small:</strong> <br>
627
+ - N/n increases β†’ log(N/n) increases ⬆️ <br>
628
+ - Word is rare in the corpus β†’ Higher importance in IDF <br><br>
629
+ <strong>When n is large:</strong> <br>
630
+ - N/n decreases β†’ log(N/n) decreases ⬇️ <br>
631
+ - Word is common β†’ Lower importance in IDF <br><br>
632
+ <strong>When N = n:</strong> log(N/n) = 0 (word appears in every document)
633
+ </div>
634
+ """,
635
+ unsafe_allow_html=True,
636
+ )
637
+
638
+ st.markdown(
639
+ """
640
+ <div class='box'>
641
+ <h3 style='color: #6A0572;'>πŸ“Œ TF-IDF Calculation</h3>
642
+ <ul>
643
+ <li><strong>TF</strong> focuses on words <strong>frequent</strong> in a document.</li>
644
+ <li><strong>IDF</strong> focuses on words <strong>rare</strong> in the corpus.</li>
645
+ <li><span class='highlight'>TF-IDF is high</span> for words that appear <strong>often in a document</strong> but <strong>rarely in the corpus</strong>.</li>
646
+ </ul>
647
+ </div>
648
+ """,
649
+ unsafe_allow_html=True,
650
+ )
651
+
652
+