Harika22 commited on
Commit
747b9ef
Β·
verified Β·
1 Parent(s): f300d27

Update pages/6_Feature_Engineering.py

Browse files
Files changed (1) hide show
  1. pages/6_Feature_Engineering.py +53 -0
pages/6_Feature_Engineering.py CHANGED
@@ -649,4 +649,57 @@ elif file_type == "Term Frequency - Inverse Document Frequency(TF-IDF)":
649
  unsafe_allow_html=True,
650
  )
651
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
652
 
 
649
  unsafe_allow_html=True,
650
  )
651
 
652
+ st.subheader(":red[Why log is used]")
653
+ st.markdown("<h1 class='title'>πŸ“Œ Understanding TF-IDF Scaling</h1>", unsafe_allow_html=True)
654
+
655
+ st.markdown(
656
+ """
657
+ <div class='box'>
658
+ <h3 style='color: #6A0572;'>πŸ“Š Minimum and Maximum Values of N/n</h3>
659
+ <ul>
660
+ <li>When <strong>n is maximum</strong> β†’ <span class='highlight'>N/n = 1</span></li>
661
+ <li>At <strong>training time</strong>: <span class='highlight'>1 ≀ n ≀ N</span></li>
662
+ <li>At <strong>test time</strong>: <span class='highlight'>0 ≀ n ≀ N</span> (due to Out-of-Vocabulary words)</li>
663
+ </ul>
664
+ </div>
665
+ """,
666
+ unsafe_allow_html=True,
667
+ )
668
+
669
+ st.markdown(
670
+ """
671
+ <div class='box'>
672
+ <h3 style='color: #6A0572;'>βš–οΈ IDF Dominance Over TF</h3>
673
+ <ul>
674
+ <li>If <strong>n decreases</strong> β†’ <span class='highlight'>N/n increases (max)</span></li>
675
+ <li>TF scale is very <span class='highlight'>small</span>, but IDF scale is very <span class='highlight'>high</span></li>
676
+ <li>IDF can <span class='highlight'>dominate</span> TF, favoring rare words over frequent ones</li>
677
+ </ul>
678
+ </div>
679
+ """,
680
+ unsafe_allow_html=True,
681
+ )
682
+
683
+ st.markdown(
684
+ """
685
+ <div class='box'>
686
+ <h3 style='color: #6A0572;'>πŸ› οΈ How Log Solves IDF Dominance?</h3>
687
+ <ul>
688
+ <li>Applying <span class='highlight'>log</span> reduces the dominance of IDF</li>
689
+ <li>Logarithm <span class='highlight'>rounds off</span> values to a balanced scale</li>
690
+ <li>It prevents bias towards rare words and maintains proportionality</li>
691
+ </ul>
692
+ </div>
693
+ """,
694
+ unsafe_allow_html=True,
695
+ )
696
+
697
+ st.markdown(
698
+ """
699
+ <div class='formula'>
700
+ <strong>TF balances frequent words, while log(IDF) prevents rare-word dominance! πŸš€</strong>
701
+ </div>
702
+ """,
703
+ unsafe_allow_html=True,
704
+ )
705