File size: 2,290 Bytes
19b102a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
We can visualize the selected terms for a few topics by creating bar charts out of the c-TF-IDF scores 
for each topic representation. Insights can be gained from the relative c-TF-IDF scores between and within 
topics. Moreover, you can easily compare topic representations to each other. 
To visualize this hierarchy, run the following:

```python
topic_model.visualize_barchart()
```

<iframe src="bar_chart.html" style="width:1100px; height: 660px; border: 0px;""></iframe>


## **Visualize Term Score Decline**
Topics are represented by a number of words starting with the best representative word. 
Each word is represented by a c-TF-IDF score. The higher the score, the more representative a word 
to the topic is. Since the topic words are sorted by their c-TF-IDF score, the scores slowly decline 
with each word that is added. At some point adding words to the topic representation only marginally 
increases the total c-TF-IDF score and would not be beneficial for its representation. 

To visualize this effect, we can plot the c-TF-IDF scores for each topic by the term rank of each word. 
In other words, the position of the words (term rank), where the words with 
the highest c-TF-IDF score will have a rank of 1, will be put on the x-axis. Whereas the y-axis 
will be populated by the c-TF-IDF scores. The result is a visualization that shows you the decline 
of c-TF-IDF score when adding words to the topic representation. It allows you, using the elbow method, 
the select the best number of words in a topic. 

To visualize the c-TF-IDF score decline, run the following:

```python
topic_model.visualize_term_rank()
```

<iframe src="term_rank.html" style="width:1000px; height: 530px; border: 0px;""></iframe>

To enable the log scale on the y-axis for a better view of individual topics, run the following:

```python
topic_model.visualize_term_rank(log_scale=True)
```

<iframe src="term_rank_log.html" style="width:1000px; height: 530px; border: 0px;""></iframe>

This visualization was heavily inspired by the "Term Probability Decline" visualization found in an 
analysis by the amazing [tmtoolkit](https://tmtoolkit.readthedocs.io/).
Reference to that specific analysis can be found 
[here](https://wzbsocialsciencecenter.github.io/tm_corona/tm_analysis.html).