Update README.md
Browse files
README.md
CHANGED
|
@@ -1,101 +1,109 @@
|
|
| 1 |
-
|
| 2 |
-
---
|
| 3 |
-
tags:
|
| 4 |
-
- bertopic
|
| 5 |
-
library_name: bertopic
|
| 6 |
-
pipeline_tag: text-classification
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
# ISSR_Dark_Web_31Topics_White_Nation
|
| 10 |
-
|
| 11 |
-
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
| 12 |
-
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
| 13 |
-
|
| 14 |
-
## Usage
|
| 15 |
-
|
| 16 |
-
To use this model, please install BERTopic:
|
| 17 |
-
|
| 18 |
-
```
|
| 19 |
-
pip install -U bertopic
|
| 20 |
-
```
|
| 21 |
-
|
| 22 |
-
You can use the model as follows:
|
| 23 |
-
|
| 24 |
-
```python
|
| 25 |
-
from bertopic import BERTopic
|
| 26 |
-
topic_model = BERTopic.load("D0men1c0/ISSR_Dark_Web_31Topics_White_Nation")
|
| 27 |
-
|
| 28 |
-
topic_model.get_topic_info()
|
| 29 |
-
```
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
|
| 48 |
-
|
|
| 49 |
-
|
|
| 50 |
-
|
|
| 51 |
-
|
|
| 52 |
-
|
|
| 53 |
-
|
|
| 54 |
-
|
|
| 55 |
-
|
|
| 56 |
-
|
|
| 57 |
-
|
|
| 58 |
-
|
|
| 59 |
-
|
|
| 60 |
-
|
|
| 61 |
-
|
|
| 62 |
-
|
|
| 63 |
-
|
|
| 64 |
-
|
|
| 65 |
-
|
|
| 66 |
-
|
|
| 67 |
-
|
|
| 68 |
-
|
|
| 69 |
-
|
|
| 70 |
-
|
|
| 71 |
-
|
|
| 72 |
-
|
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
*
|
| 87 |
-
*
|
| 88 |
-
*
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
*
|
| 93 |
-
*
|
| 94 |
-
*
|
| 95 |
-
*
|
| 96 |
-
*
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
*
|
| 101 |
-
*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
tags:
|
| 4 |
+
- bertopic
|
| 5 |
+
library_name: bertopic
|
| 6 |
+
pipeline_tag: text-classification
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# ISSR_Dark_Web_31Topics_White_Nation
|
| 10 |
+
|
| 11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
| 12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
| 13 |
+
|
| 14 |
+
## Usage
|
| 15 |
+
|
| 16 |
+
To use this model, please install BERTopic:
|
| 17 |
+
|
| 18 |
+
```
|
| 19 |
+
pip install -U bertopic
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
You can use the model as follows:
|
| 23 |
+
|
| 24 |
+
```python
|
| 25 |
+
from bertopic import BERTopic
|
| 26 |
+
topic_model = BERTopic.load("D0men1c0/ISSR_Dark_Web_31Topics_White_Nation")
|
| 27 |
+
|
| 28 |
+
topic_model.get_topic_info()
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
You can make predictions as follows:
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
sentence = ['climate']
|
| 35 |
+
topic, _ = topic_model.transform(sentence)
|
| 36 |
+
topic_model.get_topic_info(topic[0])
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
## Topic overview
|
| 40 |
+
|
| 41 |
+
* Number of topics: 32
|
| 42 |
+
* Number of training documents: 52310
|
| 43 |
+
|
| 44 |
+
<details>
|
| 45 |
+
<summary>Click here for an overview of all topics.</summary>
|
| 46 |
+
|
| 47 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
| 48 |
+
|----------|----------------|-----------------|-------|
|
| 49 |
+
| -1 | the - trump - to - of - in | 340 | outliers |
|
| 50 |
+
| 0 | socialism - lesson - applied socialism - practical - practical lesson applied | 16801 | Applied Socialism |
|
| 51 |
+
| 1 | trump - democrats - pelosi - biden - election | 10136 | 2020 Election Fraud Impeachment |
|
| 52 |
+
| 2 | border - illegal - wall - trump - mexico | 2606 | Border Wall Debate |
|
| 53 |
+
| 3 | israel - iran - syria - us - israeli | 1802 | Middle East Tensions Wars |
|
| 54 |
+
| 4 | brexit - eu - farage - europe - yellow | 1740 | EU Elections and Brexit Leaders |
|
| 55 |
+
| 5 | thread - re - you - pictures - pictures thread | 1184 | Funny Pictures Threads |
|
| 56 |
+
| 6 | climate - climate change - change - warming - global warming | 1067 | Climate Change Funding |
|
| 57 |
+
| 7 | the - fed - market - bank - banks | 915 | Global Central Banks |
|
| 58 |
+
| 8 | sgt - sgt report - report - appeared first - appeared first sgt | 1227 | SGT Report Articles |
|
| 59 |
+
| 9 | mueller - fbi - trump - clinton - obama | 832 | Trump Deep State |
|
| 60 |
+
| 10 | gun - guns - gun control - shooting - control | 3596 | Gun control and police shootings |
|
| 61 |
+
| 11 | facebook - google - tech - twitter - social media | 828 | Big Tech Censorship |
|
| 62 |
+
| 12 | china - trade - chinese - trump - tariffs | 818 | US Trade War |
|
| 63 |
+
| 13 | gold - silver - report - the post - sgt report | 642 | Gold Silver Ratio |
|
| 64 |
+
| 14 | epstein - jeffrey epstein - jeffrey - sex - maxwell | 750 | Epstein Maxwell Sex Scandal |
|
| 65 |
+
| 15 | women - men - transgender - gender - feminism | 569 | Transgender Rights and Feminism |
|
| 66 |
+
| 16 | jews - jewish - jew - holocaust - the jews | 485 | 20th Century Jewish History |
|
| 67 |
+
| 17 | kavanaugh - ford - christine - brett - brett kavanaugh | 590 | Kavanaugh Accuser |
|
| 68 |
+
| 18 | white - racist - white people - race - black | 442 | White Racism Follow |
|
| 69 |
+
| 19 | youtube - music - favorite - what favorite - what favorite music | 571 | Favorite Music Youtube |
|
| 70 |
+
| 20 | vaccine - vaccines - measles - vaccination - flu | 398 | Vaccine Lawsuit Losses |
|
| 71 |
+
| 21 | cancer - monsanto - pharma - drug - big pharma | 400 | Diabetes and Health |
|
| 72 |
+
| 22 | america - the - world - empire - globalists | 645 | Global Empire War |
|
| 73 |
+
| 23 | abortion - planned parenthood - parenthood - planned - babies | 332 | Planned Parenthood Abortion |
|
| 74 |
+
| 24 | christians - christianity - pope - christian - church | 281 | Christianity & Religion |
|
| 75 |
+
| 25 | media - news - cnn - fake news - fake | 551 | Mainstream Media and Fake News |
|
| 76 |
+
| 26 | antifa - portland - police - violence - protesters | 662 | Antifa Portland Attacks Journalist |
|
| 77 |
+
| 27 | college - school - students - schools - education | 337 | Education Politics |
|
| 78 |
+
| 28 | stormfront - stormfront sucks - re stormfront sucks - re stormfront - sucks | 374 | Stormfront Criticism |
|
| 79 |
+
| 29 | assange - julian - julian assange - wikileaks - us | 197 | Julian Assange Expulsion |
|
| 80 |
+
| 30 | coronavirus - virus - pandemic - outbreak - wuhan | 192 | Coronavirus Pandemic |
|
| 81 |
+
|
| 82 |
+
</details>
|
| 83 |
+
|
| 84 |
+
## Training hyperparameters
|
| 85 |
+
|
| 86 |
+
* calculate_probabilities: True
|
| 87 |
+
* language: None
|
| 88 |
+
* low_memory: False
|
| 89 |
+
* min_topic_size: 10
|
| 90 |
+
* n_gram_range: (1, 3)
|
| 91 |
+
* nr_topics: 32
|
| 92 |
+
* seed_topic_list: None
|
| 93 |
+
* top_n_words: 10
|
| 94 |
+
* verbose: True
|
| 95 |
+
* zeroshot_min_similarity: 0.7
|
| 96 |
+
* zeroshot_topic_list: None
|
| 97 |
+
|
| 98 |
+
## Framework versions
|
| 99 |
+
|
| 100 |
+
* Numpy: 1.26.4
|
| 101 |
+
* HDBSCAN: 0.8.36
|
| 102 |
+
* UMAP: 0.5.6
|
| 103 |
+
* Pandas: 2.2.1
|
| 104 |
+
* Scikit-Learn: 1.4.1.post1
|
| 105 |
+
* Sentence-transformers: 3.0.1
|
| 106 |
+
* Transformers: 4.39.3
|
| 107 |
+
* Numba: 0.60.0
|
| 108 |
+
* Plotly: 5.22.0
|
| 109 |
+
* Python: 3.12.2
|