Add BERTopic model
Browse files- README.md +146 -0
- config.json +16 -0
- ctfidf.safetensors +3 -0
- ctfidf_config.json +0 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
|
@@ -0,0 +1,146 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
tags:
|
| 4 |
+
- bertopic
|
| 5 |
+
library_name: bertopic
|
| 6 |
+
pipeline_tag: text-classification
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# MARTINI_enrich_BERTopic_wolfchannel2
|
| 10 |
+
|
| 11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
| 12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
| 13 |
+
|
| 14 |
+
## Usage
|
| 15 |
+
|
| 16 |
+
To use this model, please install BERTopic:
|
| 17 |
+
|
| 18 |
+
```
|
| 19 |
+
pip install -U bertopic
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
You can use the model as follows:
|
| 23 |
+
|
| 24 |
+
```python
|
| 25 |
+
from bertopic import BERTopic
|
| 26 |
+
topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_wolfchannel2")
|
| 27 |
+
|
| 28 |
+
topic_model.get_topic_info()
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
## Topic overview
|
| 32 |
+
|
| 33 |
+
* Number of topics: 77
|
| 34 |
+
* Number of training documents: 9755
|
| 35 |
+
|
| 36 |
+
<details>
|
| 37 |
+
<summary>Click here for an overview of all topics.</summary>
|
| 38 |
+
|
| 39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
| 40 |
+
|----------|----------------|-----------------|-------|
|
| 41 |
+
| -1 | ukraine - biden - truth - news - vaccine | 20 | -1_ukraine_biden_truth_news |
|
| 42 |
+
| 0 | israelis - hamas - gazans - canaanites - jazeera | 5872 | 0_israelis_hamas_gazans_canaanites |
|
| 43 |
+
| 1 | ballots - lindell - georgia - republicans - hacked | 177 | 1_ballots_lindell_georgia_republicans |
|
| 44 |
+
| 2 | zelenskiy - poroshenko - ukrainehumanrightsabuses - volodymyr - kyiv | 150 | 2_zelenskiy_poroshenko_ukrainehumanrightsabuses_volodymyr |
|
| 45 |
+
| 3 | extradited - wikileaks - guantanamo - prosecution - judges | 140 | 3_extradited_wikileaks_guantanamo_prosecution |
|
| 46 |
+
| 4 | thegreatclimatecon - alarmists - emissions - agentsoftruthchat - dioxide | 125 | 4_thegreatclimatecon_alarmists_emissions_agentsoftruthchat |
|
| 47 |
+
| 5 | londonofficialworldwiderally - merseyside - yorkshire - bbc - cardiff | 111 | 5_londonofficialworldwiderally_merseyside_yorkshire_bbc |
|
| 48 |
+
| 6 | vaers - pfizer - unvaccinated - fatalities - remdesivir | 105 | 6_vaers_pfizer_unvaccinated_fatalities |
|
| 49 |
+
| 7 | cashless - cbdc - currencies - citizens - biometric | 102 | 7_cashless_cbdc_currencies_citizens |
|
| 50 |
+
| 8 | donkey - dont - arguing - anyone - ignorance | 96 | 8_donkey_dont_arguing_anyone |
|
| 51 |
+
| 9 | totaldisclosure - bitchute - jonbenet - documentaries - alessandro | 88 | 9_totaldisclosure_bitchute_jonbenet_documentaries |
|
| 52 |
+
| 10 | holland - willem - vlaardingerbroek - muiswinkel - whistleblowers | 87 | 10_holland_willem_vlaardingerbroek_muiswinkel |
|
| 53 |
+
| 11 | telegram - chats - encrypted - deleted - cathyfox1 | 86 | 11_telegram_chats_encrypted_deleted |
|
| 54 |
+
| 12 | syria - hezbollah - lebanon - bashar - damascus | 85 | 12_syria_hezbollah_lebanon_bashar |
|
| 55 |
+
| 13 | epstein - ghislaine - justices - clinton - traffickers | 85 | 13_epstein_ghislaine_justices_clinton |
|
| 56 |
+
| 14 | bastille - protests - reims - aristocracy - macron | 82 | 14_bastille_protests_reims_aristocracy |
|
| 57 |
+
| 15 | trudeau - ontario - calgary - blockades - singh | 80 | 15_trudeau_ontario_calgary_blockades |
|
| 58 |
+
| 16 | ukrainehumanrightsabuses - chernivtsi - zelensky - persecution - monastery | 77 | 16_ukrainehumanrightsabuses_chernivtsi_zelensky_persecution |
|
| 59 |
+
| 17 | mkultra - laura - survivors - cathy - hartwell | 74 | 17_mkultra_laura_survivors_cathy |
|
| 60 |
+
| 18 | illuminati - freemasons - baphomet - satanic - pentagram | 71 | 18_illuminati_freemasons_baphomet_satanic |
|
| 61 |
+
| 19 | illegals - sinaloa - patrols - cartels - smuggler | 66 | 19_illegals_sinaloa_patrols_cartels |
|
| 62 |
+
| 20 | brics - dollarization - rubles - sanctions - algeria | 65 | 20_brics_dollarization_rubles_sanctions |
|
| 63 |
+
| 21 | bioweapons - pentagon - kharkov - laboratories - pharmbiotest | 65 | 21_bioweapons_pentagon_kharkov_laboratories |
|
| 64 |
+
| 22 | pakistan - bhutto - imran - rawalpindi - crackdown | 63 | 22_pakistan_bhutto_imran_rawalpindi |
|
| 65 |
+
| 23 | nigeriens - chadian - cameroon - mauritania - khartoum | 60 | 23_nigeriens_chadian_cameroon_mauritania |
|
| 66 |
+
| 24 | reichstag - moldova - protesters - manifesto - sanctions | 57 | 24_reichstag_moldova_protesters_manifesto |
|
| 67 |
+
| 25 | davos - oligarchs - globalist - kissinger - chairman | 52 | 25_davos_oligarchs_globalist_kissinger |
|
| 68 |
+
| 26 | pandemics - sovereignty - ghebreyesus - mundial - amendments | 49 | 26_pandemics_sovereignty_ghebreyesus_mundial |
|
| 69 |
+
| 27 | webb - george - journalist - weisendanger - bioweapons | 49 | 27_webb_george_journalist_weisendanger |
|
| 70 |
+
| 28 | police - arrested - tasered - robocops - victoria | 48 | 28_police_arrested_tasered_robocops |
|
| 71 |
+
| 29 | gmo - doritos - pepsico - apples - additives | 46 | 29_gmo_doritos_pepsico_apples |
|
| 72 |
+
| 30 | comey - mueller - spygate - dossier - falsified | 45 | 30_comey_mueller_spygate_dossier |
|
| 73 |
+
| 31 | magnetic - orgone - pyramids - rotating - hyperspace | 44 | 31_magnetic_orgone_pyramids_rotating |
|
| 74 |
+
| 32 | twitters - musk - banned - cnn - antivaxxers | 44 | 32_twitters_musk_banned_cnn |
|
| 75 |
+
| 33 | boris - chancellor - liz - british - sunak | 43 | 33_boris_chancellor_liz_british |
|
| 76 |
+
| 34 | bbc - unitynewsnetwork - murdoch - thelighttruthpaper - disinformation | 42 | 34_bbc_unitynewsnetwork_murdoch_thelighttruthpaper |
|
| 77 |
+
| 35 | paedophilies - dunblane - spycams - schoolgirls - detained | 41 | 35_paedophilies_dunblane_spycams_schoolgirls |
|
| 78 |
+
| 36 | maskenpflicht - worldcouncilforhealth - microplastics - wearers - inhaled | 41 | 36_maskenpflicht_worldcouncilforhealth_microplastics_wearers |
|
| 79 |
+
| 37 | neuralnanorobots - transhumanist - pineal - sophia - interconnects | 39 | 37_neuralnanorobots_transhumanist_pineal_sophia |
|
| 80 |
+
| 38 | ukraine - missiles - militarization - billions - smuggled | 39 | 38_ukraine_missiles_militarization_billions |
|
| 81 |
+
| 39 | 5g - wifi - antennas - fcc - protection | 38 | 39_5g_wifi_antennas_fcc |
|
| 82 |
+
| 40 | betterwayconference - agora - sociocracy - livestream - decentralized | 37 | 40_betterwayconference_agora_sociocracy_livestream |
|
| 83 |
+
| 41 | nazies - ukrainian - malorossia - расы - черным | 37 | 41_nazies_ukrainian_malorossia_расы |
|
| 84 |
+
| 42 | dysphoria - transqueer - perversion - molested - indoctrinating | 37 | 42_dysphoria_transqueer_perversion_molested |
|
| 85 |
+
| 43 | migrants - hotel - rotherham - skegness - countrywide | 37 | 43_migrants_hotel_rotherham_skegness |
|
| 86 |
+
| 44 | severodonetsk - infantry - ukrainian - deserters - mobilised | 37 | 44_severodonetsk_infantry_ukrainian_deserters |
|
| 87 |
+
| 45 | gazprom - natgas - europe - energy - shortages | 36 | 45_gazprom_natgas_europe_energy |
|
| 88 |
+
| 46 | eu - zakharova - russia - sanctions - slavyangrad | 35 | 46_eu_zakharova_russia_sanctions |
|
| 89 |
+
| 47 | pfizer - vaccinologist - deathvax - malhotra - veritastips | 32 | 47_pfizer_vaccinologist_deathvax_malhotra |
|
| 90 |
+
| 48 | trafficked - tracy - ali - mothers - youtube | 32 | 48_trafficked_tracy_ali_mothers |
|
| 91 |
+
| 49 | trump - treason - pelosi - antifa - dod | 32 | 49_trump_treason_pelosi_antifa |
|
| 92 |
+
| 50 | taiwan - beijing - mao - jinping - cpec | 32 | 50_taiwan_beijing_mao_jinping |
|
| 93 |
+
| 51 | vaccination - saveourrightsuk - employer - mandatory - exemption | 31 | 51_vaccination_saveourrightsuk_employer_mandatory |
|
| 94 |
+
| 52 | donetsk - journalists - rangeloninews - medvedev - maslennikov | 31 | 52_donetsk_journalists_rangeloninews_medvedev |
|
| 95 |
+
| 53 | britney - kardashian - ballenciaga - slaveprincess - conservatorship | 31 | 53_britney_kardashian_ballenciaga_slaveprincess |
|
| 96 |
+
| 54 | nordstream - sabotage - slavyangrad - explosions - сgtn | 30 | 54_nordstream_sabotage_slavyangrad_explosions |
|
| 97 |
+
| 55 | exposefauci - arrestfauci - sanofipasteur - denguevaxia - suppressed | 29 | 55_exposefauci_arrestfauci_sanofipasteur_denguevaxia |
|
| 98 |
+
| 56 | saddam - iraqis - bombing - vietnam - американские | 28 | 56_saddam_iraqis_bombing_vietnam |
|
| 99 |
+
| 57 | hawaii - lahaina - fema - sirens - 911 | 28 | 57_hawaii_lahaina_fema_sirens |
|
| 100 |
+
| 58 | kosovo - serbia - mitrovica - sarajevo - bombings | 27 | 58_kosovo_serbia_mitrovica_sarajevo |
|
| 101 |
+
| 59 | doctors - corona - frontlineflash_ - injustices - policemen | 27 | 59_doctors_corona_frontlineflash__injustices |
|
| 102 |
+
| 60 | ukraine - grains - cargill - exports - hryvnias | 25 | 60_ukraine_grains_cargill_exports |
|
| 103 |
+
| 61 | vaccinated - pfizer - nspcc - pediatrician - yprovisionalfiguresondeathsregisteredinenglandandwales | 25 | 61_vaccinated_pfizer_nspcc_pediatrician |
|
| 104 |
+
| 62 | corbyn - starmer - blairite - antisemitic - mi5 | 25 | 62_corbyn_starmer_blairite_antisemitic |
|
| 105 |
+
| 63 | mariupol - azovstal - nikolaevka - militants - evacuated | 25 | 63_mariupol_azovstal_nikolaevka_militants |
|
| 106 |
+
| 64 | bidenlaptopemails - pizzagate - bryan - sharylattkisson - whitneymeade | 24 | 64_bidenlaptopemails_pizzagate_bryan_sharylattkisson |
|
| 107 |
+
| 65 | lockdown - drones - qr - malaysia - yunnan | 23 | 65_lockdown_drones_qr_malaysia |
|
| 108 |
+
| 66 | ukraine - lukashenko - ossetia - европе - конфликта | 23 | 66_ukraine_lukashenko_ossetia_европе |
|
| 109 |
+
| 67 | ukrainians - органов - transplantation - corpses - orphanages | 23 | 67_ukrainians_органов_transplantation_corpses |
|
| 110 |
+
| 68 | haitians - assassinated - corrupcion - ecuador - villavicencio | 22 | 68_haitians_assassinated_corrupcion_ecuador |
|
| 111 |
+
| 69 | zaporizhzhya - chernobyl - natokraine - nuclear - 750kv | 22 | 69_zaporizhzhya_chernobyl_natokraine_nuclear |
|
| 112 |
+
| 70 | transwoman - heptathlete - competing - runners - girls | 21 | 70_transwoman_heptathlete_competing_runners |
|
| 113 |
+
| 71 | neonazis - radicalised - paramilitary - paevska - azovites | 21 | 71_neonazis_radicalised_paramilitary_paevska |
|
| 114 |
+
| 72 | mullingar - dublin - immigrants - fighting - cookstown | 21 | 72_mullingar_dublin_immigrants_fighting |
|
| 115 |
+
| 73 | biden - pedophiles - fbi - son - totaldisclosure | 20 | 73_biden_pedophiles_fbi_son |
|
| 116 |
+
| 74 | chemtrail - contrails - spraying - cloudbuster - atmospheric | 20 | 74_chemtrail_contrails_spraying_cloudbuster |
|
| 117 |
+
| 75 | uranium - munitions - depleted - contamination - zakharova | 20 | 75_uranium_munitions_depleted_contamination |
|
| 118 |
+
|
| 119 |
+
</details>
|
| 120 |
+
|
| 121 |
+
## Training hyperparameters
|
| 122 |
+
|
| 123 |
+
* calculate_probabilities: True
|
| 124 |
+
* language: None
|
| 125 |
+
* low_memory: False
|
| 126 |
+
* min_topic_size: 10
|
| 127 |
+
* n_gram_range: (1, 1)
|
| 128 |
+
* nr_topics: None
|
| 129 |
+
* seed_topic_list: None
|
| 130 |
+
* top_n_words: 10
|
| 131 |
+
* verbose: False
|
| 132 |
+
* zeroshot_min_similarity: 0.7
|
| 133 |
+
* zeroshot_topic_list: None
|
| 134 |
+
|
| 135 |
+
## Framework versions
|
| 136 |
+
|
| 137 |
+
* Numpy: 1.26.4
|
| 138 |
+
* HDBSCAN: 0.8.40
|
| 139 |
+
* UMAP: 0.5.7
|
| 140 |
+
* Pandas: 2.2.3
|
| 141 |
+
* Scikit-Learn: 1.5.2
|
| 142 |
+
* Sentence-transformers: 3.3.1
|
| 143 |
+
* Transformers: 4.46.3
|
| 144 |
+
* Numba: 0.60.0
|
| 145 |
+
* Plotly: 5.24.1
|
| 146 |
+
* Python: 3.10.12
|
config.json
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"calculate_probabilities": true,
|
| 3 |
+
"language": null,
|
| 4 |
+
"low_memory": false,
|
| 5 |
+
"min_topic_size": 10,
|
| 6 |
+
"n_gram_range": [
|
| 7 |
+
1,
|
| 8 |
+
1
|
| 9 |
+
],
|
| 10 |
+
"nr_topics": null,
|
| 11 |
+
"seed_topic_list": null,
|
| 12 |
+
"top_n_words": 10,
|
| 13 |
+
"verbose": false,
|
| 14 |
+
"zeroshot_min_similarity": 0.7,
|
| 15 |
+
"zeroshot_topic_list": null
|
| 16 |
+
}
|
ctfidf.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:53e4d9c7316fccab8c6c5ef0b3008bb24bdbe63a76e657268ef8d4433994ccad
|
| 3 |
+
size 1688504
|
ctfidf_config.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
topic_embeddings.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:013b16132954fa0a2d0f2347ec759e6def3bca6f8b5279710a0adfc301ad451e
|
| 3 |
+
size 315480
|
topics.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|