AngelPanizo commited on
Commit
d4f4829
·
verified ·
1 Parent(s): 378890b

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # MARTINI_enrich_BERTopic_JustDudeChannel
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_JustDudeChannel")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 115
34
+ * Number of training documents: 11667
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | propaganda - fauci - putin - billion - video | 20 | -1_propaganda_fauci_putin_billion |
42
+ | 0 | protests - unvaccinated - liberta - genoa - corona | 6341 | 0_protests_unvaccinated_liberta_genoa |
43
+ | 1 | israelis - hamas - antisemitism - shulamit - prager | 148 | 1_israelis_hamas_antisemitism_shulamit |
44
+ | 2 | transhumanism - neuralink - implanted - microchip - robots | 142 | 2_transhumanism_neuralink_implanted_microchip |
45
+ | 3 | migrants - sweden - noncitizens - ceuta - naturalized | 139 | 3_migrants_sweden_noncitizens_ceuta |
46
+ | 4 | davos - globalism - truthstream - rockefeller - lockdowns | 120 | 4_davos_globalism_truthstream_rockefeller |
47
+ | 5 | pfizer - brussels - mep - parliamentary - profiteers | 102 | 5_pfizer_brussels_mep_parliamentary |
48
+ | 6 | racism - whites - reparations - slaves - garvey | 99 | 6_racism_whites_reparations_slaves |
49
+ | 7 | twitter - musk - zuckerberg - dorsey - shareholder | 90 | 7_twitter_musk_zuckerberg_dorsey |
50
+ | 8 | vaccines - mrna - malone - patented - misinformation | 90 | 8_vaccines_mrna_malone_patented |
51
+ | 9 | id2020 - passports - aadhaar - biometric - wallet | 88 | 9_id2020_passports_aadhaar_biometric |
52
+ | 10 | nypd - hochul - vaccinations - mandates - protesters | 88 | 10_nypd_hochul_vaccinations_mandates |
53
+ | 11 | trudeau - convoys - trucker - freedoms - protesters | 87 | 11_trudeau_convoys_trucker_freedoms |
54
+ | 12 | desantis - florida - statewide - mandates - impaneled | 83 | 12_desantis_florida_statewide_mandates |
55
+ | 13 | climate - cnn - greenpeace - thunberg - kerry | 79 | 13_climate_cnn_greenpeace_thunberg |
56
+ | 14 | epstein - ghislaine - billionaire - trafficking - accused | 77 | 14_epstein_ghislaine_billionaire_trafficking |
57
+ | 15 | biden - republicans - manchin - ballots - msnbc | 75 | 15_biden_republicans_manchin_ballots |
58
+ | 16 | abortionists - scream - womb - mccorvey - ultrasound | 74 | 16_abortionists_scream_womb_mccorvey |
59
+ | 17 | superintendent - suspended - masks - loudoun - kindergarten | 74 | 17_superintendent_suspended_masks_loudoun |
60
+ | 18 | educators - indoctrinating - taught - antifa - crt | 74 | 18_educators_indoctrinating_taught_antifa |
61
+ | 19 | paedophile - reagan - documentary - scully - poltergeist | 66 | 19_paedophile_reagan_documentary_scully |
62
+ | 20 | rumsfeld - mcgowan - jfk - dyncorp - trillion | 65 | 20_rumsfeld_mcgowan_jfk_dyncorp |
63
+ | 21 | sheeple - troublemakers - damned - manipulated - officialmrcharlez | 65 | 21_sheeple_troublemakers_damned_manipulated |
64
+ | 22 | farmers - paul - vlaardingerbroek - manure - agenda | 63 | 22_farmers_paul_vlaardingerbroek_manure |
65
+ | 23 | bannon - insurrection - pelosi - terrorists - mccarthyism | 62 | 23_bannon_insurrection_pelosi_terrorists |
66
+ | 24 | oliver - gbnews - nonsense - unelected - awaken | 62 | 24_oliver_gbnews_nonsense_unelected |
67
+ | 25 | taxation - monetary - greenspan - gold - monarchs | 62 | 25_taxation_monetary_greenspan_gold |
68
+ | 26 | coups - lanka - protesters - nazarbayev - burundi | 61 | 26_coups_lanka_protesters_nazarbayev |
69
+ | 27 | gates - billions - vaccines - philanthropist - bioterrorist | 59 | 27_gates_billions_vaccines_philanthropist |
70
+ | 28 | marxism - weimar - bolshevik - professors - holodomor | 59 | 28_marxism_weimar_bolshevik_professors |
71
+ | 29 | nurses - vaxx - whistleblower - wwny - refused | 58 | 29_nurses_vaxx_whistleblower_wwny |
72
+ | 30 | transgenderism - dysphoria - detransitioned - internalized - mastectomy | 57 | 30_transgenderism_dysphoria_detransitioned_internalized |
73
+ | 31 | vaxxers - demonized - lunatics - fakemediaus - fuck | 57 | 31_vaxxers_demonized_lunatics_fakemediaus |
74
+ | 32 | fauci - coronaviruses - wuhan - bioweapons - laboratories | 55 | 32_fauci_coronaviruses_wuhan_bioweapons |
75
+ | 33 | bolsonaro - globo - argentina - multipolarista - presidency | 54 | 33_bolsonaro_globo_argentina_multipolarista |
76
+ | 34 | zealanders - jacinda - newshub - dictator - vaccinations | 52 | 34_zealanders_jacinda_newshub_dictator |
77
+ | 35 | scams - messages - impostor - evans_baked_telegram - redpilldealer4833 | 51 | 35_scams_messages_impostor_evans_baked_telegram |
78
+ | 36 | eu - windfarms - emissions - diesel - generators | 51 | 36_eu_windfarms_emissions_diesel |
79
+ | 37 | mandates - vaccinated - justices - employers - joe | 50 | 37_mandates_vaccinated_justices_employers |
80
+ | 38 | victoria - australians - daniel - unvaccinated - tiananmen | 49 | 38_victoria_australians_daniel_unvaccinated |
81
+ | 39 | kanye - jews - hollywoodism - whoopi - timcast | 48 | 39_kanye_jews_hollywoodism_whoopi |
82
+ | 40 | shame - dragphobic - perverts - naked - children | 48 | 40_shame_dragphobic_perverts_naked |
83
+ | 41 | megacities - mayors - mobility - driverless - bollards | 47 | 41_megacities_mayors_mobility_driverless |
84
+ | 42 | lockdown - quarantined - chongqing - xinjiang - hebei | 45 | 42_lockdown_quarantined_chongqing_xinjiang |
85
+ | 43 | masks - jabbed - idiocy - supremacists - zarah | 43 | 43_masks_jabbed_idiocy_supremacists |
86
+ | 44 | glaxosmithkline - lawsuits - risperdal - medicare - fraudulent | 43 | 44_glaxosmithkline_lawsuits_risperdal_medicare |
87
+ | 45 | liberty - montesquieu - tyrannize - karamazov - darkest | 43 | 45_liberty_montesquieu_tyrannize_karamazov |
88
+ | 46 | kenosha - acquitted - rioters - homicide - baldwin | 43 | 46_kenosha_acquitted_rioters_homicide |
89
+ | 47 | mealworms - soylent - cannibalism - angelina - snacks | 42 | 47_mealworms_soylent_cannibalism_angelina |
90
+ | 48 | passports - nhs - nightclubs - swansea - newtownards | 42 | 48_passports_nhs_nightclubs_swansea |
91
+ | 49 | wokeism - campus - loyola - curriculum - tenured | 40 | 49_wokeism_campus_loyola_curriculum |
92
+ | 50 | 1984 - orwell - dystopian - totalitarianism - novelist | 40 | 50_1984_orwell_dystopian_totalitarianism |
93
+ | 51 | cabal - sequel - episodes - laundering - 21 | 40 | 51_cabal_sequel_episodes_laundering |
94
+ | 52 | fetuses - aborted - phlebotomist - morgue - specimen | 40 | 52_fetuses_aborted_phlebotomist_morgue |
95
+ | 53 | fednow - cashless - cryptocurrency - cbdc - dollar | 40 | 53_fednow_cashless_cryptocurrency_cbdc |
96
+ | 54 | lockdown - brits - christmas - brendan - hertfordshire | 40 | 54_lockdown_brits_christmas_brendan |
97
+ | 55 | defiance - comply - idiocracy - coerced - criminalize | 39 | 55_defiance_comply_idiocracy_coerced |
98
+ | 56 | soros - hungary - mahathir - societies - geoffrey | 39 | 56_soros_hungary_mahathir_societies |
99
+ | 57 | macron - france - protest - gendarmes - martinique | 39 | 57_macron_france_protest_gendarmes |
100
+ | 58 | vaccinated - israel - paxlovid - booster - 043 | 38 | 58_vaccinated_israel_paxlovid_booster |
101
+ | 59 | surveillance - dahua - beijing - facial - xiaoshun | 38 | 59_surveillance_dahua_beijing_facial |
102
+ | 60 | pontiff - catholicfactchecking - synodal - denounced - nun | 38 | 60_pontiff_catholicfactchecking_synodal_denounced |
103
+ | 61 | lebron - shaquille - quarterback - mvp - refused | 38 | 61_lebron_shaquille_quarterback_mvp |
104
+ | 62 | trudeau - justin - canvassing - polluters - authoritarianism | 38 | 62_trudeau_justin_canvassing_polluters |
105
+ | 63 | blackrock - vanguard - ceo - billionaire - larry | 37 | 63_blackrock_vanguard_ceo_billionaire |
106
+ | 64 | lions - dude - link - never - 13k | 37 | 64_lions_dude_link_never |
107
+ | 65 | omicron - contagious - deadlier - epidemiologist - botswana | 36 | 65_omicron_contagious_deadlier_epidemiologist |
108
+ | 66 | blockfi - bankruptcy - ponzi - theranos - laundering | 36 | 66_blockfi_bankruptcy_ponzi_theranos |
109
+ | 67 | zelenskyy - volodymyr - oleksiy - oscar - actors | 35 | 67_zelenskyy_volodymyr_oleksiy_oscar |
110
+ | 68 | everybody - wailers - liars - depeche - vampiros | 35 | 68_everybody_wailers_liars_depeche |
111
+ | 69 | pizzagate - pedophile - moloch - videos - hanx | 34 | 69_pizzagate_pedophile_moloch_videos |
112
+ | 70 | masks - respirators - microplastic - pathogenic - h1n1 | 34 | 70_masks_respirators_microplastic_pathogenic |
113
+ | 71 | mkultra - brainwashing - torture - lsd - experiments | 34 | 71_mkultra_brainwashing_torture_lsd |
114
+ | 72 | donation - dude - lions - link - 59k | 33 | 72_donation_dude_lions_link |
115
+ | 73 | pfizer - bivalent - injections - approved - preschoolers | 33 | 73_pfizer_bivalent_injections_approved |
116
+ | 74 | footballers - defibrillator - died - dubravka - collapses | 33 | 74_footballers_defibrillator_died_dubravka |
117
+ | 75 | boris - johnson - population - partygate - shropshire | 32 | 75_boris_johnson_population_partygate |
118
+ | 76 | donetsk - makiivka - crimea - missiles - timoshenko | 32 | 76_donetsk_makiivka_crimea_missiles |
119
+ | 77 | pandemic - simulated - countermeasures - mcm - epilogue | 32 | 77_pandemic_simulated_countermeasures_mcm |
120
+ | 78 | transgender - minors - legislature - prohibit - idaho | 31 | 78_transgender_minors_legislature_prohibit |
121
+ | 79 | fluoridated - poisons - bisphenols - aspartame - sodas | 31 | 79_fluoridated_poisons_bisphenols_aspartame |
122
+ | 80 | nantes - protester - macron - march - reform | 31 | 80_nantes_protester_macron_march |
123
+ | 81 | illuminati - stonehenge - pyramids - pentagram - petrifaction | 31 | 81_illuminati_stonehenge_pyramids_pentagram |
124
+ | 82 | biden - poroshenko - blackmailing - maxey - laptop | 30 | 82_biden_poroshenko_blackmailing_maxey |
125
+ | 83 | milley - taiwan - biden - nukes - invade | 30 | 83_milley_taiwan_biden_nukes |
126
+ | 84 | djokovic - australia - visa - kosovo - deported | 30 | 84_djokovic_australia_visa_kosovo |
127
+ | 85 | ivermectine - hydroxychloroquine - dazitromicin - monotherapy - dexamethasone | 29 | 85_ivermectine_hydroxychloroquine_dazitromicin_monotherapy |
128
+ | 86 | vaccinated - deaths - reinfection - lancet - doubly | 28 | 86_vaccinated_deaths_reinfection_lancet |
129
+ | 87 | vaers - deaths - poisoning - 024414 - chadox | 28 | 87_vaers_deaths_poisoning_024414 |
130
+ | 88 | antifa - shootings - minneapolis - felon - marshals | 28 | 88_antifa_shootings_minneapolis_felon |
131
+ | 89 | netanyahu - ahmadinejad - lebanon - sanctions - stephanopoulos | 27 | 89_netanyahu_ahmadinejad_lebanon_sanctions |
132
+ | 90 | melbourne - enforcers - lockdown - protest - terrorized | 26 | 90_melbourne_enforcers_lockdown_protest |
133
+ | 91 | astrazeneca - died - jab - stroke - thrombocytopenia | 26 | 91_astrazeneca_died_jab_stroke |
134
+ | 92 | pfizerleak - whistleblower - falsified - trials - data | 25 | 92_pfizerleak_whistleblower_falsified_trials |
135
+ | 93 | yanukovych - documentaries - mikhalkov - malaysia - mh17 | 25 | 93_yanukovych_documentaries_mikhalkov_malaysia |
136
+ | 94 | worldwidedemonstration - lockdown - london - arrest - tyranny | 25 | 94_worldwidedemonstration_lockdown_london_arrest |
137
+ | 95 | pedophile - busted - sodomizing - wheeler - toddler | 25 | 95_pedophile_busted_sodomizing_wheeler |
138
+ | 96 | communism - britishfreedom - traitors - covertly - dodd | 24 | 96_communism_britishfreedom_traitors_covertly |
139
+ | 97 | wikileaks - mediaocracy - julian - hegemonic - pilger | 24 | 97_wikileaks_mediaocracy_julian_hegemonic |
140
+ | 98 | nasa - moon - landed - destroyed - 1969 | 24 | 98_nasa_moon_landed_destroyed |
141
+ | 99 | fauci - booster - doses - immunodeficiency - 4th | 24 | 99_fauci_booster_doses_immunodeficiency |
142
+ | 100 | euthanasia - midazolam - terminally - funeraire - canada | 24 | 100_euthanasia_midazolam_terminally_funeraire |
143
+ | 101 | inflation - fed - monetary - paul - counterfeit | 24 | 101_inflation_fed_monetary_paul |
144
+ | 102 | spikevax - myopericarditis - ecg - incidence - adolescents | 23 | 102_spikevax_myopericarditis_ecg_incidence |
145
+ | 103 | alliedpilots - vaxxed - qantas - firefighter - rescinded | 23 | 103_alliedpilots_vaxxed_qantas_firefighter |
146
+ | 104 | chemtrails - rainmaker - haarp - stratosphere - hurricanes | 23 | 104_chemtrails_rainmaker_haarp_stratosphere |
147
+ | 105 | pelosi - democrat - demonize - dianne - insurrectionists | 23 | 105_pelosi_democrat_demonize_dianne |
148
+ | 106 | lgbtq - headteacher - curriculum - pederasty - safeguarding | 23 | 106_lgbtq_headteacher_curriculum_pederasty |
149
+ | 107 | biden - inflation - yellen - gouged - trillions | 22 | 107_biden_inflation_yellen_gouged |
150
+ | 108 | vaccinated - paxlovid - hochul - test - symptoms | 22 | 108_vaccinated_paxlovid_hochul_test |
151
+ | 109 | australians - tyranny - pilger - federations - suppressed | 22 | 109_australians_tyranny_pilger_federations |
152
+ | 110 | fauci - smallpox - transmitted - aidsthis - misleading | 22 | 110_fauci_smallpox_transmitted_aidsthis |
153
+ | 111 | bioweapons - kazakhstan - infowars - pentagon - lavrov | 21 | 111_bioweapons_kazakhstan_infowars_pentagon |
154
+ | 112 | tranny - kinsey - kulturkampf - patriarchal - bezmenov | 21 | 112_tranny_kinsey_kulturkampf_patriarchal |
155
+ | 113 | milgram - obey - zimbardo - conformity - experiments | 20 | 113_milgram_obey_zimbardo_conformity |
156
+
157
+ </details>
158
+
159
+ ## Training hyperparameters
160
+
161
+ * calculate_probabilities: True
162
+ * language: None
163
+ * low_memory: False
164
+ * min_topic_size: 10
165
+ * n_gram_range: (1, 1)
166
+ * nr_topics: None
167
+ * seed_topic_list: None
168
+ * top_n_words: 10
169
+ * verbose: False
170
+ * zeroshot_min_similarity: 0.7
171
+ * zeroshot_topic_list: None
172
+
173
+ ## Framework versions
174
+
175
+ * Numpy: 1.26.4
176
+ * HDBSCAN: 0.8.40
177
+ * UMAP: 0.5.7
178
+ * Pandas: 2.2.3
179
+ * Scikit-Learn: 1.5.2
180
+ * Sentence-transformers: 3.3.1
181
+ * Transformers: 4.46.3
182
+ * Numba: 0.60.0
183
+ * Plotly: 5.24.1
184
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee0b4fea3e82eb7de71296387dae25b0f0798c49bba432b5901fe6b719d5c440
3
+ size 1604188
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4023f78f70b32a7117438ef1abd5d7c28fd72141dff7da22c4a42dc7ac3823bb
3
+ size 471136
topics.json ADDED
The diff for this file is too large to render. See raw diff