AngelPanizo commited on
Commit
f8a9679
·
verified ·
1 Parent(s): 3164eab

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # MARTINI_enrich_BERTopic_DrTenpenny
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_DrTenpenny")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 87
34
+ * Number of training documents: 11925
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | vaccine - shots - 2021 - biden - anyone | 20 | -1_vaccine_shots_2021_biden |
42
+ | 0 | transgenders - homosexuals - lgbtq - pedophilia - florida | 7533 | 0_transgenders_homosexuals_lgbtq_pedophilia |
43
+ | 1 | antivaxxer - immunised - thimerosal - placebo - never | 151 | 1_antivaxxer_immunised_thimerosal_placebo |
44
+ | 2 | died - athletes - defibrillator - sudden - collapses | 145 | 2_died_athletes_defibrillator_sudden |
45
+ | 3 | morningcoffee - thisweekwithdrt - podcast - tuesday - preregister | 123 | 3_morningcoffee_thisweekwithdrt_podcast_tuesday |
46
+ | 4 | criticallythinking - drtanddrp - 7pm - thursday - webinar | 109 | 4_criticallythinking_drtanddrp_7pm_thursday |
47
+ | 5 | pfizer - miscarriages - rhogam - placenta - nirsevimab | 109 | 5_pfizer_miscarriages_rhogam_placenta |
48
+ | 6 | hydroxychloroquine - ivermectin - paxlovid - prescribed - midazolam | 108 | 6_hydroxychloroquine_ivermectin_paxlovid_prescribed |
49
+ | 7 | trudeau - quebec - protesters - convoy - provincial | 96 | 7_trudeau_quebec_protesters_convoy |
50
+ | 8 | livestream - join - telgram - todays - clouthub | 94 | 8_livestream_join_telgram_todays |
51
+ | 9 | subscribe - drtenpenny - webinars - emails - breakthroughs | 88 | 9_subscribe_drtenpenny_webinars_emails |
52
+ | 10 | pcr - sarscov - mullis - tested - swab | 86 | 10_pcr_sarscov_mullis_tested |
53
+ | 11 | plandemic - bioterror - hysteria - smokescreen - isis | 85 | 11_plandemic_bioterror_hysteria_smokescreen |
54
+ | 12 | vaers - adenoviruses - guillain - myocarditis - adverse | 83 | 12_vaers_adenoviruses_guillain_myocarditis |
55
+ | 13 | hurricanes - nexrad - texas - mudslides - geoengineering | 79 | 13_hurricanes_nexrad_texas_mudslides |
56
+ | 14 | detoxifier - toxins - arsenic - zeolite - antioxidants | 78 | 14_detoxifier_toxins_arsenic_zeolite |
57
+ | 15 | mypillows - pillowcase - promocode - lindell - towel | 78 | 15_mypillows_pillowcase_promocode_lindell |
58
+ | 16 | vaxteen - immunization - sanofi - meningococcal - shots | 76 | 16_vaxteen_immunization_sanofi_meningococcal |
59
+ | 17 | covid - vaccinated - deaths - 2020 - doses | 73 | 17_covid_vaccinated_deaths_2020 |
60
+ | 18 | episodes - april - clouthub - 2023 - subscribed | 72 | 18_episodes_april_clouthub_2023 |
61
+ | 19 | aspartame - nutrient - pesticides - cucumbers - gummy | 67 | 19_aspartame_nutrient_pesticides_cucumbers |
62
+ | 20 | pfizer - whistleblower - ema - 7bn - backtolife_2022 | 64 | 20_pfizer_whistleblower_ema_7bn |
63
+ | 21 | thisweekwithdrt - podcast - tracy - abortion - whatintheworldaretheyspraying | 61 | 21_thisweekwithdrt_podcast_tracy_abortion |
64
+ | 22 | prayerfully - tenpenny - sunday - bible - walking | 57 | 22_prayerfully_tenpenny_sunday_bible |
65
+ | 23 | oecd - biometric - passports - experian - malawi | 56 | 23_oecd_biometric_passports_experian |
66
+ | 24 | desantis - tallahassee - mandates - veto - deported | 55 | 24_desantis_tallahassee_mandates_veto |
67
+ | 25 | covidvaccinevictims - died - 2021 - astrazeneca - julieta | 55 | 25_covidvaccinevictims_died_2021_astrazeneca |
68
+ | 26 | diesel - fuels - vehicletax - chargetax - deere | 55 | 26_diesel_fuels_vehicletax_chargetax |
69
+ | 27 | brighteontv - 7pm - tina - invited - biopesticide | 55 | 27_brighteontv_7pm_tina_invited |
70
+ | 28 | twitter - trump - freespeech - misinformation - whistleblowers | 55 | 28_twitter_trump_freespeech_misinformation |
71
+ | 29 | effects - electromagnetic - cells - fatigue - melatonin | 55 | 29_effects_electromagnetic_cells_fatigue |
72
+ | 30 | biden - idk - bankster - isolationist - lunatics | 53 | 30_biden_idk_bankster_isolationist |
73
+ | 31 | hebrews - blessed - psalm - unrighteousness - forevermore | 52 | 31_hebrews_blessed_psalm_unrighteousness |
74
+ | 32 | eugenics - bgi - anticoronaterrorism - soros - fluvax | 50 | 32_eugenics_bgi_anticoronaterrorism_soros |
75
+ | 33 | lockdown - australiatimes - france - theeuropenews - tyranny | 50 | 33_lockdown_australiatimes_france_theeuropenews |
76
+ | 34 | flights - airplanes - jetways - pilots - stewardess | 49 | 34_flights_airplanes_jetways_pilots |
77
+ | 35 | dollarization - bitcoin - rbi - yuan - cashless | 49 | 35_dollarization_bitcoin_rbi_yuan |
78
+ | 36 | facemasks - sars - legionella - meningitis - respiratory | 48 | 36_facemasks_sars_legionella_meningitis |
79
+ | 37 | unvaccinated - mandate - exemptions - employers - mississippi | 47 | 37_unvaccinated_mandate_exemptions_employers |
80
+ | 38 | ukraine - zelensky - israel - nordstream - ww3 | 47 | 38_ukraine_zelensky_israel_nordstream |
81
+ | 39 | tenpennyecp - ecpstudioventura - revitalize - counterpulsation - massage | 46 | 39_tenpennyecp_ecpstudioventura_revitalize_counterpulsation |
82
+ | 40 | myopericarditis - immunization - incidence - 2021the - mrna | 46 | 40_myopericarditis_immunization_incidence_2021the |
83
+ | 41 | farms - ranchers - meat - iaf_goats - starve | 45 | 41_farms_ranchers_meat_iaf_goats |
84
+ | 42 | optimune - supplements - quercetin - probiotic - zinc | 45 | 42_optimune_supplements_quercetin_probiotic |
85
+ | 43 | faucis - anthony - lying - gitmo - depraved | 44 | 43_faucis_anthony_lying_gitmo |
86
+ | 44 | wonder - sideways - priceless - fools - deeper | 43 | 44_wonder_sideways_priceless_fools |
87
+ | 45 | vaccines - susceptibility - immunocompromised - bivalent - lentiviral | 42 | 45_vaccines_susceptibility_immunocompromised_bivalent |
88
+ | 46 | cardiomiracle - nattokinase - endothelium - supplement - stevia | 42 | 46_cardiomiracle_nattokinase_endothelium_supplement |
89
+ | 47 | tenpennyapparel - promo - sweatshirts - shopped - drt | 41 | 47_tenpennyapparel_promo_sweatshirts_shopped |
90
+ | 48 | spring2022bootcamp - bushcraft - enrolled - 4wks - coupon | 41 | 48_spring2022bootcamp_bushcraft_enrolled_4wks |
91
+ | 49 | weforum - davos - corporativism - technocrats - crises | 40 | 49_weforum_davos_corporativism_technocrats |
92
+ | 50 | vet - nexgard - heartworm - homeopathic - pomeranian | 40 | 50_vet_nexgard_heartworm_homeopathic |
93
+ | 51 | tyranny - plutocracy - democratically - socrates - mencken | 40 | 51_tyranny_plutocracy_democratically_socrates |
94
+ | 52 | pastor - tonight - deliverance - prayer - welcome | 40 | 52_pastor_tonight_deliverance_prayer |
95
+ | 53 | neuroweapons - cyborgs - transhumanist - deepmind - humanrf | 39 | 53_neuroweapons_cyborgs_transhumanist_deepmind |
96
+ | 54 | russiagate - brainwash - libtards - newsworthy - anon | 39 | 54_russiagate_brainwash_libtards_newsworthy |
97
+ | 55 | fauci - virologists - sars - funded - institute | 37 | 55_fauci_virologists_sars_funded |
98
+ | 56 | worldcouncilforhealth - quarantine - sovereignty - ghebreyesus - regulations | 36 | 56_worldcouncilforhealth_quarantine_sovereignty_ghebreyesus |
99
+ | 57 | webinar - 2021 - salk - shots - childrenshealthdefense | 34 | 57_webinar_2021_salk_shots |
100
+ | 58 | mealworm - maggots - dragonflies - chitins - tenebrionidae | 33 | 58_mealworm_maggots_dragonflies_chitins |
101
+ | 59 | mrna - plasmids - adjuvant - marburg - vials | 33 | 59_mrna_plasmids_adjuvant_marburg |
102
+ | 60 | enlistments - servicemen - vaccinator - compulsory - huachuca | 32 | 60_enlistments_servicemen_vaccinator_compulsory |
103
+ | 61 | hochul - newyork - subways - allegations - playboy | 32 | 61_hochul_newyork_subways_allegations |
104
+ | 62 | vaxxed - listenarchives - theliberationstation - sherri - 7pm | 32 | 62_vaxxed_listenarchives_theliberationstation_sherri |
105
+ | 63 | newsom - corruptifornia - gubernatorial - mandates - gavin | 32 | 63_newsom_corruptifornia_gubernatorial_mandates |
106
+ | 64 | healing_beyond_pharmaceuticals - allopathic - quack - holistic - gastroenterologist | 32 | 64_healing_beyond_pharmaceuticals_allopathic_quack_holistic |
107
+ | 65 | vaccinated - sarscov - omicron - subvariant - evolve | 31 | 65_vaccinated_sarscov_omicron_subvariant |
108
+ | 66 | drtenpenny - subscriiption - platinum - podcast - unlimited | 31 | 66_drtenpenny_subscriiption_platinum_podcast |
109
+ | 67 | abortionists - womb - born - satanic - greatawakeningchannel | 31 | 67_abortionists_womb_born_satanic |
110
+ | 68 | tonights - 7pm - thursday - episode - webinar | 31 | 68_tonights_7pm_thursday_episode |
111
+ | 69 | goevents101 - deadchiropracticsociety - branson - freedom - wellnessmyway | 29 | 69_goevents101_deadchiropracticsociety_branson_freedom |
112
+ | 70 | happyhourwithdrt - tonight - streaming - welcome - hour | 28 | 70_happyhourwithdrt_tonight_streaming_welcome |
113
+ | 71 | masked - brainwashed - plague - wears - everywhere | 28 | 71_masked_brainwashed_plague_wears |
114
+ | 72 | trudeau - canadiancitizens - quarantine - westjet - legault | 27 | 72_trudeau_canadiancitizens_quarantine_westjet |
115
+ | 73 | 5docsbc - doctors - webinars - 4you - sherri | 27 | 73_5docsbc_doctors_webinars_4you |
116
+ | 74 | derailments - spill - ohio - nuked - dioxins | 27 | 74_derailments_spill_ohio_nuked |
117
+ | 75 | webinar - doctors - critically - thursdays - 7pm | 27 | 75_webinar_doctors_critically_thursdays |
118
+ | 76 | doctors - pharmakeia - persecution - cults - plaintiffs | 27 | 76_doctors_pharmakeia_persecution_cults |
119
+ | 77 | trump - democrat - ballots - michigan - haley | 26 | 77_trump_democrat_ballots_michigan |
120
+ | 78 | docuseries - c0vid - whistleblowers - remedies - lifeline | 25 | 78_docuseries_c0vid_whistleblowers_remedies |
121
+ | 79 | monkeypox - childcovidvaccineinjuriesuk - needlestick - stigmatization - edited | 24 | 79_monkeypox_childcovidvaccineinjuriesuk_needlestick_stigmatization |
122
+ | 80 | tenpenny - streamed - infowars - symposium - simone | 23 | 80_tenpenny_streamed_infowars_symposium |
123
+ | 81 | h5n1 - tamiflu - chickens - outbreak - avian | 23 | 81_h5n1_tamiflu_chickens_outbreak |
124
+ | 82 | justices - overturned - bannon - appellate - prosecute | 22 | 82_justices_overturned_bannon_appellate |
125
+ | 83 | eschatology - passover - judah - timeline - 6pm | 22 | 83_eschatology_passover_judah_timeline |
126
+ | 84 | superfoods - smoothie - chlorella - powder - energize | 22 | 84_superfoods_smoothie_chlorella_powder |
127
+ | 85 | rapist - molestation - sentenced - sodomy - trafficking | 22 | 85_rapist_molestation_sentenced_sodomy |
128
+
129
+ </details>
130
+
131
+ ## Training hyperparameters
132
+
133
+ * calculate_probabilities: True
134
+ * language: None
135
+ * low_memory: False
136
+ * min_topic_size: 10
137
+ * n_gram_range: (1, 1)
138
+ * nr_topics: None
139
+ * seed_topic_list: None
140
+ * top_n_words: 10
141
+ * verbose: False
142
+ * zeroshot_min_similarity: 0.7
143
+ * zeroshot_topic_list: None
144
+
145
+ ## Framework versions
146
+
147
+ * Numpy: 1.26.4
148
+ * HDBSCAN: 0.8.40
149
+ * UMAP: 0.5.7
150
+ * Pandas: 2.2.3
151
+ * Scikit-Learn: 1.5.2
152
+ * Sentence-transformers: 3.3.1
153
+ * Transformers: 4.46.3
154
+ * Numba: 0.60.0
155
+ * Plotly: 5.24.1
156
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1bbe08095cb3231456f151cac5bee79717f5c9cf8bc30c5e74b67d94a37b8af
3
+ size 1298192
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6aac7f91e430417fef6c016f182235ea4042397e39b07079d5332c5366ee82e
3
+ size 356440
topics.json ADDED
The diff for this file is too large to render. See raw diff