Add BERTopic model
Browse files- README.md +156 -0
- config.json +16 -0
- ctfidf.safetensors +3 -0
- ctfidf_config.json +0 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
tags:
|
| 4 |
+
- bertopic
|
| 5 |
+
library_name: bertopic
|
| 6 |
+
pipeline_tag: text-classification
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# MARTINI_enrich_BERTopic_DrTenpenny
|
| 10 |
+
|
| 11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
| 12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
| 13 |
+
|
| 14 |
+
## Usage
|
| 15 |
+
|
| 16 |
+
To use this model, please install BERTopic:
|
| 17 |
+
|
| 18 |
+
```
|
| 19 |
+
pip install -U bertopic
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
You can use the model as follows:
|
| 23 |
+
|
| 24 |
+
```python
|
| 25 |
+
from bertopic import BERTopic
|
| 26 |
+
topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_DrTenpenny")
|
| 27 |
+
|
| 28 |
+
topic_model.get_topic_info()
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
## Topic overview
|
| 32 |
+
|
| 33 |
+
* Number of topics: 87
|
| 34 |
+
* Number of training documents: 11925
|
| 35 |
+
|
| 36 |
+
<details>
|
| 37 |
+
<summary>Click here for an overview of all topics.</summary>
|
| 38 |
+
|
| 39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
| 40 |
+
|----------|----------------|-----------------|-------|
|
| 41 |
+
| -1 | vaccine - shots - 2021 - biden - anyone | 20 | -1_vaccine_shots_2021_biden |
|
| 42 |
+
| 0 | transgenders - homosexuals - lgbtq - pedophilia - florida | 7533 | 0_transgenders_homosexuals_lgbtq_pedophilia |
|
| 43 |
+
| 1 | antivaxxer - immunised - thimerosal - placebo - never | 151 | 1_antivaxxer_immunised_thimerosal_placebo |
|
| 44 |
+
| 2 | died - athletes - defibrillator - sudden - collapses | 145 | 2_died_athletes_defibrillator_sudden |
|
| 45 |
+
| 3 | morningcoffee - thisweekwithdrt - podcast - tuesday - preregister | 123 | 3_morningcoffee_thisweekwithdrt_podcast_tuesday |
|
| 46 |
+
| 4 | criticallythinking - drtanddrp - 7pm - thursday - webinar | 109 | 4_criticallythinking_drtanddrp_7pm_thursday |
|
| 47 |
+
| 5 | pfizer - miscarriages - rhogam - placenta - nirsevimab | 109 | 5_pfizer_miscarriages_rhogam_placenta |
|
| 48 |
+
| 6 | hydroxychloroquine - ivermectin - paxlovid - prescribed - midazolam | 108 | 6_hydroxychloroquine_ivermectin_paxlovid_prescribed |
|
| 49 |
+
| 7 | trudeau - quebec - protesters - convoy - provincial | 96 | 7_trudeau_quebec_protesters_convoy |
|
| 50 |
+
| 8 | livestream - join - telgram - todays - clouthub | 94 | 8_livestream_join_telgram_todays |
|
| 51 |
+
| 9 | subscribe - drtenpenny - webinars - emails - breakthroughs | 88 | 9_subscribe_drtenpenny_webinars_emails |
|
| 52 |
+
| 10 | pcr - sarscov - mullis - tested - swab | 86 | 10_pcr_sarscov_mullis_tested |
|
| 53 |
+
| 11 | plandemic - bioterror - hysteria - smokescreen - isis | 85 | 11_plandemic_bioterror_hysteria_smokescreen |
|
| 54 |
+
| 12 | vaers - adenoviruses - guillain - myocarditis - adverse | 83 | 12_vaers_adenoviruses_guillain_myocarditis |
|
| 55 |
+
| 13 | hurricanes - nexrad - texas - mudslides - geoengineering | 79 | 13_hurricanes_nexrad_texas_mudslides |
|
| 56 |
+
| 14 | detoxifier - toxins - arsenic - zeolite - antioxidants | 78 | 14_detoxifier_toxins_arsenic_zeolite |
|
| 57 |
+
| 15 | mypillows - pillowcase - promocode - lindell - towel | 78 | 15_mypillows_pillowcase_promocode_lindell |
|
| 58 |
+
| 16 | vaxteen - immunization - sanofi - meningococcal - shots | 76 | 16_vaxteen_immunization_sanofi_meningococcal |
|
| 59 |
+
| 17 | covid - vaccinated - deaths - 2020 - doses | 73 | 17_covid_vaccinated_deaths_2020 |
|
| 60 |
+
| 18 | episodes - april - clouthub - 2023 - subscribed | 72 | 18_episodes_april_clouthub_2023 |
|
| 61 |
+
| 19 | aspartame - nutrient - pesticides - cucumbers - gummy | 67 | 19_aspartame_nutrient_pesticides_cucumbers |
|
| 62 |
+
| 20 | pfizer - whistleblower - ema - 7bn - backtolife_2022 | 64 | 20_pfizer_whistleblower_ema_7bn |
|
| 63 |
+
| 21 | thisweekwithdrt - podcast - tracy - abortion - whatintheworldaretheyspraying | 61 | 21_thisweekwithdrt_podcast_tracy_abortion |
|
| 64 |
+
| 22 | prayerfully - tenpenny - sunday - bible - walking | 57 | 22_prayerfully_tenpenny_sunday_bible |
|
| 65 |
+
| 23 | oecd - biometric - passports - experian - malawi | 56 | 23_oecd_biometric_passports_experian |
|
| 66 |
+
| 24 | desantis - tallahassee - mandates - veto - deported | 55 | 24_desantis_tallahassee_mandates_veto |
|
| 67 |
+
| 25 | covidvaccinevictims - died - 2021 - astrazeneca - julieta | 55 | 25_covidvaccinevictims_died_2021_astrazeneca |
|
| 68 |
+
| 26 | diesel - fuels - vehicletax - chargetax - deere | 55 | 26_diesel_fuels_vehicletax_chargetax |
|
| 69 |
+
| 27 | brighteontv - 7pm - tina - invited - biopesticide | 55 | 27_brighteontv_7pm_tina_invited |
|
| 70 |
+
| 28 | twitter - trump - freespeech - misinformation - whistleblowers | 55 | 28_twitter_trump_freespeech_misinformation |
|
| 71 |
+
| 29 | effects - electromagnetic - cells - fatigue - melatonin | 55 | 29_effects_electromagnetic_cells_fatigue |
|
| 72 |
+
| 30 | biden - idk - bankster - isolationist - lunatics | 53 | 30_biden_idk_bankster_isolationist |
|
| 73 |
+
| 31 | hebrews - blessed - psalm - unrighteousness - forevermore | 52 | 31_hebrews_blessed_psalm_unrighteousness |
|
| 74 |
+
| 32 | eugenics - bgi - anticoronaterrorism - soros - fluvax | 50 | 32_eugenics_bgi_anticoronaterrorism_soros |
|
| 75 |
+
| 33 | lockdown - australiatimes - france - theeuropenews - tyranny | 50 | 33_lockdown_australiatimes_france_theeuropenews |
|
| 76 |
+
| 34 | flights - airplanes - jetways - pilots - stewardess | 49 | 34_flights_airplanes_jetways_pilots |
|
| 77 |
+
| 35 | dollarization - bitcoin - rbi - yuan - cashless | 49 | 35_dollarization_bitcoin_rbi_yuan |
|
| 78 |
+
| 36 | facemasks - sars - legionella - meningitis - respiratory | 48 | 36_facemasks_sars_legionella_meningitis |
|
| 79 |
+
| 37 | unvaccinated - mandate - exemptions - employers - mississippi | 47 | 37_unvaccinated_mandate_exemptions_employers |
|
| 80 |
+
| 38 | ukraine - zelensky - israel - nordstream - ww3 | 47 | 38_ukraine_zelensky_israel_nordstream |
|
| 81 |
+
| 39 | tenpennyecp - ecpstudioventura - revitalize - counterpulsation - massage | 46 | 39_tenpennyecp_ecpstudioventura_revitalize_counterpulsation |
|
| 82 |
+
| 40 | myopericarditis - immunization - incidence - 2021the - mrna | 46 | 40_myopericarditis_immunization_incidence_2021the |
|
| 83 |
+
| 41 | farms - ranchers - meat - iaf_goats - starve | 45 | 41_farms_ranchers_meat_iaf_goats |
|
| 84 |
+
| 42 | optimune - supplements - quercetin - probiotic - zinc | 45 | 42_optimune_supplements_quercetin_probiotic |
|
| 85 |
+
| 43 | faucis - anthony - lying - gitmo - depraved | 44 | 43_faucis_anthony_lying_gitmo |
|
| 86 |
+
| 44 | wonder - sideways - priceless - fools - deeper | 43 | 44_wonder_sideways_priceless_fools |
|
| 87 |
+
| 45 | vaccines - susceptibility - immunocompromised - bivalent - lentiviral | 42 | 45_vaccines_susceptibility_immunocompromised_bivalent |
|
| 88 |
+
| 46 | cardiomiracle - nattokinase - endothelium - supplement - stevia | 42 | 46_cardiomiracle_nattokinase_endothelium_supplement |
|
| 89 |
+
| 47 | tenpennyapparel - promo - sweatshirts - shopped - drt | 41 | 47_tenpennyapparel_promo_sweatshirts_shopped |
|
| 90 |
+
| 48 | spring2022bootcamp - bushcraft - enrolled - 4wks - coupon | 41 | 48_spring2022bootcamp_bushcraft_enrolled_4wks |
|
| 91 |
+
| 49 | weforum - davos - corporativism - technocrats - crises | 40 | 49_weforum_davos_corporativism_technocrats |
|
| 92 |
+
| 50 | vet - nexgard - heartworm - homeopathic - pomeranian | 40 | 50_vet_nexgard_heartworm_homeopathic |
|
| 93 |
+
| 51 | tyranny - plutocracy - democratically - socrates - mencken | 40 | 51_tyranny_plutocracy_democratically_socrates |
|
| 94 |
+
| 52 | pastor - tonight - deliverance - prayer - welcome | 40 | 52_pastor_tonight_deliverance_prayer |
|
| 95 |
+
| 53 | neuroweapons - cyborgs - transhumanist - deepmind - humanrf | 39 | 53_neuroweapons_cyborgs_transhumanist_deepmind |
|
| 96 |
+
| 54 | russiagate - brainwash - libtards - newsworthy - anon | 39 | 54_russiagate_brainwash_libtards_newsworthy |
|
| 97 |
+
| 55 | fauci - virologists - sars - funded - institute | 37 | 55_fauci_virologists_sars_funded |
|
| 98 |
+
| 56 | worldcouncilforhealth - quarantine - sovereignty - ghebreyesus - regulations | 36 | 56_worldcouncilforhealth_quarantine_sovereignty_ghebreyesus |
|
| 99 |
+
| 57 | webinar - 2021 - salk - shots - childrenshealthdefense | 34 | 57_webinar_2021_salk_shots |
|
| 100 |
+
| 58 | mealworm - maggots - dragonflies - chitins - tenebrionidae | 33 | 58_mealworm_maggots_dragonflies_chitins |
|
| 101 |
+
| 59 | mrna - plasmids - adjuvant - marburg - vials | 33 | 59_mrna_plasmids_adjuvant_marburg |
|
| 102 |
+
| 60 | enlistments - servicemen - vaccinator - compulsory - huachuca | 32 | 60_enlistments_servicemen_vaccinator_compulsory |
|
| 103 |
+
| 61 | hochul - newyork - subways - allegations - playboy | 32 | 61_hochul_newyork_subways_allegations |
|
| 104 |
+
| 62 | vaxxed - listenarchives - theliberationstation - sherri - 7pm | 32 | 62_vaxxed_listenarchives_theliberationstation_sherri |
|
| 105 |
+
| 63 | newsom - corruptifornia - gubernatorial - mandates - gavin | 32 | 63_newsom_corruptifornia_gubernatorial_mandates |
|
| 106 |
+
| 64 | healing_beyond_pharmaceuticals - allopathic - quack - holistic - gastroenterologist | 32 | 64_healing_beyond_pharmaceuticals_allopathic_quack_holistic |
|
| 107 |
+
| 65 | vaccinated - sarscov - omicron - subvariant - evolve | 31 | 65_vaccinated_sarscov_omicron_subvariant |
|
| 108 |
+
| 66 | drtenpenny - subscriiption - platinum - podcast - unlimited | 31 | 66_drtenpenny_subscriiption_platinum_podcast |
|
| 109 |
+
| 67 | abortionists - womb - born - satanic - greatawakeningchannel | 31 | 67_abortionists_womb_born_satanic |
|
| 110 |
+
| 68 | tonights - 7pm - thursday - episode - webinar | 31 | 68_tonights_7pm_thursday_episode |
|
| 111 |
+
| 69 | goevents101 - deadchiropracticsociety - branson - freedom - wellnessmyway | 29 | 69_goevents101_deadchiropracticsociety_branson_freedom |
|
| 112 |
+
| 70 | happyhourwithdrt - tonight - streaming - welcome - hour | 28 | 70_happyhourwithdrt_tonight_streaming_welcome |
|
| 113 |
+
| 71 | masked - brainwashed - plague - wears - everywhere | 28 | 71_masked_brainwashed_plague_wears |
|
| 114 |
+
| 72 | trudeau - canadiancitizens - quarantine - westjet - legault | 27 | 72_trudeau_canadiancitizens_quarantine_westjet |
|
| 115 |
+
| 73 | 5docsbc - doctors - webinars - 4you - sherri | 27 | 73_5docsbc_doctors_webinars_4you |
|
| 116 |
+
| 74 | derailments - spill - ohio - nuked - dioxins | 27 | 74_derailments_spill_ohio_nuked |
|
| 117 |
+
| 75 | webinar - doctors - critically - thursdays - 7pm | 27 | 75_webinar_doctors_critically_thursdays |
|
| 118 |
+
| 76 | doctors - pharmakeia - persecution - cults - plaintiffs | 27 | 76_doctors_pharmakeia_persecution_cults |
|
| 119 |
+
| 77 | trump - democrat - ballots - michigan - haley | 26 | 77_trump_democrat_ballots_michigan |
|
| 120 |
+
| 78 | docuseries - c0vid - whistleblowers - remedies - lifeline | 25 | 78_docuseries_c0vid_whistleblowers_remedies |
|
| 121 |
+
| 79 | monkeypox - childcovidvaccineinjuriesuk - needlestick - stigmatization - edited | 24 | 79_monkeypox_childcovidvaccineinjuriesuk_needlestick_stigmatization |
|
| 122 |
+
| 80 | tenpenny - streamed - infowars - symposium - simone | 23 | 80_tenpenny_streamed_infowars_symposium |
|
| 123 |
+
| 81 | h5n1 - tamiflu - chickens - outbreak - avian | 23 | 81_h5n1_tamiflu_chickens_outbreak |
|
| 124 |
+
| 82 | justices - overturned - bannon - appellate - prosecute | 22 | 82_justices_overturned_bannon_appellate |
|
| 125 |
+
| 83 | eschatology - passover - judah - timeline - 6pm | 22 | 83_eschatology_passover_judah_timeline |
|
| 126 |
+
| 84 | superfoods - smoothie - chlorella - powder - energize | 22 | 84_superfoods_smoothie_chlorella_powder |
|
| 127 |
+
| 85 | rapist - molestation - sentenced - sodomy - trafficking | 22 | 85_rapist_molestation_sentenced_sodomy |
|
| 128 |
+
|
| 129 |
+
</details>
|
| 130 |
+
|
| 131 |
+
## Training hyperparameters
|
| 132 |
+
|
| 133 |
+
* calculate_probabilities: True
|
| 134 |
+
* language: None
|
| 135 |
+
* low_memory: False
|
| 136 |
+
* min_topic_size: 10
|
| 137 |
+
* n_gram_range: (1, 1)
|
| 138 |
+
* nr_topics: None
|
| 139 |
+
* seed_topic_list: None
|
| 140 |
+
* top_n_words: 10
|
| 141 |
+
* verbose: False
|
| 142 |
+
* zeroshot_min_similarity: 0.7
|
| 143 |
+
* zeroshot_topic_list: None
|
| 144 |
+
|
| 145 |
+
## Framework versions
|
| 146 |
+
|
| 147 |
+
* Numpy: 1.26.4
|
| 148 |
+
* HDBSCAN: 0.8.40
|
| 149 |
+
* UMAP: 0.5.7
|
| 150 |
+
* Pandas: 2.2.3
|
| 151 |
+
* Scikit-Learn: 1.5.2
|
| 152 |
+
* Sentence-transformers: 3.3.1
|
| 153 |
+
* Transformers: 4.46.3
|
| 154 |
+
* Numba: 0.60.0
|
| 155 |
+
* Plotly: 5.24.1
|
| 156 |
+
* Python: 3.10.12
|
config.json
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"calculate_probabilities": true,
|
| 3 |
+
"language": null,
|
| 4 |
+
"low_memory": false,
|
| 5 |
+
"min_topic_size": 10,
|
| 6 |
+
"n_gram_range": [
|
| 7 |
+
1,
|
| 8 |
+
1
|
| 9 |
+
],
|
| 10 |
+
"nr_topics": null,
|
| 11 |
+
"seed_topic_list": null,
|
| 12 |
+
"top_n_words": 10,
|
| 13 |
+
"verbose": false,
|
| 14 |
+
"zeroshot_min_similarity": 0.7,
|
| 15 |
+
"zeroshot_topic_list": null
|
| 16 |
+
}
|
ctfidf.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d1bbe08095cb3231456f151cac5bee79717f5c9cf8bc30c5e74b67d94a37b8af
|
| 3 |
+
size 1298192
|
ctfidf_config.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
topic_embeddings.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e6aac7f91e430417fef6c016f182235ea4042397e39b07079d5332c5366ee82e
|
| 3 |
+
size 356440
|
topics.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|