AngelPanizo commited on
Commit
8de2d61
·
verified ·
1 Parent(s): f6b8a95

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # MARTINI_enrich_BERTopic_TakeOurCountryBack
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_TakeOurCountryBack")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 106
34
+ * Number of training documents: 11819
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | vaccine - everyone - freedom - news - ukraine | 20 | -1_vaccine_everyone_freedom_news |
42
+ | 0 | potus - impeachment - insurrection - capitol - traitors | 7411 | 0_potus_impeachment_insurrection_capitol |
43
+ | 1 | covidvaccinevictims - vaccinated - injections - astrazeneca - myocarditis | 178 | 1_covidvaccinevictims_vaccinated_injections_astrazeneca |
44
+ | 2 | desantis - floridians - governor - flgop - tyranny | 146 | 2_desantis_floridians_governor_flgop |
45
+ | 3 | invested - trading - opciones - miraculously - scammed | 90 | 3_invested_trading_opciones_miraculously |
46
+ | 4 | awakening - cabal - world - happening - prophecies | 88 | 4_awakening_cabal_world_happening |
47
+ | 5 | genuinely - thank - changed - respected - saying | 85 | 5_genuinely_thank_changed_respected |
48
+ | 6 | salvation - satan - righteous - bible - malachi | 84 | 6_salvation_satan_righteous_bible |
49
+ | 7 | vaticano - orderofmalta - rothschild - judas - femaleilluminati | 82 | 7_vaticano_orderofmalta_rothschild_judas |
50
+ | 8 | antifa - supremacist - blacks - white - doxxing | 81 | 8_antifa_supremacist_blacks_white |
51
+ | 9 | protests - lockdowns - austria - tyranny - unvaccinated | 77 | 9_protests_lockdowns_austria_tyranny |
52
+ | 10 | reinvested - traderz - profit - 7000 - achieved | 77 | 10_reinvested_traderz_profit_7000 |
53
+ | 11 | legit - scammed - plataforma - invertir - withdrawed | 73 | 11_legit_scammed_plataforma_invertir |
54
+ | 12 | telegram - facebook - wikileaks - censorship - deleted | 71 | 12_telegram_facebook_wikileaks_censorship |
55
+ | 13 | 1btc - invesment - 400usd - traded - profit | 71 | 13_1btc_invesment_400usd_traded |
56
+ | 14 | empowerthepeopleuk - debts - excise - london - zoom | 67 | 14_empowerthepeopleuk_debts_excise_london |
57
+ | 15 | achieve - succeed - goals - exitosas - mentalidad | 66 | 15_achieve_succeed_goals_exitosas |
58
+ | 16 | forex - profitable - johnson_investment - started - 2000k | 64 | 16_forex_profitable_johnson_investment_started |
59
+ | 17 | btcminers - 1btc - investments - trustworthy - ledger | 61 | 17_btcminers_1btc_investments_trustworthy |
60
+ | 18 | tyranny - patriot - constitution - liberties - jefferson | 59 | 18_tyranny_patriot_constitution_liberties |
61
+ | 19 | livestream - britain - golding - simon - 6pm | 59 | 19_livestream_britain_golding_simon |
62
+ | 20 | earn - safeandtrustedearning - wage - 5000 - phone | 59 | 20_earn_safeandtrustedearning_wage_5000 |
63
+ | 21 | profit_with_iqoption - promo - 5000 - invertir - trade_with_admin_williams | 56 | 21_profit_with_iqoption_promo_5000_invertir |
64
+ | 22 | legit - invested - agorahttps - 2400 - 5days | 54 | 22_legit_invested_agorahttps_2400 |
65
+ | 23 | cryptocurrency - invest - micheal_gabage_m - proоfit - trutworthy | 53 | 23_cryptocurrency_invest_micheal_gabage_m_proоfit |
66
+ | 24 | vaccinating - theukfreedomalliance - leaflets - children_version - open1info | 52 | 24_vaccinating_theukfreedomalliance_leaflets_children_version |
67
+ | 25 | financially - platform - knewi - appreciatio - sufrimiento | 51 | 25_financially_platform_knewi_appreciatio |
68
+ | 26 | fruitarian - poison - organic - seeds - borax | 51 | 26_fruitarian_poison_organic_seeds |
69
+ | 27 | investir - 2000usd - payout - success - jamesmorgan_fx | 51 | 27_investir_2000usd_payout_success |
70
+ | 28 | qanon - cabal - awakening - insiders - operation | 49 | 28_qanon_cabal_awakening_insiders |
71
+ | 29 | ʀᴇᴀʟʟʏ - ʀᴇᴄᴇɪᴠᴇᴅ - ʟɪɴᴋ - ʙᴇʟᴏᴡ - ᴡᴏʀᴋꜱ | 48 | 29_ʀᴇᴀʟʟʏ_ʀᴇᴄᴇɪᴠᴇᴅ_ʟɪɴᴋ_ʙᴇʟᴏᴡ |
72
+ | 30 | thanks - withdrawal - make_wealth_with_ernest_fx011 - 7500 - successful | 47 | 30_thanks_withdrawal_make_wealth_with_ernest_fx011_7500 |
73
+ | 31 | reignitedemocracyaustralia - canberra - protesters - tyrannised - victoria | 46 | 31_reignitedemocracyaustralia_canberra_protesters_tyrannised |
74
+ | 32 | pembrokeshire - migrants - barracks - southend - mirfield | 46 | 32_pembrokeshire_migrants_barracks_southend |
75
+ | 33 | ɪɴᴠᴇsᴛᴇᴅ - ᴘʀᴏғɪᴛ - ʀᴇᴄᴇɪᴠᴇᴅ - ʙᴇʟɪᴇᴠᴇ - sᴄᴀᴍ | 46 | 33_ɪɴᴠᴇsᴛᴇᴅ_ᴘʀᴏғɪᴛ_ʀᴇᴄᴇɪᴠᴇᴅ_ʙᴇʟɪᴇᴠᴇ |
76
+ | 34 | hanks - tradewithmarcus02 - thankful - trust - prosperous | 45 | 34_hanks_tradewithmarcus02_thankful_trust |
77
+ | 35 | youtube - liam - subscribers - singing - saturdays | 44 | 35_youtube_liam_subscribers_singing |
78
+ | 36 | thankful - bills - debt - worth - anymorewhat | 44 | 36_thankful_bills_debt_worth |
79
+ | 37 | illegals - dover - smugglers - disembarking - border | 44 | 37_illegals_dover_smugglers_disembarking |
80
+ | 38 | withdrawing - investment - 5000 - appreciate - jordanbelfortfx | 43 | 38_withdrawing_investment_5000_appreciate |
81
+ | 39 | illegals - citizenship - borders - aipac - patriotic | 42 | 39_illegals_citizenship_borders_aipac |
82
+ | 40 | wallet - excited - 15000 - 00usdt - profit | 42 | 40_wallet_excited_15000_00usdt |
83
+ | 41 | happiness - achieved - aims - dhanyavaad - qualities | 41 | 41_happiness_achieved_aims_dhanyavaad |
84
+ | 42 | bestprofitsignal - winner - mentor - glad - subscribing | 40 | 42_bestprofitsignal_winner_mentor_glad |
85
+ | 43 | rtpcr - sarscov - tested - swabs - laboratories | 40 | 43_rtpcr_sarscov_tested_swabs |
86
+ | 44 | covidiot - patients - intubation - nebuliser - hysteria | 39 | 44_covidiot_patients_intubation_nebuliser |
87
+ | 45 | thanks - 10000 - payout - trusted - usd | 39 | 45_thanks_10000_payout_trusted |
88
+ | 46 | transexuales - homosexualidad - lgbtqi - bisexuales - indoctrination | 38 | 46_transexuales_homosexualidad_lgbtqi_bisexuales |
89
+ | 47 | bezos - cash - invest - iifdefcbqvi3ngzk - winnings | 38 | 47_bezos_cash_invest_iifdefcbqvi3ngzk |
90
+ | 48 | vaxxed - fucks - theyve - mutating - antipafi | 38 | 48_vaxxed_fucks_theyve_mutating |
91
+ | 49 | thearea51blog - hangar - airspace - ramjet - nuked | 38 | 49_thearea51blog_hangar_airspace_ramjet |
92
+ | 50 | earn_with_frank_fx - investir - grateful - 10000 - minimum | 38 | 50_earn_with_frank_fx_investir_grateful_10000 |
93
+ | 51 | trader_alfred - earn - managers - 20000 - testimonies | 37 | 51_trader_alfred_earn_managers_20000 |
94
+ | 52 | vaccine - pfizer - mrna - janssen - marburg | 36 | 52_vaccine_pfizer_mrna_janssen |
95
+ | 53 | conspiracies - qaeda - mossad - plane - demolished | 36 | 53_conspiracies_qaeda_mossad_plane |
96
+ | 54 | impressed - payouts - goodworks - bonus - 7000 | 36 | 54_impressed_payouts_goodworks_bonus |
97
+ | 55 | epstein - ghislaine - billionaire - molesters - mossad | 35 | 55_epstein_ghislaine_billionaire_molesters |
98
+ | 56 | invested - profits - genuine - 30000 - link | 34 | 56_invested_profits_genuine_30000 |
99
+ | 57 | trudeau - convoy - protest - ussergeantnewsnetwork - gofundme | 33 | 57_trudeau_convoy_protest_ussergeantnewsnetwork |
100
+ | 58 | protests - europeansunited - ourmovement - march - bromley | 33 | 58_protests_europeansunited_ourmovement_march |
101
+ | 59 | parler - banned - bongino - censorship - dorsey | 32 | 59_parler_banned_bongino_censorship |
102
+ | 60 | moron - irrelevant - hahahaha - tommy - pissed | 32 | 60_moron_irrelevant_hahahaha_tommy |
103
+ | 61 | earning - online - jordanbelfortfx - fake - truthful | 32 | 61_earning_online_jordanbelfortfx_fake |
104
+ | 62 | vaccination - signatories - mandatory - cqc - passport | 31 | 62_vaccination_signatories_mandatory_cqc |
105
+ | 63 | scam - earn - started - kunalo_steve_trader - 4days | 31 | 63_scam_earn_started_kunalo_steve_trader |
106
+ | 64 | discussionabout5gtranshumanism - electrosmog - saveusnow - 64ghz - bees | 30 | 64_discussionabout5gtranshumanism_electrosmog_saveusnow_64ghz |
107
+ | 65 | make_wealth_with_ernest_fx011 - started - traderhelen2 - 5000 - 7days | 29 | 65_make_wealth_with_ernest_fx011_started_traderhelen2_5000 |
108
+ | 66 | ivermectin - hydroxychloroquine - fenbendazole - suramin - capsules | 29 | 66_ivermectin_hydroxychloroquine_fenbendazole_suramin |
109
+ | 67 | trader - funds - profits - cryptocurrencies - fake | 28 | 67_trader_funds_profits_cryptocurrencies |
110
+ | 68 | invested - retorno - 7days - happiness - confiabilidade | 28 | 68_invested_retorno_7days_happiness |
111
+ | 69 | ballots - smartmatic - reportmaricopa - auditors - subpoenaed | 28 | 69_ballots_smartmatic_reportmaricopa_auditors |
112
+ | 70 | queensland - kokoda - nationals - palmerston - dictator | 27 | 70_queensland_kokoda_nationals_palmerston |
113
+ | 71 | fortunately - losing - never - experience - smiling | 26 | 71_fortunately_losing_never_experience |
114
+ | 72 | vacunadas - nanovaxxines - telefonos - 5g - microantena | 26 | 72_vacunadas_nanovaxxines_telefonos_5g |
115
+ | 73 | amazing - credibility - invested - winning - 5000 | 26 | 73_amazing_credibility_invested_winning |
116
+ | 74 | trader_leachfx - fxhoptiontrade - nathaniel_stephenson_fx - withdraw - profit | 25 | 74_trader_leachfx_fxhoptiontrade_nathaniel_stephenson_fx_withdraw |
117
+ | 75 | deposited - 7days - retorno - test - 3000 | 25 | 75_deposited_7days_retorno_test |
118
+ | 76 | firearms - rights - tyranny - patriot - amendment | 24 | 76_firearms_rights_tyranny_patriot |
119
+ | 77 | globaltradinginvestmet - trader_matthias - dmadriana_forex - 5000usd - profit | 24 | 77_globaltradinginvestmet_trader_matthias_dmadriana_forex_5000usd |
120
+ | 78 | ʙɪᴛᴄᴏɪɴ - ɪɴᴠᴇsᴛᴏʀs - ᴡɪᴛʜᴅʀᴀᴡᴀʟ - ʙᴀɴᴋ - ᴀᴄᴄᴏᴜɴᴛ | 24 | 78_ʙɪᴛᴄᴏɪɴ_ɪɴᴠᴇsᴛᴏʀs_ᴡɪᴛʜᴅʀᴀᴡᴀʟ_ʙᴀɴᴋ |
121
+ | 79 | constables - bitchute - protest - unlawful - march | 24 | 79_constables_bitchute_protest_unlawful |
122
+ | 80 | helped - suffering - happier - epiphanies - posts | 24 | 80_helped_suffering_happier_epiphanies |
123
+ | 81 | believe - everyone - help - whatsapp - greatest | 23 | 81_believe_everyone_help_whatsapp |
124
+ | 82 | traded - usdt - hello - 12hours - payout | 23 | 82_traded_usdt_hello_12hours |
125
+ | 83 | russie - putin - armata - afganistan - stakhovsky | 23 | 83_russie_putin_armata_afganistan |
126
+ | 84 | invested - earn - recommendhttps - 700usd - 3days | 23 | 84_invested_earn_recommendhttps_700usd |
127
+ | 85 | thegreatclimatecon - alarmists - droughts - chemtrail - earthquakes | 23 | 85_thegreatclimatecon_alarmists_droughts_chemtrail |
128
+ | 86 | earn - online - trabajar - telefono - scam | 23 | 86_earn_online_trabajar_telefono |
129
+ | 87 | trading - 5days - payout - best - return | 23 | 87_trading_5days_payout_best |
130
+ | 88 | covidpositivenews - alarmism - nazis - awakened - horizons | 22 | 88_covidpositivenews_alarmism_nazis_awakened |
131
+ | 89 | slaves - jew - colonies - cromwell - mfecane | 22 | 89_slaves_jew_colonies_cromwell |
132
+ | 90 | jews - hitler - freikorps - bolsheviks - 1918 | 22 | 90_jews_hitler_freikorps_bolsheviks |
133
+ | 91 | crytotrading - earnfromhomejob808 - forex_morrison - profits - best | 22 | 91_crytotrading_earnfromhomejob808_forex_morrison_profits |
134
+ | 92 | huaxia - manchurian - mongolia - hebei - wang | 22 | 92_huaxia_manchurian_mongolia_hebei |
135
+ | 93 | smiling - glad - positively - beneficial - steps | 22 | 93_smiling_glad_positively_beneficial |
136
+ | 94 | common - rules - stupidity - organised - rubbish | 22 | 94_common_rules_stupidity_organised |
137
+ | 95 | taiwan - fujian - missiles - aeronaval - hypersonic | 21 | 95_taiwan_fujian_missiles_aeronaval |
138
+ | 96 | vacunadas - mascarilla - nuestra - muertes - efectos | 21 | 96_vacunadas_mascarilla_nuestra_muertes |
139
+ | 97 | lockdown - authorities - repeal - compulsory - whitehall | 21 | 97_lockdown_authorities_repeal_compulsory |
140
+ | 98 | femaleilluminati - trans - masculinisee - orderofmalta - papacy | 21 | 98_femaleilluminati_trans_masculinisee_orderofmalta |
141
+ | 99 | sievierodonetsk - missiles - crimea - mikhailovka - bayraktar | 21 | 99_sievierodonetsk_missiles_crimea_mikhailovka |
142
+ | 100 | scam - earned - referred - sharing - link | 21 | 100_scam_earned_referred_sharing |
143
+ | 101 | gigawatts - baseload - renewable - generators - fukushima | 21 | 101_gigawatts_baseload_renewable_generators |
144
+ | 102 | payment - manager - wow - myj - boost | 20 | 102_payment_manager_wow_myj |
145
+ | 103 | thank - debt - saved - deserves - great | 20 | 103_thank_debt_saved_deserves |
146
+ | 104 | republicans - impeachment - cheney - treacherous - ronna | 20 | 104_republicans_impeachment_cheney_treacherous |
147
+
148
+ </details>
149
+
150
+ ## Training hyperparameters
151
+
152
+ * calculate_probabilities: True
153
+ * language: None
154
+ * low_memory: False
155
+ * min_topic_size: 10
156
+ * n_gram_range: (1, 1)
157
+ * nr_topics: None
158
+ * seed_topic_list: None
159
+ * top_n_words: 10
160
+ * verbose: False
161
+ * zeroshot_min_similarity: 0.7
162
+ * zeroshot_topic_list: None
163
+
164
+ ## Framework versions
165
+
166
+ * Numpy: 1.26.4
167
+ * HDBSCAN: 0.8.40
168
+ * UMAP: 0.5.7
169
+ * Pandas: 2.2.3
170
+ * Scikit-Learn: 1.5.2
171
+ * Sentence-transformers: 3.3.1
172
+ * Transformers: 4.46.3
173
+ * Numba: 0.60.0
174
+ * Plotly: 5.24.1
175
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd6f11f8ac361ec03de7403f193be417de861732954eeba00ec70779575ee0e3
3
+ size 1378788
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9809e15c79ad877fa6b533daf0cb04c445f4edb2070e66e0cdc9a7047622f26a
3
+ size 434272
topics.json ADDED
The diff for this file is too large to render. See raw diff