| |
|
| | --- |
| | tags: |
| | - bertopic |
| | library_name: bertopic |
| | pipeline_tag: text-classification |
| | --- |
| | |
| | # MARTINI_enrich_BERTopic_Middle_East_Spectator |
| | |
| | This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. |
| | BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. |
| | |
| | ## Usage |
| | |
| | To use this model, please install BERTopic: |
| | |
| | ``` |
| | pip install -U bertopic |
| | ``` |
| | |
| | You can use the model as follows: |
| | |
| | ```python |
| | from bertopic import BERTopic |
| | topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_Middle_East_Spectator") |
| | |
| | topic_model.get_topic_info() |
| | ``` |
| | |
| | ## Topic overview |
| | |
| | * Number of topics: 34 |
| | * Number of training documents: 3345 |
| | |
| | <details> |
| | <summary>Click here for an overview of all topics.</summary> |
| | |
| | | Topic ID | Topic Keywords | Topic Frequency | Label | |
| | |----------|----------------|-----------------|-------| |
| | | -1 | hezbollah - khamenei - gaza - jerusalem - missiles | 20 | -1_hezbollah_khamenei_gaza_jerusalem | |
| | | 0 | iraqi - assad - airstrikes - faction - ismael | 1543 | 0_iraqi_assad_airstrikes_faction | |
| | | 1 | palestine - nasrallah - jihad - martyrs - victory | 194 | 1_palestine_nasrallah_jihad_martyrs | |
| | | 2 | channel - news - subscribers - censored - unbiased | 166 | 2_channel_news_subscribers_censored | |
| | | 3 | gaza - israelis - egyptians - suez - libyan | 126 | 3_gaza_israelis_egyptians_suez | |
| | | 4 | netanyahu - hezbollah - galilee - yoav - threatens | 103 | 4_netanyahu_hezbollah_galilee_yoav | |
| | | 5 | karabakh - nakhchivan - azerbaijani - armenians - yerevan | 98 | 5_karabakh_nakhchivan_azerbaijani_armenians | |
| | | 6 | yemen - warship - missiles - maersk - gulf | 92 | 6_yemen_warship_missiles_maersk | |
| | | 7 | saudi - khamenei - bahrain - faisal - zayed | 86 | 7_saudi_khamenei_bahrain_faisal | |
| | | 8 | gaza - civilians - bombed - younis - tunnels | 83 | 8_gaza_civilians_bombed_younis | |
| | | 9 | iran - abdollahian - ambassador - doha - sanctioned | 74 | 9_iran_abdollahian_ambassador_doha | |
| | | 10 | missiles - iran - hypersonic - baikonur - launched | 69 | 10_missiles_iran_hypersonic_baikonur | |
| | | 11 | kadyrov - voronezh - gerasimov - regiment - massacre | 54 | 11_kadyrov_voronezh_gerasimov_regiment | |
| | | 12 | hezbollah - drone - haifa - launched - zabdin | 47 | 12_hezbollah_drone_haifa_launched | |
| | | 13 | soleimani - mousavi - terrorists - damascus - martyred | 44 | 13_soleimani_mousavi_terrorists_damascus | |
| | | 14 | iranians - hamedan - allahu - كير - الاحرار | 43 | 14_iranians_hamedan_allahu_كير | |
| | | 15 | hezbollah - khirbet - ramyeh - ambush - sites | 41 | 15_hezbollah_khirbet_ramyeh_ambush | |
| | | 16 | sukhoi - drone - turbojet - squadrons - tatarstan | 35 | 16_sukhoi_drone_turbojet_squadrons | |
| | | 17 | belgorod - zaporizhia - volodymyr - sevastopol - reinforcements | 33 | 17_belgorod_zaporizhia_volodymyr_sevastopol | |
| | | 18 | hezbollah - mujahid - hassan - haidar - martyrs | 31 | 18_hezbollah_mujahid_hassan_haidar | |
| | | 19 | gaza - airstrikes - ashkelon - merkava - update | 31 | 19_gaza_airstrikes_ashkelon_merkava | |
| | | 20 | biden - netanyahu - cnn - ayatollahs - interviewed | 31 | 20_biden_netanyahu_cnn_ayatollahs | |
| | | 21 | jew - אילן - kanye - musk - goddamn | 28 | 21_jew_אילן_kanye_musk | |
| | | 22 | khuzestan - khorramabad - taliban - armored - irgc | 28 | 22_khuzestan_khorramabad_taliban_armored | |
| | | 23 | intifada - qassam - hamza - spokesman - obeida | 28 | 23_intifada_qassam_hamza_spokesman | |
| | | 24 | iran - balochistan - terrorists - chabahar - mashhad | 27 | 24_iran_balochistan_terrorists_chabahar | |
| | | 25 | iran - nuclear - cnn - retaliate - notified | 26 | 25_iran_nuclear_cnn_retaliate | |
| | | 26 | hezbollah - katyusha - missiles - barrage - baalbek | 26 | 26_hezbollah_katyusha_missiles_barrage | |
| | | 27 | donald - democrats - newsom - ballot - dumbest | 25 | 27_donald_democrats_newsom_ballot | |
| | | 28 | lebanon - gunfire - outposts - idf - tanks | 24 | 28_lebanon_gunfire_outposts_idf | |
| | | 29 | russia - attackers - bryansk - kiev - terrorist | 24 | 29_russia_attackers_bryansk_kiev | |
| | | 30 | hormuz - tankers - strait - submarine - seized | 23 | 30_hormuz_tankers_strait_submarine | |
| | | 31 | gaza - donate - inshaallah - subscribers - delivered | 21 | 31_gaza_donate_inshaallah_subscribers | |
| | | 32 | palestinian - ashkelon - gunfights - infiltrators - rohovot | 21 | 32_palestinian_ashkelon_gunfights_infiltrators | |
| | |
| | </details> |
| | |
| | ## Training hyperparameters |
| | |
| | * calculate_probabilities: True |
| | * language: None |
| | * low_memory: False |
| | * min_topic_size: 10 |
| | * n_gram_range: (1, 1) |
| | * nr_topics: None |
| | * seed_topic_list: None |
| | * top_n_words: 10 |
| | * verbose: False |
| | * zeroshot_min_similarity: 0.7 |
| | * zeroshot_topic_list: None |
| | |
| | ## Framework versions |
| | |
| | * Numpy: 1.26.4 |
| | * HDBSCAN: 0.8.40 |
| | * UMAP: 0.5.7 |
| | * Pandas: 2.2.3 |
| | * Scikit-Learn: 1.5.2 |
| | * Sentence-transformers: 3.3.1 |
| | * Transformers: 4.46.3 |
| | * Numba: 0.60.0 |
| | * Plotly: 5.24.1 |
| | * Python: 3.10.12 |
| | |