Add BERTopic model
Browse files- README.md +131 -0
- config.json +17 -0
- ctfidf.safetensors +3 -0
- ctfidf_config.json +0 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
tags:
|
| 4 |
+
- bertopic
|
| 5 |
+
library_name: bertopic
|
| 6 |
+
pipeline_tag: text-classification
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# BERTopic_andattakstruk_2
|
| 10 |
+
|
| 11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
| 12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
| 13 |
+
|
| 14 |
+
## Usage
|
| 15 |
+
|
| 16 |
+
To use this model, please install BERTopic:
|
| 17 |
+
|
| 18 |
+
```
|
| 19 |
+
pip install -U bertopic
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
You can use the model as follows:
|
| 23 |
+
|
| 24 |
+
```python
|
| 25 |
+
from bertopic import BERTopic
|
| 26 |
+
topic_model = BERTopic.load("GiganticLemon/BERTopic_andattakstruk_2")
|
| 27 |
+
|
| 28 |
+
topic_model.get_topic_info()
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
## Topic overview
|
| 32 |
+
|
| 33 |
+
* Number of topics: 62
|
| 34 |
+
* Number of training documents: 16559
|
| 35 |
+
|
| 36 |
+
<details>
|
| 37 |
+
<summary>Click here for an overview of all topics.</summary>
|
| 38 |
+
|
| 39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
| 40 |
+
|----------|----------------|-----------------|-------|
|
| 41 |
+
| -1 | the - and - to - of - in | 21 | -1_the_and_to_of |
|
| 42 |
+
| 0 | the - to - of - and - is | 8983 | 0_the_to_of_and |
|
| 43 |
+
| 1 | the - to - that - he - and | 1232 | 1_the_to_that_he |
|
| 44 |
+
| 2 | her - she - to - and - is | 605 | 2_her_she_to_and |
|
| 45 |
+
| 3 | and - the - of - to - in | 506 | 3_and_the_of_to |
|
| 46 |
+
| 4 | the - of - earth - to - and | 473 | 4_the_of_earth_to |
|
| 47 |
+
| 5 | the - and - to - he - his | 459 | 5_the_and_to_he |
|
| 48 |
+
| 6 | the - and - to - of - ship | 416 | 6_the_and_to_of |
|
| 49 |
+
| 7 | the - to - of - and - his | 370 | 7_the_to_of_and |
|
| 50 |
+
| 8 | de - his - he - to - the | 306 | 8_de_his_he_to |
|
| 51 |
+
| 9 | her - she - to - and - is | 192 | 9_her_she_to_and |
|
| 52 |
+
| 10 | chinese - the - and - of - to | 160 | 10_chinese_the_and_of |
|
| 53 |
+
| 11 | the - president - soviet - of - us | 150 | 11_the_president_soviet_of |
|
| 54 |
+
| 12 | russian - the - his - to - of | 145 | 12_russian_the_his_to |
|
| 55 |
+
| 13 | asterix - roman - obelix - the - rome | 141 | 13_asterix_roman_obelix_the |
|
| 56 |
+
| 14 | doctor - tardis - the - ace - to | 140 | 14_doctor_tardis_the_ace |
|
| 57 |
+
| 15 | of - that - the - in - or | 138 | 15_of_that_the_in |
|
| 58 |
+
| 16 | socrates - theseus - the - of - and | 130 | 16_socrates_theseus_the_of |
|
| 59 |
+
| 17 | vampire - vampires - darren - sookie - to | 111 | 17_vampire_vampires_darren_sookie |
|
| 60 |
+
| 18 | kirk - enterprise - spock - federation - klingon | 111 | 18_kirk_enterprise_spock_federation |
|
| 61 |
+
| 19 | reacher - hardy - frank - boys - hardys | 101 | 19_reacher_hardy_frank_boys |
|
| 62 |
+
| 20 | cadfael - his - the - to - of | 99 | 20_cadfael_his_the_to |
|
| 63 |
+
| 21 | jedi - vong - luke - leia - han | 87 | 21_jedi_vong_luke_leia |
|
| 64 |
+
| 22 | german - szpilman - hitler - was - the | 78 | 22_german_szpilman_hitler_was |
|
| 65 |
+
| 23 | jesus - judah - god - of - the | 78 | 23_jesus_judah_god_of |
|
| 66 |
+
| 24 | animorphs - jake - visser - ax - cassie | 67 | 24_animorphs_jake_visser_ax |
|
| 67 |
+
| 25 | spirou - fantasio - champignac - count - marsupilami | 66 | 25_spirou_fantasio_champignac_count |
|
| 68 |
+
| 26 | henson - white - black - the - slaves | 57 | 26_henson_white_black_the |
|
| 69 |
+
| 27 | novel - of - his - in - book | 56 | 27_novel_of_his_in |
|
| 70 |
+
| 28 | dawkins - of - that - science - religion | 55 | 28_dawkins_of_that_science |
|
| 71 |
+
| 29 | obiwan - jedi - quigon - kenobi - anakin | 52 | 29_obiwan_jedi_quigon_kenobi |
|
| 72 |
+
| 30 | cats - clan - thunderclan - kits - firestar | 48 | 30_cats_clan_thunderclan_kits |
|
| 73 |
+
| 31 | redwall - abbey - the - and - vermin | 48 | 31_redwall_abbey_the_and |
|
| 74 |
+
| 32 | virus - the - to - is - of | 47 | 32_virus_the_to_is |
|
| 75 |
+
| 33 | buffy - sunnydale - willow - slayer - giles | 46 | 33_buffy_sunnydale_willow_slayer |
|
| 76 |
+
| 34 | time - machine - traveller - in - the | 44 | 34_time_machine_traveller_in |
|
| 77 |
+
| 35 | confederate - lee - scarlett - rhett - the | 38 | 35_confederate_lee_scarlett_rhett |
|
| 78 |
+
| 36 | bond - bonds - to - leiter - by | 37 | 36_bond_bonds_to_leiter |
|
| 79 |
+
| 37 | baseball - hobbs - game - team - belichick | 37 | 37_baseball_hobbs_game_team |
|
| 80 |
+
| 38 | sharpe - scene - french - sharpes - harper | 36 | 38_sharpe_scene_french_sharpes |
|
| 81 |
+
| 39 | nancy - bess - nancys - george - mystery | 33 | 39_nancy_bess_nancys_george |
|
| 82 |
+
| 40 | women - of - ellador - men - in | 33 | 40_women_of_ellador_men |
|
| 83 |
+
| 41 | manticore - sten - haven - fleet - honor | 32 | 41_manticore_sten_haven_fleet |
|
| 84 |
+
| 42 | billy - john - horse - ranch - harold | 31 | 42_billy_john_horse_ranch |
|
| 85 |
+
| 43 | global - warming - climate - energy - carbon | 30 | 43_global_warming_climate_energy |
|
| 86 |
+
| 44 | christmas - claus - santa - roger - mimi | 30 | 44_christmas_claus_santa_roger |
|
| 87 |
+
| 45 | holmes - sherlock - watson - douglas - that | 29 | 45_holmes_sherlock_watson_douglas |
|
| 88 |
+
| 46 | tarzan - ape - lion - tarzans - opar | 28 | 46_tarzan_ape_lion_tarzans |
|
| 89 |
+
| 47 | conan - conans - dake - aquilonia - raseri | 28 | 47_conan_conans_dake_aquilonia |
|
| 90 |
+
| 48 | angel - angels - quillon - archangel - alleluia | 27 | 48_angel_angels_quillon_archangel |
|
| 91 |
+
| 49 | lone - wolf - kai - magnamund - darklords | 27 | 49_lone_wolf_kai_magnamund |
|
| 92 |
+
| 50 | helm - matt - helms - mac - agency | 27 | 50_helm_matt_helms_mac |
|
| 93 |
+
| 51 | dorothy - oz - elphaba - wizard - ozma | 27 | 51_dorothy_oz_elphaba_wizard |
|
| 94 |
+
| 52 | max - fang - flock - roland - victor | 26 | 52_max_fang_flock_roland |
|
| 95 |
+
| 53 | tom - swift - mr - airship - toms | 25 | 53_tom_swift_mr_airship |
|
| 96 |
+
| 54 | tintin - haddock - calculus - snowy - the | 25 | 54_tintin_haddock_calculus_snowy |
|
| 97 |
+
| 55 | robot - robots - derec - ariel - city | 23 | 55_robot_robots_derec_ariel |
|
| 98 |
+
| 56 | bertie - jeeves - emsworth - gally - freddie | 23 | 56_bertie_jeeves_emsworth_gally |
|
| 99 |
+
| 57 | alex - sarov - alexs - mi6 - to | 23 | 57_alex_sarov_alexs_mi6 |
|
| 100 |
+
| 58 | carson - rayford - tribulation - carpathia - buck | 22 | 58_carson_rayford_tribulation_carpathia |
|
| 101 |
+
| 59 | dresden - harry - thomas - murphy - dresdens | 22 | 59_dresden_harry_thomas_murphy |
|
| 102 |
+
| 60 | brigitta - major - life - novel - of | 22 | 60_brigitta_major_life_novel |
|
| 103 |
+
|
| 104 |
+
</details>
|
| 105 |
+
|
| 106 |
+
## Training hyperparameters
|
| 107 |
+
|
| 108 |
+
* calculate_probabilities: False
|
| 109 |
+
* language: english
|
| 110 |
+
* low_memory: False
|
| 111 |
+
* min_topic_size: 10
|
| 112 |
+
* n_gram_range: (1, 1)
|
| 113 |
+
* nr_topics: None
|
| 114 |
+
* seed_topic_list: None
|
| 115 |
+
* top_n_words: 10
|
| 116 |
+
* verbose: True
|
| 117 |
+
* zeroshot_min_similarity: 0.7
|
| 118 |
+
* zeroshot_topic_list: None
|
| 119 |
+
|
| 120 |
+
## Framework versions
|
| 121 |
+
|
| 122 |
+
* Numpy: 2.0.2
|
| 123 |
+
* HDBSCAN: 0.8.40
|
| 124 |
+
* UMAP: 0.5.7
|
| 125 |
+
* Pandas: 2.2.2
|
| 126 |
+
* Scikit-Learn: 1.6.1
|
| 127 |
+
* Sentence-transformers: 3.4.1
|
| 128 |
+
* Transformers: 4.51.3
|
| 129 |
+
* Numba: 0.60.0
|
| 130 |
+
* Plotly: 5.24.1
|
| 131 |
+
* Python: 3.11.12
|
config.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"calculate_probabilities": false,
|
| 3 |
+
"language": "english",
|
| 4 |
+
"low_memory": false,
|
| 5 |
+
"min_topic_size": 10,
|
| 6 |
+
"n_gram_range": [
|
| 7 |
+
1,
|
| 8 |
+
1
|
| 9 |
+
],
|
| 10 |
+
"nr_topics": null,
|
| 11 |
+
"seed_topic_list": null,
|
| 12 |
+
"top_n_words": 10,
|
| 13 |
+
"verbose": true,
|
| 14 |
+
"zeroshot_min_similarity": 0.7,
|
| 15 |
+
"zeroshot_topic_list": null,
|
| 16 |
+
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
|
| 17 |
+
}
|
ctfidf.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:973a2ccf0dbaa23acd875cc5ebe8082f15d58d996a363a0283c04361c24cb0da
|
| 3 |
+
size 7247376
|
ctfidf_config.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
topic_embeddings.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:46ba0fea440cb0d9e6dde84706adc879861607286bce5c831055765cb4c74cf0
|
| 3 |
+
size 95320
|
topics.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|