GiganticLemon commited on
Commit
d2c8044
·
verified ·
1 Parent(s): 120318e

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # BERTopic_andattakstruk_2
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("GiganticLemon/BERTopic_andattakstruk_2")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 62
34
+ * Number of training documents: 16559
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | the - and - to - of - in | 21 | -1_the_and_to_of |
42
+ | 0 | the - to - of - and - is | 8983 | 0_the_to_of_and |
43
+ | 1 | the - to - that - he - and | 1232 | 1_the_to_that_he |
44
+ | 2 | her - she - to - and - is | 605 | 2_her_she_to_and |
45
+ | 3 | and - the - of - to - in | 506 | 3_and_the_of_to |
46
+ | 4 | the - of - earth - to - and | 473 | 4_the_of_earth_to |
47
+ | 5 | the - and - to - he - his | 459 | 5_the_and_to_he |
48
+ | 6 | the - and - to - of - ship | 416 | 6_the_and_to_of |
49
+ | 7 | the - to - of - and - his | 370 | 7_the_to_of_and |
50
+ | 8 | de - his - he - to - the | 306 | 8_de_his_he_to |
51
+ | 9 | her - she - to - and - is | 192 | 9_her_she_to_and |
52
+ | 10 | chinese - the - and - of - to | 160 | 10_chinese_the_and_of |
53
+ | 11 | the - president - soviet - of - us | 150 | 11_the_president_soviet_of |
54
+ | 12 | russian - the - his - to - of | 145 | 12_russian_the_his_to |
55
+ | 13 | asterix - roman - obelix - the - rome | 141 | 13_asterix_roman_obelix_the |
56
+ | 14 | doctor - tardis - the - ace - to | 140 | 14_doctor_tardis_the_ace |
57
+ | 15 | of - that - the - in - or | 138 | 15_of_that_the_in |
58
+ | 16 | socrates - theseus - the - of - and | 130 | 16_socrates_theseus_the_of |
59
+ | 17 | vampire - vampires - darren - sookie - to | 111 | 17_vampire_vampires_darren_sookie |
60
+ | 18 | kirk - enterprise - spock - federation - klingon | 111 | 18_kirk_enterprise_spock_federation |
61
+ | 19 | reacher - hardy - frank - boys - hardys | 101 | 19_reacher_hardy_frank_boys |
62
+ | 20 | cadfael - his - the - to - of | 99 | 20_cadfael_his_the_to |
63
+ | 21 | jedi - vong - luke - leia - han | 87 | 21_jedi_vong_luke_leia |
64
+ | 22 | german - szpilman - hitler - was - the | 78 | 22_german_szpilman_hitler_was |
65
+ | 23 | jesus - judah - god - of - the | 78 | 23_jesus_judah_god_of |
66
+ | 24 | animorphs - jake - visser - ax - cassie | 67 | 24_animorphs_jake_visser_ax |
67
+ | 25 | spirou - fantasio - champignac - count - marsupilami | 66 | 25_spirou_fantasio_champignac_count |
68
+ | 26 | henson - white - black - the - slaves | 57 | 26_henson_white_black_the |
69
+ | 27 | novel - of - his - in - book | 56 | 27_novel_of_his_in |
70
+ | 28 | dawkins - of - that - science - religion | 55 | 28_dawkins_of_that_science |
71
+ | 29 | obiwan - jedi - quigon - kenobi - anakin | 52 | 29_obiwan_jedi_quigon_kenobi |
72
+ | 30 | cats - clan - thunderclan - kits - firestar | 48 | 30_cats_clan_thunderclan_kits |
73
+ | 31 | redwall - abbey - the - and - vermin | 48 | 31_redwall_abbey_the_and |
74
+ | 32 | virus - the - to - is - of | 47 | 32_virus_the_to_is |
75
+ | 33 | buffy - sunnydale - willow - slayer - giles | 46 | 33_buffy_sunnydale_willow_slayer |
76
+ | 34 | time - machine - traveller - in - the | 44 | 34_time_machine_traveller_in |
77
+ | 35 | confederate - lee - scarlett - rhett - the | 38 | 35_confederate_lee_scarlett_rhett |
78
+ | 36 | bond - bonds - to - leiter - by | 37 | 36_bond_bonds_to_leiter |
79
+ | 37 | baseball - hobbs - game - team - belichick | 37 | 37_baseball_hobbs_game_team |
80
+ | 38 | sharpe - scene - french - sharpes - harper | 36 | 38_sharpe_scene_french_sharpes |
81
+ | 39 | nancy - bess - nancys - george - mystery | 33 | 39_nancy_bess_nancys_george |
82
+ | 40 | women - of - ellador - men - in | 33 | 40_women_of_ellador_men |
83
+ | 41 | manticore - sten - haven - fleet - honor | 32 | 41_manticore_sten_haven_fleet |
84
+ | 42 | billy - john - horse - ranch - harold | 31 | 42_billy_john_horse_ranch |
85
+ | 43 | global - warming - climate - energy - carbon | 30 | 43_global_warming_climate_energy |
86
+ | 44 | christmas - claus - santa - roger - mimi | 30 | 44_christmas_claus_santa_roger |
87
+ | 45 | holmes - sherlock - watson - douglas - that | 29 | 45_holmes_sherlock_watson_douglas |
88
+ | 46 | tarzan - ape - lion - tarzans - opar | 28 | 46_tarzan_ape_lion_tarzans |
89
+ | 47 | conan - conans - dake - aquilonia - raseri | 28 | 47_conan_conans_dake_aquilonia |
90
+ | 48 | angel - angels - quillon - archangel - alleluia | 27 | 48_angel_angels_quillon_archangel |
91
+ | 49 | lone - wolf - kai - magnamund - darklords | 27 | 49_lone_wolf_kai_magnamund |
92
+ | 50 | helm - matt - helms - mac - agency | 27 | 50_helm_matt_helms_mac |
93
+ | 51 | dorothy - oz - elphaba - wizard - ozma | 27 | 51_dorothy_oz_elphaba_wizard |
94
+ | 52 | max - fang - flock - roland - victor | 26 | 52_max_fang_flock_roland |
95
+ | 53 | tom - swift - mr - airship - toms | 25 | 53_tom_swift_mr_airship |
96
+ | 54 | tintin - haddock - calculus - snowy - the | 25 | 54_tintin_haddock_calculus_snowy |
97
+ | 55 | robot - robots - derec - ariel - city | 23 | 55_robot_robots_derec_ariel |
98
+ | 56 | bertie - jeeves - emsworth - gally - freddie | 23 | 56_bertie_jeeves_emsworth_gally |
99
+ | 57 | alex - sarov - alexs - mi6 - to | 23 | 57_alex_sarov_alexs_mi6 |
100
+ | 58 | carson - rayford - tribulation - carpathia - buck | 22 | 58_carson_rayford_tribulation_carpathia |
101
+ | 59 | dresden - harry - thomas - murphy - dresdens | 22 | 59_dresden_harry_thomas_murphy |
102
+ | 60 | brigitta - major - life - novel - of | 22 | 60_brigitta_major_life_novel |
103
+
104
+ </details>
105
+
106
+ ## Training hyperparameters
107
+
108
+ * calculate_probabilities: False
109
+ * language: english
110
+ * low_memory: False
111
+ * min_topic_size: 10
112
+ * n_gram_range: (1, 1)
113
+ * nr_topics: None
114
+ * seed_topic_list: None
115
+ * top_n_words: 10
116
+ * verbose: True
117
+ * zeroshot_min_similarity: 0.7
118
+ * zeroshot_topic_list: None
119
+
120
+ ## Framework versions
121
+
122
+ * Numpy: 2.0.2
123
+ * HDBSCAN: 0.8.40
124
+ * UMAP: 0.5.7
125
+ * Pandas: 2.2.2
126
+ * Scikit-Learn: 1.6.1
127
+ * Sentence-transformers: 3.4.1
128
+ * Transformers: 4.51.3
129
+ * Numba: 0.60.0
130
+ * Plotly: 5.24.1
131
+ * Python: 3.11.12
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": false,
3
+ "language": "english",
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": true,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null,
16
+ "embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
17
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:973a2ccf0dbaa23acd875cc5ebe8082f15d58d996a363a0283c04361c24cb0da
3
+ size 7247376
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46ba0fea440cb0d9e6dde84706adc879861607286bce5c831055765cb4c74cf0
3
+ size 95320
topics.json ADDED
The diff for this file is too large to render. See raw diff