AngelPanizo commited on
Commit
25b2071
·
verified ·
1 Parent(s): 2cc0b65

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # MARTINI_enrich_BERTopic_oldworld42
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_oldworld42")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 4
34
+ * Number of training documents: 198
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | nsdap - capitalism - marx - stalin - nietzsche | 21 | -1_nsdap_capitalism_marx_stalin |
42
+ | 0 | soros - oligarchs - conspiracy - antisemitic - ukraine | 98 | 0_soros_oligarchs_conspiracy_antisemitic |
43
+ | 1 | brahmins - hindutva - sanskriti - sindhu - aryans | 40 | 1_brahmins_hindutva_sanskriti_sindhu |
44
+ | 2 | enlightenment - aristotle - incarnation - beliefs - christian | 39 | 2_enlightenment_aristotle_incarnation_beliefs |
45
+
46
+ </details>
47
+
48
+ ## Training hyperparameters
49
+
50
+ * calculate_probabilities: True
51
+ * language: None
52
+ * low_memory: False
53
+ * min_topic_size: 10
54
+ * n_gram_range: (1, 1)
55
+ * nr_topics: None
56
+ * seed_topic_list: None
57
+ * top_n_words: 10
58
+ * verbose: False
59
+ * zeroshot_min_similarity: 0.7
60
+ * zeroshot_topic_list: None
61
+
62
+ ## Framework versions
63
+
64
+ * Numpy: 1.26.4
65
+ * HDBSCAN: 0.8.40
66
+ * UMAP: 0.5.7
67
+ * Pandas: 2.2.3
68
+ * Scikit-Learn: 1.5.2
69
+ * Sentence-transformers: 3.3.1
70
+ * Transformers: 4.46.3
71
+ * Numba: 0.60.0
72
+ * Plotly: 5.24.1
73
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c3a0a3977c17edd7138df75ea2ba8c51957a528cb61ef1a980324ec69710898
3
+ size 139376
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75acfc460db612df755863fb17ee88d9fffcbade32c78d4f4ac1c363fc19d9b4
3
+ size 16472
topics.json ADDED
@@ -0,0 +1,329 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "topic_representations": {
3
+ "-1": [
4
+ [
5
+ "nsdap",
6
+ 0.6115792989730835
7
+ ],
8
+ [
9
+ "capitalism",
10
+ 0.5668036341667175
11
+ ],
12
+ [
13
+ "marx",
14
+ 0.5295607447624207
15
+ ],
16
+ [
17
+ "stalin",
18
+ 0.5223168730735779
19
+ ],
20
+ [
21
+ "nietzsche",
22
+ 0.47473692893981934
23
+ ]
24
+ ],
25
+ "0": [
26
+ [
27
+ "soros",
28
+ 0.6220753788948059
29
+ ],
30
+ [
31
+ "oligarchs",
32
+ 0.6183851361274719
33
+ ],
34
+ [
35
+ "conspiracy",
36
+ 0.5660175085067749
37
+ ],
38
+ [
39
+ "antisemitic",
40
+ 0.5395936965942383
41
+ ],
42
+ [
43
+ "ukraine",
44
+ 0.5191923379898071
45
+ ]
46
+ ],
47
+ "1": [
48
+ [
49
+ "brahmins",
50
+ 0.6279194355010986
51
+ ],
52
+ [
53
+ "hindutva",
54
+ 0.611663281917572
55
+ ],
56
+ [
57
+ "sanskriti",
58
+ 0.571912407875061
59
+ ],
60
+ [
61
+ "sindhu",
62
+ 0.5646538138389587
63
+ ],
64
+ [
65
+ "aryans",
66
+ 0.5604211091995239
67
+ ]
68
+ ],
69
+ "2": [
70
+ [
71
+ "enlightenment",
72
+ 0.6110816597938538
73
+ ],
74
+ [
75
+ "aristotle",
76
+ 0.5789486765861511
77
+ ],
78
+ [
79
+ "incarnation",
80
+ 0.5667442083358765
81
+ ],
82
+ [
83
+ "beliefs",
84
+ 0.5513602495193481
85
+ ],
86
+ [
87
+ "christian",
88
+ 0.5380693674087524
89
+ ]
90
+ ]
91
+ },
92
+ "topics": [
93
+ 2,
94
+ 0,
95
+ 0,
96
+ 2,
97
+ -1,
98
+ 0,
99
+ 1,
100
+ 1,
101
+ -1,
102
+ -1,
103
+ -1,
104
+ -1,
105
+ -1,
106
+ 0,
107
+ 1,
108
+ -1,
109
+ -1,
110
+ -1,
111
+ 2,
112
+ 2,
113
+ 2,
114
+ 0,
115
+ -1,
116
+ 1,
117
+ -1,
118
+ -1,
119
+ -1,
120
+ 0,
121
+ 2,
122
+ 0,
123
+ -1,
124
+ -1,
125
+ -1,
126
+ 1,
127
+ -1,
128
+ -1,
129
+ 1,
130
+ 2,
131
+ 2,
132
+ 1,
133
+ -1,
134
+ 1,
135
+ -1,
136
+ -1,
137
+ 1,
138
+ 1,
139
+ 1,
140
+ 0,
141
+ -1,
142
+ 1,
143
+ 1,
144
+ -1,
145
+ -1,
146
+ -1,
147
+ -1,
148
+ -1,
149
+ 0,
150
+ 1,
151
+ 2,
152
+ 1,
153
+ 1,
154
+ -1,
155
+ -1,
156
+ 2,
157
+ -1,
158
+ 1,
159
+ -1,
160
+ 0,
161
+ 1,
162
+ 1,
163
+ 1,
164
+ 1,
165
+ 1,
166
+ 1,
167
+ 1,
168
+ 1,
169
+ 1,
170
+ 1,
171
+ 1,
172
+ 1,
173
+ 1,
174
+ 1,
175
+ 1,
176
+ 1,
177
+ -1,
178
+ 0,
179
+ 0,
180
+ 0,
181
+ 0,
182
+ -1,
183
+ 1,
184
+ 1,
185
+ 1,
186
+ -1,
187
+ -1,
188
+ -1,
189
+ 2,
190
+ -1,
191
+ -1,
192
+ -1,
193
+ 0,
194
+ 0,
195
+ 0,
196
+ 0,
197
+ 1,
198
+ -1,
199
+ 0,
200
+ -1,
201
+ -1,
202
+ 0,
203
+ -1,
204
+ 2,
205
+ -1,
206
+ -1,
207
+ -1,
208
+ -1,
209
+ 2,
210
+ -1,
211
+ -1,
212
+ -1,
213
+ 2,
214
+ -1,
215
+ 0,
216
+ -1,
217
+ -1,
218
+ -1,
219
+ 0,
220
+ -1,
221
+ -1,
222
+ -1,
223
+ 0,
224
+ -1,
225
+ 0,
226
+ -1,
227
+ -1,
228
+ -1,
229
+ -1,
230
+ 0,
231
+ 0,
232
+ 0,
233
+ -1,
234
+ 1,
235
+ -1,
236
+ 0,
237
+ -1,
238
+ 2,
239
+ -1,
240
+ -1,
241
+ 0,
242
+ -1,
243
+ 0,
244
+ -1,
245
+ 0,
246
+ 0,
247
+ -1,
248
+ -1,
249
+ -1,
250
+ 0,
251
+ -1,
252
+ -1,
253
+ -1,
254
+ -1,
255
+ -1,
256
+ -1,
257
+ 0,
258
+ 2,
259
+ 2,
260
+ 2,
261
+ -1,
262
+ 0,
263
+ 0,
264
+ 0,
265
+ 0,
266
+ -1,
267
+ 1,
268
+ -1,
269
+ -1,
270
+ -1,
271
+ 2,
272
+ -1,
273
+ -1,
274
+ -1,
275
+ -1,
276
+ -1,
277
+ -1,
278
+ 0,
279
+ 0,
280
+ -1,
281
+ -1,
282
+ -1,
283
+ 2,
284
+ -1,
285
+ -1,
286
+ -1,
287
+ -1,
288
+ -1,
289
+ -1,
290
+ 2
291
+ ],
292
+ "topic_sizes": {
293
+ "2": 21,
294
+ "0": 40,
295
+ "-1": 98,
296
+ "1": 39
297
+ },
298
+ "topic_mapper": [
299
+ [
300
+ -1,
301
+ -1,
302
+ -1
303
+ ],
304
+ [
305
+ 0,
306
+ 0,
307
+ 1
308
+ ],
309
+ [
310
+ 1,
311
+ 1,
312
+ 2
313
+ ],
314
+ [
315
+ 2,
316
+ 2,
317
+ 0
318
+ ]
319
+ ],
320
+ "topic_labels": {
321
+ "-1": "-1_nsdap_capitalism_marx_stalin",
322
+ "0": "0_soros_oligarchs_conspiracy_antisemitic",
323
+ "1": "1_brahmins_hindutva_sanskriti_sindhu",
324
+ "2": "2_enlightenment_aristotle_incarnation_beliefs"
325
+ },
326
+ "custom_labels": null,
327
+ "_outliers": 1,
328
+ "topic_aspects": {}
329
+ }