AngelPanizo commited on
Commit
5d95903
·
verified ·
1 Parent(s): a1b89f5

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # MARTINI_enrich_BERTopic_RestoredPuritanism
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_RestoredPuritanism")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 5
34
+ * Number of training documents: 331
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | israelites - genesis - souls - christian - scripture | 20 | -1_israelites_genesis_souls_christian |
42
+ | 0 | population - abortion - decline - china - 2022 | 215 | 0_population_abortion_decline_china |
43
+ | 1 | caucasians - adamic - solomon - genesis - hyperdepigmentised | 47 | 1_caucasians_adamic_solomon_genesis |
44
+ | 2 | caucasians - neanderthals - haplogroups - ethnogenesis - siberian | 25 | 2_caucasians_neanderthals_haplogroups_ethnogenesis |
45
+ | 3 | protestantism - catechism - preterism - orthodox - puritan | 24 | 3_protestantism_catechism_preterism_orthodox |
46
+
47
+ </details>
48
+
49
+ ## Training hyperparameters
50
+
51
+ * calculate_probabilities: True
52
+ * language: None
53
+ * low_memory: False
54
+ * min_topic_size: 10
55
+ * n_gram_range: (1, 1)
56
+ * nr_topics: None
57
+ * seed_topic_list: None
58
+ * top_n_words: 10
59
+ * verbose: False
60
+ * zeroshot_min_similarity: 0.7
61
+ * zeroshot_topic_list: None
62
+
63
+ ## Framework versions
64
+
65
+ * Numpy: 1.26.4
66
+ * HDBSCAN: 0.8.40
67
+ * UMAP: 0.5.7
68
+ * Pandas: 2.2.3
69
+ * Scikit-Learn: 1.5.2
70
+ * Sentence-transformers: 3.3.1
71
+ * Transformers: 4.46.3
72
+ * Numba: 0.60.0
73
+ * Plotly: 5.24.1
74
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3941ac6956bd76a275caa0699c974ef877a1986602559781d4d59bc7bdd043ce
3
+ size 210960
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e630611f8f2509441fee88728478f45bf591a6712909be8fe80fdabaab34999f
3
+ size 20568
topics.json ADDED
@@ -0,0 +1,491 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "topic_representations": {
3
+ "-1": [
4
+ [
5
+ "israelites",
6
+ 0.581637442111969
7
+ ],
8
+ [
9
+ "genesis",
10
+ 0.5764267444610596
11
+ ],
12
+ [
13
+ "souls",
14
+ 0.5424519181251526
15
+ ],
16
+ [
17
+ "christian",
18
+ 0.5219671726226807
19
+ ],
20
+ [
21
+ "scripture",
22
+ 0.5208417177200317
23
+ ]
24
+ ],
25
+ "0": [
26
+ [
27
+ "population",
28
+ 0.5678433179855347
29
+ ],
30
+ [
31
+ "abortion",
32
+ 0.517209529876709
33
+ ],
34
+ [
35
+ "decline",
36
+ 0.4902135133743286
37
+ ],
38
+ [
39
+ "china",
40
+ 0.4678730368614197
41
+ ],
42
+ [
43
+ "2022",
44
+ 0.4570675492286682
45
+ ]
46
+ ],
47
+ "1": [
48
+ [
49
+ "caucasians",
50
+ 0.5870243906974792
51
+ ],
52
+ [
53
+ "adamic",
54
+ 0.5478933453559875
55
+ ],
56
+ [
57
+ "solomon",
58
+ 0.5107927322387695
59
+ ],
60
+ [
61
+ "genesis",
62
+ 0.5088626146316528
63
+ ],
64
+ [
65
+ "hyperdepigmentised",
66
+ 0.43542546033859253
67
+ ]
68
+ ],
69
+ "2": [
70
+ [
71
+ "caucasians",
72
+ 0.629845142364502
73
+ ],
74
+ [
75
+ "neanderthals",
76
+ 0.5478528738021851
77
+ ],
78
+ [
79
+ "haplogroups",
80
+ 0.5312052965164185
81
+ ],
82
+ [
83
+ "ethnogenesis",
84
+ 0.5104053020477295
85
+ ],
86
+ [
87
+ "siberian",
88
+ 0.48684149980545044
89
+ ]
90
+ ],
91
+ "3": [
92
+ [
93
+ "protestantism",
94
+ 0.6314404010772705
95
+ ],
96
+ [
97
+ "catechism",
98
+ 0.6196143627166748
99
+ ],
100
+ [
101
+ "preterism",
102
+ 0.6127315759658813
103
+ ],
104
+ [
105
+ "orthodox",
106
+ 0.602478563785553
107
+ ],
108
+ [
109
+ "puritan",
110
+ 0.5797485113143921
111
+ ]
112
+ ]
113
+ },
114
+ "topics": [
115
+ -1,
116
+ 3,
117
+ -1,
118
+ -1,
119
+ -1,
120
+ -1,
121
+ 0,
122
+ 0,
123
+ -1,
124
+ -1,
125
+ -1,
126
+ -1,
127
+ -1,
128
+ -1,
129
+ -1,
130
+ -1,
131
+ 0,
132
+ 0,
133
+ -1,
134
+ 0,
135
+ -1,
136
+ -1,
137
+ 2,
138
+ 0,
139
+ 0,
140
+ 2,
141
+ -1,
142
+ 0,
143
+ -1,
144
+ -1,
145
+ -1,
146
+ -1,
147
+ -1,
148
+ -1,
149
+ 0,
150
+ 0,
151
+ -1,
152
+ -1,
153
+ 2,
154
+ 2,
155
+ 0,
156
+ -1,
157
+ -1,
158
+ 0,
159
+ 0,
160
+ -1,
161
+ 0,
162
+ -1,
163
+ 3,
164
+ -1,
165
+ -1,
166
+ 0,
167
+ 0,
168
+ 0,
169
+ 0,
170
+ -1,
171
+ 0,
172
+ -1,
173
+ 0,
174
+ 0,
175
+ 0,
176
+ -1,
177
+ -1,
178
+ -1,
179
+ -1,
180
+ 0,
181
+ -1,
182
+ -1,
183
+ 0,
184
+ -1,
185
+ 3,
186
+ 0,
187
+ -1,
188
+ -1,
189
+ -1,
190
+ 0,
191
+ 0,
192
+ -1,
193
+ -1,
194
+ 2,
195
+ -1,
196
+ -1,
197
+ -1,
198
+ -1,
199
+ -1,
200
+ 1,
201
+ -1,
202
+ -1,
203
+ -1,
204
+ 0,
205
+ 0,
206
+ 0,
207
+ -1,
208
+ -1,
209
+ -1,
210
+ -1,
211
+ -1,
212
+ 1,
213
+ -1,
214
+ -1,
215
+ -1,
216
+ -1,
217
+ -1,
218
+ 2,
219
+ -1,
220
+ -1,
221
+ 1,
222
+ -1,
223
+ 2,
224
+ -1,
225
+ 2,
226
+ -1,
227
+ 2,
228
+ -1,
229
+ -1,
230
+ -1,
231
+ -1,
232
+ -1,
233
+ -1,
234
+ -1,
235
+ -1,
236
+ -1,
237
+ -1,
238
+ -1,
239
+ 0,
240
+ 0,
241
+ 0,
242
+ -1,
243
+ -1,
244
+ 3,
245
+ -1,
246
+ -1,
247
+ -1,
248
+ -1,
249
+ 0,
250
+ 0,
251
+ -1,
252
+ 2,
253
+ 2,
254
+ 2,
255
+ 3,
256
+ -1,
257
+ -1,
258
+ -1,
259
+ 0,
260
+ 1,
261
+ -1,
262
+ 0,
263
+ 0,
264
+ 0,
265
+ 0,
266
+ 0,
267
+ 0,
268
+ -1,
269
+ 0,
270
+ -1,
271
+ 0,
272
+ -1,
273
+ -1,
274
+ -1,
275
+ -1,
276
+ -1,
277
+ -1,
278
+ -1,
279
+ 2,
280
+ -1,
281
+ 2,
282
+ 2,
283
+ 2,
284
+ 2,
285
+ 2,
286
+ -1,
287
+ 2,
288
+ 2,
289
+ 2,
290
+ 3,
291
+ -1,
292
+ -1,
293
+ -1,
294
+ -1,
295
+ -1,
296
+ -1,
297
+ -1,
298
+ 3,
299
+ -1,
300
+ -1,
301
+ -1,
302
+ 3,
303
+ -1,
304
+ -1,
305
+ 0,
306
+ 0,
307
+ 2,
308
+ -1,
309
+ -1,
310
+ -1,
311
+ -1,
312
+ -1,
313
+ -1,
314
+ 1,
315
+ -1,
316
+ 2,
317
+ -1,
318
+ -1,
319
+ 2,
320
+ -1,
321
+ -1,
322
+ -1,
323
+ -1,
324
+ -1,
325
+ 3,
326
+ -1,
327
+ -1,
328
+ -1,
329
+ -1,
330
+ -1,
331
+ 1,
332
+ 1,
333
+ 1,
334
+ 1,
335
+ -1,
336
+ 1,
337
+ 1,
338
+ -1,
339
+ 1,
340
+ 1,
341
+ 1,
342
+ 1,
343
+ 1,
344
+ -1,
345
+ -1,
346
+ -1,
347
+ 3,
348
+ 3,
349
+ -1,
350
+ 1,
351
+ -1,
352
+ -1,
353
+ 3,
354
+ 3,
355
+ -1,
356
+ -1,
357
+ -1,
358
+ -1,
359
+ -1,
360
+ -1,
361
+ 3,
362
+ -1,
363
+ -1,
364
+ -1,
365
+ -1,
366
+ 3,
367
+ -1,
368
+ 1,
369
+ 3,
370
+ -1,
371
+ -1,
372
+ -1,
373
+ 1,
374
+ -1,
375
+ -1,
376
+ -1,
377
+ -1,
378
+ -1,
379
+ -1,
380
+ 1,
381
+ -1,
382
+ -1,
383
+ -1,
384
+ -1,
385
+ -1,
386
+ 1,
387
+ -1,
388
+ -1,
389
+ -1,
390
+ -1,
391
+ -1,
392
+ -1,
393
+ -1,
394
+ -1,
395
+ 1,
396
+ -1,
397
+ -1,
398
+ -1,
399
+ -1,
400
+ 3,
401
+ -1,
402
+ -1,
403
+ 1,
404
+ -1,
405
+ -1,
406
+ -1,
407
+ -1,
408
+ 1,
409
+ -1,
410
+ -1,
411
+ -1,
412
+ -1,
413
+ -1,
414
+ 3,
415
+ -1,
416
+ -1,
417
+ -1,
418
+ -1,
419
+ -1,
420
+ -1,
421
+ 3,
422
+ -1,
423
+ -1,
424
+ -1,
425
+ -1,
426
+ -1,
427
+ -1,
428
+ -1,
429
+ -1,
430
+ -1,
431
+ 0,
432
+ -1,
433
+ -1,
434
+ -1,
435
+ -1,
436
+ -1,
437
+ -1,
438
+ -1,
439
+ -1,
440
+ 1,
441
+ -1,
442
+ -1,
443
+ 3,
444
+ -1,
445
+ -1
446
+ ],
447
+ "topic_sizes": {
448
+ "-1": 215,
449
+ "3": 20,
450
+ "0": 47,
451
+ "2": 24,
452
+ "1": 25
453
+ },
454
+ "topic_mapper": [
455
+ [
456
+ -1,
457
+ -1,
458
+ -1
459
+ ],
460
+ [
461
+ 0,
462
+ 0,
463
+ 2
464
+ ],
465
+ [
466
+ 1,
467
+ 1,
468
+ 0
469
+ ],
470
+ [
471
+ 2,
472
+ 2,
473
+ 3
474
+ ],
475
+ [
476
+ 3,
477
+ 3,
478
+ 1
479
+ ]
480
+ ],
481
+ "topic_labels": {
482
+ "-1": "-1_israelites_genesis_souls_christian",
483
+ "0": "0_population_abortion_decline_china",
484
+ "1": "1_caucasians_adamic_solomon_genesis",
485
+ "2": "2_caucasians_neanderthals_haplogroups_ethnogenesis",
486
+ "3": "3_protestantism_catechism_preterism_orthodox"
487
+ },
488
+ "custom_labels": null,
489
+ "_outliers": 1,
490
+ "topic_aspects": {}
491
+ }