AngelPanizo commited on
Commit
d157510
·
verified ·
1 Parent(s): a07e809

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # MARTINI_enrich_BERTopic_TheWellnessCompany
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_TheWellnessCompany")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 6
34
+ * Number of training documents: 450
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | fauci - vaccinated - cancers - ivermectin - makis | 21 | -1_fauci_vaccinated_cancers_ivermectin |
42
+ | 0 | livestream - healthcare - supplements - tucker - amazing | 228 | 0_livestream_healthcare_supplements_tucker |
43
+ | 1 | nattokinase - bromelain - spikesymposium - curcumin - supplement | 87 | 1_nattokinase_bromelain_spikesymposium_curcumin |
44
+ | 2 | myocarditis - vaccinated - deaths - c19 - causally | 69 | 2_myocarditis_vaccinated_deaths_c19 |
45
+ | 3 | nattokinase - protease - fibrinolytic - neutralize - japan | 23 | 3_nattokinase_protease_fibrinolytic_neutralize |
46
+ | 4 | cardiologists - suicide - prescribed - negligent - houston | 22 | 4_cardiologists_suicide_prescribed_negligent |
47
+
48
+ </details>
49
+
50
+ ## Training hyperparameters
51
+
52
+ * calculate_probabilities: True
53
+ * language: None
54
+ * low_memory: False
55
+ * min_topic_size: 10
56
+ * n_gram_range: (1, 1)
57
+ * nr_topics: None
58
+ * seed_topic_list: None
59
+ * top_n_words: 10
60
+ * verbose: False
61
+ * zeroshot_min_similarity: 0.7
62
+ * zeroshot_topic_list: None
63
+
64
+ ## Framework versions
65
+
66
+ * Numpy: 1.26.4
67
+ * HDBSCAN: 0.8.40
68
+ * UMAP: 0.5.7
69
+ * Pandas: 2.2.3
70
+ * Scikit-Learn: 1.5.2
71
+ * Sentence-transformers: 3.3.1
72
+ * Transformers: 4.46.3
73
+ * Numba: 0.60.0
74
+ * Plotly: 5.24.1
75
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e68f04d9cb77cf7b8e47c47973d9b6fb56427cf364d254940225729ee154c2eb
3
+ size 106656
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:590a0ce1d0492e71328d549b2e15cc661120a338df56796ea7943686832a0e42
3
+ size 24664
topics.json ADDED
@@ -0,0 +1,639 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "topic_representations": {
3
+ "-1": [
4
+ [
5
+ "fauci",
6
+ 0.5840796232223511
7
+ ],
8
+ [
9
+ "vaccinated",
10
+ 0.5729077458381653
11
+ ],
12
+ [
13
+ "cancers",
14
+ 0.506521463394165
15
+ ],
16
+ [
17
+ "ivermectin",
18
+ 0.49987122416496277
19
+ ],
20
+ [
21
+ "makis",
22
+ 0.4593627452850342
23
+ ]
24
+ ],
25
+ "0": [
26
+ [
27
+ "livestream",
28
+ 0.5271531939506531
29
+ ],
30
+ [
31
+ "healthcare",
32
+ 0.42863935232162476
33
+ ],
34
+ [
35
+ "supplements",
36
+ 0.4007262587547302
37
+ ],
38
+ [
39
+ "tucker",
40
+ 0.4002035856246948
41
+ ],
42
+ [
43
+ "amazing",
44
+ 0.3899478018283844
45
+ ]
46
+ ],
47
+ "1": [
48
+ [
49
+ "nattokinase",
50
+ 0.5607596635818481
51
+ ],
52
+ [
53
+ "bromelain",
54
+ 0.5514837503433228
55
+ ],
56
+ [
57
+ "spikesymposium",
58
+ 0.47098857164382935
59
+ ],
60
+ [
61
+ "curcumin",
62
+ 0.46161046624183655
63
+ ],
64
+ [
65
+ "supplement",
66
+ 0.45820099115371704
67
+ ]
68
+ ],
69
+ "2": [
70
+ [
71
+ "myocarditis",
72
+ 0.5465197563171387
73
+ ],
74
+ [
75
+ "vaccinated",
76
+ 0.487912654876709
77
+ ],
78
+ [
79
+ "deaths",
80
+ 0.4605925679206848
81
+ ],
82
+ [
83
+ "c19",
84
+ 0.38834723830223083
85
+ ],
86
+ [
87
+ "causally",
88
+ 0.3710002899169922
89
+ ]
90
+ ],
91
+ "3": [
92
+ [
93
+ "nattokinase",
94
+ 0.6570773124694824
95
+ ],
96
+ [
97
+ "protease",
98
+ 0.49258729815483093
99
+ ],
100
+ [
101
+ "fibrinolytic",
102
+ 0.4861806631088257
103
+ ],
104
+ [
105
+ "neutralize",
106
+ 0.41933923959732056
107
+ ],
108
+ [
109
+ "japan",
110
+ 0.3859969973564148
111
+ ]
112
+ ],
113
+ "4": [
114
+ [
115
+ "cardiologists",
116
+ 0.47415268421173096
117
+ ],
118
+ [
119
+ "suicide",
120
+ 0.3840067684650421
121
+ ],
122
+ [
123
+ "prescribed",
124
+ 0.3793817162513733
125
+ ],
126
+ [
127
+ "negligent",
128
+ 0.37877094745635986
129
+ ],
130
+ [
131
+ "houston",
132
+ 0.3753000497817993
133
+ ]
134
+ ]
135
+ },
136
+ "topics": [
137
+ -1,
138
+ -1,
139
+ 2,
140
+ -1,
141
+ -1,
142
+ 2,
143
+ -1,
144
+ -1,
145
+ -1,
146
+ -1,
147
+ -1,
148
+ 1,
149
+ -1,
150
+ -1,
151
+ 1,
152
+ -1,
153
+ -1,
154
+ -1,
155
+ -1,
156
+ -1,
157
+ -1,
158
+ -1,
159
+ -1,
160
+ -1,
161
+ 1,
162
+ -1,
163
+ -1,
164
+ -1,
165
+ -1,
166
+ -1,
167
+ 1,
168
+ 1,
169
+ 1,
170
+ -1,
171
+ -1,
172
+ 1,
173
+ -1,
174
+ 1,
175
+ 1,
176
+ 1,
177
+ 1,
178
+ -1,
179
+ -1,
180
+ -1,
181
+ 1,
182
+ -1,
183
+ 1,
184
+ 1,
185
+ -1,
186
+ 0,
187
+ -1,
188
+ 0,
189
+ 1,
190
+ -1,
191
+ -1,
192
+ 0,
193
+ 1,
194
+ 1,
195
+ 0,
196
+ -1,
197
+ 1,
198
+ 1,
199
+ -1,
200
+ 1,
201
+ 4,
202
+ 0,
203
+ -1,
204
+ 0,
205
+ 1,
206
+ 1,
207
+ -1,
208
+ 1,
209
+ -1,
210
+ -1,
211
+ -1,
212
+ 2,
213
+ 1,
214
+ 1,
215
+ 1,
216
+ 1,
217
+ 0,
218
+ -1,
219
+ 0,
220
+ 1,
221
+ -1,
222
+ 1,
223
+ -1,
224
+ 1,
225
+ 0,
226
+ 1,
227
+ 1,
228
+ 0,
229
+ 0,
230
+ 2,
231
+ 0,
232
+ 2,
233
+ 0,
234
+ 1,
235
+ 0,
236
+ 2,
237
+ 1,
238
+ 1,
239
+ 1,
240
+ 2,
241
+ 0,
242
+ 0,
243
+ -1,
244
+ -1,
245
+ 0,
246
+ 0,
247
+ 1,
248
+ -1,
249
+ -1,
250
+ 1,
251
+ 0,
252
+ 1,
253
+ -1,
254
+ -1,
255
+ -1,
256
+ 0,
257
+ -1,
258
+ -1,
259
+ 0,
260
+ -1,
261
+ -1,
262
+ 0,
263
+ -1,
264
+ 1,
265
+ 0,
266
+ 1,
267
+ 1,
268
+ 1,
269
+ -1,
270
+ 4,
271
+ 1,
272
+ -1,
273
+ 0,
274
+ 1,
275
+ 0,
276
+ -1,
277
+ 0,
278
+ 1,
279
+ -1,
280
+ 1,
281
+ -1,
282
+ 2,
283
+ -1,
284
+ -1,
285
+ -1,
286
+ -1,
287
+ 1,
288
+ 0,
289
+ -1,
290
+ -1,
291
+ -1,
292
+ 2,
293
+ -1,
294
+ -1,
295
+ -1,
296
+ 0,
297
+ -1,
298
+ 0,
299
+ -1,
300
+ 1,
301
+ -1,
302
+ -1,
303
+ -1,
304
+ -1,
305
+ -1,
306
+ -1,
307
+ 0,
308
+ -1,
309
+ -1,
310
+ -1,
311
+ -1,
312
+ -1,
313
+ 3,
314
+ 0,
315
+ 0,
316
+ 3,
317
+ 2,
318
+ 1,
319
+ 1,
320
+ 1,
321
+ 0,
322
+ 2,
323
+ -1,
324
+ 3,
325
+ 1,
326
+ 1,
327
+ -1,
328
+ 1,
329
+ 1,
330
+ -1,
331
+ 4,
332
+ 3,
333
+ -1,
334
+ 1,
335
+ -1,
336
+ 0,
337
+ -1,
338
+ -1,
339
+ -1,
340
+ -1,
341
+ -1,
342
+ -1,
343
+ -1,
344
+ 0,
345
+ 0,
346
+ 0,
347
+ 0,
348
+ 0,
349
+ -1,
350
+ 1,
351
+ 0,
352
+ -1,
353
+ 3,
354
+ -1,
355
+ 0,
356
+ -1,
357
+ -1,
358
+ -1,
359
+ 4,
360
+ -1,
361
+ -1,
362
+ 0,
363
+ 3,
364
+ 0,
365
+ -1,
366
+ -1,
367
+ -1,
368
+ 0,
369
+ -1,
370
+ 3,
371
+ 0,
372
+ 1,
373
+ -1,
374
+ 3,
375
+ -1,
376
+ 3,
377
+ 1,
378
+ -1,
379
+ 0,
380
+ 3,
381
+ -1,
382
+ -1,
383
+ -1,
384
+ -1,
385
+ 3,
386
+ -1,
387
+ -1,
388
+ 0,
389
+ 1,
390
+ -1,
391
+ -1,
392
+ -1,
393
+ -1,
394
+ 0,
395
+ -1,
396
+ 3,
397
+ 2,
398
+ -1,
399
+ 3,
400
+ 0,
401
+ 0,
402
+ 0,
403
+ 3,
404
+ -1,
405
+ 0,
406
+ -1,
407
+ -1,
408
+ -1,
409
+ -1,
410
+ 4,
411
+ -1,
412
+ -1,
413
+ 1,
414
+ 3,
415
+ -1,
416
+ 0,
417
+ -1,
418
+ 1,
419
+ -1,
420
+ 0,
421
+ -1,
422
+ -1,
423
+ 1,
424
+ -1,
425
+ -1,
426
+ 3,
427
+ -1,
428
+ 4,
429
+ 4,
430
+ 4,
431
+ -1,
432
+ -1,
433
+ -1,
434
+ -1,
435
+ -1,
436
+ -1,
437
+ -1,
438
+ 2,
439
+ -1,
440
+ -1,
441
+ 1,
442
+ 1,
443
+ 3,
444
+ 3,
445
+ 1,
446
+ 3,
447
+ 1,
448
+ 2,
449
+ -1,
450
+ -1,
451
+ 3,
452
+ 0,
453
+ 0,
454
+ -1,
455
+ 0,
456
+ -1,
457
+ -1,
458
+ 3,
459
+ -1,
460
+ 2,
461
+ 0,
462
+ 0,
463
+ -1,
464
+ -1,
465
+ 4,
466
+ -1,
467
+ -1,
468
+ 0,
469
+ 3,
470
+ 1,
471
+ -1,
472
+ -1,
473
+ 4,
474
+ 4,
475
+ 4,
476
+ 4,
477
+ -1,
478
+ 0,
479
+ -1,
480
+ 2,
481
+ 4,
482
+ 0,
483
+ -1,
484
+ 4,
485
+ -1,
486
+ -1,
487
+ -1,
488
+ 0,
489
+ -1,
490
+ 4,
491
+ -1,
492
+ 4,
493
+ 0,
494
+ -1,
495
+ 2,
496
+ -1,
497
+ -1,
498
+ -1,
499
+ -1,
500
+ -1,
501
+ 0,
502
+ -1,
503
+ 0,
504
+ -1,
505
+ 4,
506
+ -1,
507
+ 2,
508
+ -1,
509
+ 0,
510
+ -1,
511
+ 0,
512
+ 2,
513
+ -1,
514
+ -1,
515
+ -1,
516
+ -1,
517
+ -1,
518
+ -1,
519
+ 4,
520
+ -1,
521
+ -1,
522
+ 2,
523
+ -1,
524
+ -1,
525
+ -1,
526
+ -1,
527
+ -1,
528
+ -1,
529
+ 0,
530
+ -1,
531
+ -1,
532
+ 0,
533
+ 0,
534
+ 4,
535
+ -1,
536
+ -1,
537
+ -1,
538
+ -1,
539
+ -1,
540
+ -1,
541
+ -1,
542
+ -1,
543
+ 0,
544
+ 2,
545
+ -1,
546
+ -1,
547
+ -1,
548
+ 2,
549
+ -1,
550
+ -1,
551
+ -1,
552
+ -1,
553
+ -1,
554
+ -1,
555
+ -1,
556
+ -1,
557
+ -1,
558
+ 2,
559
+ 4,
560
+ -1,
561
+ -1,
562
+ 0,
563
+ -1,
564
+ -1,
565
+ 0,
566
+ -1,
567
+ 0,
568
+ -1,
569
+ 0,
570
+ 0,
571
+ -1,
572
+ -1,
573
+ -1,
574
+ 0,
575
+ 0,
576
+ -1,
577
+ 0,
578
+ 0,
579
+ 0,
580
+ -1,
581
+ 0,
582
+ -1,
583
+ 0,
584
+ 0,
585
+ 0,
586
+ 0
587
+ ],
588
+ "topic_sizes": {
589
+ "-1": 228,
590
+ "2": 23,
591
+ "1": 69,
592
+ "0": 87,
593
+ "4": 21,
594
+ "3": 22
595
+ },
596
+ "topic_mapper": [
597
+ [
598
+ -1,
599
+ -1,
600
+ -1
601
+ ],
602
+ [
603
+ 0,
604
+ 0,
605
+ 0
606
+ ],
607
+ [
608
+ 1,
609
+ 1,
610
+ 3
611
+ ],
612
+ [
613
+ 2,
614
+ 2,
615
+ 1
616
+ ],
617
+ [
618
+ 3,
619
+ 3,
620
+ 2
621
+ ],
622
+ [
623
+ 4,
624
+ 4,
625
+ 4
626
+ ]
627
+ ],
628
+ "topic_labels": {
629
+ "-1": "-1_fauci_vaccinated_cancers_ivermectin",
630
+ "0": "0_livestream_healthcare_supplements_tucker",
631
+ "1": "1_nattokinase_bromelain_spikesymposium_curcumin",
632
+ "2": "2_myocarditis_vaccinated_deaths_c19",
633
+ "3": "3_nattokinase_protease_fibrinolytic_neutralize",
634
+ "4": "4_cardiologists_suicide_prescribed_negligent"
635
+ },
636
+ "custom_labels": null,
637
+ "_outliers": 1,
638
+ "topic_aspects": {}
639
+ }