AngelPanizo commited on
Commit
69e9e1b
·
verified ·
1 Parent(s): 288eae0

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # MARTINI_enrich_BERTopic_turkcumucadele
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("AIDA-UPM/MARTINI_enrich_BERTopic_turkcumucadele")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 3
34
+ * Number of training documents: 328
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | ataturk - milliyetcilerini - izmir - karsıyız - uyelerimiz | 60 | -1_ataturk_milliyetcilerini_izmir_karsıyız |
42
+ | 0 | sancaklılar - turancılar - kıbrıs - kacakcılıgı - istanbul | 124 | 0_sancaklılar_turancılar_kıbrıs_kacakcılıgı |
43
+ | 1 | milliyetcilerinin - turkculere - yaptıgınız - bayragını - sporları | 144 | 1_milliyetcilerinin_turkculere_yaptıgınız_bayragını |
44
+
45
+ </details>
46
+
47
+ ## Training hyperparameters
48
+
49
+ * calculate_probabilities: True
50
+ * language: None
51
+ * low_memory: False
52
+ * min_topic_size: 10
53
+ * n_gram_range: (1, 1)
54
+ * nr_topics: None
55
+ * seed_topic_list: None
56
+ * top_n_words: 10
57
+ * verbose: False
58
+ * zeroshot_min_similarity: 0.7
59
+ * zeroshot_topic_list: None
60
+
61
+ ## Framework versions
62
+
63
+ * Numpy: 1.26.4
64
+ * HDBSCAN: 0.8.40
65
+ * UMAP: 0.5.7
66
+ * Pandas: 2.2.3
67
+ * Scikit-Learn: 1.5.2
68
+ * Sentence-transformers: 3.3.1
69
+ * Transformers: 4.46.3
70
+ * Numba: 0.60.0
71
+ * Plotly: 5.24.1
72
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:241e60889d1c1811f2b0d0f18561211d0c4171598c0685b2f6f4bf1df2e32736
3
+ size 125220
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07c41a03ef1a73fea841d94dc646bebdece82b5b644e01d57fceace72d57625a
3
+ size 12376
topics.json ADDED
@@ -0,0 +1,430 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "topic_representations": {
3
+ "-1": [
4
+ [
5
+ "ataturk",
6
+ 0.5841145515441895
7
+ ],
8
+ [
9
+ "milliyetcilerini",
10
+ 0.5647975206375122
11
+ ],
12
+ [
13
+ "izmir",
14
+ 0.5517330169677734
15
+ ],
16
+ [
17
+ "kars\u0131y\u0131z",
18
+ 0.5415565967559814
19
+ ],
20
+ [
21
+ "uyelerimiz",
22
+ 0.5410891771316528
23
+ ]
24
+ ],
25
+ "0": [
26
+ [
27
+ "sancakl\u0131lar",
28
+ 0.5665498971939087
29
+ ],
30
+ [
31
+ "turanc\u0131lar",
32
+ 0.5619629621505737
33
+ ],
34
+ [
35
+ "k\u0131br\u0131s",
36
+ 0.5536266565322876
37
+ ],
38
+ [
39
+ "kacakc\u0131l\u0131g\u0131",
40
+ 0.5458101034164429
41
+ ],
42
+ [
43
+ "istanbul",
44
+ 0.5324820280075073
45
+ ]
46
+ ],
47
+ "1": [
48
+ [
49
+ "milliyetcilerinin",
50
+ 0.6006249189376831
51
+ ],
52
+ [
53
+ "turkculere",
54
+ 0.5991430282592773
55
+ ],
56
+ [
57
+ "yapt\u0131g\u0131n\u0131z",
58
+ 0.5403755903244019
59
+ ],
60
+ [
61
+ "bayrag\u0131n\u0131",
62
+ 0.5379725694656372
63
+ ],
64
+ [
65
+ "sporlar\u0131",
66
+ 0.51165771484375
67
+ ]
68
+ ]
69
+ },
70
+ "topics": [
71
+ -1,
72
+ -1,
73
+ -1,
74
+ -1,
75
+ 1,
76
+ 0,
77
+ 1,
78
+ 0,
79
+ -1,
80
+ -1,
81
+ -1,
82
+ -1,
83
+ -1,
84
+ -1,
85
+ -1,
86
+ 0,
87
+ 0,
88
+ 0,
89
+ 0,
90
+ 1,
91
+ 0,
92
+ 0,
93
+ 0,
94
+ 0,
95
+ -1,
96
+ 0,
97
+ 0,
98
+ 1,
99
+ 0,
100
+ 0,
101
+ 1,
102
+ -1,
103
+ -1,
104
+ 1,
105
+ 1,
106
+ -1,
107
+ -1,
108
+ 0,
109
+ -1,
110
+ -1,
111
+ 0,
112
+ 0,
113
+ -1,
114
+ 0,
115
+ 1,
116
+ 0,
117
+ 0,
118
+ 1,
119
+ 1,
120
+ 1,
121
+ 1,
122
+ 0,
123
+ -1,
124
+ 0,
125
+ -1,
126
+ -1,
127
+ -1,
128
+ 0,
129
+ -1,
130
+ 0,
131
+ 1,
132
+ 0,
133
+ 0,
134
+ -1,
135
+ 0,
136
+ 0,
137
+ 0,
138
+ 0,
139
+ -1,
140
+ 0,
141
+ -1,
142
+ -1,
143
+ 0,
144
+ 0,
145
+ -1,
146
+ 0,
147
+ 0,
148
+ 0,
149
+ -1,
150
+ 0,
151
+ -1,
152
+ -1,
153
+ -1,
154
+ 0,
155
+ 1,
156
+ 0,
157
+ 0,
158
+ 0,
159
+ 0,
160
+ 0,
161
+ -1,
162
+ -1,
163
+ -1,
164
+ -1,
165
+ 0,
166
+ 0,
167
+ 1,
168
+ -1,
169
+ -1,
170
+ -1,
171
+ -1,
172
+ -1,
173
+ 1,
174
+ -1,
175
+ -1,
176
+ 1,
177
+ -1,
178
+ -1,
179
+ -1,
180
+ 0,
181
+ -1,
182
+ -1,
183
+ -1,
184
+ -1,
185
+ 1,
186
+ -1,
187
+ -1,
188
+ -1,
189
+ 1,
190
+ -1,
191
+ 1,
192
+ 1,
193
+ 0,
194
+ 1,
195
+ 0,
196
+ 1,
197
+ 0,
198
+ -1,
199
+ -1,
200
+ -1,
201
+ -1,
202
+ 0,
203
+ -1,
204
+ 1,
205
+ -1,
206
+ -1,
207
+ 0,
208
+ -1,
209
+ -1,
210
+ 0,
211
+ -1,
212
+ 1,
213
+ -1,
214
+ -1,
215
+ -1,
216
+ 0,
217
+ 0,
218
+ 0,
219
+ 0,
220
+ -1,
221
+ 1,
222
+ -1,
223
+ 0,
224
+ 0,
225
+ 0,
226
+ 1,
227
+ 1,
228
+ 0,
229
+ 1,
230
+ -1,
231
+ -1,
232
+ 0,
233
+ -1,
234
+ -1,
235
+ 1,
236
+ 1,
237
+ 1,
238
+ 0,
239
+ -1,
240
+ 0,
241
+ 0,
242
+ 1,
243
+ 0,
244
+ 0,
245
+ 0,
246
+ -1,
247
+ 1,
248
+ 0,
249
+ 1,
250
+ 0,
251
+ 0,
252
+ 1,
253
+ 1,
254
+ 1,
255
+ 1,
256
+ 1,
257
+ -1,
258
+ -1,
259
+ -1,
260
+ 0,
261
+ 0,
262
+ 0,
263
+ -1,
264
+ 0,
265
+ -1,
266
+ -1,
267
+ 1,
268
+ 1,
269
+ 0,
270
+ -1,
271
+ -1,
272
+ 0,
273
+ 0,
274
+ 0,
275
+ 0,
276
+ 0,
277
+ -1,
278
+ 0,
279
+ 1,
280
+ 1,
281
+ 0,
282
+ -1,
283
+ 0,
284
+ 0,
285
+ 1,
286
+ 0,
287
+ -1,
288
+ -1,
289
+ 1,
290
+ -1,
291
+ -1,
292
+ -1,
293
+ 0,
294
+ 0,
295
+ -1,
296
+ 0,
297
+ 0,
298
+ 0,
299
+ -1,
300
+ 0,
301
+ 0,
302
+ 0,
303
+ -1,
304
+ 0,
305
+ -1,
306
+ 0,
307
+ 0,
308
+ 0,
309
+ 1,
310
+ 0,
311
+ 0,
312
+ -1,
313
+ 0,
314
+ -1,
315
+ 0,
316
+ 0,
317
+ 0,
318
+ 0,
319
+ -1,
320
+ 0,
321
+ 0,
322
+ -1,
323
+ -1,
324
+ 0,
325
+ 0,
326
+ 0,
327
+ 0,
328
+ 0,
329
+ -1,
330
+ 0,
331
+ 0,
332
+ 1,
333
+ -1,
334
+ 0,
335
+ 0,
336
+ 0,
337
+ 1,
338
+ -1,
339
+ 0,
340
+ 0,
341
+ 0,
342
+ 0,
343
+ 0,
344
+ -1,
345
+ 1,
346
+ 1,
347
+ 0,
348
+ 0,
349
+ -1,
350
+ -1,
351
+ 1,
352
+ 0,
353
+ -1,
354
+ 1,
355
+ -1,
356
+ 1,
357
+ 0,
358
+ 0,
359
+ -1,
360
+ 0,
361
+ -1,
362
+ -1,
363
+ -1,
364
+ 0,
365
+ 0,
366
+ 0,
367
+ 0,
368
+ -1,
369
+ -1,
370
+ 0,
371
+ 0,
372
+ 0,
373
+ 0,
374
+ 1,
375
+ -1,
376
+ -1,
377
+ 0,
378
+ -1,
379
+ 0,
380
+ 0,
381
+ 1,
382
+ -1,
383
+ 0,
384
+ 0,
385
+ 0,
386
+ 0,
387
+ -1,
388
+ -1,
389
+ -1,
390
+ 0,
391
+ 1,
392
+ 1,
393
+ 1,
394
+ -1,
395
+ -1,
396
+ 1,
397
+ 0,
398
+ -1
399
+ ],
400
+ "topic_sizes": {
401
+ "-1": 124,
402
+ "1": 60,
403
+ "0": 144
404
+ },
405
+ "topic_mapper": [
406
+ [
407
+ -1,
408
+ -1,
409
+ -1
410
+ ],
411
+ [
412
+ 0,
413
+ 0,
414
+ 0
415
+ ],
416
+ [
417
+ 1,
418
+ 1,
419
+ 1
420
+ ]
421
+ ],
422
+ "topic_labels": {
423
+ "-1": "-1_ataturk_milliyetcilerini_izmir_kars\u0131y\u0131z",
424
+ "0": "0_sancakl\u0131lar_turanc\u0131lar_k\u0131br\u0131s_kacakc\u0131l\u0131g\u0131",
425
+ "1": "1_milliyetcilerinin_turkculere_yapt\u0131g\u0131n\u0131z_bayrag\u0131n\u0131"
426
+ },
427
+ "custom_labels": null,
428
+ "_outliers": 1,
429
+ "topic_aspects": {}
430
+ }