RikkaBotan commited on
Commit
e31a808
·
verified ·
1 Parent(s): fdfa18b

Upload 7 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/RikkaBotan_Logo.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/SSE_Architecture.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/SSE_Logo.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/SSE_loss.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/SSE_ndcg.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,1081 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ja
4
  license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - dense
10
+ - generated_from_trainer
11
+ - dataset_size:15098874
12
+ - loss:MatryoshkaLoss
13
+ - loss:MultipleNegativesRankingLoss
14
+ pipeline_tag: sentence-similarity
15
+ library_name: sentence-transformers
16
+ metrics:
17
+ - cosine_accuracy@1
18
+ - cosine_accuracy@3
19
+ - cosine_accuracy@5
20
+ - cosine_accuracy@10
21
+ - cosine_precision@1
22
+ - cosine_precision@3
23
+ - cosine_precision@5
24
+ - cosine_precision@10
25
+ - cosine_recall@1
26
+ - cosine_recall@3
27
+ - cosine_recall@5
28
+ - cosine_recall@10
29
+ - cosine_ndcg@10
30
+ - cosine_mrr@10
31
+ - cosine_map@100
32
+ model-index:
33
+ - name: SSE Retrieval MRL
34
+ results:
35
+ - task:
36
+ type: information-retrieval
37
+ name: Information Retrieval
38
+ dataset:
39
+ name: NanoClimateFEVER
40
+ type: NanoClimateFEVER
41
+ metrics:
42
+ - type: cosine_accuracy@1
43
+ value: 0.28
44
+ name: Cosine Accuracy@1
45
+ - type: cosine_accuracy@3
46
+ value: 0.5
47
+ name: Cosine Accuracy@3
48
+ - type: cosine_accuracy@5
49
+ value: 0.6
50
+ name: Cosine Accuracy@5
51
+ - type: cosine_accuracy@10
52
+ value: 0.72
53
+ name: Cosine Accuracy@10
54
+ - type: cosine_precision@1
55
+ value: 0.28
56
+ name: Cosine Precision@1
57
+ - type: cosine_precision@3
58
+ value: 0.17999999999999997
59
+ name: Cosine Precision@3
60
+ - type: cosine_precision@5
61
+ value: 0.14
62
+ name: Cosine Precision@5
63
+ - type: cosine_precision@10
64
+ value: 0.096
65
+ name: Cosine Precision@10
66
+ - type: cosine_recall@1
67
+ value: 0.11566666666666667
68
+ name: Cosine Recall@1
69
+ - type: cosine_recall@3
70
+ value: 0.259
71
+ name: Cosine Recall@3
72
+ - type: cosine_recall@5
73
+ value: 0.309
74
+ name: Cosine Recall@5
75
+ - type: cosine_recall@10
76
+ value: 0.38366666666666666
77
+ name: Cosine Recall@10
78
+ - type: cosine_ndcg@10
79
+ value: 0.31101912464080167
80
+ name: Cosine Ndcg@10
81
+ - type: cosine_mrr@10
82
+ value: 0.42077777777777775
83
+ name: Cosine Mrr@10
84
+ - type: cosine_map@100
85
+ value: 0.23472725032298258
86
+ name: Cosine Map@100
87
+ - task:
88
+ type: information-retrieval
89
+ name: Information Retrieval
90
+ dataset:
91
+ name: NanoDBPedia
92
+ type: NanoDBPedia
93
+ metrics:
94
+ - type: cosine_accuracy@1
95
+ value: 0.64
96
+ name: Cosine Accuracy@1
97
+ - type: cosine_accuracy@3
98
+ value: 0.9
99
+ name: Cosine Accuracy@3
100
+ - type: cosine_accuracy@5
101
+ value: 0.92
102
+ name: Cosine Accuracy@5
103
+ - type: cosine_accuracy@10
104
+ value: 0.98
105
+ name: Cosine Accuracy@10
106
+ - type: cosine_precision@1
107
+ value: 0.64
108
+ name: Cosine Precision@1
109
+ - type: cosine_precision@3
110
+ value: 0.5866666666666667
111
+ name: Cosine Precision@3
112
+ - type: cosine_precision@5
113
+ value: 0.516
114
+ name: Cosine Precision@5
115
+ - type: cosine_precision@10
116
+ value: 0.458
117
+ name: Cosine Precision@10
118
+ - type: cosine_recall@1
119
+ value: 0.06160341544840008
120
+ name: Cosine Recall@1
121
+ - type: cosine_recall@3
122
+ value: 0.16190481698320675
123
+ name: Cosine Recall@3
124
+ - type: cosine_recall@5
125
+ value: 0.2178662941767401
126
+ name: Cosine Recall@5
127
+ - type: cosine_recall@10
128
+ value: 0.3311868598508409
129
+ name: Cosine Recall@10
130
+ - type: cosine_ndcg@10
131
+ value: 0.5596322526310974
132
+ name: Cosine Ndcg@10
133
+ - type: cosine_mrr@10
134
+ value: 0.7651904761904761
135
+ name: Cosine Mrr@10
136
+ - type: cosine_map@100
137
+ value: 0.39996237625484127
138
+ name: Cosine Map@100
139
+ - task:
140
+ type: information-retrieval
141
+ name: Information Retrieval
142
+ dataset:
143
+ name: NanoFEVER
144
+ type: NanoFEVER
145
+ metrics:
146
+ - type: cosine_accuracy@1
147
+ value: 0.34
148
+ name: Cosine Accuracy@1
149
+ - type: cosine_accuracy@3
150
+ value: 0.6
151
+ name: Cosine Accuracy@3
152
+ - type: cosine_accuracy@5
153
+ value: 0.68
154
+ name: Cosine Accuracy@5
155
+ - type: cosine_accuracy@10
156
+ value: 0.82
157
+ name: Cosine Accuracy@10
158
+ - type: cosine_precision@1
159
+ value: 0.34
160
+ name: Cosine Precision@1
161
+ - type: cosine_precision@3
162
+ value: 0.2
163
+ name: Cosine Precision@3
164
+ - type: cosine_precision@5
165
+ value: 0.14
166
+ name: Cosine Precision@5
167
+ - type: cosine_precision@10
168
+ value: 0.08599999999999998
169
+ name: Cosine Precision@10
170
+ - type: cosine_recall@1
171
+ value: 0.33
172
+ name: Cosine Recall@1
173
+ - type: cosine_recall@3
174
+ value: 0.5566666666666668
175
+ name: Cosine Recall@3
176
+ - type: cosine_recall@5
177
+ value: 0.6466666666666667
178
+ name: Cosine Recall@5
179
+ - type: cosine_recall@10
180
+ value: 0.7866666666666667
181
+ name: Cosine Recall@10
182
+ - type: cosine_ndcg@10
183
+ value: 0.5611230907066518
184
+ name: Cosine Ndcg@10
185
+ - type: cosine_mrr@10
186
+ value: 0.5003333333333334
187
+ name: Cosine Mrr@10
188
+ - type: cosine_map@100
189
+ value: 0.49227582231970923
190
+ name: Cosine Map@100
191
+ - task:
192
+ type: information-retrieval
193
+ name: Information Retrieval
194
+ dataset:
195
+ name: NanoFiQA2018
196
+ type: NanoFiQA2018
197
+ metrics:
198
+ - type: cosine_accuracy@1
199
+ value: 0.28
200
+ name: Cosine Accuracy@1
201
+ - type: cosine_accuracy@3
202
+ value: 0.4
203
+ name: Cosine Accuracy@3
204
+ - type: cosine_accuracy@5
205
+ value: 0.48
206
+ name: Cosine Accuracy@5
207
+ - type: cosine_accuracy@10
208
+ value: 0.64
209
+ name: Cosine Accuracy@10
210
+ - type: cosine_precision@1
211
+ value: 0.28
212
+ name: Cosine Precision@1
213
+ - type: cosine_precision@3
214
+ value: 0.16666666666666663
215
+ name: Cosine Precision@3
216
+ - type: cosine_precision@5
217
+ value: 0.12400000000000003
218
+ name: Cosine Precision@5
219
+ - type: cosine_precision@10
220
+ value: 0.088
221
+ name: Cosine Precision@10
222
+ - type: cosine_recall@1
223
+ value: 0.17600000000000002
224
+ name: Cosine Recall@1
225
+ - type: cosine_recall@3
226
+ value: 0.2701904761904762
227
+ name: Cosine Recall@3
228
+ - type: cosine_recall@5
229
+ value: 0.30707936507936506
230
+ name: Cosine Recall@5
231
+ - type: cosine_recall@10
232
+ value: 0.4077460317460318
233
+ name: Cosine Recall@10
234
+ - type: cosine_ndcg@10
235
+ value: 0.32472050466088326
236
+ name: Cosine Ndcg@10
237
+ - type: cosine_mrr@10
238
+ value: 0.37310317460317455
239
+ name: Cosine Mrr@10
240
+ - type: cosine_map@100
241
+ value: 0.26922263832673005
242
+ name: Cosine Map@100
243
+ - task:
244
+ type: information-retrieval
245
+ name: Information Retrieval
246
+ dataset:
247
+ name: NanoHotpotQA
248
+ type: NanoHotpotQA
249
+ metrics:
250
+ - type: cosine_accuracy@1
251
+ value: 0.52
252
+ name: Cosine Accuracy@1
253
+ - type: cosine_accuracy@3
254
+ value: 0.6
255
+ name: Cosine Accuracy@3
256
+ - type: cosine_accuracy@5
257
+ value: 0.64
258
+ name: Cosine Accuracy@5
259
+ - type: cosine_accuracy@10
260
+ value: 0.72
261
+ name: Cosine Accuracy@10
262
+ - type: cosine_precision@1
263
+ value: 0.52
264
+ name: Cosine Precision@1
265
+ - type: cosine_precision@3
266
+ value: 0.26
267
+ name: Cosine Precision@3
268
+ - type: cosine_precision@5
269
+ value: 0.17199999999999996
270
+ name: Cosine Precision@5
271
+ - type: cosine_precision@10
272
+ value: 0.10800000000000001
273
+ name: Cosine Precision@10
274
+ - type: cosine_recall@1
275
+ value: 0.26
276
+ name: Cosine Recall@1
277
+ - type: cosine_recall@3
278
+ value: 0.39
279
+ name: Cosine Recall@3
280
+ - type: cosine_recall@5
281
+ value: 0.43
282
+ name: Cosine Recall@5
283
+ - type: cosine_recall@10
284
+ value: 0.54
285
+ name: Cosine Recall@10
286
+ - type: cosine_ndcg@10
287
+ value: 0.4795069124741789
288
+ name: Cosine Ndcg@10
289
+ - type: cosine_mrr@10
290
+ value: 0.5758333333333333
291
+ name: Cosine Mrr@10
292
+ - type: cosine_map@100
293
+ value: 0.41822608557151697
294
+ name: Cosine Map@100
295
+ - task:
296
+ type: information-retrieval
297
+ name: Information Retrieval
298
+ dataset:
299
+ name: NanoMSMARCO
300
+ type: NanoMSMARCO
301
+ metrics:
302
+ - type: cosine_accuracy@1
303
+ value: 0.22
304
+ name: Cosine Accuracy@1
305
+ - type: cosine_accuracy@3
306
+ value: 0.36
307
+ name: Cosine Accuracy@3
308
+ - type: cosine_accuracy@5
309
+ value: 0.44
310
+ name: Cosine Accuracy@5
311
+ - type: cosine_accuracy@10
312
+ value: 0.6
313
+ name: Cosine Accuracy@10
314
+ - type: cosine_precision@1
315
+ value: 0.22
316
+ name: Cosine Precision@1
317
+ - type: cosine_precision@3
318
+ value: 0.11999999999999998
319
+ name: Cosine Precision@3
320
+ - type: cosine_precision@5
321
+ value: 0.08800000000000002
322
+ name: Cosine Precision@5
323
+ - type: cosine_precision@10
324
+ value: 0.06000000000000001
325
+ name: Cosine Precision@10
326
+ - type: cosine_recall@1
327
+ value: 0.22
328
+ name: Cosine Recall@1
329
+ - type: cosine_recall@3
330
+ value: 0.36
331
+ name: Cosine Recall@3
332
+ - type: cosine_recall@5
333
+ value: 0.44
334
+ name: Cosine Recall@5
335
+ - type: cosine_recall@10
336
+ value: 0.6
337
+ name: Cosine Recall@10
338
+ - type: cosine_ndcg@10
339
+ value: 0.3845438350481858
340
+ name: Cosine Ndcg@10
341
+ - type: cosine_mrr@10
342
+ value: 0.3190793650793651
343
+ name: Cosine Mrr@10
344
+ - type: cosine_map@100
345
+ value: 0.3335241736823179
346
+ name: Cosine Map@100
347
+ - task:
348
+ type: information-retrieval
349
+ name: Information Retrieval
350
+ dataset:
351
+ name: NanoNFCorpus
352
+ type: NanoNFCorpus
353
+ metrics:
354
+ - type: cosine_accuracy@1
355
+ value: 0.38
356
+ name: Cosine Accuracy@1
357
+ - type: cosine_accuracy@3
358
+ value: 0.54
359
+ name: Cosine Accuracy@3
360
+ - type: cosine_accuracy@5
361
+ value: 0.54
362
+ name: Cosine Accuracy@5
363
+ - type: cosine_accuracy@10
364
+ value: 0.62
365
+ name: Cosine Accuracy@10
366
+ - type: cosine_precision@1
367
+ value: 0.38
368
+ name: Cosine Precision@1
369
+ - type: cosine_precision@3
370
+ value: 0.32666666666666666
371
+ name: Cosine Precision@3
372
+ - type: cosine_precision@5
373
+ value: 0.26
374
+ name: Cosine Precision@5
375
+ - type: cosine_precision@10
376
+ value: 0.214
377
+ name: Cosine Precision@10
378
+ - type: cosine_recall@1
379
+ value: 0.021377385454146837
380
+ name: Cosine Recall@1
381
+ - type: cosine_recall@3
382
+ value: 0.07632083549405319
383
+ name: Cosine Recall@3
384
+ - type: cosine_recall@5
385
+ value: 0.08294525764762037
386
+ name: Cosine Recall@5
387
+ - type: cosine_recall@10
388
+ value: 0.12132329911306272
389
+ name: Cosine Recall@10
390
+ - type: cosine_ndcg@10
391
+ value: 0.2736049434105412
392
+ name: Cosine Ndcg@10
393
+ - type: cosine_mrr@10
394
+ value: 0.4543809523809523
395
+ name: Cosine Mrr@10
396
+ - type: cosine_map@100
397
+ value: 0.10136232337644312
398
+ name: Cosine Map@100
399
+ - task:
400
+ type: information-retrieval
401
+ name: Information Retrieval
402
+ dataset:
403
+ name: NanoNQ
404
+ type: NanoNQ
405
+ metrics:
406
+ - type: cosine_accuracy@1
407
+ value: 0.24
408
+ name: Cosine Accuracy@1
409
+ - type: cosine_accuracy@3
410
+ value: 0.4
411
+ name: Cosine Accuracy@3
412
+ - type: cosine_accuracy@5
413
+ value: 0.56
414
+ name: Cosine Accuracy@5
415
+ - type: cosine_accuracy@10
416
+ value: 0.68
417
+ name: Cosine Accuracy@10
418
+ - type: cosine_precision@1
419
+ value: 0.24
420
+ name: Cosine Precision@1
421
+ - type: cosine_precision@3
422
+ value: 0.13333333333333333
423
+ name: Cosine Precision@3
424
+ - type: cosine_precision@5
425
+ value: 0.11200000000000002
426
+ name: Cosine Precision@5
427
+ - type: cosine_precision@10
428
+ value: 0.07200000000000001
429
+ name: Cosine Precision@10
430
+ - type: cosine_recall@1
431
+ value: 0.22
432
+ name: Cosine Recall@1
433
+ - type: cosine_recall@3
434
+ value: 0.37
435
+ name: Cosine Recall@3
436
+ - type: cosine_recall@5
437
+ value: 0.51
438
+ name: Cosine Recall@5
439
+ - type: cosine_recall@10
440
+ value: 0.65
441
+ name: Cosine Recall@10
442
+ - type: cosine_ndcg@10
443
+ value: 0.42175811202298474
444
+ name: Cosine Ndcg@10
445
+ - type: cosine_mrr@10
446
+ value: 0.3658015873015873
447
+ name: Cosine Mrr@10
448
+ - type: cosine_map@100
449
+ value: 0.35721440136770855
450
+ name: Cosine Map@100
451
+ - task:
452
+ type: information-retrieval
453
+ name: Information Retrieval
454
+ dataset:
455
+ name: NanoQuoraRetrieval
456
+ type: NanoQuoraRetrieval
457
+ metrics:
458
+ - type: cosine_accuracy@1
459
+ value: 0.68
460
+ name: Cosine Accuracy@1
461
+ - type: cosine_accuracy@3
462
+ value: 0.88
463
+ name: Cosine Accuracy@3
464
+ - type: cosine_accuracy@5
465
+ value: 0.9
466
+ name: Cosine Accuracy@5
467
+ - type: cosine_accuracy@10
468
+ value: 0.9
469
+ name: Cosine Accuracy@10
470
+ - type: cosine_precision@1
471
+ value: 0.68
472
+ name: Cosine Precision@1
473
+ - type: cosine_precision@3
474
+ value: 0.3399999999999999
475
+ name: Cosine Precision@3
476
+ - type: cosine_precision@5
477
+ value: 0.21999999999999997
478
+ name: Cosine Precision@5
479
+ - type: cosine_precision@10
480
+ value: 0.11599999999999998
481
+ name: Cosine Precision@10
482
+ - type: cosine_recall@1
483
+ value: 0.5806666666666667
484
+ name: Cosine Recall@1
485
+ - type: cosine_recall@3
486
+ value: 0.8140000000000001
487
+ name: Cosine Recall@3
488
+ - type: cosine_recall@5
489
+ value: 0.8453333333333333
490
+ name: Cosine Recall@5
491
+ - type: cosine_recall@10
492
+ value: 0.866
493
+ name: Cosine Recall@10
494
+ - type: cosine_ndcg@10
495
+ value: 0.7785953864009594
496
+ name: Cosine Ndcg@10
497
+ - type: cosine_mrr@10
498
+ value: 0.775
499
+ name: Cosine Mrr@10
500
+ - type: cosine_map@100
501
+ value: 0.7428386778628464
502
+ name: Cosine Map@100
503
+ - task:
504
+ type: information-retrieval
505
+ name: Information Retrieval
506
+ dataset:
507
+ name: NanoSCIDOCS
508
+ type: NanoSCIDOCS
509
+ metrics:
510
+ - type: cosine_accuracy@1
511
+ value: 0.36
512
+ name: Cosine Accuracy@1
513
+ - type: cosine_accuracy@3
514
+ value: 0.56
515
+ name: Cosine Accuracy@3
516
+ - type: cosine_accuracy@5
517
+ value: 0.7
518
+ name: Cosine Accuracy@5
519
+ - type: cosine_accuracy@10
520
+ value: 0.74
521
+ name: Cosine Accuracy@10
522
+ - type: cosine_precision@1
523
+ value: 0.36
524
+ name: Cosine Precision@1
525
+ - type: cosine_precision@3
526
+ value: 0.26
527
+ name: Cosine Precision@3
528
+ - type: cosine_precision@5
529
+ value: 0.21599999999999997
530
+ name: Cosine Precision@5
531
+ - type: cosine_precision@10
532
+ value: 0.154
533
+ name: Cosine Precision@10
534
+ - type: cosine_recall@1
535
+ value: 0.07466666666666667
536
+ name: Cosine Recall@1
537
+ - type: cosine_recall@3
538
+ value: 0.16266666666666665
539
+ name: Cosine Recall@3
540
+ - type: cosine_recall@5
541
+ value: 0.22266666666666665
542
+ name: Cosine Recall@5
543
+ - type: cosine_recall@10
544
+ value: 0.31566666666666665
545
+ name: Cosine Recall@10
546
+ - type: cosine_ndcg@10
547
+ value: 0.30260137759921313
548
+ name: Cosine Ndcg@10
549
+ - type: cosine_mrr@10
550
+ value: 0.4850238095238095
551
+ name: Cosine Mrr@10
552
+ - type: cosine_map@100
553
+ value: 0.2192397003809301
554
+ name: Cosine Map@100
555
+ - task:
556
+ type: information-retrieval
557
+ name: Information Retrieval
558
+ dataset:
559
+ name: NanoArguAna
560
+ type: NanoArguAna
561
+ metrics:
562
+ - type: cosine_accuracy@1
563
+ value: 0.12
564
+ name: Cosine Accuracy@1
565
+ - type: cosine_accuracy@3
566
+ value: 0.38
567
+ name: Cosine Accuracy@3
568
+ - type: cosine_accuracy@5
569
+ value: 0.46
570
+ name: Cosine Accuracy@5
571
+ - type: cosine_accuracy@10
572
+ value: 0.62
573
+ name: Cosine Accuracy@10
574
+ - type: cosine_precision@1
575
+ value: 0.12
576
+ name: Cosine Precision@1
577
+ - type: cosine_precision@3
578
+ value: 0.12666666666666665
579
+ name: Cosine Precision@3
580
+ - type: cosine_precision@5
581
+ value: 0.09200000000000001
582
+ name: Cosine Precision@5
583
+ - type: cosine_precision@10
584
+ value: 0.06200000000000001
585
+ name: Cosine Precision@10
586
+ - type: cosine_recall@1
587
+ value: 0.12
588
+ name: Cosine Recall@1
589
+ - type: cosine_recall@3
590
+ value: 0.38
591
+ name: Cosine Recall@3
592
+ - type: cosine_recall@5
593
+ value: 0.46
594
+ name: Cosine Recall@5
595
+ - type: cosine_recall@10
596
+ value: 0.62
597
+ name: Cosine Recall@10
598
+ - type: cosine_ndcg@10
599
+ value: 0.35208645349040213
600
+ name: Cosine Ndcg@10
601
+ - type: cosine_mrr@10
602
+ value: 0.26857936507936503
603
+ name: Cosine Mrr@10
604
+ - type: cosine_map@100
605
+ value: 0.2793008453049682
606
+ name: Cosine Map@100
607
+ - task:
608
+ type: information-retrieval
609
+ name: Information Retrieval
610
+ dataset:
611
+ name: NanoSciFact
612
+ type: NanoSciFact
613
+ metrics:
614
+ - type: cosine_accuracy@1
615
+ value: 0.52
616
+ name: Cosine Accuracy@1
617
+ - type: cosine_accuracy@3
618
+ value: 0.66
619
+ name: Cosine Accuracy@3
620
+ - type: cosine_accuracy@5
621
+ value: 0.74
622
+ name: Cosine Accuracy@5
623
+ - type: cosine_accuracy@10
624
+ value: 0.78
625
+ name: Cosine Accuracy@10
626
+ - type: cosine_precision@1
627
+ value: 0.52
628
+ name: Cosine Precision@1
629
+ - type: cosine_precision@3
630
+ value: 0.23333333333333336
631
+ name: Cosine Precision@3
632
+ - type: cosine_precision@5
633
+ value: 0.16
634
+ name: Cosine Precision@5
635
+ - type: cosine_precision@10
636
+ value: 0.08599999999999998
637
+ name: Cosine Precision@10
638
+ - type: cosine_recall@1
639
+ value: 0.485
640
+ name: Cosine Recall@1
641
+ - type: cosine_recall@3
642
+ value: 0.635
643
+ name: Cosine Recall@3
644
+ - type: cosine_recall@5
645
+ value: 0.725
646
+ name: Cosine Recall@5
647
+ - type: cosine_recall@10
648
+ value: 0.76
649
+ name: Cosine Recall@10
650
+ - type: cosine_ndcg@10
651
+ value: 0.6372055531156149
652
+ name: Cosine Ndcg@10
653
+ - type: cosine_mrr@10
654
+ value: 0.6100238095238094
655
+ name: Cosine Mrr@10
656
+ - type: cosine_map@100
657
+ value: 0.5990326492390428
658
+ name: Cosine Map@100
659
+ - task:
660
+ type: information-retrieval
661
+ name: Information Retrieval
662
+ dataset:
663
+ name: NanoTouche2020
664
+ type: NanoTouche2020
665
+ metrics:
666
+ - type: cosine_accuracy@1
667
+ value: 0.5102040816326531
668
+ name: Cosine Accuracy@1
669
+ - type: cosine_accuracy@3
670
+ value: 0.9183673469387755
671
+ name: Cosine Accuracy@3
672
+ - type: cosine_accuracy@5
673
+ value: 0.9387755102040817
674
+ name: Cosine Accuracy@5
675
+ - type: cosine_accuracy@10
676
+ value: 0.9795918367346939
677
+ name: Cosine Accuracy@10
678
+ - type: cosine_precision@1
679
+ value: 0.5102040816326531
680
+ name: Cosine Precision@1
681
+ - type: cosine_precision@3
682
+ value: 0.5306122448979591
683
+ name: Cosine Precision@3
684
+ - type: cosine_precision@5
685
+ value: 0.5142857142857142
686
+ name: Cosine Precision@5
687
+ - type: cosine_precision@10
688
+ value: 0.4448979591836735
689
+ name: Cosine Precision@10
690
+ - type: cosine_recall@1
691
+ value: 0.02749427230935509
692
+ name: Cosine Recall@1
693
+ - type: cosine_recall@3
694
+ value: 0.09510475496599126
695
+ name: Cosine Recall@3
696
+ - type: cosine_recall@5
697
+ value: 0.15543797830995626
698
+ name: Cosine Recall@5
699
+ - type: cosine_recall@10
700
+ value: 0.26176487754214656
701
+ name: Cosine Recall@10
702
+ - type: cosine_ndcg@10
703
+ value: 0.4730551485884738
704
+ name: Cosine Ndcg@10
705
+ - type: cosine_mrr@10
706
+ value: 0.7036281179138323
707
+ name: Cosine Mrr@10
708
+ - type: cosine_map@100
709
+ value: 0.35719386134567377
710
+ name: Cosine Map@100
711
+ - task:
712
+ type: nano-beir
713
+ name: Nano BEIR
714
+ dataset:
715
+ name: NanoBEIR mean
716
+ type: NanoBEIR_mean
717
+ metrics:
718
+ - type: cosine_accuracy@1
719
+ value: 0.39155416012558875
720
+ name: Cosine Accuracy@1
721
+ - type: cosine_accuracy@3
722
+ value: 0.592182103610675
723
+ name: Cosine Accuracy@3
724
+ - type: cosine_accuracy@5
725
+ value: 0.6614442700156987
726
+ name: Cosine Accuracy@5
727
+ - type: cosine_accuracy@10
728
+ value: 0.7538147566718995
729
+ name: Cosine Accuracy@10
730
+ - type: cosine_precision@1
731
+ value: 0.39155416012558875
732
+ name: Cosine Precision@1
733
+ - type: cosine_precision@3
734
+ value: 0.26645735217163785
735
+ name: Cosine Precision@3
736
+ - type: cosine_precision@5
737
+ value: 0.2118681318681319
738
+ name: Cosine Precision@5
739
+ - type: cosine_precision@10
740
+ value: 0.15729984301412875
741
+ name: Cosine Precision@10
742
+ - type: cosine_recall@1
743
+ value: 0.20711346717014628
744
+ name: Cosine Recall@1
745
+ - type: cosine_recall@3
746
+ value: 0.3485272474590046
747
+ name: Cosine Recall@3
748
+ - type: cosine_recall@5
749
+ value: 0.41169196629848837
750
+ name: Cosine Recall@5
751
+ - type: cosine_recall@10
752
+ value: 0.5110785437116987
753
+ name: Cosine Recall@10
754
+ - type: cosine_ndcg@10
755
+ value: 0.4507271303684606
756
+ name: Cosine Ndcg@10
757
+ - type: cosine_mrr@10
758
+ value: 0.5089811616954475
759
+ name: Cosine Mrr@10
760
+ - type: cosine_map@100
761
+ value: 0.36954775425813163
762
+ name: Cosine Map@100
763
  ---
764
+
765
+ ![SSE](assets/SSE_Logo.png)
766
+
767
+ # 🩵 SSE: Stable Static Embedding for Retrieval MRL 日本語バージョン 🩵
768
+ ### **軽量、高速かつ強力な埋め込みモデル**
769
+
770
+ **パフォーマンスの簡易解説**
771
+ このモデルは NanoBEIR_ja(日本語文書検索タスク) において **NDCG@10 = 0.4507** を達成しました。
772
+ このスコアは他の静的埋め込みモデル( [`static-embedding-japanese`](https://huggingface.co/hotchpotch/static-embedding-japanese) (0.4487)など )を上回るパフォーマンスです。
773
+ さらに、**次元数を半分**(512 vs 1024)に抑えています。
774
+ 次元数の削減と、**Separable Dynamic Tanh** により、環境によっては検索速度は **約2倍高速** になっています。
775
+
776
+ | モデル | NanoBEIR NDCG@10 | 次元数 | パラメータ数 | 速度の優位性 | ライセンス |
777
+ |-------|------------------|------------|------------|-----------------|---------|
778
+ | **SSE Retrieval MRL Japanese** | **0.4507** ✨ | **512** | **~17M** 🪽 | **検索が約2倍高速** (超効率的) | Apache 2.0 |
779
+ | `static-embedding-japanese` | 0.4487 | 1024 | ~34M | ベースライン | Apache 2.0 |
780
+
781
+ ---
782
+
783
+ ## 🩵 **SSE Retrieval MRL を選ぶ理由** 🩵
784
+
785
+ ✅ **パラメータ数の小さなモデル (<35M パラメータ) の中では高い性能(NDCG@10)**
786
+ ✅ **約17M のパラメータのみ** :軽量モデルである[ruri-v3-30m](cl-nagoya/ruri-v3-30m) より約43% 小さい。
787
+ ✅ **次元数512の出力** — 次元数1024のモデルよりも豊かな表現力を持ち、[`static-embedding-japanese`](https://huggingface.co/hotchpotch/static-embedding-japanese) の **半分サイズ**
788
+ ✅ **Matryoshka 対応** — 256/128/64/32 に簡単に切り替えられ、性能の緩やかな低下を実現
789
+ ✅ **Apache 2.0 ライセンス** — 商用・個人利用ともに可能
790
+ ✅ **CPU 最適化** — エッジデバイスや限られたハードウェアでも高速に動作
791
+
792
+ ---
793
+
794
+ ## 🩵 モデル詳細 🩵
795
+
796
+ | プロパティ | 値 |
797
+ |----------|-------|
798
+ | **モデルタイプ** | Sentence Transformer (SSE アーキテクチャ) |
799
+ | **最大シーケンス長** | 無制限 |
800
+ | **出力次元** | 512 (Matryoshka により 次元数32 まで削減可能!) |
801
+ | **類似度関数** | コサイン類似度 |
802
+ | **言語** | 日本語 |
803
+ | **ライセンス** | Apache 2.0 |
804
+
805
+ tokenizerは下記を使用させていただきました。
806
+
807
+ hotchpotch/xlm-roberta-japanese-tokenizer
808
+
809
+ ```python
810
+ SentenceTransformer(
811
+ (0): SSE(
812
+ (embedding): EmbeddingBag(32768, 512, mode='mean')
813
+ (dyt): SeparableDyT()
814
+ )
815
+ )
816
+ ```
817
+
818
+ ![Architecture](assets/SSE_Architecture.png)
819
+
820
+ ---
821
+
822
+ ## 🩵 数学的背景 🩵
823
+
824
+ このモデルは静的埋め込みモデルの汎化性能を向上させるために、オリジナルのアーキテクチャである、**SSE: Stable Static Embedding**を採用しています。
825
+ SSEは、EmbeddingBagとSeparable Tanh Normalizationから構成されます。
826
+ Dynamic Tanh Normalization (DyT) は、静的埋め込みにおいて、強度適応型勾配流を可能にします。入力次元 $x$ に対して、DyT は以下のように計算されます。
827
+ $$
828
+ y_k = c_k \tanh(a_k x_k + b_k)
829
+ $$
830
+
831
+ ここで、a, b, c は学習可能なパラメータです。
832
+ すると、$x$ の勾配は以下の通りになります。
833
+
834
+ $$
835
+ \frac{\partial y_k}{\partial x_k} = c_k a_k \, \mathrm{sech}^2(a_k x_k + b_k).
836
+ $$
837
+
838
+ 飽和した次元 $|x| > 1$ の場合
839
+
840
+ $$
841
+ |a_i x_i + b_i| \gg 1
842
+ $$
843
+
844
+ は指数関数的な減衰をもたらします。
845
+
846
+ $$
847
+ \mathrm{sech}^2(z) \sim 4e^{-2|z|}
848
+ $$
849
+
850
+ これにより勾配が抑制され、
851
+
852
+ $$
853
+ \partial y_i / \partial x_i \to 0
854
+ $$
855
+
856
+ となります。
857
+
858
+ 対して、非飽和の次元 $|x| << 1$ の場合、
859
+
860
+ $$
861
+ \mathrm{sech}^2(z) \approx 1
862
+ $$
863
+
864
+ ほぼ一定の勾配を維持します。
865
+
866
+ $$
867
+ \partial y_j / \partial x_j \approx c_j a_j
868
+ $$
869
+
870
+ この強度依存型のゲートは、ノイズが多く大きな成分を持つ次元からの学習信号を減衰させつつ、安定した情報を有する次元については勾配流を維持します。これは明示的なハイパーパラメータなしで、表現空間の汎化性能を高める暗黙的な正則化を可能にします。
871
+
872
+ ---
873
+
874
+ ## 🩵 評価結果 (NanoBEIR_ja) 🩵
875
+
876
+ | データセット | NDCG@10 | MRR@10 | MAP@100 |
877
+ |---------|---------|--------|---------|
878
+ | **NanoBEIR Mean** | **0.4507**✨ | **0.5090** | **0.3695** |
879
+ | NanoClimateFEVER | 0.3110 | 0.4208 | 0.2347 |
880
+ | NanoDBPedia | 0.5596 | 0.7652 | 0.4000 |
881
+ | NanoFEVER | 0.5611 | 0.5003 | 0.4923 |
882
+ | NanoFiQA2018 | 0.3247 | 0.3731 | 0.2692 |
883
+ | NanoHotpotQA | 0.4795 | 0.5758 | 0.4182 |
884
+ | NanoMSMARCO | 0.3845 | 0.3191 | 0.3335 |
885
+ | NanoNFCorpus | 0.2736 | 0.4544 | 0.1014 |
886
+ | NanoNQ | 0.4218 | 0.3658 | 0.3572 |
887
+ | NanoQuoraRetrieval | **0.7786**✨ | **0.7750** | **0.7428** |
888
+ | NanoSCIDOCS | 0.3026 | 0.4850 | 0.2192 |
889
+ | NanoArguAna | 0.3521 | 0.2686 | 0.2793 |
890
+ | NanoSciFact | 0.6372 | 0.6100 | 0.5990 |
891
+ | NanoTouche2020 | 0.4731 | 0.7036 | 0.3572 |
892
+
893
+ ---
894
+
895
+ ## 🩵 使い方 🩵
896
+
897
+ ```python
898
+ import torch
899
+ from sentence_transformers import SentenceTransformer
900
+
901
+ # モデルのロード(リモートコードは有効化)
902
+ model = SentenceTransformer(
903
+ "RikkaBotan/stable-static-embedding-fast-retrieval-mrl-ja",
904
+ trust_remote_code=True,
905
+ device="cuda" if torch.cuda.is_available() else "cpu",
906
+ )
907
+
908
+ # 対象の文章
909
+ sentences = [
910
+ "大規模言語モデルは学習により、高い推論能力を獲得することが可能である。",
911
+ "静的埋め込みモデルは、簡素なアーキテクチャにより、表現空間を高速に生成可能である。"
912
+ ]
913
+
914
+ with torch.no_grad():
915
+ embeddings = model.encode(
916
+ sentences,
917
+ convert_to_tensor=True,
918
+ normalize_embeddings=True,
919
+ batch_size=32
920
+ )
921
+
922
+ # コサイン類似度
923
+ # cosine_sim = embeddings[0] @ embeddings[1].T
924
+ cosine_sim = model.similarity(embeddings, embeddings)
925
+
926
+ print("embeddings shape:", embeddings.shape)
927
+ print("cosine similarity matrix:")
928
+ print(cosine_sim)
929
+ ```
930
+ ---
931
+
932
+ ## 🩵 検索用使用例 🩵
933
+
934
+ ```python
935
+ import torch
936
+ from sentence_transformers import SentenceTransformer
937
+
938
+ # モデルのロード(リモートコードは有効化)
939
+ model = SentenceTransformer(
940
+ "RikkaBotan/stable-static-embedding-fast-retrieval-mrl-ja",
941
+ trust_remote_code=True,
942
+ device="cuda" if torch.cuda.is_available() else "cpu",
943
+ )
944
+
945
+ # 推論
946
+ query = "安定性静的埋め込みモデルとは何ですか?"
947
+ sentences = [
948
+ "安定性静的埋め込みモデルは自己注意機構を必要としません。",
949
+ "安定性静的埋め込みモデルは高速に高精度な埋め込み表現を生成するためのモデルです。",
950
+ "自己注意機構はトークン間の関係性をスコア化する仕組みのことです",
951
+ "昨夜はアイドルの曲を聴きながらお菓子作りをしていました。",
952
+ "言語モデルは一般的に、次時刻のトークンを予測するという学習が行われます。",
953
+ "お気に入りのヘアアクセサリーを身に着けると、とてもテンションが上がるよね。",
954
+ ]
955
+
956
+
957
+ with torch.no_grad():
958
+ embeddings = model.encode(
959
+ [query] + sentences,
960
+ convert_to_tensor=True,
961
+ normalize_embeddings=True,
962
+ batch_size=32
963
+ )
964
+
965
+ print("embeddings shape:", embeddings.shape)
966
+
967
+ # コサイン類似度
968
+ similarities = model.similarity(embeddings[0], embeddings[1:])
969
+ for i, similarity in enumerate(similarities[0].tolist()):
970
+ print(f"{similarity:.05f}: {sentences[i]}")
971
+ ```
972
+
973
+ ---
974
+
975
+ ## 🩵 学習時のハイパーパラメータ 🩵
976
+
977
+ #### デフォルトと異なる設定
978
+
979
+ - `eval_strategy`: steps
980
+ - `per_device_train_batch_size`: 3072
981
+ - `gradient_accumulation_steps`: 10
982
+ - `learning_rate`: 0.1
983
+ - `adam_epsilon`: 1e-10
984
+ - `num_train_epochs`: 2
985
+ - `lr_scheduler_type`: cosine
986
+ - `warmup_ratio`: 0.02
987
+ - `bf16`: True
988
+ - `dataloader_num_workers`: 4
989
+ - `batch_sampler`: no_duplicates
990
+
991
+ ---
992
+
993
+ ## 🩵 学習データセット 🩵
994
+
995
+ 下記の**14個のデータセット**を使用しました。
996
+
997
+ | Dataset |
998
+ |---------|
999
+ | `hpprc_emb__auto-wiki-nli-triplet` |
1000
+ | `hpprc_emb__jqara` |
1001
+ | `hpprc_emb__jagovfaqs` |
1002
+ | `hpprc_emb__jsquad` |
1003
+ | `hpprc_emb__jaquad` |
1004
+ | `hpprc_emb__mkqa-triplet` |
1005
+ | `hpprc_llmjp-kaken` |
1006
+ | `hpprc_msmarco_ja` |
1007
+ | `hpprc_emb__auto-wiki-qa-nemotron` |
1008
+ | `mldr_ja` |
1009
+ | `mrtydi_ja` |
1010
+ | `miracl_ja` |
1011
+ | `mmarco_ja` |
1012
+ | `mmarco_ja_hard` |
1013
+
1014
+ **MatryoshkaLoss**を用いて学習を行っています。
1015
+
1016
+ ## 🩵 学習結果 🩵
1017
+
1018
+ ![loss](assets/SSE_loss.png)
1019
+
1020
+ ![ndcg](assets/SSE_ndcg.png)
1021
+
1022
+ ## 🩵 作成者:六花牡丹(りっかぼたん) 🩵
1023
+
1024
+ おっとりで甘えん坊な研究者見習い。
1025
+ 言語モデルに関するものが主な研究分野です。
1026
+ お仕事のご依頼・登壇依頼・執筆依頼に関しては、下記までご連絡ください。
1027
+
1028
+ X(Twitter):
1029
+ https://twitter.com/peony__snow
1030
+
1031
+ ![Logo](assets/RikkaBotan_Logo.png)
1032
+
1033
+ ## 🩵 謝辞 🩵
1034
+
1035
+ このモデルの学習のための計算リソースの一部は、Saldraさん、Witnessさん、Lumina Logic Minds社から提供いただきました。貴重なサポートに感謝いたします。
1036
+
1037
+ sentence-transformers、python、pytorchを使用させていただきました。
1038
+ 作成・メンテンナンスしてくださっている皆様に感謝いたします。
1039
+
1040
+ 何よりも、このモデルにご興味を持ってくださりありがとうございます。
1041
+
1042
+ ## 🩵 引用 🩵
1043
+
1044
+ ### BibTeX
1045
+
1046
+ #### Sentence Transformers
1047
+ ```bibtex
1048
+ @inproceedings{reimers-2019-sentence-bert,
1049
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1050
+ author = "Reimers, Nils and Gurevych, Iryna",
1051
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1052
+ month = "11",
1053
+ year = "2019",
1054
+ publisher = "Association for Computational Linguistics",
1055
+ url = "https://arxiv.org/abs/1908.10084",
1056
+ }
1057
+ ```
1058
+
1059
+ #### MatryoshkaLoss
1060
+ ```bibtex
1061
+ @misc{kusupati2024matryoshka,
1062
+ title={Matryoshka Representation Learning},
1063
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
1064
+ year={2024},
1065
+ eprint={2205.13147},
1066
+ archivePrefix={arXiv},
1067
+ primaryClass={cs.LG}
1068
+ }
1069
+ ```
1070
+
1071
+ #### MultipleNegativesRankingLoss
1072
+ ```bibtex
1073
+ @misc{henderson2017efficient,
1074
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1075
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1076
+ year={2017},
1077
+ eprint={1705.00652},
1078
+ archivePrefix={arXiv},
1079
+ primaryClass={cs.CL}
1080
+ }
1081
+ ```
SSE.py ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ coding = utf-8
3
+ Copyright 2026 Rikka Botan. All rights reserved
4
+ Licensed under "MIT License"
5
+ Stable Static Embedding official PyTorch implementation
6
+ """
7
+
8
+ from __future__ import annotations
9
+ import os
10
+ from pathlib import Path
11
+ from safetensors.torch import save_file as save_safetensors_file
12
+ import torch
13
+ import torch.nn as nn
14
+ import torch.nn.functional as F
15
+ import numpy as np
16
+ from typing import Dict
17
+ from dataclasses import dataclass
18
+ from tokenizers import Tokenizer
19
+ from transformers import PreTrainedTokenizerFast
20
+ from sentence_transformers.models.InputModule import InputModule
21
+
22
+
23
+ class SeparableDyT(nn.Module):
24
+ def __init__(
25
+ self,
26
+ hidden_dim: int,
27
+ alpha_init: float = 0.5
28
+ ):
29
+ super().__init__()
30
+ self.alpha = nn.Parameter(alpha_init*torch.ones(hidden_dim))
31
+ self.beta = nn.Parameter(torch.ones(hidden_dim))
32
+ self.bias = nn.Parameter(torch.zeros(hidden_dim))
33
+
34
+ def forward(
35
+ self,
36
+ x: torch.Tensor
37
+ ) -> torch.Tensor:
38
+ x = self.beta * F.tanh(self.alpha * x + self.bias)
39
+ return x
40
+
41
+
42
+ class SSE(InputModule):
43
+ """
44
+ Stable Static Embedding (SSE)
45
+ StaticEmbedding-compatible Sentence-Transformers module
46
+ """
47
+
48
+ def __init__(
49
+ self,
50
+ tokenizer: Tokenizer | PreTrainedTokenizerFast,
51
+ vocab_size: int,
52
+ hidden_dim: int = 1024,
53
+ **kwargs,
54
+ ):
55
+ super().__init__()
56
+
57
+ if isinstance(tokenizer, PreTrainedTokenizerFast):
58
+ tokenizer = tokenizer._tokenizer
59
+ elif not isinstance(tokenizer, Tokenizer):
60
+ raise ValueError("Tokenizer must be a fast (Rust) tokenizer")
61
+
62
+ self.tokenizer: Tokenizer = tokenizer
63
+ self.tokenizer.no_padding()
64
+
65
+ self.embedding = nn.EmbeddingBag(vocab_size, hidden_dim)
66
+ self.dyt = SeparableDyT(hidden_dim)
67
+
68
+ self.embedding_dim = hidden_dim
69
+
70
+ # For model card compatibility
71
+ self.base_model = kwargs.get("base_model", None)
72
+
73
+ # Tokenization (StaticEmbedding-compatible)
74
+ def tokenize(
75
+ self,
76
+ texts: list[str],
77
+ **kwargs
78
+ ) -> dict[str, torch.Tensor]:
79
+ encodings = self.tokenizer.encode_batch(texts, add_special_tokens=False)
80
+ encodings_ids = [encoding.ids for encoding in encodings]
81
+
82
+ offsets = torch.from_numpy(
83
+ np.cumsum(
84
+ [0] + [len(token_ids) for token_ids in encodings_ids[:-1]]
85
+ )
86
+ )
87
+ input_ids = torch.tensor(
88
+ [token_id for token_ids in encodings_ids for token_id in token_ids],
89
+ dtype=torch.long
90
+ )
91
+ return {
92
+ "input_ids": input_ids,
93
+ "offsets": offsets
94
+ }
95
+
96
+ # Forward
97
+ def forward(
98
+ self,
99
+ features: Dict[str, torch.Tensor],
100
+ **kwargs,
101
+ ) -> Dict[str, torch.Tensor]:
102
+ x = self.embedding(features["input_ids"], features["offsets"])
103
+ x = self.dyt(x)
104
+ features["sentence_embedding"] = x
105
+ return features
106
+
107
+ # Required APIs
108
+ def get_sentence_embedding_dimension(self) -> int:
109
+ return self.embedding_dim
110
+
111
+ @property
112
+ def max_seq_length(self) -> int:
113
+ return torch.inf
114
+
115
+ def save(
116
+ self,
117
+ output_path: str,
118
+ *args,
119
+ safe_serialization: bool = True,
120
+ **kwargs,
121
+ ) -> None:
122
+ os.makedirs(output_path, exist_ok=True)
123
+
124
+ if safe_serialization:
125
+ save_safetensors_file(
126
+ self.state_dict(),
127
+ os.path.join(output_path, "model.safetensors"),
128
+ )
129
+ else:
130
+ torch.save(
131
+ self.state_dict(),
132
+ os.path.join(output_path, "pytorch_model.bin"),
133
+ )
134
+
135
+ self.tokenizer.save(
136
+ str(Path(output_path) / "tokenizer.json")
137
+ )
138
+
139
+ @classmethod
140
+ def load(
141
+ cls,
142
+ model_name_or_path: str,
143
+ **kwargs,
144
+ ):
145
+ allowed_keys = {
146
+ "cache_dir",
147
+ "local_files_only",
148
+ "force_download",
149
+ }
150
+ filtered_kwargs = {
151
+ k: v for k, v in kwargs.items() if k in allowed_keys
152
+ }
153
+
154
+ tokenizer_path = cls.load_file_path(
155
+ model_name_or_path,
156
+ filename="tokenizer.json",
157
+ **filtered_kwargs,
158
+ )
159
+ tokenizer = Tokenizer.from_file(tokenizer_path)
160
+
161
+ weights = cls.load_torch_weights(
162
+ model_name_or_path=model_name_or_path,
163
+ **filtered_kwargs,
164
+ )
165
+
166
+ hidden_dim = weights["embedding.weight"].size(1)
167
+ vocab_size = weights["embedding.weight"].size(0)
168
+
169
+ model = cls(
170
+ tokenizer=tokenizer,
171
+ vocab_size=vocab_size,
172
+ hidden_dim=hidden_dim,
173
+ )
174
+
175
+ model.load_state_dict(weights)
176
+ return model
177
+
178
+
179
+ @dataclass
180
+ class SSESforzandoConfig:
181
+ hidden_dim: int = 512
182
+ vocab_size: int = 30522
183
+
184
+
185
+ @dataclass
186
+ class SSEForzandoConfig:
187
+ hidden_dim: int = 384
188
+ vocab_size: int = 30522
189
+
190
+ @dataclass
191
+ class SSESforzandoBiConfig:
192
+ hidden_dim: int = 512
193
+ vocab_size: int = 96867
194
+
195
+ @dataclass
196
+ class SSEForzandoBiConfig:
197
+ hidden_dim: int = 384
198
+ vocab_size: int = 96867
199
+
200
+ @dataclass
201
+ class SSESforzandoJConfig:
202
+ hidden_dim: int = 512
203
+ vocab_size: int = 32768
204
+
205
+ @dataclass
206
+ class SSEForzandoJConfig:
207
+ hidden_dim: int = 384
208
+ vocab_size: int = 32768
assets/RikkaBotan_Logo.png ADDED

Git LFS Details

  • SHA256: 9a041048155db5aad3010880943f95b632bb8754ff86e2c4a4f696697a4c2bce
  • Pointer size: 131 Bytes
  • Size of remote file: 724 kB
assets/SSE_Architecture.png ADDED

Git LFS Details

  • SHA256: 11482c498cbf27183f20969f900dbe7294281e018652caceab53e72fe3fa5c28
  • Pointer size: 131 Bytes
  • Size of remote file: 309 kB
assets/SSE_Logo.png ADDED

Git LFS Details

  • SHA256: e63bf0f4d9efd8f1a81065392c1f1aec196ab128a340d6cf8cedb609f6fc55ca
  • Pointer size: 132 Bytes
  • Size of remote file: 1.76 MB
assets/SSE_loss.png ADDED

Git LFS Details

  • SHA256: f532768d8a653999ebdf574a4642a742047f2dfa7121fe95cbd31099b2f15a6b
  • Pointer size: 131 Bytes
  • Size of remote file: 245 kB
assets/SSE_ndcg.png ADDED

Git LFS Details

  • SHA256: 226203e16787c0aa3971c29606b63e8f595d2888f1ced8ac67799ecb79af9446
  • Pointer size: 131 Bytes
  • Size of remote file: 258 kB