oneryalcin commited on
Commit
5a677dd
·
verified ·
1 Parent(s): b32a9d1

Training in progress, step 285

Browse files
README.md CHANGED
@@ -8,39 +8,49 @@ tags:
8
  - feature-extraction
9
  - generated_from_trainer
10
  - dataset_size:5832592
 
11
  - loss:MultipleNegativesRankingLoss
12
  widget:
13
  - source_sentence: crushing middlegame sacrifice short
14
  sentences:
15
- - themes crushing endgame short moves d8e8 f7g7 a8e4 g7g4
16
- - themes endgame master mate mateIn1 oneMove moves d2c2 h5d1
17
- - themes crushing middlegame sacrifice short moves g6g4 e1e6 f7e6 d2h6
 
 
 
18
  - source_sentence: crushing endgame long
19
  sentences:
20
- - themes crushing endgame long moves e2c2 f5g5 c2g2 g5h6 g2h2 h6g7
21
- - themes advantage endgame long moves d1d5 h4c4 f5d3 c4d5 d3d5 d6d5
22
- - themes endgame master mate mateIn1 oneMove moves c7c6 d6b8
23
- - source_sentence: crushing endgame long
 
 
 
24
  sentences:
25
- - themes advancedPawn crushing endgame hangingPiece long moves e3c1 f8g7 c1b2 g7f6
26
- c2d3 g3g2
27
- - themes crushing endgame long moves f5g4 b8f8 h6h5 e2f4 e4f4 g3f4
28
- - themes backRankMate master mate mateIn2 middlegame sacrifice short moves a3b2
29
- c7d8 f6d8 e3e8
30
- - source_sentence: crushing deflection endgame pawnEndgame veryLong zugzwang
31
  sentences:
32
- - themes backRankMate deflection endgame long mate mateIn3 moves a6c6 f7f2 g1h1
33
- f2f1 c1f1 f8f1
34
- - themes crushing deflection endgame long moves g8f7 c5f8 f7g6 f8e8 c7f7 e8d8
35
- - themes crushing deflection endgame pawnEndgame veryLong zugzwang moves f2f3 d3d4
36
- h2h3 g6g5 f4g5 d4e5 f3e3 e5f5
37
- - source_sentence: capturingDefender crushing endgame long sacrifice
 
 
 
38
  sentences:
39
- - themes crushing fork middlegame short moves d4f4 h4g6 f8f7 g6f4
40
- - themes capturingDefender crushing endgame long sacrifice moves d4a4 g7g3 f3g3
41
- h5h4 b6b3 h4h2
42
- - themes attraction crushing endgame fork long sacrifice moves a6f1 d7f7 g8f7 d6d7
43
- f7f8 d7c8
44
  pipeline_tag: sentence-similarity
45
  library_name: sentence-transformers
46
  metrics:
@@ -54,7 +64,7 @@ metrics:
54
  - cosine_mrr@10
55
  - cosine_map@100
56
  model-index:
57
- - name: Static chess embedding (256d) -- themes/openings <-> positions
58
  results:
59
  - task:
60
  type: information-retrieval
@@ -64,37 +74,71 @@ model-index:
64
  type: chess-ir
65
  metrics:
66
  - type: cosine_accuracy@1
67
- value: 0.025
68
  name: Cosine Accuracy@1
69
  - type: cosine_accuracy@10
70
- value: 0.125
71
  name: Cosine Accuracy@10
72
  - type: cosine_precision@1
73
- value: 0.025
74
  name: Cosine Precision@1
75
  - type: cosine_precision@10
76
- value: 0.016
77
  name: Cosine Precision@10
78
  - type: cosine_recall@1
79
- value: 0.008333333333333333
80
  name: Cosine Recall@1
81
  - type: cosine_recall@10
82
- value: 0.05333333333333333
83
  name: Cosine Recall@10
84
  - type: cosine_ndcg@10
85
- value: 0.03923902062478621
86
  name: Cosine Ndcg@10
87
  - type: cosine_mrr@10
88
- value: 0.053103174603174604
89
  name: Cosine Mrr@10
90
  - type: cosine_map@100
91
- value: 0.03190843674305716
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  name: Cosine Map@100
93
  ---
94
 
95
- # Static chess embedding (256d) -- themes/openings <-> positions
96
 
97
- This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for retrieval.
98
 
99
  ## Model Details
100
 
@@ -102,7 +146,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps
102
  - **Model Type:** Sentence Transformer
103
  <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
104
  - **Maximum Sequence Length:** inf tokens
105
- - **Output Dimensionality:** 256 dimensions
106
  - **Similarity Function:** Cosine Similarity
107
  - **Supported Modality:** Text
108
  <!-- - **Training Dataset:** Unknown -->
@@ -140,22 +184,22 @@ from sentence_transformers import SentenceTransformer
140
  model = SentenceTransformer("oneryalcin/static-embedding-chess")
141
  # Run inference
142
  queries = [
143
- 'capturingDefender crushing endgame long sacrifice',
144
  ]
145
  documents = [
146
- 'themes capturingDefender crushing endgame long sacrifice moves d4a4 g7g3 f3g3 h5h4 b6b3 h4h2',
147
- 'themes crushing fork middlegame short moves d4f4 h4g6 f8f7 g6f4',
148
- 'themes attraction crushing endgame fork long sacrifice moves a6f1 d7f7 g8f7 d6d7 f7f8 d7c8',
149
  ]
150
  query_embeddings = model.encode_query(queries)
151
  document_embeddings = model.encode_document(documents)
152
  print(query_embeddings.shape, document_embeddings.shape)
153
- # [1, 256] [3, 256]
154
 
155
  # Get the similarity scores for the embeddings
156
  similarities = model.similarity(query_embeddings, document_embeddings)
157
  print(similarities)
158
- # tensor([[ 0.9505, -0.1987, 0.4045]])
159
  ```
160
  <!--
161
  ### Direct Usage (Transformers)
@@ -187,20 +231,20 @@ You can finetune this model on your own dataset.
187
 
188
  #### Information Retrieval
189
 
190
- * Dataset: `chess-ir`
191
  * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.sentence_transformer.evaluation.InformationRetrievalEvaluator)
192
 
193
- | Metric | Value |
194
- |:--------------------|:-----------|
195
- | cosine_accuracy@1 | 0.025 |
196
- | cosine_accuracy@10 | 0.125 |
197
- | cosine_precision@1 | 0.025 |
198
- | cosine_precision@10 | 0.016 |
199
- | cosine_recall@1 | 0.0083 |
200
- | cosine_recall@10 | 0.0533 |
201
- | **cosine_ndcg@10** | **0.0392** |
202
- | cosine_mrr@10 | 0.0531 |
203
- | cosine_map@100 | 0.0319 |
204
 
205
  <!--
206
  ## Bias, Risks and Limitations
@@ -223,29 +267,36 @@ You can finetune this model on your own dataset.
223
  * Size: 5,832,592 training samples
224
  * Columns: <code>anchor</code> and <code>positive</code>
225
  * Approximate statistics based on the first 100 samples:
226
- | | anchor | positive |
227
- |:---------|:------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|
228
- | type | string | string |
229
- | modality | text | text |
230
- | details | <ul><li>min: 14 characters</li><li>mean: 45.72 characters</li><li>max: 107 characters</li></ul> | <ul><li>min: 47 characters</li><li>mean: 84.03 characters</li><li>max: 151 characters</li></ul> |
231
  * Samples:
232
- | anchor | positive |
233
- |:-----------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|
234
- | <code>crushing endgame fork short</code> | <code>themes crushing endgame fork short moves f7f6 g5e6 g7h6 e6c5</code> |
235
- | <code>crushing discoveredAttack kingsideAttack middlegame short</code> | <code>themes crushing discoveredAttack kingsideAttack middlegame short moves e4g3 f3g3 f2g3 h5e2</code> |
236
- | <code>crushing middlegame short</code> | <code>themes crushing middlegame short moves d7c8 e2g4 c8c7 c3b5</code> |
237
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
238
  ```json
239
  {
240
- "scale": 20.0,
241
- "similarity_fct": "cos_sim",
242
- "gather_across_devices": false,
243
- "directions": [
244
- "query_to_doc"
 
 
245
  ],
246
- "partition_mode": "joint",
247
- "hardness_mode": null,
248
- "hardness_strength": 0.0
 
 
 
 
 
249
  }
250
  ```
251
 
@@ -254,11 +305,9 @@ You can finetune this model on your own dataset.
254
 
255
  - `per_device_train_batch_size`: 2048
256
  - `num_train_epochs`: 1
257
- - `max_steps`: 500
258
  - `learning_rate`: 0.05
259
  - `warmup_steps`: 0.1
260
  - `weight_decay`: 0.01
261
- - `bf16`: True
262
  - `per_device_eval_batch_size`: 2048
263
  - `push_to_hub`: True
264
  - `hub_model_id`: oneryalcin/static-embedding-chess
@@ -270,7 +319,7 @@ You can finetune this model on your own dataset.
270
 
271
  - `per_device_train_batch_size`: 2048
272
  - `num_train_epochs`: 1
273
- - `max_steps`: 500
274
  - `learning_rate`: 0.05
275
  - `lr_scheduler_type`: linear
276
  - `lr_scheduler_kwargs`: None
@@ -286,7 +335,7 @@ You can finetune this model on your own dataset.
286
  - `average_tokens_across_devices`: True
287
  - `max_grad_norm`: 1.0
288
  - `label_smoothing_factor`: 0.0
289
- - `bf16`: True
290
  - `fp16`: False
291
  - `bf16_full_eval`: False
292
  - `fp16_full_eval`: False
@@ -371,82 +420,32 @@ You can finetune this model on your own dataset.
371
  </details>
372
 
373
  ### Training Logs
374
- | Epoch | Step | Training Loss | chess-ir_cosine_ndcg@10 |
375
- |:------:|:----:|:-------------:|:-----------------------:|
376
- | -1 | -1 | - | 0.0140 |
377
- | 0.0004 | 1 | 2.5397 | - |
378
- | 0.0018 | 5 | 2.5232 | - |
379
- | 0.0035 | 10 | 2.4279 | - |
380
- | 0.0053 | 15 | 2.2911 | - |
381
- | 0.0070 | 20 | 2.0382 | - |
382
- | 0.0088 | 25 | 1.8031 | - |
383
- | 0.0105 | 30 | 1.6873 | - |
384
- | 0.0123 | 35 | 1.5444 | - |
385
- | 0.0140 | 40 | 1.4872 | - |
386
- | 0.0158 | 45 | 1.4555 | - |
387
- | 0.0176 | 50 | 1.4493 | 0.0277 |
388
- | 0.0193 | 55 | 1.4075 | - |
389
- | 0.0211 | 60 | 1.4012 | - |
390
- | 0.0228 | 65 | 1.4055 | - |
391
- | 0.0246 | 70 | 1.3977 | - |
392
- | 0.0263 | 75 | 1.3597 | - |
393
- | 0.0281 | 80 | 1.3765 | - |
394
- | 0.0298 | 85 | 1.3657 | - |
395
- | 0.0316 | 90 | 1.3138 | - |
396
- | 0.0334 | 95 | 1.3596 | - |
397
- | 0.0351 | 100 | 1.3428 | 0.0335 |
398
- | 0.0369 | 105 | 1.3302 | - |
399
- | 0.0386 | 110 | 1.3281 | - |
400
- | 0.0404 | 115 | 1.3520 | - |
401
- | 0.0421 | 120 | 1.3127 | - |
402
- | 0.0439 | 125 | 1.3362 | - |
403
- | 0.0456 | 130 | 1.3174 | - |
404
- | 0.0474 | 135 | 1.3103 | - |
405
- | 0.0492 | 140 | 1.3428 | - |
406
- | 0.0509 | 145 | 1.2886 | - |
407
- | 0.0527 | 150 | 1.2895 | 0.0345 |
408
- | 0.0544 | 155 | 1.3418 | - |
409
- | 0.0562 | 160 | 1.3498 | - |
410
- | 0.0579 | 165 | 1.3033 | - |
411
- | 0.0597 | 170 | 1.2958 | - |
412
- | 0.0614 | 175 | 1.3081 | - |
413
- | 0.0632 | 180 | 1.3154 | - |
414
- | 0.0650 | 185 | 1.3129 | - |
415
- | 0.0667 | 190 | 1.3124 | - |
416
- | 0.0685 | 195 | 1.3237 | - |
417
- | 0.0702 | 200 | 1.3051 | 0.0451 |
418
- | 0.0720 | 205 | 1.2801 | - |
419
- | 0.0737 | 210 | 1.3404 | - |
420
- | 0.0755 | 215 | 1.2916 | - |
421
- | 0.0772 | 220 | 1.2981 | - |
422
- | 0.0790 | 225 | 1.3321 | - |
423
- | 0.0808 | 230 | 1.3369 | - |
424
- | 0.0825 | 235 | 1.3059 | - |
425
- | 0.0843 | 240 | 1.3213 | - |
426
- | 0.0860 | 245 | 1.3127 | - |
427
- | 0.0878 | 250 | 1.2801 | 0.0374 |
428
- | 0.0895 | 255 | 1.2940 | - |
429
- | 0.0913 | 260 | 1.3423 | - |
430
- | 0.0930 | 265 | 1.2860 | - |
431
- | 0.0948 | 270 | 1.3022 | - |
432
- | 0.0966 | 275 | 1.3040 | - |
433
- | 0.0983 | 280 | 1.2921 | - |
434
- | 0.1001 | 285 | 1.2940 | - |
435
- | 0.1018 | 290 | 1.3064 | - |
436
- | 0.1036 | 295 | 1.3042 | - |
437
- | 0.1053 | 300 | 1.3058 | 0.0392 |
438
 
439
 
440
  ### Training Time
441
- - **Training**: 25.3 seconds
442
- - **Evaluation**: 0.3 seconds
443
- - **Total**: 25.5 seconds
444
 
445
  ### Framework Versions
446
  - Python: 3.12.10
447
  - Sentence Transformers: 5.5.0
448
- - Transformers: 5.8.1
449
- - PyTorch: 2.12.0+cu130
450
  - Accelerate: 1.13.0
451
  - Datasets: 4.8.5
452
  - Tokenizers: 0.22.2
@@ -468,6 +467,18 @@ You can finetune this model on your own dataset.
468
  }
469
  ```
470
 
 
 
 
 
 
 
 
 
 
 
 
 
471
  #### MultipleNegativesRankingLoss
472
  ```bibtex
473
  @misc{oord2019representationlearningcontrastivepredictive,
 
8
  - feature-extraction
9
  - generated_from_trainer
10
  - dataset_size:5832592
11
+ - loss:MatryoshkaLoss
12
  - loss:MultipleNegativesRankingLoss
13
  widget:
14
  - source_sentence: crushing middlegame sacrifice short
15
  sentences:
16
+ - themes advantage middlegame short moves f4f7 c4d5 f7d5 b3d5 f4f7+c4d5 c4d5+f7d5
17
+ f7d5+b3d5
18
+ - themes advantage fork middlegame short opening Four Knights Game Four Knights
19
+ Game Italian Variation moves c8f5 d5e7 g8h8 e7f5 c8f5+d5e7 d5e7+g8h8 g8h8+e7f5
20
+ - themes crushing middlegame sacrifice short moves g6g4 e1e6 f7e6 d2h6 g6g4+e1e6
21
+ e1e6+f7e6 f7e6+d2h6
22
  - source_sentence: crushing endgame long
23
  sentences:
24
+ - themes crushing endgame long moves e2c2 f5g5 c2g2 g5h6 g2h2 h6g7 e2c2+f5g5 f5g5+c2g2
25
+ c2g2+g5h6 g5h6+g2h2 g2h2+h6g7
26
+ - themes crushing endgame fork hangingPiece long moves c7c3 b2c3 d5f7 g5g7 f7g7
27
+ f8g7 c7c3+b2c3 b2c3+d5f7 d5f7+g5g7 g5g7+f7g7 f7g7+f8g7
28
+ - themes crushing intermezzo middlegame short moves c5b4 d1d3 f6e7 a3b4 c5b4+d1d3
29
+ d1d3+f6e7 f6e7+a3b4
30
+ - source_sentence: crushing endgame fork short
31
  sentences:
32
+ - themes crushing endgame rookEndgame short skewer moves b4b3 h7h8 f8g7 h8b8 b4b3+h7h8
33
+ h7h8+f8g7 f8g7+h8b8
34
+ - themes crushing endgame fork short moves f2f1 f3d2 f1e2 d2c4 f2f1+f3d2 f3d2+f1e2
35
+ f1e2+d2c4
36
+ - themes mate mateIn1 middlegame oneMove moves d7d6 g3g7 d7d6+g3g7
37
+ - source_sentence: crushing fork middlegame veryLong
38
  sentences:
39
+ - themes crushing endgame fork master short moves f7f5 a6g6 g5g6 h4g6 f7f5+a6g6
40
+ a6g6+g5g6 g5g6+h4g6
41
+ - themes attraction discoveredCheck doubleCheck long mate mateIn3 opening operaMate
42
+ sacrifice opening Bishops Opening Bishops Opening Ponziani Gambit moves h8g8 f6d8
43
+ e8d8 d2g5 d8e8 d1d8 h8g8+f6d8 f6d8+e8d8 e8d8+d2g5 d2g5+d8e8 d8e8+d1d8
44
+ - themes crushing fork middlegame veryLong moves h6h7 e8h5 f3g3 c5e3 h7h8q e3f4
45
+ g3g2 h5g4 g2h1 f4d2 a1g1 g4f3 h6h7+e8h5 e8h5+f3g3 f3g3+c5e3 c5e3+h7h8q h7h8q+e3f4
46
+ e3f4+g3g2 g3g2+h5g4 h5g4+g2h1 g2h1+f4d2 f4d2+a1g1 a1g1+g4f3
47
+ - source_sentence: endgame mate mateIn2 pillsburysMate short
48
  sentences:
49
+ - themes bishopEndgame crushing defensiveMove endgame master short moves g3g4 h5h4
50
+ f4g5 h6g5 g3g4+h5h4 h5h4+f4g5 f4g5+h6g5
51
+ - themes endgame mate mateIn2 pillsburysMate short moves c4e3 b5b8 f5c8 b8c8 c4e3+b5b8
52
+ b5b8+f5c8 f5c8+b8c8
53
+ - themes endgame mate mateIn1 oneMove moves e5f4 g3g1 e5f4+g3g1
54
  pipeline_tag: sentence-similarity
55
  library_name: sentence-transformers
56
  metrics:
 
64
  - cosine_mrr@10
65
  - cosine_map@100
66
  model-index:
67
+ - name: Static chess embedding (512d) -- themes/openings <-> positions
68
  results:
69
  - task:
70
  type: information-retrieval
 
74
  type: chess-ir
75
  metrics:
76
  - type: cosine_accuracy@1
77
+ value: 0.02
78
  name: Cosine Accuracy@1
79
  - type: cosine_accuracy@10
80
+ value: 0.135
81
  name: Cosine Accuracy@10
82
  - type: cosine_precision@1
83
+ value: 0.02
84
  name: Cosine Precision@1
85
  - type: cosine_precision@10
86
+ value: 0.0175
87
  name: Cosine Precision@10
88
  - type: cosine_recall@1
89
+ value: 0.006666666666666666
90
  name: Cosine Recall@1
91
  - type: cosine_recall@10
92
+ value: 0.05833333333333333
93
  name: Cosine Recall@10
94
  - type: cosine_ndcg@10
95
+ value: 0.040260232965004236
96
  name: Cosine Ndcg@10
97
  - type: cosine_mrr@10
98
+ value: 0.05090277777777777
99
  name: Cosine Mrr@10
100
  - type: cosine_map@100
101
+ value: 0.03468285594907049
102
+ name: Cosine Map@100
103
+ - task:
104
+ type: information-retrieval
105
+ name: Information Retrieval
106
+ dataset:
107
+ name: chess ir tokens
108
+ type: chess-ir-tokens
109
+ metrics:
110
+ - type: cosine_accuracy@1
111
+ value: 0.1111111111111111
112
+ name: Cosine Accuracy@1
113
+ - type: cosine_accuracy@10
114
+ value: 0.30158730158730157
115
+ name: Cosine Accuracy@10
116
+ - type: cosine_precision@1
117
+ value: 0.1111111111111111
118
+ name: Cosine Precision@1
119
+ - type: cosine_precision@10
120
+ value: 0.0835978835978836
121
+ name: Cosine Precision@10
122
+ - type: cosine_recall@1
123
+ value: 0.008191309640952804
124
+ name: Cosine Recall@1
125
+ - type: cosine_recall@10
126
+ value: 0.03797928598263959
127
+ name: Cosine Recall@10
128
+ - type: cosine_ndcg@10
129
+ value: 0.0963937043281825
130
+ name: Cosine Ndcg@10
131
+ - type: cosine_mrr@10
132
+ value: 0.16048962794994542
133
+ name: Cosine Mrr@10
134
+ - type: cosine_map@100
135
+ value: 0.05480807151213741
136
  name: Cosine Map@100
137
  ---
138
 
139
+ # Static chess embedding (512d) -- themes/openings <-> positions
140
 
141
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for retrieval.
142
 
143
  ## Model Details
144
 
 
146
  - **Model Type:** Sentence Transformer
147
  <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
148
  - **Maximum Sequence Length:** inf tokens
149
+ - **Output Dimensionality:** 512 dimensions
150
  - **Similarity Function:** Cosine Similarity
151
  - **Supported Modality:** Text
152
  <!-- - **Training Dataset:** Unknown -->
 
184
  model = SentenceTransformer("oneryalcin/static-embedding-chess")
185
  # Run inference
186
  queries = [
187
+ 'endgame mate mateIn2 pillsburysMate short',
188
  ]
189
  documents = [
190
+ 'themes endgame mate mateIn2 pillsburysMate short moves c4e3 b5b8 f5c8 b8c8 c4e3+b5b8 b5b8+f5c8 f5c8+b8c8',
191
+ 'themes bishopEndgame crushing defensiveMove endgame master short moves g3g4 h5h4 f4g5 h6g5 g3g4+h5h4 h5h4+f4g5 f4g5+h6g5',
192
+ 'themes endgame mate mateIn1 oneMove moves e5f4 g3g1 e5f4+g3g1',
193
  ]
194
  query_embeddings = model.encode_query(queries)
195
  document_embeddings = model.encode_document(documents)
196
  print(query_embeddings.shape, document_embeddings.shape)
197
+ # [1, 512] [3, 512]
198
 
199
  # Get the similarity scores for the embeddings
200
  similarities = model.similarity(query_embeddings, document_embeddings)
201
  print(similarities)
202
+ # tensor([[ 0.8014, -0.0485, 0.0709]])
203
  ```
204
  <!--
205
  ### Direct Usage (Transformers)
 
231
 
232
  #### Information Retrieval
233
 
234
+ * Datasets: `chess-ir` and `chess-ir-tokens`
235
  * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.sentence_transformer.evaluation.InformationRetrievalEvaluator)
236
 
237
+ | Metric | chess-ir | chess-ir-tokens |
238
+ |:--------------------|:-----------|:----------------|
239
+ | cosine_accuracy@1 | 0.02 | 0.1111 |
240
+ | cosine_accuracy@10 | 0.135 | 0.3016 |
241
+ | cosine_precision@1 | 0.02 | 0.1111 |
242
+ | cosine_precision@10 | 0.0175 | 0.0836 |
243
+ | cosine_recall@1 | 0.0067 | 0.0082 |
244
+ | cosine_recall@10 | 0.0583 | 0.038 |
245
+ | **cosine_ndcg@10** | **0.0403** | **0.0964** |
246
+ | cosine_mrr@10 | 0.0509 | 0.1605 |
247
+ | cosine_map@100 | 0.0347 | 0.0548 |
248
 
249
  <!--
250
  ## Bias, Risks and Limitations
 
267
  * Size: 5,832,592 training samples
268
  * Columns: <code>anchor</code> and <code>positive</code>
269
  * Approximate statistics based on the first 100 samples:
270
+ | | anchor | positive |
271
+ |:---------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
272
+ | type | string | string |
273
+ | modality | text | text |
274
+ | details | <ul><li>min: 14 characters</li><li>mean: 45.72 characters</li><li>max: 107 characters</li></ul> | <ul><li>min: 61 characters</li><li>mean: 121.98 characters</li><li>max: 233 characters</li></ul> |
275
  * Samples:
276
+ | anchor | positive |
277
+ |:-----------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------|
278
+ | <code>crushing endgame fork short</code> | <code>themes crushing endgame fork short moves f7f6 g5e6 g7h6 e6c5 f7f6+g5e6 g5e6+g7h6 g7h6+e6c5</code> |
279
+ | <code>crushing discoveredAttack kingsideAttack middlegame short</code> | <code>themes crushing discoveredAttack kingsideAttack middlegame short moves e4g3 f3g3 f2g3 h5e2 e4g3+f3g3 f3g3+f2g3 f2g3+h5e2</code> |
280
+ | <code>crushing middlegame short</code> | <code>themes crushing middlegame short moves d7c8 e2g4 c8c7 c3b5 d7c8+e2g4 e2g4+c8c7 c8c7+c3b5</code> |
281
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
282
  ```json
283
  {
284
+ "loss": "MultipleNegativesRankingLoss",
285
+ "matryoshka_dims": [
286
+ 512,
287
+ 256,
288
+ 128,
289
+ 64,
290
+ 32
291
  ],
292
+ "matryoshka_weights": [
293
+ 1,
294
+ 1,
295
+ 1,
296
+ 1,
297
+ 1
298
+ ],
299
+ "n_dims_per_step": -1
300
  }
301
  ```
302
 
 
305
 
306
  - `per_device_train_batch_size`: 2048
307
  - `num_train_epochs`: 1
 
308
  - `learning_rate`: 0.05
309
  - `warmup_steps`: 0.1
310
  - `weight_decay`: 0.01
 
311
  - `per_device_eval_batch_size`: 2048
312
  - `push_to_hub`: True
313
  - `hub_model_id`: oneryalcin/static-embedding-chess
 
319
 
320
  - `per_device_train_batch_size`: 2048
321
  - `num_train_epochs`: 1
322
+ - `max_steps`: -1
323
  - `learning_rate`: 0.05
324
  - `lr_scheduler_type`: linear
325
  - `lr_scheduler_kwargs`: None
 
335
  - `average_tokens_across_devices`: True
336
  - `max_grad_norm`: 1.0
337
  - `label_smoothing_factor`: 0.0
338
+ - `bf16`: False
339
  - `fp16`: False
340
  - `bf16_full_eval`: False
341
  - `fp16_full_eval`: False
 
420
  </details>
421
 
422
  ### Training Logs
423
+ | Epoch | Step | Training Loss | chess-ir_cosine_ndcg@10 | chess-ir-tokens_cosine_ndcg@10 |
424
+ |:------:|:----:|:-------------:|:-----------------------:|:------------------------------:|
425
+ | -1 | -1 | - | 0.0087 | 0.0476 |
426
+ | 0.0004 | 1 | 25.5090 | - | - |
427
+ | 0.0102 | 29 | 24.7398 | - | - |
428
+ | 0.0204 | 58 | 20.8309 | - | - |
429
+ | 0.0305 | 87 | 16.5176 | - | - |
430
+ | 0.0407 | 116 | 12.8534 | - | - |
431
+ | 0.0509 | 145 | 10.2759 | - | - |
432
+ | 0.0611 | 174 | 8.7313 | - | - |
433
+ | 0.0713 | 203 | 7.8373 | - | - |
434
+ | 0.0815 | 232 | 7.3665 | - | - |
435
+ | 0.0916 | 261 | 7.0534 | - | - |
436
+ | 0.1001 | 285 | - | 0.0403 | 0.0964 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
437
 
438
 
439
  ### Training Time
440
+ - **Training**: 16.5 seconds
441
+ - **Evaluation**: 0.1 seconds
442
+ - **Total**: 16.6 seconds
443
 
444
  ### Framework Versions
445
  - Python: 3.12.10
446
  - Sentence Transformers: 5.5.0
447
+ - Transformers: 5.8.0
448
+ - PyTorch: 2.11.0
449
  - Accelerate: 1.13.0
450
  - Datasets: 4.8.5
451
  - Tokenizers: 0.22.2
 
467
  }
468
  ```
469
 
470
+ #### MatryoshkaLoss
471
+ ```bibtex
472
+ @misc{kusupati2024matryoshka,
473
+ title={Matryoshka Representation Learning},
474
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
475
+ year={2024},
476
+ eprint={2205.13147},
477
+ archivePrefix={arXiv},
478
+ primaryClass={cs.LG}
479
+ }
480
+ ```
481
+
482
  #### MultipleNegativesRankingLoss
483
  ```bibtex
484
  @misc{oord2019representationlearningcontrastivepredictive,
chess_tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
config_sentence_transformers.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "__version__": {
3
- "pytorch": "2.12.0+cu130",
4
  "sentence_transformers": "5.5.0",
5
- "transformers": "5.8.1"
6
  },
7
  "default_prompt_name": null,
8
  "model_type": "SentenceTransformer",
 
1
  {
2
  "__version__": {
3
+ "pytorch": "2.11.0",
4
  "sentence_transformers": "5.5.0",
5
+ "transformers": "5.8.0"
6
  },
7
  "default_prompt_name": null,
8
  "model_type": "SentenceTransformer",
eval/Information-Retrieval_evaluation_chess-ir-tokens_results.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ epoch,steps,cosine-Accuracy@1,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
2
+ 0.10007022471910113,285,0.1111111111111111,0.30158730158730157,0.1111111111111111,0.008191309640952804,0.0835978835978836,0.03797928598263959,0.16048962794994542,0.0963937043281825,0.05480807151213741
eval/Information-Retrieval_evaluation_chess-ir_results.csv CHANGED
@@ -1,7 +1,2 @@
1
  epoch,steps,cosine-Accuracy@1,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
2
- 0.0175561797752809,50,0.015,0.115,0.015,0.005,0.013000000000000001,0.04333333333333333,0.03541269841269841,0.02770564804107805,0.021195015342589062
3
- 0.0351123595505618,100,0.015,0.135,0.015,0.005,0.016,0.05333333333333333,0.04136111111111111,0.03352606053277749,0.025214543549657912
4
- 0.05266853932584269,150,0.02,0.12,0.02,0.006666666666666666,0.0155,0.051666666666666666,0.04391468253968253,0.034539315152376744,0.02851338765635309
5
- 0.0702247191011236,200,0.03,0.16,0.03,0.009999999999999998,0.02,0.06666666666666667,0.05857142857142858,0.045080933582823335,0.033163497941181515
6
- 0.0877808988764045,250,0.025,0.14,0.025,0.008333333333333333,0.017,0.056666666666666664,0.049240079365079355,0.037406426241984,0.02874627448743367
7
- 0.10533707865168539,300,0.025,0.125,0.025,0.008333333333333333,0.016,0.05333333333333333,0.053103174603174604,0.03923902062478621,0.03190843674305716
 
1
  epoch,steps,cosine-Accuracy@1,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
2
+ 0.10007022471910113,285,0.02,0.135,0.02,0.006666666666666666,0.0175,0.05833333333333333,0.05090277777777777,0.040260232965004236,0.03468285594907049
 
 
 
 
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a9765f24fcf2a3c67e7e6402e8f6453489a27ae2b99d12929d135e4886f7605e
3
- size 4343904
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c50ae1fdd13646f6ccd8502b934c8d7f1ac91ee33935efefe06cbb8bd4c6cdd4
3
+ size 8880224
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bed99a155dbec21964e026b0647e360c1ee931c7048ba902384bf1f776fcae2c
3
  size 5713
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1f79a123f09dc75fd3488fe5caef388a8c542815dabe7ec16811867955b17a2
3
  size 5713