Training in progress, step 285

Browse files

Files changed (8) hide show

README.md +158 -147
chess_tokenizer.json +0 -0
config_sentence_transformers.json +2 -2
eval/Information-Retrieval_evaluation_chess-ir-tokens_results.csv +2 -0
eval/Information-Retrieval_evaluation_chess-ir_results.csv +1 -6
model.safetensors +2 -2
tokenizer.json +0 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -8,39 +8,49 @@ tags:
 - feature-extraction
 - generated_from_trainer
 - dataset_size:5832592
 - loss:MultipleNegativesRankingLoss
 widget:
 - source_sentence: crushing middlegame sacrifice short
   sentences:
-  - themes crushing endgame short moves d8e8 f7g7 a8e4 g7g4
-  - themes endgame master mate mateIn1 oneMove moves d2c2 h5d1
-  - themes crushing middlegame sacrifice short moves g6g4 e1e6 f7e6 d2h6
 - source_sentence: crushing endgame long
   sentences:
-  - themes crushing endgame long moves e2c2 f5g5 c2g2 g5h6 g2h2 h6g7
-  - themes advantage endgame long moves d1d5 h4c4 f5d3 c4d5 d3d5 d6d5
-  - themes endgame master mate mateIn1 oneMove moves c7c6 d6b8
-- source_sentence: crushing endgame long
   sentences:
-  - themes advancedPawn crushing endgame hangingPiece long moves e3c1 f8g7 c1b2 g7f6
-    c2d3 g3g2
-  - themes crushing endgame long moves f5g4 b8f8 h6h5 e2f4 e4f4 g3f4
-  - themes backRankMate master mate mateIn2 middlegame sacrifice short moves a3b2
-    c7d8 f6d8 e3e8
-- source_sentence: crushing deflection endgame pawnEndgame veryLong zugzwang
   sentences:
-  - themes backRankMate deflection endgame long mate mateIn3 moves a6c6 f7f2 g1h1
-    f2f1 c1f1 f8f1
-  - themes crushing deflection endgame long moves g8f7 c5f8 f7g6 f8e8 c7f7 e8d8
-  - themes crushing deflection endgame pawnEndgame veryLong zugzwang moves f2f3 d3d4
-    h2h3 g6g5 f4g5 d4e5 f3e3 e5f5
-- source_sentence: capturingDefender crushing endgame long sacrifice
   sentences:
-  - themes crushing fork middlegame short moves d4f4 h4g6 f8f7 g6f4
-  - themes capturingDefender crushing endgame long sacrifice moves d4a4 g7g3 f3g3
-    h5h4 b6b3 h4h2
-  - themes attraction crushing endgame fork long sacrifice moves a6f1 d7f7 g8f7 d6d7
-    f7f8 d7c8
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 metrics:
@@ -54,7 +64,7 @@ metrics:
 - cosine_mrr@10
 - cosine_map@100
 model-index:
-- name: Static chess embedding (256d) -- themes/openings <-> positions
   results:
   - task:
       type: information-retrieval
@@ -64,37 +74,71 @@ model-index:
       type: chess-ir
     metrics:
     - type: cosine_accuracy@1
-      value: 0.025
       name: Cosine Accuracy@1
     - type: cosine_accuracy@10
-      value: 0.125
       name: Cosine Accuracy@10
     - type: cosine_precision@1
-      value: 0.025
       name: Cosine Precision@1
     - type: cosine_precision@10
-      value: 0.016
       name: Cosine Precision@10
     - type: cosine_recall@1
-      value: 0.008333333333333333
       name: Cosine Recall@1
     - type: cosine_recall@10
-      value: 0.05333333333333333
       name: Cosine Recall@10
     - type: cosine_ndcg@10
-      value: 0.03923902062478621
       name: Cosine Ndcg@10
     - type: cosine_mrr@10
-      value: 0.053103174603174604
       name: Cosine Mrr@10
     - type: cosine_map@100
-      value: 0.03190843674305716
       name: Cosine Map@100
 ---
-# Static chess embedding (256d) -- themes/openings <-> positions
-This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for retrieval.
 ## Model Details
@@ -102,7 +146,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps
 - **Model Type:** Sentence Transformer
 <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
 - **Maximum Sequence Length:** inf tokens
-- **Output Dimensionality:** 256 dimensions
 - **Similarity Function:** Cosine Similarity
 - **Supported Modality:** Text
 <!-- - **Training Dataset:** Unknown -->
@@ -140,22 +184,22 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("oneryalcin/static-embedding-chess")
 # Run inference
 queries = [
-    'capturingDefender crushing endgame long sacrifice',
 ]
 documents = [
-    'themes capturingDefender crushing endgame long sacrifice moves d4a4 g7g3 f3g3 h5h4 b6b3 h4h2',
-    'themes crushing fork middlegame short moves d4f4 h4g6 f8f7 g6f4',
-    'themes attraction crushing endgame fork long sacrifice moves a6f1 d7f7 g8f7 d6d7 f7f8 d7c8',
 ]
 query_embeddings = model.encode_query(queries)
 document_embeddings = model.encode_document(documents)
 print(query_embeddings.shape, document_embeddings.shape)
-# [1, 256] [3, 256]
 # Get the similarity scores for the embeddings
 similarities = model.similarity(query_embeddings, document_embeddings)
 print(similarities)
-# tensor([[ 0.9505, -0.1987,  0.4045]])
 ```
 <!--
 ### Direct Usage (Transformers)
@@ -187,20 +231,20 @@ You can finetune this model on your own dataset.
 #### Information Retrieval
-* Dataset: `chess-ir`
 * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.sentence_transformer.evaluation.InformationRetrievalEvaluator)
-| Metric              | Value      |
-|:--------------------|:-----------|
-| cosine_accuracy@1   | 0.025      |
-| cosine_accuracy@10  | 0.125      |
-| cosine_precision@1  | 0.025      |
-| cosine_precision@10 | 0.016      |
-| cosine_recall@1     | 0.0083     |
-| cosine_recall@10    | 0.0533     |
-| **cosine_ndcg@10**  | **0.0392** |
-| cosine_mrr@10       | 0.0531     |
-| cosine_map@100      | 0.0319     |
 <!--
 ## Bias, Risks and Limitations
@@ -223,29 +267,36 @@ You can finetune this model on your own dataset.
 * Size: 5,832,592 training samples
 * Columns: <code>anchor</code> and <code>positive</code>
 * Approximate statistics based on the first 100 samples:
-  |          | anchor                                                                                          | positive                                                                                        |
-  |:---------|:------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|
-  | type     | string                                                                                          | string                                                                                          |
-  | modality | text                                                                                            | text                                                                                            |
-  | details  | <ul><li>min: 14 characters</li><li>mean: 45.72 characters</li><li>max: 107 characters</li></ul> | <ul><li>min: 47 characters</li><li>mean: 84.03 characters</li><li>max: 151 characters</li></ul> |
 * Samples:
-  | anchor                                                                 | positive                                                                                                |
-  |:-----------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------|
-  | <code>crushing endgame fork short</code>                               | <code>themes crushing endgame fork short moves f7f6 g5e6 g7h6 e6c5</code>                               |
-  | <code>crushing discoveredAttack kingsideAttack middlegame short</code> | <code>themes crushing discoveredAttack kingsideAttack middlegame short moves e4g3 f3g3 f2g3 h5e2</code> |
-  | <code>crushing middlegame short</code>                                 | <code>themes crushing middlegame short moves d7c8 e2g4 c8c7 c3b5</code>                                 |
-* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
-      "scale": 20.0,
-      "similarity_fct": "cos_sim",
-      "gather_across_devices": false,
-      "directions": [
-          "query_to_doc"
       ],
-      "partition_mode": "joint",
-      "hardness_mode": null,
-      "hardness_strength": 0.0
   }
   ```
@@ -254,11 +305,9 @@ You can finetune this model on your own dataset.
 - `per_device_train_batch_size`: 2048
 - `num_train_epochs`: 1
-- `max_steps`: 500
 - `learning_rate`: 0.05
 - `warmup_steps`: 0.1
 - `weight_decay`: 0.01
-- `bf16`: True
 - `per_device_eval_batch_size`: 2048
 - `push_to_hub`: True
 - `hub_model_id`: oneryalcin/static-embedding-chess
@@ -270,7 +319,7 @@ You can finetune this model on your own dataset.
 - `per_device_train_batch_size`: 2048
 - `num_train_epochs`: 1
-- `max_steps`: 500
 - `learning_rate`: 0.05
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: None
@@ -286,7 +335,7 @@ You can finetune this model on your own dataset.
 - `average_tokens_across_devices`: True
 - `max_grad_norm`: 1.0
 - `label_smoothing_factor`: 0.0
-- `bf16`: True
 - `fp16`: False
 - `bf16_full_eval`: False
 - `fp16_full_eval`: False
@@ -371,82 +420,32 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch  | Step | Training Loss | chess-ir_cosine_ndcg@10 |
-|:------:|:----:|:-------------:|:-----------------------:|
-| -1     | -1   | -             | 0.0140                  |
-| 0.0004 | 1    | 2.5397        | -                       |
-| 0.0018 | 5    | 2.5232        | -                       |
-| 0.0035 | 10   | 2.4279        | -                       |
-| 0.0053 | 15   | 2.2911        | -                       |
-| 0.0070 | 20   | 2.0382        | -                       |
-| 0.0088 | 25   | 1.8031        | -                       |
-| 0.0105 | 30   | 1.6873        | -                       |
-| 0.0123 | 35   | 1.5444        | -                       |
-| 0.0140 | 40   | 1.4872        | -                       |
-| 0.0158 | 45   | 1.4555        | -                       |
-| 0.0176 | 50   | 1.4493        | 0.0277                  |
-| 0.0193 | 55   | 1.4075        | -                       |
-| 0.0211 | 60   | 1.4012        | -                       |
-| 0.0228 | 65   | 1.4055        | -                       |
-| 0.0246 | 70   | 1.3977        | -                       |
-| 0.0263 | 75   | 1.3597        | -                       |
-| 0.0281 | 80   | 1.3765        | -                       |
-| 0.0298 | 85   | 1.3657        | -                       |
-| 0.0316 | 90   | 1.3138        | -                       |
-| 0.0334 | 95   | 1.3596        | -                       |
-| 0.0351 | 100  | 1.3428        | 0.0335                  |
-| 0.0369 | 105  | 1.3302        | -                       |
-| 0.0386 | 110  | 1.3281        | -                       |
-| 0.0404 | 115  | 1.3520        | -                       |
-| 0.0421 | 120  | 1.3127        | -                       |
-| 0.0439 | 125  | 1.3362        | -                       |
-| 0.0456 | 130  | 1.3174        | -                       |
-| 0.0474 | 135  | 1.3103        | -                       |
-| 0.0492 | 140  | 1.3428        | -                       |
-| 0.0509 | 145  | 1.2886        | -                       |
-| 0.0527 | 150  | 1.2895        | 0.0345                  |
-| 0.0544 | 155  | 1.3418        | -                       |
-| 0.0562 | 160  | 1.3498        | -                       |
-| 0.0579 | 165  | 1.3033        | -                       |
-| 0.0597 | 170  | 1.2958        | -                       |
-| 0.0614 | 175  | 1.3081        | -                       |
-| 0.0632 | 180  | 1.3154        | -                       |
-| 0.0650 | 185  | 1.3129        | -                       |
-| 0.0667 | 190  | 1.3124        | -                       |
-| 0.0685 | 195  | 1.3237        | -                       |
-| 0.0702 | 200  | 1.3051        | 0.0451                  |
-| 0.0720 | 205  | 1.2801        | -                       |
-| 0.0737 | 210  | 1.3404        | -                       |
-| 0.0755 | 215  | 1.2916        | -                       |
-| 0.0772 | 220  | 1.2981        | -                       |
-| 0.0790 | 225  | 1.3321        | -                       |
-| 0.0808 | 230  | 1.3369        | -                       |
-| 0.0825 | 235  | 1.3059        | -                       |
-| 0.0843 | 240  | 1.3213        | -                       |
-| 0.0860 | 245  | 1.3127        | -                       |
-| 0.0878 | 250  | 1.2801        | 0.0374                  |
-| 0.0895 | 255  | 1.2940        | -                       |
-| 0.0913 | 260  | 1.3423        | -                       |
-| 0.0930 | 265  | 1.2860        | -                       |
-| 0.0948 | 270  | 1.3022        | -                       |
-| 0.0966 | 275  | 1.3040        | -                       |
-| 0.0983 | 280  | 1.2921        | -                       |
-| 0.1001 | 285  | 1.2940        | -                       |
-| 0.1018 | 290  | 1.3064        | -                       |
-| 0.1036 | 295  | 1.3042        | -                       |
-| 0.1053 | 300  | 1.3058        | 0.0392                  |
 ### Training Time
-- **Training**: 25.3 seconds
-- **Evaluation**: 0.3 seconds
-- **Total**: 25.5 seconds
 ### Framework Versions
 - Python: 3.12.10
 - Sentence Transformers: 5.5.0
-- Transformers: 5.8.1
-- PyTorch: 2.12.0+cu130
 - Accelerate: 1.13.0
 - Datasets: 4.8.5
 - Tokenizers: 0.22.2
@@ -468,6 +467,18 @@ You can finetune this model on your own dataset.
 }
 ```
 #### MultipleNegativesRankingLoss
 ```bibtex
 @misc{oord2019representationlearningcontrastivepredictive,

 - feature-extraction
 - generated_from_trainer
 - dataset_size:5832592
+- loss:MatryoshkaLoss
 - loss:MultipleNegativesRankingLoss
 widget:
 - source_sentence: crushing middlegame sacrifice short
   sentences:
+  - themes advantage middlegame short moves f4f7 c4d5 f7d5 b3d5 f4f7+c4d5 c4d5+f7d5
+    f7d5+b3d5
+  - themes advantage fork middlegame short opening Four Knights Game Four Knights
+    Game Italian Variation moves c8f5 d5e7 g8h8 e7f5 c8f5+d5e7 d5e7+g8h8 g8h8+e7f5
+  - themes crushing middlegame sacrifice short moves g6g4 e1e6 f7e6 d2h6 g6g4+e1e6
+    e1e6+f7e6 f7e6+d2h6
 - source_sentence: crushing endgame long
   sentences:
+  - themes crushing endgame long moves e2c2 f5g5 c2g2 g5h6 g2h2 h6g7 e2c2+f5g5 f5g5+c2g2
+    c2g2+g5h6 g5h6+g2h2 g2h2+h6g7
+  - themes crushing endgame fork hangingPiece long moves c7c3 b2c3 d5f7 g5g7 f7g7
+    f8g7 c7c3+b2c3 b2c3+d5f7 d5f7+g5g7 g5g7+f7g7 f7g7+f8g7
+  - themes crushing intermezzo middlegame short moves c5b4 d1d3 f6e7 a3b4 c5b4+d1d3
+    d1d3+f6e7 f6e7+a3b4
+- source_sentence: crushing endgame fork short
   sentences:
+  - themes crushing endgame rookEndgame short skewer moves b4b3 h7h8 f8g7 h8b8 b4b3+h7h8
+    h7h8+f8g7 f8g7+h8b8
+  - themes crushing endgame fork short moves f2f1 f3d2 f1e2 d2c4 f2f1+f3d2 f3d2+f1e2
+    f1e2+d2c4
+  - themes mate mateIn1 middlegame oneMove moves d7d6 g3g7 d7d6+g3g7
+- source_sentence: crushing fork middlegame veryLong
   sentences:
+  - themes crushing endgame fork master short moves f7f5 a6g6 g5g6 h4g6 f7f5+a6g6
+    a6g6+g5g6 g5g6+h4g6
+  - themes attraction discoveredCheck doubleCheck long mate mateIn3 opening operaMate
+    sacrifice opening Bishops Opening Bishops Opening Ponziani Gambit moves h8g8 f6d8
+    e8d8 d2g5 d8e8 d1d8 h8g8+f6d8 f6d8+e8d8 e8d8+d2g5 d2g5+d8e8 d8e8+d1d8
+  - themes crushing fork middlegame veryLong moves h6h7 e8h5 f3g3 c5e3 h7h8q e3f4
+    g3g2 h5g4 g2h1 f4d2 a1g1 g4f3 h6h7+e8h5 e8h5+f3g3 f3g3+c5e3 c5e3+h7h8q h7h8q+e3f4
+    e3f4+g3g2 g3g2+h5g4 h5g4+g2h1 g2h1+f4d2 f4d2+a1g1 a1g1+g4f3
+- source_sentence: endgame mate mateIn2 pillsburysMate short
   sentences:
+  - themes bishopEndgame crushing defensiveMove endgame master short moves g3g4 h5h4
+    f4g5 h6g5 g3g4+h5h4 h5h4+f4g5 f4g5+h6g5
+  - themes endgame mate mateIn2 pillsburysMate short moves c4e3 b5b8 f5c8 b8c8 c4e3+b5b8
+    b5b8+f5c8 f5c8+b8c8
+  - themes endgame mate mateIn1 oneMove moves e5f4 g3g1 e5f4+g3g1
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 metrics:
 - cosine_mrr@10
 - cosine_map@100
 model-index:
+- name: Static chess embedding (512d) -- themes/openings <-> positions
   results:
   - task:
       type: information-retrieval
       type: chess-ir
     metrics:
     - type: cosine_accuracy@1
+      value: 0.02
       name: Cosine Accuracy@1
     - type: cosine_accuracy@10
+      value: 0.135
       name: Cosine Accuracy@10
     - type: cosine_precision@1
+      value: 0.02
       name: Cosine Precision@1
     - type: cosine_precision@10
+      value: 0.0175
       name: Cosine Precision@10
     - type: cosine_recall@1
+      value: 0.006666666666666666
       name: Cosine Recall@1
     - type: cosine_recall@10
+      value: 0.05833333333333333
       name: Cosine Recall@10
     - type: cosine_ndcg@10
+      value: 0.040260232965004236
       name: Cosine Ndcg@10
     - type: cosine_mrr@10
+      value: 0.05090277777777777
       name: Cosine Mrr@10
     - type: cosine_map@100
+      value: 0.03468285594907049
+      name: Cosine Map@100
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: chess ir tokens
+      type: chess-ir-tokens
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.1111111111111111
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@10
+      value: 0.30158730158730157
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.1111111111111111
+      name: Cosine Precision@1
+    - type: cosine_precision@10
+      value: 0.0835978835978836
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.008191309640952804
+      name: Cosine Recall@1
+    - type: cosine_recall@10
+      value: 0.03797928598263959
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.0963937043281825
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.16048962794994542
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.05480807151213741
       name: Cosine Map@100
 ---
+# Static chess embedding (512d) -- themes/openings <-> positions
+This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for retrieval.
 ## Model Details
 - **Model Type:** Sentence Transformer
 <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
 - **Maximum Sequence Length:** inf tokens
+- **Output Dimensionality:** 512 dimensions
 - **Similarity Function:** Cosine Similarity
 - **Supported Modality:** Text
 <!-- - **Training Dataset:** Unknown -->
 model = SentenceTransformer("oneryalcin/static-embedding-chess")
 # Run inference
 queries = [
+    'endgame mate mateIn2 pillsburysMate short',
 ]
 documents = [
+    'themes endgame mate mateIn2 pillsburysMate short moves c4e3 b5b8 f5c8 b8c8 c4e3+b5b8 b5b8+f5c8 f5c8+b8c8',
+    'themes bishopEndgame crushing defensiveMove endgame master short moves g3g4 h5h4 f4g5 h6g5 g3g4+h5h4 h5h4+f4g5 f4g5+h6g5',
+    'themes endgame mate mateIn1 oneMove moves e5f4 g3g1 e5f4+g3g1',
 ]
 query_embeddings = model.encode_query(queries)
 document_embeddings = model.encode_document(documents)
 print(query_embeddings.shape, document_embeddings.shape)
+# [1, 512] [3, 512]
 # Get the similarity scores for the embeddings
 similarities = model.similarity(query_embeddings, document_embeddings)
 print(similarities)
+# tensor([[ 0.8014, -0.0485,  0.0709]])
 ```
 <!--
 ### Direct Usage (Transformers)
 #### Information Retrieval
+* Datasets: `chess-ir` and `chess-ir-tokens`
 * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.sentence_transformer.evaluation.InformationRetrievalEvaluator)
+| Metric              | chess-ir   | chess-ir-tokens |
+|:--------------------|:-----------|:----------------|
+| cosine_accuracy@1   | 0.02       | 0.1111          |
+| cosine_accuracy@10  | 0.135      | 0.3016          |
+| cosine_precision@1  | 0.02       | 0.1111          |
+| cosine_precision@10 | 0.0175     | 0.0836          |
+| cosine_recall@1     | 0.0067     | 0.0082          |
+| cosine_recall@10    | 0.0583     | 0.038           |
+| **cosine_ndcg@10**  | **0.0403** | **0.0964**      |
+| cosine_mrr@10       | 0.0509     | 0.1605          |
+| cosine_map@100      | 0.0347     | 0.0548          |
 <!--
 ## Bias, Risks and Limitations
 * Size: 5,832,592 training samples
 * Columns: <code>anchor</code> and <code>positive</code>
 * Approximate statistics based on the first 100 samples:
+  |          | anchor                                                                                          | positive                                                                                         |
+  |:---------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
+  | type     | string                                                                                          | string                                                                                           |
+  | modality | text                                                                                            | text                                                                                             |
+  | details  | <ul><li>min: 14 characters</li><li>mean: 45.72 characters</li><li>max: 107 characters</li></ul> | <ul><li>min: 61 characters</li><li>mean: 121.98 characters</li><li>max: 233 characters</li></ul> |
 * Samples:
+  | anchor                                                                 | positive                                                                                                                              |
+  |:-----------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>crushing endgame fork short</code>                               | <code>themes crushing endgame fork short moves f7f6 g5e6 g7h6 e6c5 f7f6+g5e6 g5e6+g7h6 g7h6+e6c5</code>                               |
+  | <code>crushing discoveredAttack kingsideAttack middlegame short</code> | <code>themes crushing discoveredAttack kingsideAttack middlegame short moves e4g3 f3g3 f2g3 h5e2 e4g3+f3g3 f3g3+f2g3 f2g3+h5e2</code> |
+  | <code>crushing middlegame short</code>                                 | <code>themes crushing middlegame short moves d7c8 e2g4 c8c7 c3b5 d7c8+e2g4 e2g4+c8c7 c8c7+c3b5</code>                                 |
+* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
   ```json
   {
+      "loss": "MultipleNegativesRankingLoss",
+      "matryoshka_dims": [
+          512,
+          256,
+          128,
+          64,
+          32
       ],
+      "matryoshka_weights": [
+          1,
+          1,
+          1,
+          1,
+          1
+      ],
+      "n_dims_per_step": -1
   }
   ```
 - `per_device_train_batch_size`: 2048
 - `num_train_epochs`: 1
 - `learning_rate`: 0.05
 - `warmup_steps`: 0.1
 - `weight_decay`: 0.01
 - `per_device_eval_batch_size`: 2048
 - `push_to_hub`: True
 - `hub_model_id`: oneryalcin/static-embedding-chess
 - `per_device_train_batch_size`: 2048
 - `num_train_epochs`: 1
+- `max_steps`: -1
 - `learning_rate`: 0.05
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: None
 - `average_tokens_across_devices`: True
 - `max_grad_norm`: 1.0
 - `label_smoothing_factor`: 0.0
+- `bf16`: False
 - `fp16`: False
 - `bf16_full_eval`: False
 - `fp16_full_eval`: False
 </details>
 ### Training Logs
+| Epoch  | Step | Training Loss | chess-ir_cosine_ndcg@10 | chess-ir-tokens_cosine_ndcg@10 |
+|:------:|:----:|:-------------:|:-----------------------:|:------------------------------:|
+| -1     | -1   | -             | 0.0087                  | 0.0476                         |
+| 0.0004 | 1    | 25.5090       | -                       | -                              |
+| 0.0102 | 29   | 24.7398       | -                       | -                              |
+| 0.0204 | 58   | 20.8309       | -                       | -                              |
+| 0.0305 | 87   | 16.5176       | -                       | -                              |
+| 0.0407 | 116  | 12.8534       | -                       | -                              |
+| 0.0509 | 145  | 10.2759       | -                       | -                              |
+| 0.0611 | 174  | 8.7313        | -                       | -                              |
+| 0.0713 | 203  | 7.8373        | -                       | -                              |
+| 0.0815 | 232  | 7.3665        | -                       | -                              |
+| 0.0916 | 261  | 7.0534        | -                       | -                              |
+| 0.1001 | 285  | -             | 0.0403                  | 0.0964                         |
 ### Training Time
+- **Training**: 16.5 seconds
+- **Evaluation**: 0.1 seconds
+- **Total**: 16.6 seconds
 ### Framework Versions
 - Python: 3.12.10
 - Sentence Transformers: 5.5.0
+- Transformers: 5.8.0
+- PyTorch: 2.11.0
 - Accelerate: 1.13.0
 - Datasets: 4.8.5
 - Tokenizers: 0.22.2
 }
 ```
+#### MatryoshkaLoss
+```bibtex
+@misc{kusupati2024matryoshka,
+    title={Matryoshka Representation Learning},
+    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
+    year={2024},
+    eprint={2205.13147},
+    archivePrefix={arXiv},
+    primaryClass={cs.LG}
+}
+```
 #### MultipleNegativesRankingLoss
 ```bibtex
 @misc{oord2019representationlearningcontrastivepredictive,

chess_tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

config_sentence_transformers.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "__version__": {
-    "pytorch": "2.12.0+cu130",
     "sentence_transformers": "5.5.0",
-    "transformers": "5.8.1"
   },
   "default_prompt_name": null,
   "model_type": "SentenceTransformer",

 {
   "__version__": {
+    "pytorch": "2.11.0",
     "sentence_transformers": "5.5.0",
+    "transformers": "5.8.0"
   },
   "default_prompt_name": null,
   "model_type": "SentenceTransformer",

eval/Information-Retrieval_evaluation_chess-ir-tokens_results.csv ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ epoch,steps,cosine-Accuracy@1,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
2	+ 0.10007022471910113,285,0.1111111111111111,0.30158730158730157,0.1111111111111111,0.008191309640952804,0.0835978835978836,0.03797928598263959,0.16048962794994542,0.0963937043281825,0.05480807151213741

eval/Information-Retrieval_evaluation_chess-ir_results.csv CHANGED Viewed

@@ -1,7 +1,2 @@
 epoch,steps,cosine-Accuracy@1,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
-0.0175561797752809,50,0.015,0.115,0.015,0.005,0.013000000000000001,0.04333333333333333,0.03541269841269841,0.02770564804107805,0.021195015342589062
-0.0351123595505618,100,0.015,0.135,0.015,0.005,0.016,0.05333333333333333,0.04136111111111111,0.03352606053277749,0.025214543549657912
-0.05266853932584269,150,0.02,0.12,0.02,0.006666666666666666,0.0155,0.051666666666666666,0.04391468253968253,0.034539315152376744,0.02851338765635309
-0.0702247191011236,200,0.03,0.16,0.03,0.009999999999999998,0.02,0.06666666666666667,0.05857142857142858,0.045080933582823335,0.033163497941181515
-0.0877808988764045,250,0.025,0.14,0.025,0.008333333333333333,0.017,0.056666666666666664,0.049240079365079355,0.037406426241984,0.02874627448743367
-0.10533707865168539,300,0.025,0.125,0.025,0.008333333333333333,0.016,0.05333333333333333,0.053103174603174604,0.03923902062478621,0.03190843674305716


1	epoch,steps,cosine-Accuracy@1,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
2	+ 0.10007022471910113,285,0.02,0.135,0.02,0.006666666666666666,0.0175,0.05833333333333333,0.05090277777777777,0.040260232965004236,0.03468285594907049

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a9765f24fcf2a3c67e7e6402e8f6453489a27ae2b99d12929d135e4886f7605e
-size 4343904

 version https://git-lfs.github.com/spec/v1
+oid sha256:c50ae1fdd13646f6ccd8502b934c8d7f1ac91ee33935efefe06cbb8bd4c6cdd4
+size 8880224

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bed99a155dbec21964e026b0647e360c1ee931c7048ba902384bf1f776fcae2c
 size 5713

 version https://git-lfs.github.com/spec/v1
+oid sha256:d1f79a123f09dc75fd3488fe5caef388a8c542815dabe7ec16811867955b17a2
 size 5713