benjamintli
/

modernbert-cosqa

@@ -7,7 +7,7 @@ tags:
 - generated_from_trainer
 - dataset_size:8118
 - loss:CachedMultipleNegativesRankingLoss
-base_model: answerdotai/ModernBERT-base
 widget:
 - source_sentence: python create path if doesnt exist
   sentences:
@@ -101,7 +101,7 @@ metrics:
 - cosine_mrr@10
 - cosine_map@100
 model-index:
-- name: SentenceTransformer based on answerdotai/ModernBERT-base
   results:
   - task:
       type: information-retrieval
@@ -111,61 +111,61 @@ model-index:
       type: eval
     metrics:
     - type: cosine_accuracy@1
-      value: 0.61529933481153
       name: Cosine Accuracy@1
     - type: cosine_accuracy@3
-      value: 0.8791574279379157
       name: Cosine Accuracy@3
     - type: cosine_accuracy@5
-      value: 0.9356984478935698
       name: Cosine Accuracy@5
     - type: cosine_accuracy@10
-      value: 0.9733924611973392
       name: Cosine Accuracy@10
     - type: cosine_precision@1
-      value: 0.61529933481153
       name: Cosine Precision@1
     - type: cosine_precision@3
-      value: 0.2930524759793052
       name: Cosine Precision@3
     - type: cosine_precision@5
-      value: 0.187139689578714
       name: Cosine Precision@5
     - type: cosine_precision@10
-      value: 0.09733924611973392
       name: Cosine Precision@10
     - type: cosine_recall@1
-      value: 0.61529933481153
       name: Cosine Recall@1
     - type: cosine_recall@3
-      value: 0.8791574279379157
       name: Cosine Recall@3
     - type: cosine_recall@5
-      value: 0.9356984478935698
       name: Cosine Recall@5
     - type: cosine_recall@10
-      value: 0.9733924611973392
       name: Cosine Recall@10
     - type: cosine_ndcg@10
-      value: 0.8075594888103552
       name: Cosine Ndcg@10
     - type: cosine_mrr@10
-      value: 0.7526867103086619
       name: Cosine Mrr@10
     - type: cosine_map@100
-      value: 0.7539697212778319
       name: Cosine Map@100
 ---
-# SentenceTransformer based on answerdotai/ModernBERT-base
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
-- **Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) <!-- at revision 8949b909ec900327062f0ebf497f51aef5e6f0c8 -->
 - **Maximum Sequence Length:** 512 tokens
 - **Output Dimensionality:** 768 dimensions
 - **Similarity Function:** Cosine Similarity
@@ -221,7 +221,7 @@ print(query_embeddings.shape, document_embeddings.shape)
 # Get the similarity scores for the embeddings
 similarities = model.similarity(query_embeddings, document_embeddings)
 print(similarities)
-# tensor([[0.6000, 0.0149, 0.0027]])
 ```
 <!--
@@ -259,21 +259,21 @@ You can finetune this model on your own dataset.
 | Metric              | Value      |
 |:--------------------|:-----------|
-| cosine_accuracy@1   | 0.6153     |
-| cosine_accuracy@3   | 0.8792     |
-| cosine_accuracy@5   | 0.9357     |
-| cosine_accuracy@10  | 0.9734     |
-| cosine_precision@1  | 0.6153     |
-| cosine_precision@3  | 0.2931     |
-| cosine_precision@5  | 0.1871     |
-| cosine_precision@10 | 0.0973     |
-| cosine_recall@1     | 0.6153     |
-| cosine_recall@3     | 0.8792     |
-| cosine_recall@5     | 0.9357     |
-| cosine_recall@10    | 0.9734     |
-| **cosine_ndcg@10**  | **0.8076** |
-| cosine_mrr@10       | 0.7527     |
-| cosine_map@100      | 0.754      |
 <!--
 ## Bias, Risks and Limitations
@@ -360,7 +360,7 @@ You can finetune this model on your own dataset.
 - `per_device_train_batch_size`: 1024
 - `num_train_epochs`: 10
-- `learning_rate`: 2e-05
 - `warmup_steps`: 0.1
 - `bf16`: True
 - `eval_strategy`: epoch
@@ -377,7 +377,7 @@ You can finetune this model on your own dataset.
 - `per_device_train_batch_size`: 1024
 - `num_train_epochs`: 10
 - `max_steps`: -1
-- `learning_rate`: 2e-05
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: None
 - `warmup_steps`: 0.1
@@ -475,24 +475,24 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch    | Step   | Training Loss | Validation Loss | eval_cosine_ndcg@10 |
-|:--------:|:------:|:-------------:|:---------------:|:-------------------:|
-| 1.0      | 8      | -             | 2.7837          | 0.3703              |
-| 1.25     | 10     | 6.1885        | -               | -                   |
-| 2.0      | 16     | -             | 1.4004          | 0.4896              |
-| 2.5      | 20     | 3.6826        | -               | -                   |
-| 3.0      | 24     | -             | 0.8114          | 0.6814              |
-| 3.75     | 30     | 2.2134        | -               | -                   |
-| 4.0      | 32     | -             | 0.5772          | 0.7412              |
-| 5.0      | 40     | 1.5999        | 0.4729          | 0.7684              |
-| 6.0      | 48     | -             | 0.4246          | 0.7873              |
-| 6.25     | 50     | 1.3357        | -               | -                   |
-| 7.0      | 56     | -             | 0.3918          | 0.7978              |
-| 7.5      | 60     | 1.1768        | -               | -                   |
-| 8.0      | 64     | -             | 0.3711          | 0.8005              |
-| 8.75     | 70     | 1.0993        | -               | -                   |
-| 9.0      | 72     | -             | 0.3602          | 0.8064              |
-| **10.0** | **80** | **1.0152**    | **0.3568**      | **0.8076**          |
 * The bold row denotes the saved checkpoint.
@@ -502,7 +502,7 @@ You can finetune this model on your own dataset.
 - Transformers: 5.3.0
 - PyTorch: 2.10.0+cu128
 - Accelerate: 1.13.0
-- Datasets: 4.7.0
 - Tokenizers: 0.22.2
 ## Citation

 - generated_from_trainer
 - dataset_size:8118
 - loss:CachedMultipleNegativesRankingLoss
+base_model: benjamintli/modernbert-cosqa
 widget:
 - source_sentence: python create path if doesnt exist
   sentences:
 - cosine_mrr@10
 - cosine_map@100
 model-index:
+- name: SentenceTransformer based on benjamintli/modernbert-cosqa
   results:
   - task:
       type: information-retrieval
       type: eval
     metrics:
     - type: cosine_accuracy@1
+      value: 0.6197339246119734
       name: Cosine Accuracy@1
     - type: cosine_accuracy@3
+      value: 0.88470066518847
       name: Cosine Accuracy@3
     - type: cosine_accuracy@5
+      value: 0.9390243902439024
       name: Cosine Accuracy@5
     - type: cosine_accuracy@10
+      value: 0.9778270509977827
       name: Cosine Accuracy@10
     - type: cosine_precision@1
+      value: 0.6197339246119734
       name: Cosine Precision@1
     - type: cosine_precision@3
+      value: 0.29490022172949004
       name: Cosine Precision@3
     - type: cosine_precision@5
+      value: 0.18780487804878046
       name: Cosine Precision@5
     - type: cosine_precision@10
+      value: 0.0977827050997783
       name: Cosine Precision@10
     - type: cosine_recall@1
+      value: 0.6197339246119734
       name: Cosine Recall@1
     - type: cosine_recall@3
+      value: 0.88470066518847
       name: Cosine Recall@3
     - type: cosine_recall@5
+      value: 0.9390243902439024
       name: Cosine Recall@5
     - type: cosine_recall@10
+      value: 0.9778270509977827
       name: Cosine Recall@10
     - type: cosine_ndcg@10
+      value: 0.8124675617500997
       name: Cosine Ndcg@10
     - type: cosine_mrr@10
+      value: 0.7577473339668463
       name: Cosine Mrr@10
     - type: cosine_map@100
+      value: 0.7588050805217604
       name: Cosine Map@100
 ---
+# SentenceTransformer based on benjamintli/modernbert-cosqa
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [benjamintli/modernbert-cosqa](https://huggingface.co/benjamintli/modernbert-cosqa). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+- **Base model:** [benjamintli/modernbert-cosqa](https://huggingface.co/benjamintli/modernbert-cosqa) <!-- at revision c85b25617894d583fafad7eb7421b7dc0aab0ad9 -->
 - **Maximum Sequence Length:** 512 tokens
 - **Output Dimensionality:** 768 dimensions
 - **Similarity Function:** Cosine Similarity
 # Get the similarity scores for the embeddings
 similarities = model.similarity(query_embeddings, document_embeddings)
 print(similarities)
+# tensor([[ 0.5986, -0.0006, -0.0122]])
 ```
 <!--
 | Metric              | Value      |
 |:--------------------|:-----------|
+| cosine_accuracy@1   | 0.6197     |
+| cosine_accuracy@3   | 0.8847     |
+| cosine_accuracy@5   | 0.939      |
+| cosine_accuracy@10  | 0.9778     |
+| cosine_precision@1  | 0.6197     |
+| cosine_precision@3  | 0.2949     |
+| cosine_precision@5  | 0.1878     |
+| cosine_precision@10 | 0.0978     |
+| cosine_recall@1     | 0.6197     |
+| cosine_recall@3     | 0.8847     |
+| cosine_recall@5     | 0.939      |
+| cosine_recall@10    | 0.9778     |
+| **cosine_ndcg@10**  | **0.8125** |
+| cosine_mrr@10       | 0.7577     |
+| cosine_map@100      | 0.7588     |
 <!--
 ## Bias, Risks and Limitations
 - `per_device_train_batch_size`: 1024
 - `num_train_epochs`: 10
+- `learning_rate`: 2e-06
 - `warmup_steps`: 0.1
 - `bf16`: True
 - `eval_strategy`: epoch
 - `per_device_train_batch_size`: 1024
 - `num_train_epochs`: 10
 - `max_steps`: -1
+- `learning_rate`: 2e-06
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: None
 - `warmup_steps`: 0.1
 </details>
 ### Training Logs
+| Epoch   | Step   | Training Loss | Validation Loss | eval_cosine_ndcg@10 |
+|:-------:|:------:|:-------------:|:---------------:|:-------------------:|
+| 1.0     | 8      | -             | 0.3550          | 0.8071              |
+| 1.25    | 10     | 1.0218        | -               | -                   |
+| 2.0     | 16     | -             | 0.3508          | 0.8110              |
+| 2.5     | 20     | 0.9890        | -               | -                   |
+| 3.0     | 24     | -             | 0.3466          | 0.8131              |
+| 3.75    | 30     | 0.9778        | -               | -                   |
+| 4.0     | 32     | -             | 0.3439          | 0.8136              |
+| **5.0** | **40** | **0.9507**    | **0.3417**      | **0.8148**          |
+| 6.0     | 48     | -             | 0.3404          | 0.8120              |
+| 6.25    | 50     | 0.9429        | -               | -                   |
+| 7.0     | 56     | -             | 0.3387          | 0.8131              |
+| 7.5     | 60     | 0.9267        | -               | -                   |
+| 8.0     | 64     | -             | 0.3378          | 0.8127              |
+| 8.75    | 70     | 0.9396        | -               | -                   |
+| 9.0     | 72     | -             | 0.3370          | 0.8106              |
+| 10.0    | 80     | 0.9099        | 0.3366          | 0.8125              |
 * The bold row denotes the saved checkpoint.
 - Transformers: 5.3.0
 - PyTorch: 2.10.0+cu128
 - Accelerate: 1.13.0
+- Datasets: 4.8.2
 - Tokenizers: 0.22.2
 ## Citation