Sentence Similarity
sentence-transformers
Safetensors
bert
feature-extraction
dense
Generated from Trainer
dataset_size:111470
loss:MultipleNegativesRankingLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use redis/model-b-structured with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use redis/model-b-structured with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("redis/model-b-structured") sentences = [ "when was the first elephant brought to america", "Old Bet The first elephant brought to the United States was in 1796, aboard the America which set sail from Calcutta for New York on December 3, 1795.[4] However, it is not certain that this was Old Bet.[2] The first references to Old Bet start in 1804 in Boston as part of a menagerie.[1] In 1808, while residing in Somers, New York, Hachaliah Bailey purchased the menagerie elephant for $1,000 and named it \"Old Bet\".[5][6]", "Cronus Rhea secretly gave birth to Zeus in Crete, and handed Cronus a stone wrapped in swaddling clothes, also known as the Omphalos Stone, which he promptly swallowed, thinking that it was his son.", "Renal artery One or two accessory renal arteries are frequently found, especially on the left side since they usually arise from the aorta, and may come off above (more common) or below the main artery. Instead of entering the kidney at the hilus, they usually pierce the upper or lower part of the organ." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Add new SentenceTransformer model
Browse files- 1_Pooling/config.json +1 -1
- README.md +229 -80
- config_sentence_transformers.json +2 -2
1_Pooling/config.json
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
{
|
| 2 |
-
"word_embedding_dimension":
|
| 3 |
"pooling_mode_cls_token": true,
|
| 4 |
"pooling_mode_mean_tokens": false,
|
| 5 |
"pooling_mode_max_tokens": false,
|
|
|
|
| 1 |
{
|
| 2 |
+
"word_embedding_dimension": 768,
|
| 3 |
"pooling_mode_cls_token": true,
|
| 4 |
"pooling_mode_mean_tokens": false,
|
| 5 |
"pooling_mode_max_tokens": false,
|
README.md
CHANGED
|
@@ -5,51 +5,123 @@ tags:
|
|
| 5 |
- feature-extraction
|
| 6 |
- dense
|
| 7 |
- generated_from_trainer
|
| 8 |
-
- dataset_size:
|
| 9 |
- loss:MultipleNegativesRankingLoss
|
| 10 |
-
base_model:
|
| 11 |
widget:
|
| 12 |
-
- source_sentence:
|
| 13 |
sentences:
|
| 14 |
-
-
|
| 15 |
-
-
|
| 16 |
-
- What
|
| 17 |
-
- source_sentence:
|
|
|
|
| 18 |
sentences:
|
| 19 |
-
- How
|
| 20 |
-
-
|
| 21 |
-
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
sentences:
|
| 25 |
-
-
|
| 26 |
-
|
| 27 |
-
-
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
| 29 |
sentences:
|
| 30 |
-
- What are
|
| 31 |
-
-
|
| 32 |
-
-
|
| 33 |
-
- source_sentence: What is the
|
| 34 |
sentences:
|
| 35 |
-
-
|
| 36 |
-
|
| 37 |
-
-
|
|
|
|
| 38 |
pipeline_tag: sentence-similarity
|
| 39 |
library_name: sentence-transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
---
|
| 41 |
|
| 42 |
-
# SentenceTransformer based on
|
| 43 |
|
| 44 |
-
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [
|
| 45 |
|
| 46 |
## Model Details
|
| 47 |
|
| 48 |
### Model Description
|
| 49 |
- **Model Type:** Sentence Transformer
|
| 50 |
-
- **Base model:** [
|
| 51 |
- **Maximum Sequence Length:** 128 tokens
|
| 52 |
-
- **Output Dimensionality:**
|
| 53 |
- **Similarity Function:** Cosine Similarity
|
| 54 |
<!-- - **Training Dataset:** Unknown -->
|
| 55 |
<!-- - **Language:** Unknown -->
|
|
@@ -65,8 +137,8 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [p
|
|
| 65 |
|
| 66 |
```
|
| 67 |
SentenceTransformer(
|
| 68 |
-
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': '
|
| 69 |
-
(1): Pooling({'word_embedding_dimension':
|
| 70 |
)
|
| 71 |
```
|
| 72 |
|
|
@@ -85,23 +157,23 @@ Then you can load this model and run inference.
|
|
| 85 |
from sentence_transformers import SentenceTransformer
|
| 86 |
|
| 87 |
# Download from the 🤗 Hub
|
| 88 |
-
model = SentenceTransformer("
|
| 89 |
# Run inference
|
| 90 |
sentences = [
|
| 91 |
-
'What is the
|
| 92 |
-
'
|
| 93 |
-
'
|
| 94 |
]
|
| 95 |
embeddings = model.encode(sentences)
|
| 96 |
print(embeddings.shape)
|
| 97 |
-
# [3,
|
| 98 |
|
| 99 |
# Get the similarity scores for the embeddings
|
| 100 |
similarities = model.similarity(embeddings, embeddings)
|
| 101 |
print(similarities)
|
| 102 |
-
# tensor([[1.0000,
|
| 103 |
-
# [
|
| 104 |
-
# [0.
|
| 105 |
```
|
| 106 |
|
| 107 |
<!--
|
|
@@ -128,6 +200,32 @@ You can finetune this model on your own dataset.
|
|
| 128 |
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
| 129 |
-->
|
| 130 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
<!--
|
| 132 |
## Bias, Risks and Limitations
|
| 133 |
|
|
@@ -146,23 +244,49 @@ You can finetune this model on your own dataset.
|
|
| 146 |
|
| 147 |
#### Unnamed Dataset
|
| 148 |
|
| 149 |
-
* Size:
|
| 150 |
-
* Columns: <code>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
* Approximate statistics based on the first 1000 samples:
|
| 152 |
-
| |
|
| 153 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 154 |
| type | string | string | string |
|
| 155 |
-
| details | <ul><li>min:
|
| 156 |
* Samples:
|
| 157 |
-
|
|
| 158 |
-
|:-----------------------------------------------------------------|:-----------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 159 |
-
| <code>
|
| 160 |
-
| <code>
|
| 161 |
-
| <code>
|
| 162 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 163 |
```json
|
| 164 |
{
|
| 165 |
-
"scale":
|
| 166 |
"similarity_fct": "cos_sim",
|
| 167 |
"gather_across_devices": false
|
| 168 |
}
|
|
@@ -171,36 +295,49 @@ You can finetune this model on your own dataset.
|
|
| 171 |
### Training Hyperparameters
|
| 172 |
#### Non-Default Hyperparameters
|
| 173 |
|
| 174 |
-
- `
|
| 175 |
-
- `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
- `fp16`: True
|
| 177 |
-
- `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
|
| 179 |
#### All Hyperparameters
|
| 180 |
<details><summary>Click to expand</summary>
|
| 181 |
|
| 182 |
- `overwrite_output_dir`: False
|
| 183 |
- `do_predict`: False
|
| 184 |
-
- `eval_strategy`:
|
| 185 |
- `prediction_loss_only`: True
|
| 186 |
-
- `per_device_train_batch_size`:
|
| 187 |
-
- `per_device_eval_batch_size`:
|
| 188 |
- `per_gpu_train_batch_size`: None
|
| 189 |
- `per_gpu_eval_batch_size`: None
|
| 190 |
- `gradient_accumulation_steps`: 1
|
| 191 |
- `eval_accumulation_steps`: None
|
| 192 |
- `torch_empty_cache_steps`: None
|
| 193 |
-
- `learning_rate`:
|
| 194 |
-
- `weight_decay`: 0.
|
| 195 |
- `adam_beta1`: 0.9
|
| 196 |
- `adam_beta2`: 0.999
|
| 197 |
- `adam_epsilon`: 1e-08
|
| 198 |
-
- `max_grad_norm`: 1
|
| 199 |
-
- `num_train_epochs`: 3
|
| 200 |
-
- `max_steps`:
|
| 201 |
- `lr_scheduler_type`: linear
|
| 202 |
- `lr_scheduler_kwargs`: {}
|
| 203 |
-
- `warmup_ratio`: 0.
|
| 204 |
- `warmup_steps`: 0
|
| 205 |
- `log_level`: passive
|
| 206 |
- `log_level_replica`: warning
|
|
@@ -228,14 +365,14 @@ You can finetune this model on your own dataset.
|
|
| 228 |
- `tpu_num_cores`: None
|
| 229 |
- `tpu_metrics_debug`: False
|
| 230 |
- `debug`: []
|
| 231 |
-
- `dataloader_drop_last`:
|
| 232 |
-
- `dataloader_num_workers`:
|
| 233 |
-
- `dataloader_prefetch_factor`:
|
| 234 |
- `past_index`: -1
|
| 235 |
- `disable_tqdm`: False
|
| 236 |
- `remove_unused_columns`: True
|
| 237 |
- `label_names`: None
|
| 238 |
-
- `load_best_model_at_end`:
|
| 239 |
- `ignore_data_skip`: False
|
| 240 |
- `fsdp`: []
|
| 241 |
- `fsdp_min_num_params`: 0
|
|
@@ -245,23 +382,23 @@ You can finetune this model on your own dataset.
|
|
| 245 |
- `parallelism_config`: None
|
| 246 |
- `deepspeed`: None
|
| 247 |
- `label_smoothing_factor`: 0.0
|
| 248 |
-
- `optim`:
|
| 249 |
- `optim_args`: None
|
| 250 |
- `adafactor`: False
|
| 251 |
- `group_by_length`: False
|
| 252 |
- `length_column_name`: length
|
| 253 |
- `project`: huggingface
|
| 254 |
- `trackio_space_id`: trackio
|
| 255 |
-
- `ddp_find_unused_parameters`:
|
| 256 |
- `ddp_bucket_cap_mb`: None
|
| 257 |
- `ddp_broadcast_buffers`: False
|
| 258 |
- `dataloader_pin_memory`: True
|
| 259 |
- `dataloader_persistent_workers`: False
|
| 260 |
- `skip_memory_metrics`: True
|
| 261 |
- `use_legacy_prediction_loop`: False
|
| 262 |
-
- `push_to_hub`:
|
| 263 |
- `resume_from_checkpoint`: None
|
| 264 |
-
- `hub_model_id`:
|
| 265 |
- `hub_strategy`: every_save
|
| 266 |
- `hub_private_repo`: None
|
| 267 |
- `hub_always_push`: False
|
|
@@ -288,31 +425,43 @@ You can finetune this model on your own dataset.
|
|
| 288 |
- `neftune_noise_alpha`: None
|
| 289 |
- `optim_target_modules`: None
|
| 290 |
- `batch_eval_metrics`: False
|
| 291 |
-
- `eval_on_start`:
|
| 292 |
- `use_liger_kernel`: False
|
| 293 |
- `liger_kernel_config`: None
|
| 294 |
- `eval_use_gather_object`: False
|
| 295 |
- `average_tokens_across_devices`: True
|
| 296 |
- `prompts`: None
|
| 297 |
- `batch_sampler`: batch_sampler
|
| 298 |
-
- `multi_dataset_batch_sampler`:
|
| 299 |
- `router_mapping`: {}
|
| 300 |
- `learning_rate_mapping`: {}
|
| 301 |
|
| 302 |
</details>
|
| 303 |
|
| 304 |
### Training Logs
|
| 305 |
-
| Epoch | Step | Training Loss |
|
| 306 |
-
|:------:|:----:|:-------------:|
|
| 307 |
-
| 0
|
| 308 |
-
| 0.
|
| 309 |
-
| 0.
|
| 310 |
-
|
|
| 311 |
-
|
|
| 312 |
-
|
|
| 313 |
-
|
|
| 314 |
-
|
|
| 315 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 316 |
|
| 317 |
|
| 318 |
### Framework Versions
|
|
|
|
| 5 |
- feature-extraction
|
| 6 |
- dense
|
| 7 |
- generated_from_trainer
|
| 8 |
+
- dataset_size:713743
|
| 9 |
- loss:MultipleNegativesRankingLoss
|
| 10 |
+
base_model: Alibaba-NLP/gte-modernbert-base
|
| 11 |
widget:
|
| 12 |
+
- source_sentence: 'Abraham Lincoln: Why is the Gettysburg Address so memorable?'
|
| 13 |
sentences:
|
| 14 |
+
- 'Abraham Lincoln: Why is the Gettysburg Address so memorable?'
|
| 15 |
+
- What does the Gettysburg Address really mean?
|
| 16 |
+
- What is eatalo.com?
|
| 17 |
+
- source_sentence: Has the influence of Ancient Carthage in science, math, and society
|
| 18 |
+
been underestimated?
|
| 19 |
sentences:
|
| 20 |
+
- How does one earn money online without an investment from home?
|
| 21 |
+
- Has the influence of Ancient Carthage in science, math, and society been underestimated?
|
| 22 |
+
- Has the influence of the Ancient Etruscans in science and math been underestimated?
|
| 23 |
+
- source_sentence: Is there any app that shares charging to others like share it how
|
| 24 |
+
we transfer files?
|
| 25 |
sentences:
|
| 26 |
+
- How do you think of Chinese claims that the present Private Arbitration is illegal,
|
| 27 |
+
its verdict violates the UNCLOS and is illegal?
|
| 28 |
+
- Is there any app that shares charging to others like share it how we transfer
|
| 29 |
+
files?
|
| 30 |
+
- Are there any platforms that provides end-to-end encryption for file transfer/
|
| 31 |
+
sharing?
|
| 32 |
+
- source_sentence: Why AAP’s MLA Dinesh Mohaniya has been arrested?
|
| 33 |
sentences:
|
| 34 |
+
- What are your views on the latest sex scandal by AAP MLA Sandeep Kumar?
|
| 35 |
+
- What is a dc current? What are some examples?
|
| 36 |
+
- Why AAP’s MLA Dinesh Mohaniya has been arrested?
|
| 37 |
+
- source_sentence: What is the difference between economic growth and economic development?
|
| 38 |
sentences:
|
| 39 |
+
- How cold can the Gobi Desert get, and how do its average temperatures compare
|
| 40 |
+
to the ones in the Simpson Desert?
|
| 41 |
+
- the difference between economic growth and economic development is What?
|
| 42 |
+
- What is the difference between economic growth and economic development?
|
| 43 |
pipeline_tag: sentence-similarity
|
| 44 |
library_name: sentence-transformers
|
| 45 |
+
metrics:
|
| 46 |
+
- cosine_accuracy@1
|
| 47 |
+
- cosine_accuracy@3
|
| 48 |
+
- cosine_accuracy@5
|
| 49 |
+
- cosine_precision@1
|
| 50 |
+
- cosine_precision@3
|
| 51 |
+
- cosine_precision@5
|
| 52 |
+
- cosine_recall@1
|
| 53 |
+
- cosine_recall@3
|
| 54 |
+
- cosine_recall@5
|
| 55 |
+
- cosine_ndcg@10
|
| 56 |
+
- cosine_mrr@1
|
| 57 |
+
- cosine_mrr@5
|
| 58 |
+
- cosine_mrr@10
|
| 59 |
+
- cosine_map@100
|
| 60 |
+
model-index:
|
| 61 |
+
- name: SentenceTransformer based on Alibaba-NLP/gte-modernbert-base
|
| 62 |
+
results:
|
| 63 |
+
- task:
|
| 64 |
+
type: information-retrieval
|
| 65 |
+
name: Information Retrieval
|
| 66 |
+
dataset:
|
| 67 |
+
name: val
|
| 68 |
+
type: val
|
| 69 |
+
metrics:
|
| 70 |
+
- type: cosine_accuracy@1
|
| 71 |
+
value: 0.83665
|
| 72 |
+
name: Cosine Accuracy@1
|
| 73 |
+
- type: cosine_accuracy@3
|
| 74 |
+
value: 0.91045
|
| 75 |
+
name: Cosine Accuracy@3
|
| 76 |
+
- type: cosine_accuracy@5
|
| 77 |
+
value: 0.9361
|
| 78 |
+
name: Cosine Accuracy@5
|
| 79 |
+
- type: cosine_precision@1
|
| 80 |
+
value: 0.83665
|
| 81 |
+
name: Cosine Precision@1
|
| 82 |
+
- type: cosine_precision@3
|
| 83 |
+
value: 0.3034833333333333
|
| 84 |
+
name: Cosine Precision@3
|
| 85 |
+
- type: cosine_precision@5
|
| 86 |
+
value: 0.18722000000000003
|
| 87 |
+
name: Cosine Precision@5
|
| 88 |
+
- type: cosine_recall@1
|
| 89 |
+
value: 0.83665
|
| 90 |
+
name: Cosine Recall@1
|
| 91 |
+
- type: cosine_recall@3
|
| 92 |
+
value: 0.91045
|
| 93 |
+
name: Cosine Recall@3
|
| 94 |
+
- type: cosine_recall@5
|
| 95 |
+
value: 0.9361
|
| 96 |
+
name: Cosine Recall@5
|
| 97 |
+
- type: cosine_ndcg@10
|
| 98 |
+
value: 0.9000254411118587
|
| 99 |
+
name: Cosine Ndcg@10
|
| 100 |
+
- type: cosine_mrr@1
|
| 101 |
+
value: 0.83665
|
| 102 |
+
name: Cosine Mrr@1
|
| 103 |
+
- type: cosine_mrr@5
|
| 104 |
+
value: 0.8753945833333286
|
| 105 |
+
name: Cosine Mrr@5
|
| 106 |
+
- type: cosine_mrr@10
|
| 107 |
+
value: 0.8793089583333286
|
| 108 |
+
name: Cosine Mrr@10
|
| 109 |
+
- type: cosine_map@100
|
| 110 |
+
value: 0.8812821493075779
|
| 111 |
+
name: Cosine Map@100
|
| 112 |
---
|
| 113 |
|
| 114 |
+
# SentenceTransformer based on Alibaba-NLP/gte-modernbert-base
|
| 115 |
|
| 116 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 117 |
|
| 118 |
## Model Details
|
| 119 |
|
| 120 |
### Model Description
|
| 121 |
- **Model Type:** Sentence Transformer
|
| 122 |
+
- **Base model:** [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) <!-- at revision e7f32e3c00f91d699e8c43b53106206bcc72bb22 -->
|
| 123 |
- **Maximum Sequence Length:** 128 tokens
|
| 124 |
+
- **Output Dimensionality:** 768 dimensions
|
| 125 |
- **Similarity Function:** Cosine Similarity
|
| 126 |
<!-- - **Training Dataset:** Unknown -->
|
| 127 |
<!-- - **Language:** Unknown -->
|
|
|
|
| 137 |
|
| 138 |
```
|
| 139 |
SentenceTransformer(
|
| 140 |
+
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
|
| 141 |
+
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 142 |
)
|
| 143 |
```
|
| 144 |
|
|
|
|
| 157 |
from sentence_transformers import SentenceTransformer
|
| 158 |
|
| 159 |
# Download from the 🤗 Hub
|
| 160 |
+
model = SentenceTransformer("redis/model-b-structured")
|
| 161 |
# Run inference
|
| 162 |
sentences = [
|
| 163 |
+
'What is the difference between economic growth and economic development?',
|
| 164 |
+
'What is the difference between economic growth and economic development?',
|
| 165 |
+
'the difference between economic growth and economic development is What?',
|
| 166 |
]
|
| 167 |
embeddings = model.encode(sentences)
|
| 168 |
print(embeddings.shape)
|
| 169 |
+
# [3, 768]
|
| 170 |
|
| 171 |
# Get the similarity scores for the embeddings
|
| 172 |
similarities = model.similarity(embeddings, embeddings)
|
| 173 |
print(similarities)
|
| 174 |
+
# tensor([[ 1.0000, 1.0000, -0.0640],
|
| 175 |
+
# [ 1.0000, 1.0000, -0.0640],
|
| 176 |
+
# [-0.0640, -0.0640, 1.0000]])
|
| 177 |
```
|
| 178 |
|
| 179 |
<!--
|
|
|
|
| 200 |
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
| 201 |
-->
|
| 202 |
|
| 203 |
+
## Evaluation
|
| 204 |
+
|
| 205 |
+
### Metrics
|
| 206 |
+
|
| 207 |
+
#### Information Retrieval
|
| 208 |
+
|
| 209 |
+
* Dataset: `val`
|
| 210 |
+
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
|
| 211 |
+
|
| 212 |
+
| Metric | Value |
|
| 213 |
+
|:-------------------|:--------|
|
| 214 |
+
| cosine_accuracy@1 | 0.8367 |
|
| 215 |
+
| cosine_accuracy@3 | 0.9104 |
|
| 216 |
+
| cosine_accuracy@5 | 0.9361 |
|
| 217 |
+
| cosine_precision@1 | 0.8367 |
|
| 218 |
+
| cosine_precision@3 | 0.3035 |
|
| 219 |
+
| cosine_precision@5 | 0.1872 |
|
| 220 |
+
| cosine_recall@1 | 0.8367 |
|
| 221 |
+
| cosine_recall@3 | 0.9104 |
|
| 222 |
+
| cosine_recall@5 | 0.9361 |
|
| 223 |
+
| **cosine_ndcg@10** | **0.9** |
|
| 224 |
+
| cosine_mrr@1 | 0.8367 |
|
| 225 |
+
| cosine_mrr@5 | 0.8754 |
|
| 226 |
+
| cosine_mrr@10 | 0.8793 |
|
| 227 |
+
| cosine_map@100 | 0.8813 |
|
| 228 |
+
|
| 229 |
<!--
|
| 230 |
## Bias, Risks and Limitations
|
| 231 |
|
|
|
|
| 244 |
|
| 245 |
#### Unnamed Dataset
|
| 246 |
|
| 247 |
+
* Size: 713,743 training samples
|
| 248 |
+
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
|
| 249 |
+
* Approximate statistics based on the first 1000 samples:
|
| 250 |
+
| | anchor | positive | negative |
|
| 251 |
+
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 252 |
+
| type | string | string | string |
|
| 253 |
+
| details | <ul><li>min: 6 tokens</li><li>mean: 15.96 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.93 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.72 tokens</li><li>max: 59 tokens</li></ul> |
|
| 254 |
+
* Samples:
|
| 255 |
+
| anchor | positive | negative |
|
| 256 |
+
|:-------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------|
|
| 257 |
+
| <code>Which one is better Linux OS? Ubuntu or Mint?</code> | <code>Why do you use Linux Mint?</code> | <code>Which one is not better Linux OS ? Ubuntu or Mint ?</code> |
|
| 258 |
+
| <code>What is flow?</code> | <code>What is flow?</code> | <code>What are flow lines?</code> |
|
| 259 |
+
| <code>How is Trump planning to get Mexico to pay for his supposed wall?</code> | <code>How is it possible for Donald Trump to force Mexico to pay for the wall?</code> | <code>Why do we connect the positive terminal before the negative terminal to ground in a vehicle battery?</code> |
|
| 260 |
+
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 261 |
+
```json
|
| 262 |
+
{
|
| 263 |
+
"scale": 7.0,
|
| 264 |
+
"similarity_fct": "cos_sim",
|
| 265 |
+
"gather_across_devices": false
|
| 266 |
+
}
|
| 267 |
+
```
|
| 268 |
+
|
| 269 |
+
### Evaluation Dataset
|
| 270 |
+
|
| 271 |
+
#### Unnamed Dataset
|
| 272 |
+
|
| 273 |
+
* Size: 40,000 evaluation samples
|
| 274 |
+
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
|
| 275 |
* Approximate statistics based on the first 1000 samples:
|
| 276 |
+
| | anchor | positive | negative |
|
| 277 |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 278 |
| type | string | string | string |
|
| 279 |
+
| details | <ul><li>min: 7 tokens</li><li>mean: 15.47 tokens</li><li>max: 70 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.48 tokens</li><li>max: 70 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.76 tokens</li><li>max: 67 tokens</li></ul> |
|
| 280 |
* Samples:
|
| 281 |
+
| anchor | positive | negative |
|
| 282 |
+
|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
| 283 |
+
| <code>Why are all my questions on Quora marked needing improvement?</code> | <code>Why are all my questions immediately being marked as needing improvement?</code> | <code>For a post-graduate student in IIT, is it allowed to take an external scholarship as a top-up to his/her MHRD assistantship?</code> |
|
| 284 |
+
| <code>Can blue butter fly needle with vaccum tube be reused? Is it HIV risk? . Heard the needle is too small to be reused . Had blood draw at clinic?</code> | <code>Can blue butter fly needle with vaccum tube be reused? Is it HIV risk? . Heard the needle is too small to be reused . Had blood draw at clinic?</code> | <code>Can blue butter fly needle with vaccum tube be reused not ? Is it HIV risk ? . Heard the needle is too small to be reused . Had blood draw at clinic ?</code> |
|
| 285 |
+
| <code>Why do people still believe the world is flat?</code> | <code>Why are there still people who believe the world is flat?</code> | <code>I'm not able to buy Udemy course .it is not accepting mine and my friends debit card.my card can be used for Flipkart .how to purchase now?</code> |
|
| 286 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 287 |
```json
|
| 288 |
{
|
| 289 |
+
"scale": 7.0,
|
| 290 |
"similarity_fct": "cos_sim",
|
| 291 |
"gather_across_devices": false
|
| 292 |
}
|
|
|
|
| 295 |
### Training Hyperparameters
|
| 296 |
#### Non-Default Hyperparameters
|
| 297 |
|
| 298 |
+
- `eval_strategy`: steps
|
| 299 |
+
- `per_device_train_batch_size`: 128
|
| 300 |
+
- `per_device_eval_batch_size`: 128
|
| 301 |
+
- `learning_rate`: 2e-05
|
| 302 |
+
- `weight_decay`: 0.0001
|
| 303 |
+
- `max_steps`: 5000
|
| 304 |
+
- `warmup_ratio`: 0.1
|
| 305 |
- `fp16`: True
|
| 306 |
+
- `dataloader_drop_last`: True
|
| 307 |
+
- `dataloader_num_workers`: 1
|
| 308 |
+
- `dataloader_prefetch_factor`: 1
|
| 309 |
+
- `load_best_model_at_end`: True
|
| 310 |
+
- `optim`: adamw_torch
|
| 311 |
+
- `ddp_find_unused_parameters`: False
|
| 312 |
+
- `push_to_hub`: True
|
| 313 |
+
- `hub_model_id`: redis/model-b-structured
|
| 314 |
+
- `eval_on_start`: True
|
| 315 |
|
| 316 |
#### All Hyperparameters
|
| 317 |
<details><summary>Click to expand</summary>
|
| 318 |
|
| 319 |
- `overwrite_output_dir`: False
|
| 320 |
- `do_predict`: False
|
| 321 |
+
- `eval_strategy`: steps
|
| 322 |
- `prediction_loss_only`: True
|
| 323 |
+
- `per_device_train_batch_size`: 128
|
| 324 |
+
- `per_device_eval_batch_size`: 128
|
| 325 |
- `per_gpu_train_batch_size`: None
|
| 326 |
- `per_gpu_eval_batch_size`: None
|
| 327 |
- `gradient_accumulation_steps`: 1
|
| 328 |
- `eval_accumulation_steps`: None
|
| 329 |
- `torch_empty_cache_steps`: None
|
| 330 |
+
- `learning_rate`: 2e-05
|
| 331 |
+
- `weight_decay`: 0.0001
|
| 332 |
- `adam_beta1`: 0.9
|
| 333 |
- `adam_beta2`: 0.999
|
| 334 |
- `adam_epsilon`: 1e-08
|
| 335 |
+
- `max_grad_norm`: 1.0
|
| 336 |
+
- `num_train_epochs`: 3.0
|
| 337 |
+
- `max_steps`: 5000
|
| 338 |
- `lr_scheduler_type`: linear
|
| 339 |
- `lr_scheduler_kwargs`: {}
|
| 340 |
+
- `warmup_ratio`: 0.1
|
| 341 |
- `warmup_steps`: 0
|
| 342 |
- `log_level`: passive
|
| 343 |
- `log_level_replica`: warning
|
|
|
|
| 365 |
- `tpu_num_cores`: None
|
| 366 |
- `tpu_metrics_debug`: False
|
| 367 |
- `debug`: []
|
| 368 |
+
- `dataloader_drop_last`: True
|
| 369 |
+
- `dataloader_num_workers`: 1
|
| 370 |
+
- `dataloader_prefetch_factor`: 1
|
| 371 |
- `past_index`: -1
|
| 372 |
- `disable_tqdm`: False
|
| 373 |
- `remove_unused_columns`: True
|
| 374 |
- `label_names`: None
|
| 375 |
+
- `load_best_model_at_end`: True
|
| 376 |
- `ignore_data_skip`: False
|
| 377 |
- `fsdp`: []
|
| 378 |
- `fsdp_min_num_params`: 0
|
|
|
|
| 382 |
- `parallelism_config`: None
|
| 383 |
- `deepspeed`: None
|
| 384 |
- `label_smoothing_factor`: 0.0
|
| 385 |
+
- `optim`: adamw_torch
|
| 386 |
- `optim_args`: None
|
| 387 |
- `adafactor`: False
|
| 388 |
- `group_by_length`: False
|
| 389 |
- `length_column_name`: length
|
| 390 |
- `project`: huggingface
|
| 391 |
- `trackio_space_id`: trackio
|
| 392 |
+
- `ddp_find_unused_parameters`: False
|
| 393 |
- `ddp_bucket_cap_mb`: None
|
| 394 |
- `ddp_broadcast_buffers`: False
|
| 395 |
- `dataloader_pin_memory`: True
|
| 396 |
- `dataloader_persistent_workers`: False
|
| 397 |
- `skip_memory_metrics`: True
|
| 398 |
- `use_legacy_prediction_loop`: False
|
| 399 |
+
- `push_to_hub`: True
|
| 400 |
- `resume_from_checkpoint`: None
|
| 401 |
+
- `hub_model_id`: redis/model-b-structured
|
| 402 |
- `hub_strategy`: every_save
|
| 403 |
- `hub_private_repo`: None
|
| 404 |
- `hub_always_push`: False
|
|
|
|
| 425 |
- `neftune_noise_alpha`: None
|
| 426 |
- `optim_target_modules`: None
|
| 427 |
- `batch_eval_metrics`: False
|
| 428 |
+
- `eval_on_start`: True
|
| 429 |
- `use_liger_kernel`: False
|
| 430 |
- `liger_kernel_config`: None
|
| 431 |
- `eval_use_gather_object`: False
|
| 432 |
- `average_tokens_across_devices`: True
|
| 433 |
- `prompts`: None
|
| 434 |
- `batch_sampler`: batch_sampler
|
| 435 |
+
- `multi_dataset_batch_sampler`: proportional
|
| 436 |
- `router_mapping`: {}
|
| 437 |
- `learning_rate_mapping`: {}
|
| 438 |
|
| 439 |
</details>
|
| 440 |
|
| 441 |
### Training Logs
|
| 442 |
+
| Epoch | Step | Training Loss | Validation Loss | val_cosine_ndcg@10 |
|
| 443 |
+
|:------:|:----:|:-------------:|:---------------:|:------------------:|
|
| 444 |
+
| 0 | 0 | - | 2.2389 | 0.8638 |
|
| 445 |
+
| 0.0448 | 250 | 1.0018 | 0.4153 | 0.8910 |
|
| 446 |
+
| 0.0897 | 500 | 0.3879 | 0.3664 | 0.8940 |
|
| 447 |
+
| 0.1345 | 750 | 0.3583 | 0.3532 | 0.8937 |
|
| 448 |
+
| 0.1793 | 1000 | 0.3453 | 0.3371 | 0.8962 |
|
| 449 |
+
| 0.2242 | 1250 | 0.3371 | 0.3299 | 0.8956 |
|
| 450 |
+
| 0.2690 | 1500 | 0.3283 | 0.3230 | 0.8967 |
|
| 451 |
+
| 0.3138 | 1750 | 0.323 | 0.3185 | 0.8974 |
|
| 452 |
+
| 0.3587 | 2000 | 0.3205 | 0.3139 | 0.8978 |
|
| 453 |
+
| 0.4035 | 2250 | 0.315 | 0.3123 | 0.8985 |
|
| 454 |
+
| 0.4484 | 2500 | 0.3132 | 0.3095 | 0.8987 |
|
| 455 |
+
| 0.4932 | 2750 | 0.3082 | 0.3071 | 0.8991 |
|
| 456 |
+
| 0.5380 | 3000 | 0.3065 | 0.3045 | 0.8985 |
|
| 457 |
+
| 0.5829 | 3250 | 0.3041 | 0.3029 | 0.8988 |
|
| 458 |
+
| 0.6277 | 3500 | 0.3046 | 0.3015 | 0.8996 |
|
| 459 |
+
| 0.6725 | 3750 | 0.3023 | 0.3002 | 0.8995 |
|
| 460 |
+
| 0.7174 | 4000 | 0.3017 | 0.2991 | 0.9000 |
|
| 461 |
+
| 0.7622 | 4250 | 0.3001 | 0.2985 | 0.8996 |
|
| 462 |
+
| 0.8070 | 4500 | 0.3006 | 0.2975 | 0.8999 |
|
| 463 |
+
| 0.8519 | 4750 | 0.2983 | 0.2970 | 0.8998 |
|
| 464 |
+
| 0.8967 | 5000 | 0.2991 | 0.2966 | 0.9000 |
|
| 465 |
|
| 466 |
|
| 467 |
### Framework Versions
|
config_sentence_transformers.json
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
{
|
| 2 |
-
"model_type": "SentenceTransformer",
|
| 3 |
"__version__": {
|
| 4 |
"sentence_transformers": "5.2.0",
|
| 5 |
"transformers": "4.57.3",
|
|
@@ -10,5 +9,6 @@
|
|
| 10 |
"document": ""
|
| 11 |
},
|
| 12 |
"default_prompt_name": null,
|
| 13 |
-
"similarity_fn_name": "cosine"
|
|
|
|
| 14 |
}
|
|
|
|
| 1 |
{
|
|
|
|
| 2 |
"__version__": {
|
| 3 |
"sentence_transformers": "5.2.0",
|
| 4 |
"transformers": "4.57.3",
|
|
|
|
| 9 |
"document": ""
|
| 10 |
},
|
| 11 |
"default_prompt_name": null,
|
| 12 |
+
"similarity_fn_name": "cosine",
|
| 13 |
+
"model_type": "SentenceTransformer"
|
| 14 |
}
|