radoslavralev commited on
Commit
280e0c4
·
verified ·
1 Parent(s): 6bf9252

Add new SentenceTransformer model

Browse files
Files changed (1) hide show
  1. README.md +41 -141
README.md CHANGED
@@ -12,50 +12,8 @@ tags:
12
  - retrieval
13
  - reranking
14
  - generated_from_trainer
15
- - dataset_size:49642
16
  - loss:ArcFaceInBatchLoss
17
- base_model: sentence-transformers/all-MiniLM-L6-v2
18
- widget:
19
- - source_sentence: '"How much would I need to narrate a ""Let''s Play"" video in order
20
- to make money from it on YouTube?"'
21
- sentences:
22
- - How much money do people make from YouTube videos with 1 million views?
23
- - '"How much would I need to narrate a ""Let''s Play"" video in order to make money
24
- from it on YouTube?"'
25
- - '"Does the sentence, ""I expect to be disappointed,"" make sense?"'
26
- - source_sentence: '"I appreciate that.'
27
- sentences:
28
- - '"How is the Mariner rewarded in ""The Rime of the Ancient Mariner"" by Samuel
29
- Taylor Coleridge?"'
30
- - '"I appreciate that.'
31
- - I can appreciate that.
32
- - source_sentence: '"""It is very easy to defeat someone, but too hard to win some
33
- one"". What does the previous sentence mean?"'
34
- sentences:
35
- - '"How can you use the word ""visceral"" in a sentence?"'
36
- - '"""It is very easy to defeat someone, but too hard to win some one"". What does
37
- the previous sentence mean?"'
38
- - '"What does ""The loudest one in the room is the weakest one in the room."" Mean?"'
39
- - source_sentence: '" We condemn this raid which is in our view illegal and morally
40
- and politically unjustifiable , " London-based NCRI official Ali Safavi told Reuters
41
- by telephone .'
42
- sentences:
43
- - 'London-based NCRI official Ali Safavi told Reuters : " We condemn this raid ,
44
- which is in our view illegal and morally and politically unjustifiable . "'
45
- - The social awkwardness is complicated by the fact that Marianne is a white girl
46
- living with a black family .
47
- - art's cause, this in my opinion
48
- - source_sentence: '"If you click ""like"" on an old post that someone made on your
49
- wall yet you''re no longer Facebook friends, will they still receive a notification?"'
50
- sentences:
51
- - '"Is there is any two wheeler having a gear box which has the feature ""automatic
52
- neutral"" when the engine is off while it is in gear?"'
53
- - '"If you click ""like"" on an old post that someone made on your wall yet you''re
54
- no longer Facebook friends, will they still receive a notification?"'
55
- - '"If your teenage son posted ""La commedia e finita"" on his Facebook wall, would
56
- you be concerned?"'
57
- datasets:
58
- - redis/langcache-sentencepairs-v2
59
  pipeline_tag: sentence-similarity
60
  library_name: sentence-transformers
61
  metrics:
@@ -78,45 +36,44 @@ model-index:
78
  type: test
79
  metrics:
80
  - type: cosine_accuracy@1
81
- value: 0.5763286334056399
82
  name: Cosine Accuracy@1
83
  - type: cosine_precision@1
84
- value: 0.5763286334056399
85
  name: Cosine Precision@1
86
  - type: cosine_recall@1
87
- value: 0.5589816867630893
88
  name: Cosine Recall@1
89
  - type: cosine_ndcg@10
90
- value: 0.7619433934524245
91
  name: Cosine Ndcg@10
92
  - type: cosine_mrr@1
93
- value: 0.5763286334056399
94
  name: Cosine Mrr@1
95
  - type: cosine_map@100
96
- value: 0.7107811578738404
97
  name: Cosine Map@100
98
  - type: cosine_auc_precision_cache_hit_ratio
99
- value: 0.3488530268041688
100
  name: Cosine Auc Precision Cache Hit Ratio
101
  - type: cosine_auc_similarity_distribution
102
- value: 0.16348145891100385
103
  name: Cosine Auc Similarity Distribution
104
  ---
105
 
106
  # Redis fine-tuned BiEncoder model for semantic caching on LangCache
107
 
108
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on the [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for sentence pair similarity.
109
 
110
  ## Model Details
111
 
112
  ### Model Description
113
  - **Model Type:** Sentence Transformer
114
- - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
115
- - **Maximum Sequence Length:** 256 tokens
116
  - **Output Dimensionality:** 384 dimensions
117
  - **Similarity Function:** Cosine Similarity
118
- - **Training Dataset:**
119
- - [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
120
  - **Language:** en
121
  - **License:** apache-2.0
122
 
@@ -130,7 +87,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [s
130
 
131
  ```
132
  SentenceTransformer(
133
- (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
134
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
135
  (2): Normalize()
136
  )
@@ -154,9 +111,9 @@ from sentence_transformers import SentenceTransformer
154
  model = SentenceTransformer("redis/langcache-embed-v3-mini")
155
  # Run inference
156
  sentences = [
157
- '"If you click ""like"" on an old post that someone made on your wall yet you\'re no longer Facebook friends, will they still receive a notification?"',
158
- '"If you click ""like"" on an old post that someone made on your wall yet you\'re no longer Facebook friends, will they still receive a notification?"',
159
- '"If your teenage son posted ""La commedia e finita"" on his Facebook wall, would you be concerned?"',
160
  ]
161
  embeddings = model.encode(sentences)
162
  print(embeddings.shape)
@@ -165,9 +122,9 @@ print(embeddings.shape)
165
  # Get the similarity scores for the embeddings
166
  similarities = model.similarity(embeddings, embeddings)
167
  print(similarities)
168
- # tensor([[1.0000, 1.0000, 0.3655],
169
- # [1.0000, 1.0000, 0.3655],
170
- # [0.3655, 0.3655, 1.0000]])
171
  ```
172
 
173
  <!--
@@ -205,14 +162,14 @@ You can finetune this model on your own dataset.
205
 
206
  | Metric | Value |
207
  |:-------------------------------------|:-----------|
208
- | cosine_accuracy@1 | 0.5763 |
209
- | cosine_precision@1 | 0.5763 |
210
- | cosine_recall@1 | 0.559 |
211
- | **cosine_ndcg@10** | **0.7619** |
212
- | cosine_mrr@1 | 0.5763 |
213
- | cosine_map@100 | 0.7108 |
214
- | cosine_auc_precision_cache_hit_ratio | 0.3489 |
215
- | cosine_auc_similarity_distribution | 0.1635 |
216
 
217
  <!--
218
  ## Bias, Risks and Limitations
@@ -228,76 +185,19 @@ You can finetune this model on your own dataset.
228
 
229
  ## Training Details
230
 
231
- ### Training Dataset
232
-
233
- #### LangCache Sentence Pairs (all)
234
-
235
- * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
236
- * Size: 132,354 training samples
237
- * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
238
- * Approximate statistics based on the first 1000 samples:
239
- | | anchor | positive | negative |
240
- |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
241
- | type | string | string | string |
242
- | details | <ul><li>min: 4 tokens</li><li>mean: 27.17 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 26.61 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 19.39 tokens</li><li>max: 64 tokens</li></ul> |
243
- * Samples:
244
- | anchor | positive | negative |
245
- |:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|
246
- | <code> What high potential jobs are there other than computer science?</code> | <code> What high potential jobs are there other than computer science?</code> | <code>Why IT or Computer Science jobs are being over rated than other Engineering jobs?</code> |
247
- | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code>Should India buy the Russian S400 air defence missile system?</code> |
248
- | <code> water from the faucet is being drunk by a yellow dog</code> | <code>A yellow dog is drinking water from the faucet</code> | <code>Childlessness is low in Eastern European countries.</code> |
249
- * Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
250
- ```json
251
- {
252
- "scale": 20.0,
253
- "similarity_fct": "cos_sim",
254
- "gather_across_devices": false
255
- }
256
- ```
257
-
258
- ### Evaluation Dataset
259
-
260
- #### LangCache Sentence Pairs (all)
261
-
262
- * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
263
- * Size: 132,354 evaluation samples
264
- * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
265
- * Approximate statistics based on the first 1000 samples:
266
- | | anchor | positive | negative |
267
- |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
268
- | type | string | string | string |
269
- | details | <ul><li>min: 4 tokens</li><li>mean: 27.17 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 26.61 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 19.39 tokens</li><li>max: 64 tokens</li></ul> |
270
- * Samples:
271
- | anchor | positive | negative |
272
- |:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|
273
- | <code> What high potential jobs are there other than computer science?</code> | <code> What high potential jobs are there other than computer science?</code> | <code>Why IT or Computer Science jobs are being over rated than other Engineering jobs?</code> |
274
- | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code>Should India buy the Russian S400 air defence missile system?</code> |
275
- | <code> water from the faucet is being drunk by a yellow dog</code> | <code>A yellow dog is drinking water from the faucet</code> | <code>Childlessness is low in Eastern European countries.</code> |
276
- * Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
277
- ```json
278
- {
279
- "scale": 20.0,
280
- "similarity_fct": "cos_sim",
281
- "gather_across_devices": false
282
- }
283
- ```
284
-
285
  ### Training Hyperparameters
286
  #### Non-Default Hyperparameters
287
 
288
  - `eval_strategy`: steps
289
- - `per_device_train_batch_size`: 1152
290
- - `per_device_eval_batch_size`: 1152
291
  - `weight_decay`: 0.001
292
  - `adam_beta2`: 0.98
293
  - `adam_epsilon`: 1e-06
294
- - `num_train_epochs`: 2
295
  - `warmup_ratio`: 0.05
296
  - `bf16`: True
297
- - `dataloader_num_workers`: 4
298
- - `dataloader_prefetch_factor`: 2
299
  - `load_best_model_at_end`: True
300
- - `optim`: stable_adamw
301
  - `ddp_find_unused_parameters`: False
302
  - `push_to_hub`: True
303
  - `hub_model_id`: redis/langcache-embed-v3-mini
@@ -311,8 +211,8 @@ You can finetune this model on your own dataset.
311
  - `do_predict`: False
312
  - `eval_strategy`: steps
313
  - `prediction_loss_only`: True
314
- - `per_device_train_batch_size`: 1152
315
- - `per_device_eval_batch_size`: 1152
316
  - `per_gpu_train_batch_size`: None
317
  - `per_gpu_eval_batch_size`: None
318
  - `gradient_accumulation_steps`: 1
@@ -324,8 +224,8 @@ You can finetune this model on your own dataset.
324
  - `adam_beta2`: 0.98
325
  - `adam_epsilon`: 1e-06
326
  - `max_grad_norm`: 1.0
327
- - `num_train_epochs`: 2
328
- - `max_steps`: -1
329
  - `lr_scheduler_type`: linear
330
  - `lr_scheduler_kwargs`: {}
331
  - `warmup_ratio`: 0.05
@@ -357,8 +257,8 @@ You can finetune this model on your own dataset.
357
  - `tpu_metrics_debug`: False
358
  - `debug`: []
359
  - `dataloader_drop_last`: False
360
- - `dataloader_num_workers`: 4
361
- - `dataloader_prefetch_factor`: 2
362
  - `past_index`: -1
363
  - `disable_tqdm`: False
364
  - `remove_unused_columns`: True
@@ -373,7 +273,7 @@ You can finetune this model on your own dataset.
373
  - `parallelism_config`: None
374
  - `deepspeed`: None
375
  - `label_smoothing_factor`: 0.0
376
- - `optim`: stable_adamw
377
  - `optim_args`: None
378
  - `adafactor`: False
379
  - `group_by_length`: False
@@ -430,9 +330,9 @@ You can finetune this model on your own dataset.
430
  </details>
431
 
432
  ### Training Logs
433
- | Epoch | Step | Validation Loss | test_cosine_ndcg@10 |
434
- |:-----:|:----:|:---------------:|:-------------------:|
435
- | 0 | 0 | 0.6981 | 0.7619 |
436
 
437
 
438
  ### Framework Versions
 
12
  - retrieval
13
  - reranking
14
  - generated_from_trainer
 
15
  - loss:ArcFaceInBatchLoss
16
+ base_model: thenlper/gte-small
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  pipeline_tag: sentence-similarity
18
  library_name: sentence-transformers
19
  metrics:
 
36
  type: test
37
  metrics:
38
  - type: cosine_accuracy@1
39
+ value: 0.548650317572336
40
  name: Cosine Accuracy@1
41
  - type: cosine_precision@1
42
+ value: 0.548650317572336
43
  name: Cosine Precision@1
44
  - type: cosine_recall@1
45
+ value: 0.529780177773297
46
  name: Cosine Recall@1
47
  - type: cosine_ndcg@10
48
+ value: 0.7467559051152127
49
  name: Cosine Ndcg@10
50
  - type: cosine_mrr@1
51
+ value: 0.548650317572336
52
  name: Cosine Mrr@1
53
  - type: cosine_map@100
54
+ value: 0.691192638604471
55
  name: Cosine Map@100
56
  - type: cosine_auc_precision_cache_hit_ratio
57
+ value: 0.31983377806645374
58
  name: Cosine Auc Precision Cache Hit Ratio
59
  - type: cosine_auc_similarity_distribution
60
+ value: 0.15293509382911363
61
  name: Cosine Auc Similarity Distribution
62
  ---
63
 
64
  # Redis fine-tuned BiEncoder model for semantic caching on LangCache
65
 
66
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [thenlper/gte-small](https://huggingface.co/thenlper/gte-small). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for sentence pair similarity.
67
 
68
  ## Model Details
69
 
70
  ### Model Description
71
  - **Model Type:** Sentence Transformer
72
+ - **Base model:** [thenlper/gte-small](https://huggingface.co/thenlper/gte-small) <!-- at revision 17e1f347d17fe144873b1201da91788898c639cd -->
73
+ - **Maximum Sequence Length:** 64 tokens
74
  - **Output Dimensionality:** 384 dimensions
75
  - **Similarity Function:** Cosine Similarity
76
+ <!-- - **Training Dataset:** Unknown -->
 
77
  - **Language:** en
78
  - **License:** apache-2.0
79
 
 
87
 
88
  ```
89
  SentenceTransformer(
90
+ (0): Transformer({'max_seq_length': 64, 'do_lower_case': False, 'architecture': 'BertModel'})
91
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
92
  (2): Normalize()
93
  )
 
111
  model = SentenceTransformer("redis/langcache-embed-v3-mini")
112
  # Run inference
113
  sentences = [
114
+ 'The weather is lovely today.',
115
+ "It's so sunny outside!",
116
+ 'He drove to the stadium.',
117
  ]
118
  embeddings = model.encode(sentences)
119
  print(embeddings.shape)
 
122
  # Get the similarity scores for the embeddings
123
  similarities = model.similarity(embeddings, embeddings)
124
  print(similarities)
125
+ # tensor([[0.9999, 0.9036, 0.7702],
126
+ # [0.9036, 1.0000, 0.7837],
127
+ # [0.7702, 0.7837, 1.0000]])
128
  ```
129
 
130
  <!--
 
162
 
163
  | Metric | Value |
164
  |:-------------------------------------|:-----------|
165
+ | cosine_accuracy@1 | 0.5487 |
166
+ | cosine_precision@1 | 0.5487 |
167
+ | cosine_recall@1 | 0.5298 |
168
+ | **cosine_ndcg@10** | **0.7468** |
169
+ | cosine_mrr@1 | 0.5487 |
170
+ | cosine_map@100 | 0.6912 |
171
+ | cosine_auc_precision_cache_hit_ratio | 0.3198 |
172
+ | cosine_auc_similarity_distribution | 0.1529 |
173
 
174
  <!--
175
  ## Bias, Risks and Limitations
 
185
 
186
  ## Training Details
187
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
188
  ### Training Hyperparameters
189
  #### Non-Default Hyperparameters
190
 
191
  - `eval_strategy`: steps
192
+ - `per_device_train_batch_size`: 512
193
+ - `per_device_eval_batch_size`: 512
194
  - `weight_decay`: 0.001
195
  - `adam_beta2`: 0.98
196
  - `adam_epsilon`: 1e-06
197
+ - `max_steps`: 100000
198
  - `warmup_ratio`: 0.05
199
  - `bf16`: True
 
 
200
  - `load_best_model_at_end`: True
 
201
  - `ddp_find_unused_parameters`: False
202
  - `push_to_hub`: True
203
  - `hub_model_id`: redis/langcache-embed-v3-mini
 
211
  - `do_predict`: False
212
  - `eval_strategy`: steps
213
  - `prediction_loss_only`: True
214
+ - `per_device_train_batch_size`: 512
215
+ - `per_device_eval_batch_size`: 512
216
  - `per_gpu_train_batch_size`: None
217
  - `per_gpu_eval_batch_size`: None
218
  - `gradient_accumulation_steps`: 1
 
224
  - `adam_beta2`: 0.98
225
  - `adam_epsilon`: 1e-06
226
  - `max_grad_norm`: 1.0
227
+ - `num_train_epochs`: 3.0
228
+ - `max_steps`: 100000
229
  - `lr_scheduler_type`: linear
230
  - `lr_scheduler_kwargs`: {}
231
  - `warmup_ratio`: 0.05
 
257
  - `tpu_metrics_debug`: False
258
  - `debug`: []
259
  - `dataloader_drop_last`: False
260
+ - `dataloader_num_workers`: 0
261
+ - `dataloader_prefetch_factor`: None
262
  - `past_index`: -1
263
  - `disable_tqdm`: False
264
  - `remove_unused_columns`: True
 
273
  - `parallelism_config`: None
274
  - `deepspeed`: None
275
  - `label_smoothing_factor`: 0.0
276
+ - `optim`: adamw_torch_fused
277
  - `optim_args`: None
278
  - `adafactor`: False
279
  - `group_by_length`: False
 
330
  </details>
331
 
332
  ### Training Logs
333
+ | Epoch | Step | test_cosine_ndcg@10 |
334
+ |:-----:|:----:|:-------------------:|
335
+ | 0 | 0 | 0.7468 |
336
 
337
 
338
  ### Framework Versions