redis
/

langcache-embed-v3-mini-experimental

@@ -12,50 +12,8 @@ tags:
 - retrieval
 - reranking
 - generated_from_trainer
-- dataset_size:49642
 - loss:ArcFaceInBatchLoss
-base_model: sentence-transformers/all-MiniLM-L6-v2
-widget:
-- source_sentence: '"How much would I need to narrate a ""Let''s Play"" video in order
-    to make money from it on YouTube?"'
-  sentences:
-  - How much money do people make from YouTube videos with 1 million views?
-  - '"How much would I need to narrate a ""Let''s Play"" video in order to make money
-    from it on YouTube?"'
-  - '"Does the sentence, ""I expect to be disappointed,"" make sense?"'
-- source_sentence: '"I appreciate that.'
-  sentences:
-  - '"How is the Mariner rewarded in ""The Rime of the Ancient Mariner"" by Samuel
-    Taylor Coleridge?"'
-  - '"I appreciate that.'
-  - I can appreciate that.
-- source_sentence: '"""It is very easy to defeat someone, but too hard to win some
-    one"". What does the previous sentence mean?"'
-  sentences:
-  - '"How can you use the word ""visceral"" in a sentence?"'
-  - '"""It is very easy to defeat someone, but too hard to win some one"". What does
-    the previous sentence mean?"'
-  - '"What does ""The loudest one in the room is the weakest one in the room."" Mean?"'
-- source_sentence: '" We condemn this raid which is in our view illegal and morally
-    and politically unjustifiable , " London-based NCRI official Ali Safavi told Reuters
-    by telephone .'
-  sentences:
-  - 'London-based NCRI official Ali Safavi told Reuters : " We condemn this raid ,
-    which is in our view illegal and morally and politically unjustifiable . "'
-  - The social awkwardness is complicated by the fact that Marianne is a white girl
-    living with a black family .
-  - art's cause, this in my opinion
-- source_sentence: '"If you click ""like"" on an old post that someone made on your
-    wall yet you''re no longer Facebook friends, will they still receive a notification?"'
-  sentences:
-  - '"Is there is any two wheeler having a gear box which has the feature ""automatic
-    neutral"" when the engine is off while it is in gear?"'
-  - '"If you click ""like"" on an old post that someone made on your wall yet you''re
-    no longer Facebook friends, will they still receive a notification?"'
-  - '"If your teenage son posted ""La commedia e finita"" on his Facebook wall, would
-    you be concerned?"'
-datasets:
-- redis/langcache-sentencepairs-v2
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 metrics:
@@ -78,45 +36,44 @@ model-index:
       type: test
     metrics:
     - type: cosine_accuracy@1
-      value: 0.5763286334056399
       name: Cosine Accuracy@1
     - type: cosine_precision@1
-      value: 0.5763286334056399
       name: Cosine Precision@1
     - type: cosine_recall@1
-      value: 0.5589816867630893
       name: Cosine Recall@1
     - type: cosine_ndcg@10
-      value: 0.7619433934524245
       name: Cosine Ndcg@10
     - type: cosine_mrr@1
-      value: 0.5763286334056399
       name: Cosine Mrr@1
     - type: cosine_map@100
-      value: 0.7107811578738404
       name: Cosine Map@100
     - type: cosine_auc_precision_cache_hit_ratio
-      value: 0.3488530268041688
       name: Cosine Auc Precision Cache Hit Ratio
     - type: cosine_auc_similarity_distribution
-      value: 0.16348145891100385
       name: Cosine Auc Similarity Distribution
 ---
 # Redis fine-tuned BiEncoder model for semantic caching on LangCache
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on the [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for sentence pair similarity.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
-- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
-- **Maximum Sequence Length:** 256 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
-- **Training Dataset:**
-    - [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
 - **Language:** en
 - **License:** apache-2.0
@@ -130,7 +87,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [s
 ```
 SentenceTransformer(
-  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
   (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
   (2): Normalize()
 )
@@ -154,9 +111,9 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("redis/langcache-embed-v3-mini")
 # Run inference
 sentences = [
-    '"If you click ""like"" on an old post that someone made on your wall yet you\'re no longer Facebook friends, will they still receive a notification?"',
-    '"If you click ""like"" on an old post that someone made on your wall yet you\'re no longer Facebook friends, will they still receive a notification?"',
-    '"If your teenage son posted ""La commedia e finita"" on his Facebook wall, would you be concerned?"',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
@@ -165,9 +122,9 @@ print(embeddings.shape)
 # Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
-# tensor([[1.0000, 1.0000, 0.3655],
-#         [1.0000, 1.0000, 0.3655],
-#         [0.3655, 0.3655, 1.0000]])
 ```
 <!--
@@ -205,14 +162,14 @@ You can finetune this model on your own dataset.
 | Metric                               | Value      |
 |:-------------------------------------|:-----------|
-| cosine_accuracy@1                    | 0.5763     |
-| cosine_precision@1                   | 0.5763     |
-| cosine_recall@1                      | 0.559      |
-| **cosine_ndcg@10**                   | **0.7619** |
-| cosine_mrr@1                         | 0.5763     |
-| cosine_map@100                       | 0.7108     |
-| cosine_auc_precision_cache_hit_ratio | 0.3489     |
-| cosine_auc_similarity_distribution   | 0.1635     |
 <!--
 ## Bias, Risks and Limitations
@@ -228,76 +185,19 @@ You can finetune this model on your own dataset.
 ## Training Details
-### Training Dataset
-#### LangCache Sentence Pairs (all)
-* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
-* Size: 132,354 training samples
-* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
-* Approximate statistics based on the first 1000 samples:
-  |         | anchor                                                                             | positive                                                                           | negative                                                                          |
-  |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
-  | type    | string                                                                             | string                                                                             | string                                                                            |
-  | details | <ul><li>min: 4 tokens</li><li>mean: 27.17 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 26.61 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 19.39 tokens</li><li>max: 64 tokens</li></ul> |
-* Samples:
-  | anchor                                                                                        | positive                                                                                      | negative                                                                                       |
-  |:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|
-  | <code> What high potential jobs are there other than computer science?</code>                 | <code> What high potential jobs are there other than computer science?</code>                 | <code>Why IT or Computer Science jobs are being over rated than other Engineering jobs?</code> |
-  | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code>Should India buy the Russian S400 air defence missile system?</code>                     |
-  | <code> water from the faucet is being drunk by a yellow dog</code>                            | <code>A yellow dog is drinking water from the faucet</code>                                   | <code>Childlessness is low in Eastern European countries.</code>                               |
-* Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
-  ```json
-  {
-      "scale": 20.0,
-      "similarity_fct": "cos_sim",
-      "gather_across_devices": false
-  }
-  ```
-### Evaluation Dataset
-#### LangCache Sentence Pairs (all)
-* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
-* Size: 132,354 evaluation samples
-* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
-* Approximate statistics based on the first 1000 samples:
-  |         | anchor                                                                             | positive                                                                           | negative                                                                          |
-  |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
-  | type    | string                                                                             | string                                                                             | string                                                                            |
-  | details | <ul><li>min: 4 tokens</li><li>mean: 27.17 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 26.61 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 19.39 tokens</li><li>max: 64 tokens</li></ul> |
-* Samples:
-  | anchor                                                                                        | positive                                                                                      | negative                                                                                       |
-  |:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|
-  | <code> What high potential jobs are there other than computer science?</code>                 | <code> What high potential jobs are there other than computer science?</code>                 | <code>Why IT or Computer Science jobs are being over rated than other Engineering jobs?</code> |
-  | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code>Should India buy the Russian S400 air defence missile system?</code>                     |
-  | <code> water from the faucet is being drunk by a yellow dog</code>                            | <code>A yellow dog is drinking water from the faucet</code>                                   | <code>Childlessness is low in Eastern European countries.</code>                               |
-* Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
-  ```json
-  {
-      "scale": 20.0,
-      "similarity_fct": "cos_sim",
-      "gather_across_devices": false
-  }
-  ```
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
-- `per_device_train_batch_size`: 1152
-- `per_device_eval_batch_size`: 1152
 - `weight_decay`: 0.001
 - `adam_beta2`: 0.98
 - `adam_epsilon`: 1e-06
-- `num_train_epochs`: 2
 - `warmup_ratio`: 0.05
 - `bf16`: True
-- `dataloader_num_workers`: 4
-- `dataloader_prefetch_factor`: 2
 - `load_best_model_at_end`: True
-- `optim`: stable_adamw
 - `ddp_find_unused_parameters`: False
 - `push_to_hub`: True
 - `hub_model_id`: redis/langcache-embed-v3-mini
@@ -311,8 +211,8 @@ You can finetune this model on your own dataset.
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
-- `per_device_train_batch_size`: 1152
-- `per_device_eval_batch_size`: 1152
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
@@ -324,8 +224,8 @@ You can finetune this model on your own dataset.
 - `adam_beta2`: 0.98
 - `adam_epsilon`: 1e-06
 - `max_grad_norm`: 1.0
-- `num_train_epochs`: 2
-- `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
 - `warmup_ratio`: 0.05
@@ -357,8 +257,8 @@ You can finetune this model on your own dataset.
 - `tpu_metrics_debug`: False
 - `debug`: []
 - `dataloader_drop_last`: False
-- `dataloader_num_workers`: 4
-- `dataloader_prefetch_factor`: 2
 - `past_index`: -1
 - `disable_tqdm`: False
 - `remove_unused_columns`: True
@@ -373,7 +273,7 @@ You can finetune this model on your own dataset.
 - `parallelism_config`: None
 - `deepspeed`: None
 - `label_smoothing_factor`: 0.0
-- `optim`: stable_adamw
 - `optim_args`: None
 - `adafactor`: False
 - `group_by_length`: False
@@ -430,9 +330,9 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch | Step | Validation Loss | test_cosine_ndcg@10 |
-|:-----:|:----:|:---------------:|:-------------------:|
-| 0     | 0    | 0.6981          | 0.7619              |
 ### Framework Versions

 - retrieval
 - reranking
 - generated_from_trainer
 - loss:ArcFaceInBatchLoss
+base_model: thenlper/gte-small
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 metrics:
       type: test
     metrics:
     - type: cosine_accuracy@1
+      value: 0.548650317572336
       name: Cosine Accuracy@1
     - type: cosine_precision@1
+      value: 0.548650317572336
       name: Cosine Precision@1
     - type: cosine_recall@1
+      value: 0.529780177773297
       name: Cosine Recall@1
     - type: cosine_ndcg@10
+      value: 0.7467559051152127
       name: Cosine Ndcg@10
     - type: cosine_mrr@1
+      value: 0.548650317572336
       name: Cosine Mrr@1
     - type: cosine_map@100
+      value: 0.691192638604471
       name: Cosine Map@100
     - type: cosine_auc_precision_cache_hit_ratio
+      value: 0.31983377806645374
       name: Cosine Auc Precision Cache Hit Ratio
     - type: cosine_auc_similarity_distribution
+      value: 0.15293509382911363
       name: Cosine Auc Similarity Distribution
 ---
 # Redis fine-tuned BiEncoder model for semantic caching on LangCache
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [thenlper/gte-small](https://huggingface.co/thenlper/gte-small). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for sentence pair similarity.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+- **Base model:** [thenlper/gte-small](https://huggingface.co/thenlper/gte-small) <!-- at revision 17e1f347d17fe144873b1201da91788898c639cd -->
+- **Maximum Sequence Length:** 64 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
 - **Language:** en
 - **License:** apache-2.0
 ```
 SentenceTransformer(
+  (0): Transformer({'max_seq_length': 64, 'do_lower_case': False, 'architecture': 'BertModel'})
   (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
   (2): Normalize()
 )
 model = SentenceTransformer("redis/langcache-embed-v3-mini")
 # Run inference
 sentences = [
+    'The weather is lovely today.',
+    "It's so sunny outside!",
+    'He drove to the stadium.',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 # Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
+# tensor([[0.9999, 0.9036, 0.7702],
+#         [0.9036, 1.0000, 0.7837],
+#         [0.7702, 0.7837, 1.0000]])
 ```
 <!--
 | Metric                               | Value      |
 |:-------------------------------------|:-----------|
+| cosine_accuracy@1                    | 0.5487     |
+| cosine_precision@1                   | 0.5487     |
+| cosine_recall@1                      | 0.5298     |
+| **cosine_ndcg@10**                   | **0.7468** |
+| cosine_mrr@1                         | 0.5487     |
+| cosine_map@100                       | 0.6912     |
+| cosine_auc_precision_cache_hit_ratio | 0.3198     |
+| cosine_auc_similarity_distribution   | 0.1529     |
 <!--
 ## Bias, Risks and Limitations
 ## Training Details
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
+- `per_device_train_batch_size`: 512
+- `per_device_eval_batch_size`: 512
 - `weight_decay`: 0.001
 - `adam_beta2`: 0.98
 - `adam_epsilon`: 1e-06
+- `max_steps`: 100000
 - `warmup_ratio`: 0.05
 - `bf16`: True
 - `load_best_model_at_end`: True
 - `ddp_find_unused_parameters`: False
 - `push_to_hub`: True
 - `hub_model_id`: redis/langcache-embed-v3-mini
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
+- `per_device_train_batch_size`: 512
+- `per_device_eval_batch_size`: 512
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `adam_beta2`: 0.98
 - `adam_epsilon`: 1e-06
 - `max_grad_norm`: 1.0
+- `num_train_epochs`: 3.0
+- `max_steps`: 100000
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
 - `warmup_ratio`: 0.05
 - `tpu_metrics_debug`: False
 - `debug`: []
 - `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
 - `past_index`: -1
 - `disable_tqdm`: False
 - `remove_unused_columns`: True
 - `parallelism_config`: None
 - `deepspeed`: None
 - `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch_fused
 - `optim_args`: None
 - `adafactor`: False
 - `group_by_length`: False
 </details>
 ### Training Logs
+| Epoch | Step | test_cosine_ndcg@10 |
+|:-----:|:----:|:-------------------:|
+| 0     | 0    | 0.7468              |
 ### Framework Versions