abkimc
/

distilroberta-base-sentence-transformer

@@ -5,50 +5,56 @@ tags:
 - feature-extraction
 - dense
 - generated_from_trainer
-- dataset_size:50881
-- loss:TripletLoss
-- dataset_size:508
-- dataset_size:1017
-base_model: distilbert/distilroberta-base
 widget:
-- source_sentence: What time is good for gym workout? Morning or evening?
   sentences:
-  - What should I eat in the morning if I workout in the afternoon?
-  - What are your views on The Mummy trailer?
-  - Which is the best time to workout, morning or evening?
-- source_sentence: What is the best way to make money make more money?
   sentences:
-  - What's the best way to make fast cash?
-  - How can I make money from CashParking?
-  - Why can’t an airplane just fly into space?
-- source_sentence: What is the best way to learn film making on my own?
   sentences:
-  - How do I learn film making on my own?
-  - Is it healthy to eat bread every day?
-  - What does a filmmaker need to learn?
-- source_sentence: What is love? How can we find that we are in love?
   sentences:
-  - What is the exact meaning of love?
-  - What does love mean to a woman?
-  - How do you raise self confidence?
-- source_sentence: Which is your favorite hangout place in Pune?
   sentences:
-  - What are the best places to hangout in the weekend in Pune?
-  - How will you come to know that you are in love?
-  - What are the best places to hangout in the weekend in Mumbai?
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
-# SentenceTransformer based on distilbert/distilroberta-base
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [distilbert/distilroberta-base](https://huggingface.co/distilbert/distilroberta-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
-- **Base model:** [distilbert/distilroberta-base](https://huggingface.co/distilbert/distilroberta-base) <!-- at revision fb53ab8802853c8e4fbdbcd0529f21fc6f459b2b -->
 - **Maximum Sequence Length:** 512 tokens
 - **Output Dimensionality:** 768 dimensions
 - **Similarity Function:** Cosine Similarity
@@ -89,9 +95,9 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("abkimc/distilroberta-base-sentence-transformer")
 # Run inference
 sentences = [
-    'Which is your favorite hangout place in Pune?',
-    'What are the best places to hangout in the weekend in Pune?',
-    'What are the best places to hangout in the weekend in Mumbai?',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
@@ -100,9 +106,9 @@ print(embeddings.shape)
 # Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
-# tensor([[1.0000, 0.9998, 0.9997],
-#         [0.9998, 1.0000, 1.0000],
-#         [0.9997, 1.0000, 1.0000]])
 ```
 <!--
@@ -147,33 +153,34 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 1,017 training samples
-* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence_0                                                                        | sentence_1                                                                       | sentence_2                                                                        |
-  |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
-  | type    | string                                                                            | string                                                                           | string                                                                            |
-  | details | <ul><li>min: 6 tokens</li><li>mean: 13.72 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 13.5 tokens</li><li>max: 44 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.55 tokens</li><li>max: 62 tokens</li></ul> |
 * Samples:
-  | sentence_0                                                                                                    | sentence_1                                                                 | sentence_2                                                                                           |
-  |:--------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------|
-  | <code>How can I gain weight naturally?</code>                                                                 | <code>What is the best weight gain treatment for gaining weight?</code>    | <code>Which is the best weight gainer in india?</code>                                               |
-  | <code>Who won the September 26, 2016 presidential debate?</code>                                              | <code>Who won the 09/26/16 debate? Does it matter?</code>                  | <code>Who was more effective in the October 3rd 2012 presidential debate? Who won the debate?</code> |
-  | <code>What programming languages are used in video consoles like the PS4 or Xbox One to develop games?</code> | <code>What are the programming languages dev uses on Console games?</code> | <code>What language were NES games originally programmed in?</code>                                  |
-* Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
   ```json
   {
-      "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
-      "triplet_margin": 5
   }
   ```
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
-- `per_device_train_batch_size`: 128
-- `per_device_eval_batch_size`: 128
-- `num_train_epochs`: 80
 - `multi_dataset_batch_sampler`: round_robin
 #### All Hyperparameters
@@ -183,8 +190,8 @@ You can finetune this model on your own dataset.
 - `do_predict`: False
 - `eval_strategy`: no
 - `prediction_loss_only`: True
-- `per_device_train_batch_size`: 128
-- `per_device_eval_batch_size`: 128
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
@@ -196,7 +203,7 @@ You can finetune this model on your own dataset.
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1
-- `num_train_epochs`: 80
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
@@ -300,9 +307,64 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch | Step | Training Loss |
-|:-----:|:----:|:-------------:|
-| 62.5  | 500  | 4.9905        |
 ### Framework Versions
@@ -331,15 +393,15 @@ You can finetune this model on your own dataset.
 }
 ```
-#### TripletLoss
 ```bibtex
-@misc{hermans2017defense,
-    title={In Defense of the Triplet Loss for Person Re-Identification},
-    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
     year={2017},
-    eprint={1703.07737},
     archivePrefix={arXiv},
-    primaryClass={cs.CV}
 }
 ```

 - feature-extraction
 - dense
 - generated_from_trainer
+- dataset_size:180000
+- loss:MultipleNegativesRankingLoss
+base_model: abkimc/distilroberta-base-sentence-transformer
 widget:
+- source_sentence: Two autopsy reports for heat related deaths that took place in
+    July have been released.
   sentences:
+  - President Obama declares a major disaster in North Carolina
+  - Voters reject the leash law
+  - Two autopsy reports for heat related deaths released
+- source_sentence: Steel sector is expected to grow 6-9% in 2010 on higher demand
+    from the real estate, construction and automobile sectors, the finance ministry
+    said in a report on Thursday.
   sentences:
+  - Steel sector to grow 6-9% in 2010
+  - Bomb teams called in after bank robbery
+  - 2009 was record low in crimes for Wyandotte County
+- source_sentence: A suicide bombing in a Pakistani market close to the Afghan border
+    killed 16 people Friday, officials said, a day after the US released letters seized
+    from Osama bin Laden's compound that criticized Pakistani militants for killing
+    too many civilians.
   sentences:
+  - 'Ed Miliband: voters should pass verdict on ''catastrophic'' handling of economy'
+  - Second woman files sexual harassment lawsuit against Casey Affleck
+  - Suicide bombing in Pakistani market kills 16
+- source_sentence: HARLOW residents are being urged to enter the running to become
+    Essex ambassadors for the London 2012 Olympics.
   sentences:
+  - Activision announces Ferrari Challenge Trofeo Pirelli
+  - Harlow residents urged to become Essex ambassadors at London Olympics
+  - Chicago Cubs suspend Milton Bradley for rest of season
+- source_sentence: The HTC Legend has made its official debut in India days after
+    it was informally launched .
   sentences:
+  - Britain, Bill Gates join forces
+  - '``Large group'''' of men break into Shippensburg apartment'
+  - HTC Legend makes official debut in India
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
+# SentenceTransformer based on abkimc/distilroberta-base-sentence-transformer
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [abkimc/distilroberta-base-sentence-transformer](https://huggingface.co/abkimc/distilroberta-base-sentence-transformer). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+- **Base model:** [abkimc/distilroberta-base-sentence-transformer](https://huggingface.co/abkimc/distilroberta-base-sentence-transformer) <!-- at revision 78f76adc5086e39f5c1b2f7630eb4ca58975294c -->
 - **Maximum Sequence Length:** 512 tokens
 - **Output Dimensionality:** 768 dimensions
 - **Similarity Function:** Cosine Similarity
 model = SentenceTransformer("abkimc/distilroberta-base-sentence-transformer")
 # Run inference
 sentences = [
+    'The HTC Legend has made its official debut in India days after it was informally launched .',
+    'HTC Legend makes official debut in India',
+    'Britain, Bill Gates join forces',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 # Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
+# tensor([[ 1.0000,  0.9061, -0.0382],
+#         [ 0.9061,  1.0000, -0.0170],
+#         [-0.0382, -0.0170,  1.0000]])
 ```
 <!--
 #### Unnamed Dataset
+* Size: 180,000 training samples
+* Columns: <code>sentence_0</code> and <code>sentence_1</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                          | sentence_1                                                                        |
+  |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
+  | type    | string                                                                              | string                                                                            |
+  | details | <ul><li>min: 12 tokens</li><li>mean: 33.68 tokens</li><li>max: 293 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 10.98 tokens</li><li>max: 28 tokens</li></ul> |
 * Samples:
+  | sentence_0                                                                                                                                                                                                                          | sentence_1                                                             |
+  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------|
+  | <code>Content is the king in today's world of journalism and a newspaper cannot survive if it compromises on the quality of the content, said Abhilash Khandekar, Maharashtra state head of Dainik Bhaskar Group on Tuesday.</code> | <code>'Content is king in today's journalism'</code>                   |
+  | <code>Sammons Pensions has launched its ninth annual salary survey which aims to document remuneration packages across the industry.</code>                                                                                         | <code>Sammons launches ninth salary survey</code>                      |
+  | <code>The state of Tennessee saw a major spike in foreclosure filings in 2008, according to a report by the Tennessee Housing Development Agency.</code>                                                                            | <code>Tennessee sees major spike in foreclosure filings in 2008</code> |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim",
+      "gather_across_devices": false
   }
   ```
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
+- `num_train_epochs`: 10
 - `multi_dataset_batch_sampler`: round_robin
 #### All Hyperparameters
 - `do_predict`: False
 - `eval_strategy`: no
 - `prediction_loss_only`: True
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1
+- `num_train_epochs`: 10
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
 </details>
 ### Training Logs
+| Epoch  | Step  | Training Loss |
+|:------:|:-----:|:-------------:|
+| 0.1777 | 500   | 2.8662        |
+| 0.3555 | 1000  | 0.0631        |
+| 0.5332 | 1500  | 0.0149        |
+| 0.7110 | 2000  | 0.0097        |
+| 0.8887 | 2500  | 0.0079        |
+| 1.0665 | 3000  | 0.0062        |
+| 1.2442 | 3500  | 0.0041        |
+| 1.4220 | 4000  | 0.0037        |
+| 1.5997 | 4500  | 0.0038        |
+| 1.7775 | 5000  | 0.0034        |
+| 1.9552 | 5500  | 0.0038        |
+| 2.1330 | 6000  | 0.0021        |
+| 2.3107 | 6500  | 0.0015        |
+| 2.4884 | 7000  | 0.0016        |
+| 2.6662 | 7500  | 0.0015        |
+| 2.8439 | 8000  | 0.0018        |
+| 3.0217 | 8500  | 0.0015        |
+| 3.1994 | 9000  | 0.0013        |
+| 3.3772 | 9500  | 0.001         |
+| 3.5549 | 10000 | 0.0011        |
+| 3.7327 | 10500 | 0.0011        |
+| 3.9104 | 11000 | 0.0014        |
+| 4.0882 | 11500 | 0.0011        |
+| 4.2659 | 12000 | 0.0007        |
+| 4.4437 | 12500 | 0.0009        |
+| 4.6214 | 13000 | 0.0009        |
+| 4.7991 | 13500 | 0.0008        |
+| 4.9769 | 14000 | 0.0008        |
+| 5.1546 | 14500 | 0.0009        |
+| 5.3324 | 15000 | 0.0007        |
+| 5.5101 | 15500 | 0.0007        |
+| 5.6879 | 16000 | 0.0007        |
+| 5.8656 | 16500 | 0.0006        |
+| 6.0434 | 17000 | 0.0007        |
+| 6.2211 | 17500 | 0.0007        |
+| 6.3989 | 18000 | 0.0005        |
+| 6.5766 | 18500 | 0.0007        |
+| 6.7544 | 19000 | 0.0005        |
+| 6.9321 | 19500 | 0.0005        |
+| 7.1098 | 20000 | 0.0005        |
+| 7.2876 | 20500 | 0.0006        |
+| 7.4653 | 21000 | 0.0005        |
+| 7.6431 | 21500 | 0.0004        |
+| 7.8208 | 22000 | 0.0004        |
+| 7.9986 | 22500 | 0.0004        |
+| 8.1763 | 23000 | 0.0004        |
+| 8.3541 | 23500 | 0.0004        |
+| 8.5318 | 24000 | 0.0005        |
+| 8.7096 | 24500 | 0.0004        |
+| 8.8873 | 25000 | 0.0004        |
+| 9.0651 | 25500 | 0.0005        |
+| 9.2428 | 26000 | 0.0004        |
+| 9.4205 | 26500 | 0.0005        |
+| 9.5983 | 27000 | 0.0004        |
+| 9.7760 | 27500 | 0.0004        |
+| 9.9538 | 28000 | 0.0004        |
 ### Framework Versions
 }
 ```
+#### MultipleNegativesRankingLoss
 ```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
     year={2017},
+    eprint={1705.00652},
     archivePrefix={arXiv},
+    primaryClass={cs.CL}
 }
 ```

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bd663686f32e94076283726160b0ed24911c8fbaf3363380382203f2391728e7
 size 328485128

 version https://git-lfs.github.com/spec/v1
+oid sha256:13f4cb960b323182629b52c170ac1141db264209880af82405716db52241a638
 size 328485128

special_tokens_map.json CHANGED Viewed

@@ -1,7 +1,25 @@
 {
-  "bos_token": "<s>",
-  "cls_token": "<s>",
-  "eos_token": "</s>",
   "mask_token": {
     "content": "<mask>",
     "lstrip": true,
@@ -9,7 +27,25 @@
     "rstrip": false,
     "single_word": false
   },
-  "pad_token": "<pad>",
-  "sep_token": "</s>",
-  "unk_token": "<unk>"
 }

 {
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
   "mask_token": {
     "content": "<mask>",
     "lstrip": true,
     "rstrip": false,
     "single_word": false
   },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
 }

tokenizer_config.json CHANGED Viewed

@@ -49,10 +49,17 @@
   "errors": "replace",
   "extra_special_tokens": {},
   "mask_token": "<mask>",
   "model_max_length": 512,
   "pad_token": "<pad>",
   "sep_token": "</s>",
   "tokenizer_class": "RobertaTokenizer",
   "trim_offsets": true,
   "unk_token": "<unk>"
 }

   "errors": "replace",
   "extra_special_tokens": {},
   "mask_token": "<mask>",
+  "max_length": 512,
   "model_max_length": 512,
+  "pad_to_multiple_of": null,
   "pad_token": "<pad>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
   "sep_token": "</s>",
+  "stride": 0,
   "tokenizer_class": "RobertaTokenizer",
   "trim_offsets": true,
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
   "unk_token": "<unk>"
 }