deepa2810 commited on
Commit
38c25ba
·
verified ·
1 Parent(s): c31f835

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +385 -145
  2. model.safetensors +1 -1
  3. tokenizer.json +2 -4
README.md CHANGED
@@ -1,173 +1,413 @@
1
  ---
2
- language: en
3
- license: apache-2.0
4
- library_name: sentence-transformers
5
  tags:
6
  - sentence-transformers
7
- - feature-extraction
8
  - sentence-similarity
9
- - transformers
10
- datasets:
11
- - s2orc
12
- - flax-sentence-embeddings/stackexchange_xml
13
- - ms_marco
14
- - gooaq
15
- - yahoo_answers_topics
16
- - code_search_net
17
- - search_qa
18
- - eli5
19
- - snli
20
- - multi_nli
21
- - wikihow
22
- - natural_questions
23
- - trivia_qa
24
- - embedding-data/sentence-compression
25
- - embedding-data/flickr30k-captions
26
- - embedding-data/altlex
27
- - embedding-data/simple-wiki
28
- - embedding-data/QQP
29
- - embedding-data/SPECTER
30
- - embedding-data/PAQ_pairs
31
- - embedding-data/WikiAnswers
 
 
 
 
 
 
 
 
 
32
  pipeline_tag: sentence-similarity
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ---
34
 
 
 
 
35
 
36
- # all-MiniLM-L6-v2
37
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
38
 
39
- ## Usage (Sentence-Transformers)
40
- Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  pip install -U sentence-transformers
44
  ```
45
 
46
- Then you can use the model like this:
47
  ```python
48
  from sentence_transformers import SentenceTransformer
49
- sentences = ["This is an example sentence", "Each sentence is converted"]
50
 
51
- model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
 
 
 
 
 
 
 
52
  embeddings = model.encode(sentences)
53
- print(embeddings)
 
 
 
 
 
 
 
 
54
  ```
55
 
56
- ## Usage (HuggingFace Transformers)
57
- Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
58
-
59
- ```python
60
- from transformers import AutoTokenizer, AutoModel
61
- import torch
62
- import torch.nn.functional as F
63
 
64
- #Mean Pooling - Take attention mask into account for correct averaging
65
- def mean_pooling(model_output, attention_mask):
66
- token_embeddings = model_output[0] #First element of model_output contains all token embeddings
67
- input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
68
- return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
- # Sentences we want sentence embeddings for
72
- sentences = ['This is an example sentence', 'Each sentence is converted']
73
-
74
- # Load model from HuggingFace Hub
75
- tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
76
- model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
77
-
78
- # Tokenize sentences
79
- encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
80
 
81
- # Compute token embeddings
82
- with torch.no_grad():
83
- model_output = model(**encoded_input)
84
 
85
- # Perform pooling
86
- sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
87
 
88
- # Normalize embeddings
89
- sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
90
 
91
- print("Sentence embeddings:")
92
- print(sentence_embeddings)
93
- ```
94
 
95
- ------
96
-
97
- ## Background
98
-
99
- The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
100
- contrastive learning objective. We used the pretrained [`nreimers/MiniLM-L6-H384-uncased`](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) model and fine-tuned in on a
101
- 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset.
102
-
103
- We developed this model during the
104
- [Community week using JAX/Flax for NLP & CV](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104),
105
- organized by Hugging Face. We developed this model as part of the project:
106
- [Train the Best Sentence Embedding Model Ever with 1B Training Pairs](https://discuss.huggingface.co/t/train-the-best-sentence-embedding-model-ever-with-1b-training-pairs/7354). We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks.
107
-
108
- ## Intended uses
109
-
110
- Our model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures
111
- the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks.
112
-
113
- By default, input text longer than 256 word pieces is truncated.
114
-
115
-
116
- ## Training procedure
117
-
118
- ### Pre-training
119
-
120
- We use the pretrained [`nreimers/MiniLM-L6-H384-uncased`](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) model. Please refer to the model card for more detailed information about the pre-training procedure.
121
-
122
- ### Fine-tuning
123
-
124
- We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch.
125
- We then apply the cross entropy loss by comparing with true pairs.
126
-
127
- #### Hyper parameters
128
-
129
- We trained our model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core).
130
- We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
131
- a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
132
-
133
- #### Training data
134
-
135
- We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences.
136
- We sampled each dataset given a weighted probability which configuration is detailed in the `data_config.json` file.
137
-
138
-
139
- | Dataset | Paper | Number of training tuples |
140
- |--------------------------------------------------------|:----------------------------------------:|:--------------------------:|
141
- | [Reddit comments (2015-2018)](https://github.com/PolyAI-LDN/conversational-datasets/tree/master/reddit) | [paper](https://arxiv.org/abs/1904.06472) | 726,484,430 |
142
- | [S2ORC](https://github.com/allenai/s2orc) Citation pairs (Abstracts) | [paper](https://aclanthology.org/2020.acl-main.447/) | 116,288,806 |
143
- | [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs | [paper](https://doi.org/10.1145/2623330.2623677) | 77,427,422 |
144
- | [PAQ](https://github.com/facebookresearch/PAQ) (Question, Answer) pairs | [paper](https://arxiv.org/abs/2102.07033) | 64,371,441 |
145
- | [S2ORC](https://github.com/allenai/s2orc) Citation pairs (Titles) | [paper](https://aclanthology.org/2020.acl-main.447/) | 52,603,982 |
146
- | [S2ORC](https://github.com/allenai/s2orc) (Title, Abstract) | [paper](https://aclanthology.org/2020.acl-main.447/) | 41,769,185 |
147
- | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title, Body) pairs | - | 25,316,456 |
148
- | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title+Body, Answer) pairs | - | 21,396,559 |
149
- | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title, Answer) pairs | - | 21,396,559 |
150
- | [MS MARCO](https://microsoft.github.io/msmarco/) triplets | [paper](https://doi.org/10.1145/3404835.3462804) | 9,144,553 |
151
- | [GOOAQ: Open Question Answering with Diverse Answer Types](https://github.com/allenai/gooaq) | [paper](https://arxiv.org/pdf/2104.08727.pdf) | 3,012,496 |
152
- | [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Title, Answer) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 1,198,260 |
153
- | [Code Search](https://huggingface.co/datasets/code_search_net) | - | 1,151,414 |
154
- | [COCO](https://cocodataset.org/#home) Image captions | [paper](https://link.springer.com/chapter/10.1007%2F978-3-319-10602-1_48) | 828,395|
155
- | [SPECTER](https://github.com/allenai/specter) citation triplets | [paper](https://doi.org/10.18653/v1/2020.acl-main.207) | 684,100 |
156
- | [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Question, Answer) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 681,164 |
157
- | [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Title, Question) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 659,896 |
158
- | [SearchQA](https://huggingface.co/datasets/search_qa) | [paper](https://arxiv.org/abs/1704.05179) | 582,261 |
159
- | [Eli5](https://huggingface.co/datasets/eli5) | [paper](https://doi.org/10.18653/v1/p19-1346) | 325,475 |
160
- | [Flickr 30k](https://shannon.cs.illinois.edu/DenotationGraph/) | [paper](https://transacl.org/ojs/index.php/tacl/article/view/229/33) | 317,695 |
161
- | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (titles) | | 304,525 |
162
- | AllNLI ([SNLI](https://nlp.stanford.edu/projects/snli/) and [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) | [paper SNLI](https://doi.org/10.18653/v1/d15-1075), [paper MultiNLI](https://doi.org/10.18653/v1/n18-1101) | 277,230 |
163
- | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (bodies) | | 250,519 |
164
- | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (titles+bodies) | | 250,460 |
165
- | [Sentence Compression](https://github.com/google-research-datasets/sentence-compression) | [paper](https://www.aclweb.org/anthology/D13-1155/) | 180,000 |
166
- | [Wikihow](https://github.com/pvl/wikihow_pairs_dataset) | [paper](https://arxiv.org/abs/1810.09305) | 128,542 |
167
- | [Altlex](https://github.com/chridey/altlex/) | [paper](https://aclanthology.org/P16-1135.pdf) | 112,696 |
168
- | [Quora Question Triplets](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs) | - | 103,663 |
169
- | [Simple Wikipedia](https://cs.pomona.edu/~dkauchak/simplification/) | [paper](https://www.aclweb.org/anthology/P11-2117/) | 102,225 |
170
- | [Natural Questions (NQ)](https://ai.google.com/research/NaturalQuestions) | [paper](https://transacl.org/ojs/index.php/tacl/article/view/1455) | 100,231 |
171
- | [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/) | [paper](https://aclanthology.org/P18-2124.pdf) | 87,599 |
172
- | [TriviaQA](https://huggingface.co/datasets/trivia_qa) | - | 73,346 |
173
- | **Total** | | **1,170,060,424** |
 
1
  ---
 
 
 
2
  tags:
3
  - sentence-transformers
 
4
  - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:10000
9
+ - loss:CosineSimilarityLoss
10
+ base_model: sentence-transformers/all-MiniLM-L6-v2
11
+ widget:
12
+ - source_sentence: Love, Rosie (2014)
13
+ sentences:
14
+ - All Roads Lead to Rome (2016)
15
+ - 'Under Siege 2: Dark Territory (1995)'
16
+ - Side Effects (2013)
17
+ - source_sentence: Fletch Lives (1989)
18
+ sentences:
19
+ - Catch That Kid (2004)
20
+ - Island, The (2005)
21
+ - Game Over, Man! (2018)
22
+ - source_sentence: Snow White and the Seven Dwarfs (1937)
23
+ sentences:
24
+ - Oceans (Océans) (2009)
25
+ - Last of the Dogmen (1995)
26
+ - 'Star Trek IV: The Voyage Home (1986)'
27
+ - source_sentence: He Got Game (1998)
28
+ sentences:
29
+ - Camille Claudel (1988)
30
+ - Minus Man, The (1999)
31
+ - Helvetica (2007)
32
+ - source_sentence: Leap Year (2010)
33
+ sentences:
34
+ - Gotcha! (1985)
35
+ - Independence Day (a.k.a. ID4) (1996)
36
+ - Valley Girl (1983)
37
  pipeline_tag: sentence-similarity
38
+ library_name: sentence-transformers
39
+ metrics:
40
+ - cosine_accuracy
41
+ - cosine_accuracy_threshold
42
+ - cosine_f1
43
+ - cosine_f1_threshold
44
+ - cosine_precision
45
+ - cosine_recall
46
+ - cosine_ap
47
+ - cosine_mcc
48
+ model-index:
49
+ - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
50
+ results:
51
+ - task:
52
+ type: binary-classification
53
+ name: Binary Classification
54
+ dataset:
55
+ name: eval
56
+ type: eval
57
+ metrics:
58
+ - type: cosine_accuracy
59
+ value: 0.72
60
+ name: Cosine Accuracy
61
+ - type: cosine_accuracy_threshold
62
+ value: 0.5096579790115356
63
+ name: Cosine Accuracy Threshold
64
+ - type: cosine_f1
65
+ value: 0.7534246575342466
66
+ name: Cosine F1
67
+ - type: cosine_f1_threshold
68
+ value: 0.4517183005809784
69
+ name: Cosine F1 Threshold
70
+ - type: cosine_precision
71
+ value: 0.6586826347305389
72
+ name: Cosine Precision
73
+ - type: cosine_recall
74
+ value: 0.88
75
+ name: Cosine Recall
76
+ - type: cosine_ap
77
+ value: 0.7877022662312201
78
+ name: Cosine Ap
79
+ - type: cosine_mcc
80
+ value: 0.45017211275427993
81
+ name: Cosine Mcc
82
  ---
83
 
84
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
85
+
86
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
87
 
88
+ ## Model Details
 
89
 
90
+ ### Model Description
91
+ - **Model Type:** Sentence Transformer
92
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
93
+ - **Maximum Sequence Length:** 256 tokens
94
+ - **Output Dimensionality:** 384 dimensions
95
+ - **Similarity Function:** Cosine Similarity
96
+ <!-- - **Training Dataset:** Unknown -->
97
+ <!-- - **Language:** Unknown -->
98
+ <!-- - **License:** Unknown -->
99
+
100
+ ### Model Sources
101
+
102
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
103
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
104
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
105
+
106
+ ### Full Model Architecture
107
 
108
  ```
109
+ SentenceTransformer(
110
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
111
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
112
+ (2): Normalize()
113
+ )
114
+ ```
115
+
116
+ ## Usage
117
+
118
+ ### Direct Usage (Sentence Transformers)
119
+
120
+ First install the Sentence Transformers library:
121
+
122
+ ```bash
123
  pip install -U sentence-transformers
124
  ```
125
 
126
+ Then you can load this model and run inference.
127
  ```python
128
  from sentence_transformers import SentenceTransformer
 
129
 
130
+ # Download from the 🤗 Hub
131
+ model = SentenceTransformer("sentence_transformers_model_id")
132
+ # Run inference
133
+ sentences = [
134
+ 'Leap Year (2010)',
135
+ 'Gotcha! (1985)',
136
+ 'Valley Girl (1983)',
137
+ ]
138
  embeddings = model.encode(sentences)
139
+ print(embeddings.shape)
140
+ # [3, 384]
141
+
142
+ # Get the similarity scores for the embeddings
143
+ similarities = model.similarity(embeddings, embeddings)
144
+ print(similarities)
145
+ # tensor([[1.0000, 0.7005, 0.6328],
146
+ # [0.7005, 1.0000, 0.6042],
147
+ # [0.6328, 0.6042, 1.0000]])
148
  ```
149
 
150
+ <!--
151
+ ### Direct Usage (Transformers)
 
 
 
 
 
152
 
153
+ <details><summary>Click to see the direct usage in Transformers</summary>
 
 
 
 
154
 
155
+ </details>
156
+ -->
157
+
158
+ <!--
159
+ ### Downstream Usage (Sentence Transformers)
160
+
161
+ You can finetune this model on your own dataset.
162
+
163
+ <details><summary>Click to expand</summary>
164
+
165
+ </details>
166
+ -->
167
+
168
+ <!--
169
+ ### Out-of-Scope Use
170
+
171
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
172
+ -->
173
+
174
+ ## Evaluation
175
+
176
+ ### Metrics
177
+
178
+ #### Binary Classification
179
+
180
+ * Dataset: `eval`
181
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
182
+
183
+ | Metric | Value |
184
+ |:--------------------------|:-----------|
185
+ | cosine_accuracy | 0.72 |
186
+ | cosine_accuracy_threshold | 0.5097 |
187
+ | cosine_f1 | 0.7534 |
188
+ | cosine_f1_threshold | 0.4517 |
189
+ | cosine_precision | 0.6587 |
190
+ | cosine_recall | 0.88 |
191
+ | **cosine_ap** | **0.7877** |
192
+ | cosine_mcc | 0.4502 |
193
+
194
+ <!--
195
+ ## Bias, Risks and Limitations
196
+
197
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
198
+ -->
199
+
200
+ <!--
201
+ ### Recommendations
202
+
203
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
204
+ -->
205
+
206
+ ## Training Details
207
+
208
+ ### Training Dataset
209
+
210
+ #### Unnamed Dataset
211
+
212
+ * Size: 10,000 training samples
213
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
214
+ * Approximate statistics based on the first 1000 samples:
215
+ | | sentence_0 | sentence_1 | label |
216
+ |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------|
217
+ | type | string | string | float |
218
+ | details | <ul><li>min: 5 tokens</li><li>mean: 9.72 tokens</li><li>max: 34 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 9.94 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.49</li><li>max: 1.0</li></ul> |
219
+ * Samples:
220
+ | sentence_0 | sentence_1 | label |
221
+ |:------------------------------------------------|:---------------------------------------|:-----------------|
222
+ | <code>Love and Other Catastrophes (1996)</code> | <code>Royal Flash (1975)</code> | <code>1.0</code> |
223
+ | <code>Matrix, The (1999)</code> | <code>Warrior's Way, The (2010)</code> | <code>1.0</code> |
224
+ | <code>Spy (2015)</code> | <code>Cassandra's Dream (2007)</code> | <code>1.0</code> |
225
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
226
+ ```json
227
+ {
228
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
229
+ }
230
+ ```
231
+
232
+ ### Training Hyperparameters
233
+ #### Non-Default Hyperparameters
234
+
235
+ - `per_device_train_batch_size`: 16
236
+ - `per_device_eval_batch_size`: 16
237
+ - `multi_dataset_batch_sampler`: round_robin
238
+
239
+ #### All Hyperparameters
240
+ <details><summary>Click to expand</summary>
241
+
242
+ - `overwrite_output_dir`: False
243
+ - `do_predict`: False
244
+ - `eval_strategy`: no
245
+ - `prediction_loss_only`: True
246
+ - `per_device_train_batch_size`: 16
247
+ - `per_device_eval_batch_size`: 16
248
+ - `per_gpu_train_batch_size`: None
249
+ - `per_gpu_eval_batch_size`: None
250
+ - `gradient_accumulation_steps`: 1
251
+ - `eval_accumulation_steps`: None
252
+ - `torch_empty_cache_steps`: None
253
+ - `learning_rate`: 5e-05
254
+ - `weight_decay`: 0.0
255
+ - `adam_beta1`: 0.9
256
+ - `adam_beta2`: 0.999
257
+ - `adam_epsilon`: 1e-08
258
+ - `max_grad_norm`: 1
259
+ - `num_train_epochs`: 3
260
+ - `max_steps`: -1
261
+ - `lr_scheduler_type`: linear
262
+ - `lr_scheduler_kwargs`: {}
263
+ - `warmup_ratio`: 0.0
264
+ - `warmup_steps`: 0
265
+ - `log_level`: passive
266
+ - `log_level_replica`: warning
267
+ - `log_on_each_node`: True
268
+ - `logging_nan_inf_filter`: True
269
+ - `save_safetensors`: True
270
+ - `save_on_each_node`: False
271
+ - `save_only_model`: False
272
+ - `restore_callback_states_from_checkpoint`: False
273
+ - `no_cuda`: False
274
+ - `use_cpu`: False
275
+ - `use_mps_device`: False
276
+ - `seed`: 42
277
+ - `data_seed`: None
278
+ - `jit_mode_eval`: False
279
+ - `use_ipex`: False
280
+ - `bf16`: False
281
+ - `fp16`: False
282
+ - `fp16_opt_level`: O1
283
+ - `half_precision_backend`: auto
284
+ - `bf16_full_eval`: False
285
+ - `fp16_full_eval`: False
286
+ - `tf32`: None
287
+ - `local_rank`: 0
288
+ - `ddp_backend`: None
289
+ - `tpu_num_cores`: None
290
+ - `tpu_metrics_debug`: False
291
+ - `debug`: []
292
+ - `dataloader_drop_last`: False
293
+ - `dataloader_num_workers`: 0
294
+ - `dataloader_prefetch_factor`: None
295
+ - `past_index`: -1
296
+ - `disable_tqdm`: False
297
+ - `remove_unused_columns`: True
298
+ - `label_names`: None
299
+ - `load_best_model_at_end`: False
300
+ - `ignore_data_skip`: False
301
+ - `fsdp`: []
302
+ - `fsdp_min_num_params`: 0
303
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
304
+ - `fsdp_transformer_layer_cls_to_wrap`: None
305
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
306
+ - `deepspeed`: None
307
+ - `label_smoothing_factor`: 0.0
308
+ - `optim`: adamw_torch
309
+ - `optim_args`: None
310
+ - `adafactor`: False
311
+ - `group_by_length`: False
312
+ - `length_column_name`: length
313
+ - `ddp_find_unused_parameters`: None
314
+ - `ddp_bucket_cap_mb`: None
315
+ - `ddp_broadcast_buffers`: False
316
+ - `dataloader_pin_memory`: True
317
+ - `dataloader_persistent_workers`: False
318
+ - `skip_memory_metrics`: True
319
+ - `use_legacy_prediction_loop`: False
320
+ - `push_to_hub`: False
321
+ - `resume_from_checkpoint`: None
322
+ - `hub_model_id`: None
323
+ - `hub_strategy`: every_save
324
+ - `hub_private_repo`: None
325
+ - `hub_always_push`: False
326
+ - `hub_revision`: None
327
+ - `gradient_checkpointing`: False
328
+ - `gradient_checkpointing_kwargs`: None
329
+ - `include_inputs_for_metrics`: False
330
+ - `include_for_metrics`: []
331
+ - `eval_do_concat_batches`: True
332
+ - `fp16_backend`: auto
333
+ - `push_to_hub_model_id`: None
334
+ - `push_to_hub_organization`: None
335
+ - `mp_parameters`:
336
+ - `auto_find_batch_size`: False
337
+ - `full_determinism`: False
338
+ - `torchdynamo`: None
339
+ - `ray_scope`: last
340
+ - `ddp_timeout`: 1800
341
+ - `torch_compile`: False
342
+ - `torch_compile_backend`: None
343
+ - `torch_compile_mode`: None
344
+ - `include_tokens_per_second`: False
345
+ - `include_num_input_tokens_seen`: False
346
+ - `neftune_noise_alpha`: None
347
+ - `optim_target_modules`: None
348
+ - `batch_eval_metrics`: False
349
+ - `eval_on_start`: False
350
+ - `use_liger_kernel`: False
351
+ - `liger_kernel_config`: None
352
+ - `eval_use_gather_object`: False
353
+ - `average_tokens_across_devices`: False
354
+ - `prompts`: None
355
+ - `batch_sampler`: batch_sampler
356
+ - `multi_dataset_batch_sampler`: round_robin
357
+ - `router_mapping`: {}
358
+ - `learning_rate_mapping`: {}
359
+
360
+ </details>
361
+
362
+ ### Training Logs
363
+ | Epoch | Step | Training Loss | eval_cosine_ap |
364
+ |:-----:|:----:|:-------------:|:--------------:|
365
+ | 0.8 | 500 | 0.2615 | - |
366
+ | 1.6 | 1000 | 0.2422 | - |
367
+ | 2.4 | 1500 | 0.231 | - |
368
+ | -1 | -1 | - | 0.7877 |
369
+
370
+
371
+ ### Framework Versions
372
+ - Python: 3.11.13
373
+ - Sentence Transformers: 5.0.0
374
+ - Transformers: 4.53.3
375
+ - PyTorch: 2.6.0+cu124
376
+ - Accelerate: 1.9.0
377
+ - Datasets: 2.14.4
378
+ - Tokenizers: 0.21.2
379
+
380
+ ## Citation
381
+
382
+ ### BibTeX
383
+
384
+ #### Sentence Transformers
385
+ ```bibtex
386
+ @inproceedings{reimers-2019-sentence-bert,
387
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
388
+ author = "Reimers, Nils and Gurevych, Iryna",
389
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
390
+ month = "11",
391
+ year = "2019",
392
+ publisher = "Association for Computational Linguistics",
393
+ url = "https://arxiv.org/abs/1908.10084",
394
+ }
395
+ ```
396
 
397
+ <!--
398
+ ## Glossary
 
 
 
 
 
 
 
399
 
400
+ *Clearly define terms in order to be accessible across audiences.*
401
+ -->
 
402
 
403
+ <!--
404
+ ## Model Card Authors
405
 
406
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
407
+ -->
408
 
409
+ <!--
410
+ ## Model Card Contact
 
411
 
412
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
413
+ -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1377e9af0ca0b016a9f2aa584d6fc71ab3ea6804fae21ef9fb1416e2944057ac
3
  size 90864192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e2fafa75f6fe25b1bdad731ed0a17fa19cfa83d035073e56ac37ef8515ca5c9
3
  size 90864192
tokenizer.json CHANGED
@@ -2,14 +2,12 @@
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
- "max_length": 128,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },
9
  "padding": {
10
- "strategy": {
11
- "Fixed": 128
12
- },
13
  "direction": "Right",
14
  "pad_to_multiple_of": null,
15
  "pad_id": 0,
 
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
+ "max_length": 256,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },
9
  "padding": {
10
+ "strategy": "BatchLongest",
 
 
11
  "direction": "Right",
12
  "pad_to_multiple_of": null,
13
  "pad_id": 0,