kselight commited on
Commit
bd35c30
·
verified ·
1 Parent(s): 267023d

Add new CrossEncoder model

Browse files
Files changed (7) hide show
  1. README.md +470 -0
  2. config.json +35 -0
  3. model.safetensors +3 -0
  4. special_tokens_map.json +37 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +56 -0
  7. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,470 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - reranker
8
+ - generated_from_trainer
9
+ - dataset_size:10000
10
+ - loss:MSELoss
11
+ datasets:
12
+ - sentence-transformers/msmarco
13
+ pipeline_tag: text-ranking
14
+ library_name: sentence-transformers
15
+ metrics:
16
+ - map
17
+ - mrr@10
18
+ - ndcg@10
19
+ model-index:
20
+ - name: CrossEncoder
21
+ results:
22
+ - task:
23
+ type: cross-encoder-reranking
24
+ name: Cross Encoder Reranking
25
+ dataset:
26
+ name: NanoMSMARCO R100
27
+ type: NanoMSMARCO_R100
28
+ metrics:
29
+ - type: map
30
+ value: 0.0579
31
+ name: Map
32
+ - type: mrr@10
33
+ value: 0.0329
34
+ name: Mrr@10
35
+ - type: ndcg@10
36
+ value: 0.0479
37
+ name: Ndcg@10
38
+ - task:
39
+ type: cross-encoder-reranking
40
+ name: Cross Encoder Reranking
41
+ dataset:
42
+ name: NanoNFCorpus R100
43
+ type: NanoNFCorpus_R100
44
+ metrics:
45
+ - type: map
46
+ value: 0.2867
47
+ name: Map
48
+ - type: mrr@10
49
+ value: 0.4222
50
+ name: Mrr@10
51
+ - type: ndcg@10
52
+ value: 0.2546
53
+ name: Ndcg@10
54
+ - task:
55
+ type: cross-encoder-reranking
56
+ name: Cross Encoder Reranking
57
+ dataset:
58
+ name: NanoNQ R100
59
+ type: NanoNQ_R100
60
+ metrics:
61
+ - type: map
62
+ value: 0.0326
63
+ name: Map
64
+ - type: mrr@10
65
+ value: 0.01
66
+ name: Mrr@10
67
+ - type: ndcg@10
68
+ value: 0.0229
69
+ name: Ndcg@10
70
+ - task:
71
+ type: cross-encoder-nano-beir
72
+ name: Cross Encoder Nano BEIR
73
+ dataset:
74
+ name: NanoBEIR R100 mean
75
+ type: NanoBEIR_R100_mean
76
+ metrics:
77
+ - type: map
78
+ value: 0.1257
79
+ name: Map
80
+ - type: mrr@10
81
+ value: 0.155
82
+ name: Mrr@10
83
+ - type: ndcg@10
84
+ value: 0.1084
85
+ name: Ndcg@10
86
+ ---
87
+
88
+ # CrossEncoder
89
+
90
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model trained on the [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
91
+
92
+ ## Model Details
93
+
94
+ ### Model Description
95
+ - **Model Type:** Cross Encoder
96
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
97
+ - **Maximum Sequence Length:** 512 tokens
98
+ - **Number of Output Labels:** 1 label
99
+ - **Training Dataset:**
100
+ - [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco)
101
+ - **Language:** en
102
+ <!-- - **License:** Unknown -->
103
+
104
+ ### Model Sources
105
+
106
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
107
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
108
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
109
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
110
+
111
+ ## Usage
112
+
113
+ ### Direct Usage (Sentence Transformers)
114
+
115
+ First install the Sentence Transformers library:
116
+
117
+ ```bash
118
+ pip install -U sentence-transformers
119
+ ```
120
+
121
+ Then you can load this model and run inference.
122
+ ```python
123
+ from sentence_transformers import CrossEncoder
124
+
125
+ # Download from the 🤗 Hub
126
+ model = CrossEncoder("kselight/123BERT")
127
+ # Get scores for pairs of texts
128
+ pairs = [
129
+ ['what is ivana trump', 'The need for an independent investigation. As it stands, all three men in charge of the investigations into the Trump campaign are Republicans, and two of the three are vociferous Trump allies. Burr, the third, also tied himself to Trump during his close 2016 reelection campaign.'],
130
+ ["hogan's goat meaning", 'hoganâ\x80\x99s goat. The phrase like Hoganâ\x80\x99s goat refers to something that is faulty, messed up, or stinks like a goat. The phrase is a reference to R.F. Outcaultâ\x80\x99s seminal newspaper comic Hoganâ\x80\x99s Alley, which debuted in 1895. The title of the strip changed to The Yellow Kid the following year.'],
131
+ ['who made tokyo ghoul', "Tokyo Ghoul (Japanese: æ\x9d±äº¬å\x96°ç¨®ï¼\x88ã\x83\x88ã\x83¼ã\x82\xadã\x83§ã\x83¼ã\x82°ã\x83¼ã\x83«ï¼\x89, Hepburn: TÅ\x8dkyÅ\x8d GÅ«ru) is a Japanese manga series by Sui Ishida. It was serialized in Shueisha's seinen manga magazine Weekly Young Jump between September 2011 and September 2014 and has been collected in fourteen tankÅ\x8dbon volumes as of August 2014."],
132
+ ['neck of the scottie dog', 'Classical guitars. The classical guitar neck blank is relatively small compared to what is needed for construction. This is because a classical neck is constructed differently than most other neck designs. The heel of the neck is built up by stacking blocks of wood to achieve the necessary height.'],
133
+ ['what does bicameral mean in government', 'Top 10 amazing movie makeup transformations. In government, bicameralism is the practice of having two legislative or parliamentary chambers. The relationship between the two chambers of a bicameral legislature can vary. In some cases, they have equal power, and in others, one chamber is clearly superior to the other. It is commonplace in most federal systems to have a bicameral legislature.'],
134
+ ]
135
+ scores = model.predict(pairs)
136
+ print(scores.shape)
137
+ # (5,)
138
+
139
+ # Or rank different texts based on similarity to a single text
140
+ ranks = model.rank(
141
+ 'what is ivana trump',
142
+ [
143
+ 'The need for an independent investigation. As it stands, all three men in charge of the investigations into the Trump campaign are Republicans, and two of the three are vociferous Trump allies. Burr, the third, also tied himself to Trump during his close 2016 reelection campaign.',
144
+ 'hoganâ\x80\x99s goat. The phrase like Hoganâ\x80\x99s goat refers to something that is faulty, messed up, or stinks like a goat. The phrase is a reference to R.F. Outcaultâ\x80\x99s seminal newspaper comic Hoganâ\x80\x99s Alley, which debuted in 1895. The title of the strip changed to The Yellow Kid the following year.',
145
+ "Tokyo Ghoul (Japanese: æ\x9d±äº¬å\x96°ç¨®ï¼\x88ã\x83\x88ã\x83¼ã\x82\xadã\x83§ã\x83¼ã\x82°ã\x83¼ã\x83«ï¼\x89, Hepburn: TÅ\x8dkyÅ\x8d GÅ«ru) is a Japanese manga series by Sui Ishida. It was serialized in Shueisha's seinen manga magazine Weekly Young Jump between September 2011 and September 2014 and has been collected in fourteen tankÅ\x8dbon volumes as of August 2014.",
146
+ 'Classical guitars. The classical guitar neck blank is relatively small compared to what is needed for construction. This is because a classical neck is constructed differently than most other neck designs. The heel of the neck is built up by stacking blocks of wood to achieve the necessary height.',
147
+ 'Top 10 amazing movie makeup transformations. In government, bicameralism is the practice of having two legislative or parliamentary chambers. The relationship between the two chambers of a bicameral legislature can vary. In some cases, they have equal power, and in others, one chamber is clearly superior to the other. It is commonplace in most federal systems to have a bicameral legislature.',
148
+ ]
149
+ )
150
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
151
+ ```
152
+
153
+ <!--
154
+ ### Direct Usage (Transformers)
155
+
156
+ <details><summary>Click to see the direct usage in Transformers</summary>
157
+
158
+ </details>
159
+ -->
160
+
161
+ <!--
162
+ ### Downstream Usage (Sentence Transformers)
163
+
164
+ You can finetune this model on your own dataset.
165
+
166
+ <details><summary>Click to expand</summary>
167
+
168
+ </details>
169
+ -->
170
+
171
+ <!--
172
+ ### Out-of-Scope Use
173
+
174
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
175
+ -->
176
+
177
+ ## Evaluation
178
+
179
+ ### Metrics
180
+
181
+ #### Cross Encoder Reranking
182
+
183
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
184
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
185
+ ```json
186
+ {
187
+ "at_k": 10,
188
+ "always_rerank_positives": true
189
+ }
190
+ ```
191
+
192
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
193
+ |:------------|:---------------------|:---------------------|:---------------------|
194
+ | map | 0.0579 (-0.4317) | 0.2867 (+0.0257) | 0.0326 (-0.3870) |
195
+ | mrr@10 | 0.0329 (-0.4446) | 0.4222 (-0.0777) | 0.0100 (-0.4167) |
196
+ | **ndcg@10** | **0.0479 (-0.4925)** | **0.2546 (-0.0705)** | **0.0229 (-0.4778)** |
197
+
198
+ #### Cross Encoder Nano BEIR
199
+
200
+ * Dataset: `NanoBEIR_R100_mean`
201
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
202
+ ```json
203
+ {
204
+ "dataset_names": [
205
+ "msmarco",
206
+ "nfcorpus",
207
+ "nq"
208
+ ],
209
+ "rerank_k": 100,
210
+ "at_k": 10,
211
+ "always_rerank_positives": true
212
+ }
213
+ ```
214
+
215
+ | Metric | Value |
216
+ |:------------|:---------------------|
217
+ | map | 0.1257 (-0.2643) |
218
+ | mrr@10 | 0.1550 (-0.3130) |
219
+ | **ndcg@10** | **0.1084 (-0.3469)** |
220
+
221
+ <!--
222
+ ## Bias, Risks and Limitations
223
+
224
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
225
+ -->
226
+
227
+ <!--
228
+ ### Recommendations
229
+
230
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
231
+ -->
232
+
233
+ ## Training Details
234
+
235
+ ### Training Dataset
236
+
237
+ #### msmarco
238
+
239
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
240
+ * Size: 10,000 training samples
241
+ * Columns: <code>score</code>, <code>query</code>, and <code>passage</code>
242
+ * Approximate statistics based on the first 1000 samples:
243
+ | | score | query | passage |
244
+ |:--------|:-------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|
245
+ | type | float | string | string |
246
+ | details | <ul><li>min: -11.79</li><li>mean: 0.58</li><li>max: 11.1</li></ul> | <ul><li>min: 9 characters</li><li>mean: 34.21 characters</li><li>max: 140 characters</li></ul> | <ul><li>min: 70 characters</li><li>mean: 342.2 characters</li><li>max: 894 characters</li></ul> |
247
+ * Samples:
248
+ | score | query | passage |
249
+ |:--------------------------------|:-------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
250
+ | <code>6.720487356185913</code> | <code>modern definition of democracy</code> | <code>Links. A Short Definition of Democracy U.S. president Abraham Lincoln (1809-1865) defined democracy as: «Government of the people, by the people, for the people» Democracy is by far the most challenging form of government-both for politicians and for the people.The term democracy comes from the Greek language and means rule by the (simple) people. The so-called democracies in classical antiquity (Athens and Rome) represent precursors of modern democracies.Like modern democracy, they were created as a reaction to a concentration and abuse of power by the rulers.he term democracy comes from the Greek language and means rule by the (simple) people. The so-called democracies in classical antiquity (Athens and Rome) represent precursors of modern democracies.</code> |
251
+ | <code>1.6529417037963867</code> | <code>is celexa and fluoxetine same</code> | <code>Celexa (citalopram hydrobromide) is a type of antidepressant called a selective serotonin reuptake inhibitor (SSRI) indicated for the treatment of depression. Celexa is available in generic form. Common side effects of Celexa include. constipation, nausea, diarrhea, upset stomach, decreased sexual desire,</code> |
252
+ | <code>-9.121654828389486</code> | <code>what are 2 examples of nonpoint pollution</code> | <code>Concept of pollution tax. All such measures are compensatory in nature and it is not called pollution tax. The concept of pollution tax is something different. It entails that instead of doing offsetting work by yourself wherever you hurt environment either willfully or without any intention you have to pay for it.</code> |
253
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#mseloss) with these parameters:
254
+ ```json
255
+ {
256
+ "activation_fn": "torch.nn.modules.linear.Identity"
257
+ }
258
+ ```
259
+
260
+ ### Evaluation Dataset
261
+
262
+ #### msmarco
263
+
264
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
265
+ * Size: 1,000 evaluation samples
266
+ * Columns: <code>score</code>, <code>query</code>, and <code>passage</code>
267
+ * Approximate statistics based on the first 1000 samples:
268
+ | | score | query | passage |
269
+ |:--------|:--------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
270
+ | type | float | string | string |
271
+ | details | <ul><li>min: -11.85</li><li>mean: 1.11</li><li>max: 11.15</li></ul> | <ul><li>min: 7 characters</li><li>mean: 34.0 characters</li><li>max: 186 characters</li></ul> | <ul><li>min: 76 characters</li><li>mean: 343.66 characters</li><li>max: 944 characters</li></ul> |
272
+ * Samples:
273
+ | score | query | passage |
274
+ |:---------------------------------|:----------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
275
+ | <code>-11.078993638356527</code> | <code>what is ivana trump</code> | <code>The need for an independent investigation. As it stands, all three men in charge of the investigations into the Trump campaign are Republicans, and two of the three are vociferous Trump allies. Burr, the third, also tied himself to Trump during his close 2016 reelection campaign.</code> |
276
+ | <code>8.86651055018107</code> | <code>hogan's goat meaning</code> | <code>hogan’s goat. The phrase like Hogan’s goat refers to something that is faulty, messed up, or stinks like a goat. The phrase is a reference to R.F. Outcault’s seminal newspaper comic Hogan’s Alley, which debuted in 1895. The title of the strip changed to The Yellow Kid the following year.</code> |
277
+ | <code>8.381712992986044</code> | <code>who made tokyo ghoul</code> | <code>Tokyo Ghoul (Japanese: 東京喰種(トーキョーグール), Hepburn: Tōkyō Gūru) is a Japanese manga series by Sui Ishida. It was serialized in Shueisha's seinen manga magazine Weekly Young Jump between September 2011 and September 2014 and has been collected in fourteen tankōbon volumes as of August 2014.</code> |
278
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#mseloss) with these parameters:
279
+ ```json
280
+ {
281
+ "activation_fn": "torch.nn.modules.linear.Identity"
282
+ }
283
+ ```
284
+
285
+ ### Training Hyperparameters
286
+ #### Non-Default Hyperparameters
287
+
288
+ - `eval_strategy`: steps
289
+ - `per_device_train_batch_size`: 16
290
+ - `per_device_eval_batch_size`: 16
291
+ - `learning_rate`: 8e-06
292
+ - `num_train_epochs`: 1
293
+ - `warmup_ratio`: 0.1
294
+ - `seed`: 12
295
+ - `dataloader_num_workers`: 4
296
+ - `load_best_model_at_end`: True
297
+
298
+ #### All Hyperparameters
299
+ <details><summary>Click to expand</summary>
300
+
301
+ - `overwrite_output_dir`: False
302
+ - `do_predict`: False
303
+ - `eval_strategy`: steps
304
+ - `prediction_loss_only`: True
305
+ - `per_device_train_batch_size`: 16
306
+ - `per_device_eval_batch_size`: 16
307
+ - `per_gpu_train_batch_size`: None
308
+ - `per_gpu_eval_batch_size`: None
309
+ - `gradient_accumulation_steps`: 1
310
+ - `eval_accumulation_steps`: None
311
+ - `torch_empty_cache_steps`: None
312
+ - `learning_rate`: 8e-06
313
+ - `weight_decay`: 0.0
314
+ - `adam_beta1`: 0.9
315
+ - `adam_beta2`: 0.999
316
+ - `adam_epsilon`: 1e-08
317
+ - `max_grad_norm`: 1.0
318
+ - `num_train_epochs`: 1
319
+ - `max_steps`: -1
320
+ - `lr_scheduler_type`: linear
321
+ - `lr_scheduler_kwargs`: {}
322
+ - `warmup_ratio`: 0.1
323
+ - `warmup_steps`: 0
324
+ - `log_level`: passive
325
+ - `log_level_replica`: warning
326
+ - `log_on_each_node`: True
327
+ - `logging_nan_inf_filter`: True
328
+ - `save_safetensors`: True
329
+ - `save_on_each_node`: False
330
+ - `save_only_model`: False
331
+ - `restore_callback_states_from_checkpoint`: False
332
+ - `no_cuda`: False
333
+ - `use_cpu`: False
334
+ - `use_mps_device`: False
335
+ - `seed`: 12
336
+ - `data_seed`: None
337
+ - `jit_mode_eval`: False
338
+ - `use_ipex`: False
339
+ - `bf16`: False
340
+ - `fp16`: False
341
+ - `fp16_opt_level`: O1
342
+ - `half_precision_backend`: auto
343
+ - `bf16_full_eval`: False
344
+ - `fp16_full_eval`: False
345
+ - `tf32`: None
346
+ - `local_rank`: 0
347
+ - `ddp_backend`: None
348
+ - `tpu_num_cores`: None
349
+ - `tpu_metrics_debug`: False
350
+ - `debug`: []
351
+ - `dataloader_drop_last`: False
352
+ - `dataloader_num_workers`: 4
353
+ - `dataloader_prefetch_factor`: None
354
+ - `past_index`: -1
355
+ - `disable_tqdm`: False
356
+ - `remove_unused_columns`: True
357
+ - `label_names`: None
358
+ - `load_best_model_at_end`: True
359
+ - `ignore_data_skip`: False
360
+ - `fsdp`: []
361
+ - `fsdp_min_num_params`: 0
362
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
363
+ - `fsdp_transformer_layer_cls_to_wrap`: None
364
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
365
+ - `deepspeed`: None
366
+ - `label_smoothing_factor`: 0.0
367
+ - `optim`: adamw_torch
368
+ - `optim_args`: None
369
+ - `adafactor`: False
370
+ - `group_by_length`: False
371
+ - `length_column_name`: length
372
+ - `ddp_find_unused_parameters`: None
373
+ - `ddp_bucket_cap_mb`: None
374
+ - `ddp_broadcast_buffers`: False
375
+ - `dataloader_pin_memory`: True
376
+ - `dataloader_persistent_workers`: False
377
+ - `skip_memory_metrics`: True
378
+ - `use_legacy_prediction_loop`: False
379
+ - `push_to_hub`: False
380
+ - `resume_from_checkpoint`: None
381
+ - `hub_model_id`: None
382
+ - `hub_strategy`: every_save
383
+ - `hub_private_repo`: None
384
+ - `hub_always_push`: False
385
+ - `gradient_checkpointing`: False
386
+ - `gradient_checkpointing_kwargs`: None
387
+ - `include_inputs_for_metrics`: False
388
+ - `include_for_metrics`: []
389
+ - `eval_do_concat_batches`: True
390
+ - `fp16_backend`: auto
391
+ - `push_to_hub_model_id`: None
392
+ - `push_to_hub_organization`: None
393
+ - `mp_parameters`:
394
+ - `auto_find_batch_size`: False
395
+ - `full_determinism`: False
396
+ - `torchdynamo`: None
397
+ - `ray_scope`: last
398
+ - `ddp_timeout`: 1800
399
+ - `torch_compile`: False
400
+ - `torch_compile_backend`: None
401
+ - `torch_compile_mode`: None
402
+ - `dispatch_batches`: None
403
+ - `split_batches`: None
404
+ - `include_tokens_per_second`: False
405
+ - `include_num_input_tokens_seen`: False
406
+ - `neftune_noise_alpha`: None
407
+ - `optim_target_modules`: None
408
+ - `batch_eval_metrics`: False
409
+ - `eval_on_start`: False
410
+ - `use_liger_kernel`: False
411
+ - `eval_use_gather_object`: False
412
+ - `average_tokens_across_devices`: False
413
+ - `prompts`: None
414
+ - `batch_sampler`: batch_sampler
415
+ - `multi_dataset_batch_sampler`: proportional
416
+ - `router_mapping`: {}
417
+ - `learning_rate_mapping`: {}
418
+
419
+ </details>
420
+
421
+ ### Training Logs
422
+ | Epoch | Step | Training Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
423
+ |:------:|:----:|:-------------:|:------------------------:|:-------------------------:|:-------------------:|:--------------------------:|
424
+ | -1 | -1 | - | 0.0479 (-0.4925) | 0.2546 (-0.0705) | 0.0229 (-0.4778) | 0.1084 (-0.3469) |
425
+ | 0.0064 | 1 | 53.6175 | - | - | - | - |
426
+
427
+
428
+ ### Framework Versions
429
+ - Python: 3.11.6
430
+ - Sentence Transformers: 5.1.1
431
+ - Transformers: 4.47.1
432
+ - PyTorch: 2.4.0+cu124
433
+ - Accelerate: 1.5.1
434
+ - Datasets: 3.3.2
435
+ - Tokenizers: 0.21.1
436
+
437
+ ## Citation
438
+
439
+ ### BibTeX
440
+
441
+ #### Sentence Transformers
442
+ ```bibtex
443
+ @inproceedings{reimers-2019-sentence-bert,
444
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
445
+ author = "Reimers, Nils and Gurevych, Iryna",
446
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
447
+ month = "11",
448
+ year = "2019",
449
+ publisher = "Association for Computational Linguistics",
450
+ url = "https://arxiv.org/abs/1908.10084",
451
+ }
452
+ ```
453
+
454
+ <!--
455
+ ## Glossary
456
+
457
+ *Clearly define terms in order to be accessible across audiences.*
458
+ -->
459
+
460
+ <!--
461
+ ## Model Card Authors
462
+
463
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
464
+ -->
465
+
466
+ <!--
467
+ ## Model Card Contact
468
+
469
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
470
+ -->
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./models/bert_1L_256/student/untrained",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 256,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 8,
23
+ "num_hidden_layers": 1,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sentence_transformers": {
27
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
28
+ "version": "5.1.1"
29
+ },
30
+ "torch_dtype": "float32",
31
+ "transformers_version": "4.47.1",
32
+ "type_vocab_size": 2,
33
+ "use_cache": true,
34
+ "vocab_size": 30522
35
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f575f481344a089068d643636537330e4844349265b5ad6b3772ad11a8079e6
3
+ size 36259524
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff