foochun commited on
Commit
c08f0f3
·
verified ·
1 Parent(s): cd23558

finetuned with additional names

Browse files
Files changed (3) hide show
  1. README.md +78 -50
  2. config.json +1 -1
  3. model.safetensors +1 -1
README.md CHANGED
@@ -2,10 +2,9 @@
2
  tags:
3
  - sentence-transformers
4
  - cross-encoder
5
- - reranker
6
  - generated_from_trainer
7
- - dataset_size:27035
8
- - loss:BinaryCrossEntropyLoss
9
  base_model: BAAI/bge-reranker-base
10
  pipeline_tag: text-ranking
11
  library_name: sentence-transformers
@@ -51,11 +50,11 @@ from sentence_transformers import CrossEncoder
51
  model = CrossEncoder("foochun/bge-reranker-ft")
52
  # Get scores for pairs of texts
53
  pairs = [
54
- ['wendy chia pei ling', 'chia ling pei wendy'],
55
- ['tara d/o sundaram', 'tara a/l sundaram'],
56
- ['sim sin xuan', 'sin sim xuan'],
57
- ['samantha claire de silva', 'raja iskandar bin raja ahmad'],
58
- ['tai yong shen', 'shen tai yong'],
59
  ]
60
  scores = model.predict(pairs)
61
  print(scores.shape)
@@ -63,13 +62,13 @@ print(scores.shape)
63
 
64
  # Or rank different texts based on similarity to a single text
65
  ranks = model.rank(
66
- 'wendy chia pei ling',
67
  [
68
- 'chia ling pei wendy',
69
- 'tara a/l sundaram',
70
- 'sin sim xuan',
71
- 'raja iskandar bin raja ahmad',
72
- 'shen tai yong',
73
  ]
74
  )
75
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -117,41 +116,74 @@ You can finetune this model on your own dataset.
117
 
118
  #### Unnamed Dataset
119
 
120
- * Size: 27,035 training samples
121
- * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
122
  * Approximate statistics based on the first 1000 samples:
123
- | | sentence_0 | sentence_1 | label |
124
- |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------|:----------------------------------------------------------------|
125
- | type | string | string | float |
126
- | details | <ul><li>min: 10 characters</li><li>mean: 21.47 characters</li><li>max: 45 characters</li></ul> | <ul><li>min: 7 characters</li><li>mean: 19.7 characters</li><li>max: 40 characters</li></ul> | <ul><li>min: 0.55</li><li>mean: 0.77</li><li>max: 1.0</li></ul> |
127
  * Samples:
128
- | sentence_0 | sentence_1 | label |
129
- |:---------------------------------|:---------------------------------|:--------------------|
130
- | <code>wendy chia pei ling</code> | <code>chia ling pei wendy</code> | <code>0.55</code> |
131
- | <code>tara d/o sundaram</code> | <code>tara a/l sundaram</code> | <code>0.836</code> |
132
- | <code>sim sin xuan</code> | <code>sin sim xuan</code> | <code>0.7885</code> |
133
- * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
134
  ```json
135
  {
136
- "activation_fn": "torch.nn.modules.linear.Identity",
137
- "pos_weight": null
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
  }
139
  ```
140
 
141
  ### Training Hyperparameters
142
  #### Non-Default Hyperparameters
143
 
 
144
  - `per_device_train_batch_size`: 64
145
  - `per_device_eval_batch_size`: 64
146
- - `num_train_epochs`: 5
 
 
147
  - `fp16`: True
 
 
 
148
 
149
  #### All Hyperparameters
150
  <details><summary>Click to expand</summary>
151
 
152
  - `overwrite_output_dir`: False
153
  - `do_predict`: False
154
- - `eval_strategy`: no
155
  - `prediction_loss_only`: True
156
  - `per_device_train_batch_size`: 64
157
  - `per_device_eval_batch_size`: 64
@@ -160,17 +192,17 @@ You can finetune this model on your own dataset.
160
  - `gradient_accumulation_steps`: 1
161
  - `eval_accumulation_steps`: None
162
  - `torch_empty_cache_steps`: None
163
- - `learning_rate`: 5e-05
164
  - `weight_decay`: 0.0
165
  - `adam_beta1`: 0.9
166
  - `adam_beta2`: 0.999
167
  - `adam_epsilon`: 1e-08
168
- - `max_grad_norm`: 1
169
- - `num_train_epochs`: 5
170
  - `max_steps`: -1
171
  - `lr_scheduler_type`: linear
172
  - `lr_scheduler_kwargs`: {}
173
- - `warmup_ratio`: 0.0
174
  - `warmup_steps`: 0
175
  - `log_level`: passive
176
  - `log_level_replica`: warning
@@ -183,7 +215,7 @@ You can finetune this model on your own dataset.
183
  - `no_cuda`: False
184
  - `use_cpu`: False
185
  - `use_mps_device`: False
186
- - `seed`: 42
187
  - `data_seed`: None
188
  - `jit_mode_eval`: False
189
  - `use_ipex`: False
@@ -200,13 +232,13 @@ You can finetune this model on your own dataset.
200
  - `tpu_metrics_debug`: False
201
  - `debug`: []
202
  - `dataloader_drop_last`: False
203
- - `dataloader_num_workers`: 0
204
  - `dataloader_prefetch_factor`: None
205
  - `past_index`: -1
206
  - `disable_tqdm`: False
207
  - `remove_unused_columns`: True
208
  - `label_names`: None
209
- - `load_best_model_at_end`: False
210
  - `ignore_data_skip`: False
211
  - `fsdp`: []
212
  - `fsdp_min_num_params`: 0
@@ -233,7 +265,6 @@ You can finetune this model on your own dataset.
233
  - `hub_strategy`: every_save
234
  - `hub_private_repo`: None
235
  - `hub_always_push`: False
236
- - `hub_revision`: None
237
  - `gradient_checkpointing`: False
238
  - `gradient_checkpointing_kwargs`: None
239
  - `include_inputs_for_metrics`: False
@@ -258,34 +289,31 @@ You can finetune this model on your own dataset.
258
  - `batch_eval_metrics`: False
259
  - `eval_on_start`: False
260
  - `use_liger_kernel`: False
261
- - `liger_kernel_config`: None
262
  - `eval_use_gather_object`: False
263
  - `average_tokens_across_devices`: False
264
  - `prompts`: None
265
- - `batch_sampler`: batch_sampler
266
  - `multi_dataset_batch_sampler`: proportional
267
- - `router_mapping`: {}
268
- - `learning_rate_mapping`: {}
269
 
270
  </details>
271
 
272
  ### Training Logs
273
  | Epoch | Step | Training Loss |
274
  |:------:|:----:|:-------------:|
275
- | 1.1820 | 500 | 0.4725 |
276
- | 2.3641 | 1000 | 0.4476 |
277
- | 3.5461 | 1500 | 0.4438 |
278
- | 4.7281 | 2000 | 0.443 |
279
 
280
 
281
  ### Framework Versions
282
  - Python: 3.11.9
283
- - Sentence Transformers: 5.0.0
284
- - Transformers: 4.53.0
285
  - PyTorch: 2.6.0+cu124
286
- - Accelerate: 1.8.1
287
  - Datasets: 3.6.0
288
- - Tokenizers: 0.21.2
289
 
290
  ## Citation
291
 
 
2
  tags:
3
  - sentence-transformers
4
  - cross-encoder
 
5
  - generated_from_trainer
6
+ - dataset_size:82744
7
+ - loss:MultipleNegativesRankingLoss
8
  base_model: BAAI/bge-reranker-base
9
  pipeline_tag: text-ranking
10
  library_name: sentence-transformers
 
50
  model = CrossEncoder("foochun/bge-reranker-ft")
51
  # Get scores for pairs of texts
52
  pairs = [
53
+ ['quinn toh heng yi', 'heng yi toh quinn'],
54
+ ['mohd iskandi bin hassan', 'muhd iskandi hassan'],
55
+ ['quinn ng ee siu', 'quinn ee siu ng'],
56
+ ['malini doraisamy', 'malini doraisamy'],
57
+ ['see shan fui', 'shanfui see'],
58
  ]
59
  scores = model.predict(pairs)
60
  print(scores.shape)
 
62
 
63
  # Or rank different texts based on similarity to a single text
64
  ranks = model.rank(
65
+ 'quinn toh heng yi',
66
  [
67
+ 'heng yi toh quinn',
68
+ 'muhd iskandi hassan',
69
+ 'quinn ee siu ng',
70
+ 'malini doraisamy',
71
+ 'shanfui see',
72
  ]
73
  )
74
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 
116
 
117
  #### Unnamed Dataset
118
 
119
+ * Size: 82,744 training samples
120
+ * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
121
  * Approximate statistics based on the first 1000 samples:
122
+ | | query | pos | neg |
123
+ |:--------|:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------|
124
+ | type | string | string | string |
125
+ | details | <ul><li>min: 9 characters</li><li>mean: 19.16 characters</li><li>max: 42 characters</li></ul> | <ul><li>min: 9 characters</li><li>mean: 17.11 characters</li><li>max: 37 characters</li></ul> | <ul><li>min: 9 characters</li><li>mean: 17.7 characters</li><li>max: 38 characters</li></ul> |
126
  * Samples:
127
+ | query | pos | neg |
128
+ |:---------------------------------|:-------------------------------|:---------------------------------|
129
+ | <code>brandon teh min jun</code> | <code>jun teh min</code> | <code>brandon min teh jun</code> |
130
+ | <code>suling anak peroi</code> | <code>suling anak peroi</code> | <code>suling anak rahim</code> |
131
+ | <code>chin sze tian</code> | <code>szetian chin</code> | <code>chin sze tian wong</code> |
132
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#multiplenegativesrankingloss) with these parameters:
133
  ```json
134
  {
135
+ "scale": 10.0,
136
+ "num_negatives": 4,
137
+ "activation_fn": "torch.nn.modules.activation.Sigmoid"
138
+ }
139
+ ```
140
+
141
+ ### Evaluation Dataset
142
+
143
+ #### Unnamed Dataset
144
+
145
+ * Size: 11,820 evaluation samples
146
+ * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
147
+ * Approximate statistics based on the first 1000 samples:
148
+ | | query | pos | neg |
149
+ |:--------|:-----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|
150
+ | type | string | string | string |
151
+ | details | <ul><li>min: 10 characters</li><li>mean: 19.08 characters</li><li>max: 45 characters</li></ul> | <ul><li>min: 9 characters</li><li>mean: 17.02 characters</li><li>max: 40 characters</li></ul> | <ul><li>min: 9 characters</li><li>mean: 17.58 characters</li><li>max: 44 characters</li></ul> |
152
+ * Samples:
153
+ | query | pos | neg |
154
+ |:-------------------------------------|:---------------------------------|:------------------------------------------------|
155
+ | <code>quinn toh heng yi</code> | <code>heng yi toh quinn</code> | <code>toh yi heng</code> |
156
+ | <code>mohd iskandi bin hassan</code> | <code>muhd iskandi hassan</code> | <code>puteri balqis binti megat sulaiman</code> |
157
+ | <code>quinn ng ee siu</code> | <code>quinn ee siu ng</code> | <code>quinn ee ng siu</code> |
158
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#multiplenegativesrankingloss) with these parameters:
159
+ ```json
160
+ {
161
+ "scale": 10.0,
162
+ "num_negatives": 4,
163
+ "activation_fn": "torch.nn.modules.activation.Sigmoid"
164
  }
165
  ```
166
 
167
  ### Training Hyperparameters
168
  #### Non-Default Hyperparameters
169
 
170
+ - `eval_strategy`: steps
171
  - `per_device_train_batch_size`: 64
172
  - `per_device_eval_batch_size`: 64
173
+ - `learning_rate`: 1e-05
174
+ - `warmup_ratio`: 0.1
175
+ - `seed`: 12
176
  - `fp16`: True
177
+ - `dataloader_num_workers`: 4
178
+ - `load_best_model_at_end`: True
179
+ - `batch_sampler`: no_duplicates
180
 
181
  #### All Hyperparameters
182
  <details><summary>Click to expand</summary>
183
 
184
  - `overwrite_output_dir`: False
185
  - `do_predict`: False
186
+ - `eval_strategy`: steps
187
  - `prediction_loss_only`: True
188
  - `per_device_train_batch_size`: 64
189
  - `per_device_eval_batch_size`: 64
 
192
  - `gradient_accumulation_steps`: 1
193
  - `eval_accumulation_steps`: None
194
  - `torch_empty_cache_steps`: None
195
+ - `learning_rate`: 1e-05
196
  - `weight_decay`: 0.0
197
  - `adam_beta1`: 0.9
198
  - `adam_beta2`: 0.999
199
  - `adam_epsilon`: 1e-08
200
+ - `max_grad_norm`: 1.0
201
+ - `num_train_epochs`: 3
202
  - `max_steps`: -1
203
  - `lr_scheduler_type`: linear
204
  - `lr_scheduler_kwargs`: {}
205
+ - `warmup_ratio`: 0.1
206
  - `warmup_steps`: 0
207
  - `log_level`: passive
208
  - `log_level_replica`: warning
 
215
  - `no_cuda`: False
216
  - `use_cpu`: False
217
  - `use_mps_device`: False
218
+ - `seed`: 12
219
  - `data_seed`: None
220
  - `jit_mode_eval`: False
221
  - `use_ipex`: False
 
232
  - `tpu_metrics_debug`: False
233
  - `debug`: []
234
  - `dataloader_drop_last`: False
235
+ - `dataloader_num_workers`: 4
236
  - `dataloader_prefetch_factor`: None
237
  - `past_index`: -1
238
  - `disable_tqdm`: False
239
  - `remove_unused_columns`: True
240
  - `label_names`: None
241
+ - `load_best_model_at_end`: True
242
  - `ignore_data_skip`: False
243
  - `fsdp`: []
244
  - `fsdp_min_num_params`: 0
 
265
  - `hub_strategy`: every_save
266
  - `hub_private_repo`: None
267
  - `hub_always_push`: False
 
268
  - `gradient_checkpointing`: False
269
  - `gradient_checkpointing_kwargs`: None
270
  - `include_inputs_for_metrics`: False
 
289
  - `batch_eval_metrics`: False
290
  - `eval_on_start`: False
291
  - `use_liger_kernel`: False
 
292
  - `eval_use_gather_object`: False
293
  - `average_tokens_across_devices`: False
294
  - `prompts`: None
295
+ - `batch_sampler`: no_duplicates
296
  - `multi_dataset_batch_sampler`: proportional
 
 
297
 
298
  </details>
299
 
300
  ### Training Logs
301
  | Epoch | Step | Training Loss |
302
  |:------:|:----:|:-------------:|
303
+ | 0.0008 | 1 | 0.4707 |
304
+ | 0.7734 | 1000 | 0.1114 |
305
+ | 1.5468 | 2000 | 0.0051 |
306
+ | 2.3202 | 3000 | 0.0046 |
307
 
308
 
309
  ### Framework Versions
310
  - Python: 3.11.9
311
+ - Sentence Transformers: 4.1.0
312
+ - Transformers: 4.52.4
313
  - PyTorch: 2.6.0+cu124
314
+ - Accelerate: 1.7.0
315
  - Datasets: 3.6.0
316
+ - Tokenizers: 0.21.1
317
 
318
  ## Citation
319
 
config.json CHANGED
@@ -30,7 +30,7 @@
30
  "version": "5.0.0"
31
  },
32
  "torch_dtype": "float32",
33
- "transformers_version": "4.53.0",
34
  "type_vocab_size": 1,
35
  "use_cache": true,
36
  "vocab_size": 250002
 
30
  "version": "5.0.0"
31
  },
32
  "torch_dtype": "float32",
33
+ "transformers_version": "4.53.3",
34
  "type_vocab_size": 1,
35
  "use_cache": true,
36
  "vocab_size": 250002
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c4d122284e1a31599b81749bfa07801bed98b79c73b8b146ce4ade3793501d47
3
  size 1112201932
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:590bafb40b20dad3f7206e0dd682b70c7d962305730ffde246762e9b04328fba
3
  size 1112201932