noahjax commited on
Commit
1bb56d2
·
verified ·
1 Parent(s): 242414d

Upload fine-tuned chart reranker model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -4,16 +4,16 @@ tags:
4
  - cross-encoder
5
  - reranker
6
  - generated_from_trainer
7
- - dataset_size:8000
8
  - loss:BinaryCrossEntropyLoss
9
- base_model: cross-encoder/ms-marco-MiniLM-L6-v2
10
  pipeline_tag: text-ranking
11
  library_name: sentence-transformers
12
  metrics:
13
  - pearson
14
  - spearman
15
  model-index:
16
- - name: CrossEncoder based on cross-encoder/ms-marco-MiniLM-L6-v2
17
  results:
18
  - task:
19
  type: cross-encoder-correlation
@@ -23,22 +23,22 @@ model-index:
23
  type: validation
24
  metrics:
25
  - type: pearson
26
- value: 0.8481096700155641
27
  name: Pearson
28
  - type: spearman
29
- value: 0.8528646396544212
30
  name: Spearman
31
  ---
32
 
33
- # CrossEncoder based on cross-encoder/ms-marco-MiniLM-L6-v2
34
 
35
- This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
36
 
37
  ## Model Details
38
 
39
  ### Model Description
40
  - **Model Type:** Cross Encoder
41
- - **Base model:** [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) <!-- at revision c5ee24cb16019beea0893ab7796b1df96625c6b8 -->
42
  - **Maximum Sequence Length:** 512 tokens
43
  - **Number of Output Labels:** 1 label
44
  <!-- - **Training Dataset:** Unknown -->
@@ -70,11 +70,11 @@ from sentence_transformers import CrossEncoder
70
  model = CrossEncoder("cross_encoder_model_id")
71
  # Get scores for pairs of texts
72
  pairs = [
73
- ['prix blé tendre bio Indre et Loire 2025', 'Chart Title: "Wheat (US Soft Red Winter) Spot Price", Collections: Commodity Prices'],
74
- ['oil prices', 'Chart Title: "West Texas Intermediate Crude Oil - Price in United States", Collections: Commodities::EIAEnergyIndicators::TimeseriesManager'],
75
- ['Nvidia earnings AI chip demand', 'Chart Title: "Nvidia Quarterly Price to Earnings", Collections: Companies::CompanyComputedRatiosV2::TimeseriesManager'],
76
- ['show me tesla stock performance 2020 to 2025', 'Title: "Manakoa Services Corporation Stock Performance"\n Collections: Companies\n Chart Type: company:private\n Sources: S&P Global'],
77
- ['Samsung A56 5G mémoire', 'Chart Title: "Samsung Publishing Co., Ltd Stock Prices", Info: Stock details for company Samsung Publishing Co., Ltd, Collections: Company Card, Chart Type: company:finance'],
78
  ]
79
  scores = model.predict(pairs)
80
  print(scores.shape)
@@ -82,13 +82,13 @@ print(scores.shape)
82
 
83
  # Or rank different texts based on similarity to a single text
84
  ranks = model.rank(
85
- 'prix blé tendre bio Indre et Loire 2025',
86
  [
87
- 'Chart Title: "Wheat (US Soft Red Winter) Spot Price", Collections: Commodity Prices',
88
- 'Chart Title: "West Texas Intermediate Crude Oil - Price in United States", Collections: Commodities::EIAEnergyIndicators::TimeseriesManager',
89
- 'Chart Title: "Nvidia Quarterly Price to Earnings", Collections: Companies::CompanyComputedRatiosV2::TimeseriesManager',
90
- 'Title: "Manakoa Services Corporation Stock Performance"\n Collections: Companies\n Chart Type: company:private\n Sources: S&P Global',
91
- 'Chart Title: "Samsung Publishing Co., Ltd Stock Prices", Info: Stock details for company Samsung Publishing Co., Ltd, Collections: Company Card, Chart Type: company:finance',
92
  ]
93
  )
94
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -129,8 +129,8 @@ You can finetune this model on your own dataset.
129
 
130
  | Metric | Value |
131
  |:-------------|:-----------|
132
- | pearson | 0.8481 |
133
- | **spearman** | **0.8529** |
134
 
135
  <!--
136
  ## Bias, Risks and Limitations
@@ -150,19 +150,19 @@ You can finetune this model on your own dataset.
150
 
151
  #### Unnamed Dataset
152
 
153
- * Size: 8,000 training samples
154
  * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
155
  * Approximate statistics based on the first 1000 samples:
156
- | | sentence_0 | sentence_1 | label |
157
- |:--------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
158
- | type | string | string | float |
159
- | details | <ul><li>min: 3 characters</li><li>mean: 51.78 characters</li><li>max: 1024 characters</li></ul> | <ul><li>min: 49 characters</li><li>mean: 136.27 characters</li><li>max: 716 characters</li></ul> | <ul><li>min: 0.2</li><li>mean: 0.52</li><li>max: 1.0</li></ul> |
160
  * Samples:
161
- | sentence_0 | sentence_1 | label |
162
- |:-----------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
163
- | <code>prix blé tendre bio Indre et Loire 2025</code> | <code>Chart Title: "Wheat (US Soft Red Winter) Spot Price", Collections: Commodity Prices</code> | <code>0.4</code> |
164
- | <code>oil prices</code> | <code>Chart Title: "West Texas Intermediate Crude Oil - Price in United States", Collections: Commodities::EIAEnergyIndicators::TimeseriesManager</code> | <code>0.8</code> |
165
- | <code>Nvidia earnings AI chip demand</code> | <code>Chart Title: "Nvidia Quarterly Price to Earnings", Collections: Companies::CompanyComputedRatiosV2::TimeseriesManager</code> | <code>0.4</code> |
166
  * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
167
  ```json
168
  {
@@ -175,8 +175,9 @@ You can finetune this model on your own dataset.
175
  #### Non-Default Hyperparameters
176
 
177
  - `eval_strategy`: steps
178
- - `per_device_train_batch_size`: 16
179
- - `per_device_eval_batch_size`: 16
 
180
 
181
  #### All Hyperparameters
182
  <details><summary>Click to expand</summary>
@@ -185,8 +186,8 @@ You can finetune this model on your own dataset.
185
  - `do_predict`: False
186
  - `eval_strategy`: steps
187
  - `prediction_loss_only`: True
188
- - `per_device_train_batch_size`: 16
189
- - `per_device_eval_batch_size`: 16
190
  - `per_gpu_train_batch_size`: None
191
  - `per_gpu_eval_batch_size`: None
192
  - `gradient_accumulation_steps`: 1
@@ -198,7 +199,7 @@ You can finetune this model on your own dataset.
198
  - `adam_beta2`: 0.999
199
  - `adam_epsilon`: 1e-08
200
  - `max_grad_norm`: 1
201
- - `num_train_epochs`: 3
202
  - `max_steps`: -1
203
  - `lr_scheduler_type`: linear
204
  - `lr_scheduler_kwargs`: {}
@@ -306,21 +307,15 @@ You can finetune this model on your own dataset.
306
  ### Training Logs
307
  | Epoch | Step | Training Loss | validation_spearman |
308
  |:-----:|:----:|:-------------:|:-------------------:|
309
- | 0.2 | 100 | - | 0.7038 |
310
- | 0.4 | 200 | - | 0.7816 |
311
- | 0.6 | 300 | - | 0.8134 |
312
- | 0.8 | 400 | - | 0.8216 |
313
- | 1.0 | 500 | 0.8021 | 0.8296 |
314
- | 1.2 | 600 | - | 0.8358 |
315
- | 1.4 | 700 | - | 0.8418 |
316
- | 1.6 | 800 | - | 0.8418 |
317
- | 1.8 | 900 | - | 0.8478 |
318
- | 2.0 | 1000 | 0.5726 | 0.8471 |
319
- | 2.2 | 1100 | - | 0.8487 |
320
- | 2.4 | 1200 | - | 0.8497 |
321
- | 2.6 | 1300 | - | 0.8522 |
322
- | 2.8 | 1400 | - | 0.8523 |
323
- | 3.0 | 1500 | 0.5616 | 0.8529 |
324
 
325
 
326
  ### Framework Versions
 
4
  - cross-encoder
5
  - reranker
6
  - generated_from_trainer
7
+ - dataset_size:3999
8
  - loss:BinaryCrossEntropyLoss
9
+ base_model: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
10
  pipeline_tag: text-ranking
11
  library_name: sentence-transformers
12
  metrics:
13
  - pearson
14
  - spearman
15
  model-index:
16
+ - name: CrossEncoder based on cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
17
  results:
18
  - task:
19
  type: cross-encoder-correlation
 
23
  type: validation
24
  metrics:
25
  - type: pearson
26
+ value: 0.7551794832253556
27
  name: Pearson
28
  - type: spearman
29
+ value: 0.8052608880870304
30
  name: Spearman
31
  ---
32
 
33
+ # CrossEncoder based on cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
34
 
35
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/mmarco-mMiniLMv2-L12-H384-v1](https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
36
 
37
  ## Model Details
38
 
39
  ### Model Description
40
  - **Model Type:** Cross Encoder
41
+ - **Base model:** [cross-encoder/mmarco-mMiniLMv2-L12-H384-v1](https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1) <!-- at revision 1427fd652930e4ba29e8149678df786c240d8825 -->
42
  - **Maximum Sequence Length:** 512 tokens
43
  - **Number of Output Labels:** 1 label
44
  <!-- - **Training Dataset:** Unknown -->
 
70
  model = CrossEncoder("cross_encoder_model_id")
71
  # Get scores for pairs of texts
72
  pairs = [
73
+ ['NVIDIA stock price trend from February 2024 to February 2025', 'Title: "Nvidia Stockpile (Annual)"\nCollections: Companies\nDatasets: StandardIncomeStatement\nChart Type: timeseries:eav_v2\nCanonical forms: "Stockpile"="inventory"\nSources: S&P Global'],
74
+ ['What is the price of Costco stock? Answer in as few words as possible.', 'Title: "Costco Quarterly Price to Earnings, Costco Stock (Annual)"\nCollections: Companies\nDatasets: StandardIncomeStatement, CompanyComputedRatiosV2\nChart Type: timeseries:eav_v2\nCanonical forms: "Price to Earnings"="computed_ratio_last_close_price_to_earnings", "Stock"="inventory"'],
75
+ ['Who was named EY World Entrepreneur Of The Year 2024?', 'Title: "World Overview"\nCollections: Companies\nChart Type: company:private\nSources: S&P Global'],
76
+ ['dubbed movies streaming', 'Title: "How Brits subscribe to film service subscriptions e.g. Sky Go (United Kingdom)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov'],
77
+ ['Virtual Reality (VR) – Meta Quest 3', 'Title: "Meta Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Meta"="Meta Platforms, Inc.", "Overview"="Stock Overview"\nSources: S&P Global'],
78
  ]
79
  scores = model.predict(pairs)
80
  print(scores.shape)
 
82
 
83
  # Or rank different texts based on similarity to a single text
84
  ranks = model.rank(
85
+ 'NVIDIA stock price trend from February 2024 to February 2025',
86
  [
87
+ 'Title: "Nvidia Stockpile (Annual)"\nCollections: Companies\nDatasets: StandardIncomeStatement\nChart Type: timeseries:eav_v2\nCanonical forms: "Stockpile"="inventory"\nSources: S&P Global',
88
+ 'Title: "Costco Quarterly Price to Earnings, Costco Stock (Annual)"\nCollections: Companies\nDatasets: StandardIncomeStatement, CompanyComputedRatiosV2\nChart Type: timeseries:eav_v2\nCanonical forms: "Price to Earnings"="computed_ratio_last_close_price_to_earnings", "Stock"="inventory"',
89
+ 'Title: "World Overview"\nCollections: Companies\nChart Type: company:private\nSources: S&P Global',
90
+ 'Title: "How Brits subscribe to film service subscriptions e.g. Sky Go (United Kingdom)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov',
91
+ 'Title: "Meta Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Meta"="Meta Platforms, Inc.", "Overview"="Stock Overview"\nSources: S&P Global',
92
  ]
93
  )
94
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 
129
 
130
  | Metric | Value |
131
  |:-------------|:-----------|
132
+ | pearson | 0.7552 |
133
+ | **spearman** | **0.8053** |
134
 
135
  <!--
136
  ## Bias, Risks and Limitations
 
150
 
151
  #### Unnamed Dataset
152
 
153
+ * Size: 3,999 training samples
154
  * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
155
  * Approximate statistics based on the first 1000 samples:
156
+ | | sentence_0 | sentence_1 | label |
157
+ |:--------|:----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
158
+ | type | string | string | float |
159
+ | details | <ul><li>min: 3 characters</li><li>mean: 43.12 characters</li><li>max: 99 characters</li></ul> | <ul><li>min: 76 characters</li><li>mean: 181.15 characters</li><li>max: 393 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.46</li><li>max: 1.0</li></ul> |
160
  * Samples:
161
+ | sentence_0 | sentence_1 | label |
162
+ |:------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
163
+ | <code>NVIDIA stock price trend from February 2024 to February 2025</code> | <code>Title: "Nvidia Stockpile (Annual)"<br>Collections: Companies<br>Datasets: StandardIncomeStatement<br>Chart Type: timeseries:eav_v2<br>Canonical forms: "Stockpile"="inventory"<br>Sources: S&P Global</code> | <code>0.0</code> |
164
+ | <code>What is the price of Costco stock? Answer in as few words as possible.</code> | <code>Title: "Costco Quarterly Price to Earnings, Costco Stock (Annual)"<br>Collections: Companies<br>Datasets: StandardIncomeStatement, CompanyComputedRatiosV2<br>Chart Type: timeseries:eav_v2<br>Canonical forms: "Price to Earnings"="computed_ratio_last_close_price_to_earnings", "Stock"="inventory"</code> | <code>0.5</code> |
165
+ | <code>Who was named EY World Entrepreneur Of The Year 2024?</code> | <code>Title: "World Overview"<br>Collections: Companies<br>Chart Type: company:private<br>Sources: S&P Global</code> | <code>0.0</code> |
166
  * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
167
  ```json
168
  {
 
175
  #### Non-Default Hyperparameters
176
 
177
  - `eval_strategy`: steps
178
+ - `per_device_train_batch_size`: 32
179
+ - `per_device_eval_batch_size`: 32
180
+ - `num_train_epochs`: 5
181
 
182
  #### All Hyperparameters
183
  <details><summary>Click to expand</summary>
 
186
  - `do_predict`: False
187
  - `eval_strategy`: steps
188
  - `prediction_loss_only`: True
189
+ - `per_device_train_batch_size`: 32
190
+ - `per_device_eval_batch_size`: 32
191
  - `per_gpu_train_batch_size`: None
192
  - `per_gpu_eval_batch_size`: None
193
  - `gradient_accumulation_steps`: 1
 
199
  - `adam_beta2`: 0.999
200
  - `adam_epsilon`: 1e-08
201
  - `max_grad_norm`: 1
202
+ - `num_train_epochs`: 5
203
  - `max_steps`: -1
204
  - `lr_scheduler_type`: linear
205
  - `lr_scheduler_kwargs`: {}
 
307
  ### Training Logs
308
  | Epoch | Step | Training Loss | validation_spearman |
309
  |:-----:|:----:|:-------------:|:-------------------:|
310
+ | 0.8 | 100 | - | 0.7305 |
311
+ | 1.0 | 125 | - | 0.7516 |
312
+ | 1.6 | 200 | - | 0.7809 |
313
+ | 2.0 | 250 | - | 0.7922 |
314
+ | 2.4 | 300 | - | 0.7947 |
315
+ | 3.0 | 375 | - | 0.8022 |
316
+ | 3.2 | 400 | - | 0.7995 |
317
+ | 4.0 | 500 | 0.5555 | 0.8045 |
318
+ | 4.8 | 600 | - | 0.8053 |
 
 
 
 
 
 
319
 
320
 
321
  ### Framework Versions
config.json CHANGED
@@ -1,11 +1,12 @@
1
  {
2
  "architectures": [
3
- "BertForSequenceClassification"
4
  ],
5
  "attention_probs_dropout_prob": 0.1,
 
6
  "classifier_dropout": null,
7
  "dtype": "float32",
8
- "gradient_checkpointing": false,
9
  "hidden_act": "gelu",
10
  "hidden_dropout_prob": 0.1,
11
  "hidden_size": 384,
@@ -17,19 +18,19 @@
17
  "label2id": {
18
  "LABEL_0": 0
19
  },
20
- "layer_norm_eps": 1e-12,
21
- "max_position_embeddings": 512,
22
- "model_type": "bert",
23
  "num_attention_heads": 12,
24
- "num_hidden_layers": 6,
25
- "pad_token_id": 0,
26
  "position_embedding_type": "absolute",
27
  "sentence_transformers": {
28
  "activation_fn": "torch.nn.modules.linear.Identity",
29
  "version": "5.1.1"
30
  },
31
  "transformers_version": "4.57.1",
32
- "type_vocab_size": 2,
33
  "use_cache": true,
34
- "vocab_size": 30522
35
  }
 
1
  {
2
  "architectures": [
3
+ "XLMRobertaForSequenceClassification"
4
  ],
5
  "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
  "classifier_dropout": null,
8
  "dtype": "float32",
9
+ "eos_token_id": 2,
10
  "hidden_act": "gelu",
11
  "hidden_dropout_prob": 0.1,
12
  "hidden_size": 384,
 
18
  "label2id": {
19
  "LABEL_0": 0
20
  },
21
+ "layer_norm_eps": 1e-05,
22
+ "max_position_embeddings": 514,
23
+ "model_type": "xlm-roberta",
24
  "num_attention_heads": 12,
25
+ "num_hidden_layers": 12,
26
+ "pad_token_id": 1,
27
  "position_embedding_type": "absolute",
28
  "sentence_transformers": {
29
  "activation_fn": "torch.nn.modules.linear.Identity",
30
  "version": "5.1.1"
31
  },
32
  "transformers_version": "4.57.1",
33
+ "type_vocab_size": 1,
34
  "use_cache": true,
35
+ "vocab_size": 250002
36
  }
eval/CrossEncoderCorrelationEvaluator_validation_results.csv CHANGED
@@ -1,4 +1,6 @@
1
  epoch,steps,Pearson_Correlation,Spearman_Correlation
2
- 1.0,500,0.8334498280984426,0.8296374514172629
3
- 2.0,1000,0.8444343598056561,0.8471494664684638
4
- 3.0,1500,0.8481096700155641,0.8528646396544212
 
 
 
1
  epoch,steps,Pearson_Correlation,Spearman_Correlation
2
+ 1.0,125,0.7309011622271578,0.7516436739892058
3
+ 2.0,250,0.7588798492491784,0.7921942665271138
4
+ 3.0,375,0.7523098419884638,0.8021607473982901
5
+ 4.0,500,0.7556591422221105,0.8044702495085688
6
+ 5.0,625,0.7553359223115874,0.8050527106824349
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:93357cfe857f758d0ab0429d2076e1599cd7661ab2cc03f999bede0267e1167c
3
- size 90866412
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f41f04568b485258127e43e4fb378afcbdb017f968f848d88687ca2ba76591e
3
+ size 470588492
special_tokens_map.json CHANGED
@@ -1,34 +1,48 @@
1
  {
 
 
 
 
 
 
 
2
  "cls_token": {
3
- "content": "[CLS]",
4
  "lstrip": false,
5
  "normalized": false,
6
  "rstrip": false,
7
  "single_word": false
8
  },
9
- "mask_token": {
10
- "content": "[MASK]",
11
  "lstrip": false,
12
  "normalized": false,
13
  "rstrip": false,
14
  "single_word": false
15
  },
 
 
 
 
 
 
 
16
  "pad_token": {
17
- "content": "[PAD]",
18
  "lstrip": false,
19
  "normalized": false,
20
  "rstrip": false,
21
  "single_word": false
22
  },
23
  "sep_token": {
24
- "content": "[SEP]",
25
  "lstrip": false,
26
  "normalized": false,
27
  "rstrip": false,
28
  "single_word": false
29
  },
30
  "unk_token": {
31
- "content": "[UNK]",
32
  "lstrip": false,
33
  "normalized": false,
34
  "rstrip": false,
 
1
  {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
  "cls_token": {
10
+ "content": "<s>",
11
  "lstrip": false,
12
  "normalized": false,
13
  "rstrip": false,
14
  "single_word": false
15
  },
16
+ "eos_token": {
17
+ "content": "</s>",
18
  "lstrip": false,
19
  "normalized": false,
20
  "rstrip": false,
21
  "single_word": false
22
  },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
  "pad_token": {
31
+ "content": "<pad>",
32
  "lstrip": false,
33
  "normalized": false,
34
  "rstrip": false,
35
  "single_word": false
36
  },
37
  "sep_token": {
38
+ "content": "</s>",
39
  "lstrip": false,
40
  "normalized": false,
41
  "rstrip": false,
42
  "single_word": false
43
  },
44
  "unk_token": {
45
+ "content": "<unk>",
46
  "lstrip": false,
47
  "normalized": false,
48
  "rstrip": false,
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -1,58 +1,55 @@
1
  {
2
  "added_tokens_decoder": {
3
  "0": {
4
- "content": "[PAD]",
5
  "lstrip": false,
6
  "normalized": false,
7
  "rstrip": false,
8
  "single_word": false,
9
  "special": true
10
  },
11
- "100": {
12
- "content": "[UNK]",
13
  "lstrip": false,
14
  "normalized": false,
15
  "rstrip": false,
16
  "single_word": false,
17
  "special": true
18
  },
19
- "101": {
20
- "content": "[CLS]",
21
  "lstrip": false,
22
  "normalized": false,
23
  "rstrip": false,
24
  "single_word": false,
25
  "special": true
26
  },
27
- "102": {
28
- "content": "[SEP]",
29
  "lstrip": false,
30
  "normalized": false,
31
  "rstrip": false,
32
  "single_word": false,
33
  "special": true
34
  },
35
- "103": {
36
- "content": "[MASK]",
37
- "lstrip": false,
38
  "normalized": false,
39
  "rstrip": false,
40
  "single_word": false,
41
  "special": true
42
  }
43
  },
44
- "clean_up_tokenization_spaces": true,
45
- "cls_token": "[CLS]",
46
- "do_basic_tokenize": true,
47
- "do_lower_case": true,
48
  "extra_special_tokens": {},
49
- "mask_token": "[MASK]",
50
  "model_max_length": 512,
51
- "never_split": null,
52
- "pad_token": "[PAD]",
53
- "sep_token": "[SEP]",
54
- "strip_accents": null,
55
- "tokenize_chinese_chars": true,
56
- "tokenizer_class": "BertTokenizer",
57
- "unk_token": "[UNK]"
58
  }
 
1
  {
2
  "added_tokens_decoder": {
3
  "0": {
4
+ "content": "<s>",
5
  "lstrip": false,
6
  "normalized": false,
7
  "rstrip": false,
8
  "single_word": false,
9
  "special": true
10
  },
11
+ "1": {
12
+ "content": "<pad>",
13
  "lstrip": false,
14
  "normalized": false,
15
  "rstrip": false,
16
  "single_word": false,
17
  "special": true
18
  },
19
+ "2": {
20
+ "content": "</s>",
21
  "lstrip": false,
22
  "normalized": false,
23
  "rstrip": false,
24
  "single_word": false,
25
  "special": true
26
  },
27
+ "3": {
28
+ "content": "<unk>",
29
  "lstrip": false,
30
  "normalized": false,
31
  "rstrip": false,
32
  "single_word": false,
33
  "special": true
34
  },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
  "normalized": false,
39
  "rstrip": false,
40
  "single_word": false,
41
  "special": true
42
  }
43
  },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
  "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
  "model_max_length": 512,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "tokenizer_class": "XLMRobertaTokenizer",
54
+ "unk_token": "<unk>"
 
 
 
55
  }
training_info.txt CHANGED
@@ -1,6 +1,9 @@
1
- Base Model: cross-encoder/ms-marco-MiniLM-L6-v2
2
- Training Samples: 8000
3
- Epochs: 3
4
- Batch Size: 16
5
  Learning Rate: 2e-05
 
 
 
6
  Max Length: 512
 
1
+ Base Model: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
2
+ Training Samples: 3999
3
+ Epochs: 5
4
+ Batch Size: 32
5
  Learning Rate: 2e-05
6
+ Weight Decay: 0.01
7
+ Scheduler: warmuplinear
8
+ Warmup Steps: 100
9
  Max Length: 512