noahjax commited on
Commit
358ba33
·
verified ·
1 Parent(s): f12de44

Upload fine-tuned chart reranker model

Browse files
README.md CHANGED
@@ -4,7 +4,7 @@ tags:
4
  - cross-encoder
5
  - reranker
6
  - generated_from_trainer
7
- - dataset_size:7994
8
  - loss:BinaryCrossEntropyLoss
9
  base_model: Alibaba-NLP/gte-multilingual-reranker-base
10
  pipeline_tag: text-ranking
@@ -23,10 +23,10 @@ model-index:
23
  type: validation
24
  metrics:
25
  - type: pearson
26
- value: 0.8786187568919903
27
  name: Pearson
28
  - type: spearman
29
- value: 0.8799620299766784
30
  name: Spearman
31
  ---
32
 
@@ -70,11 +70,11 @@ from sentence_transformers import CrossEncoder
70
  model = CrossEncoder("cross_encoder_model_id")
71
  # Get scores for pairs of texts
72
  pairs = [
73
- ['les différences entre les ETF Long-only, les ETF actifs, les ETF à effet de levier et Short', 'Title: "Billon et Cie SA Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Billon et Cie SA"="Patrimoine et Commerce SA", "Overview"="Stock Overview"\nSources: S&P Global'],
74
- ['Premier League schedule', 'Title: "Champions League Schedule"\nCollections: Soccer\nChart Type: schedule:soccer_league_v2'],
75
- ['nikki haley favorability', 'Title: "Nikki Haley fame & popularity tracker (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov'],
76
- ['JP Morgan Goldman Sachs return on equity comparison 2015-2025', 'Title: "MORGAN Quarterly Return on Equity"\nCollections: Companies\nDatasets: CompanyComputedRatiosV2\nChart Type: timeseries:eav_v2\nCanonical forms: "Return on Equity"="computed_ratio_return_on_equity"\nSources: S&P Global'],
77
- ['Top teams in the premier league', 'Title: "Mobile Premier League Overview"\nCollections: Companies\nChart Type: company:private\nSources: S&P Global'],
78
  ]
79
  scores = model.predict(pairs)
80
  print(scores.shape)
@@ -82,13 +82,13 @@ print(scores.shape)
82
 
83
  # Or rank different texts based on similarity to a single text
84
  ranks = model.rank(
85
- 'les différences entre les ETF Long-only, les ETF actifs, les ETF à effet de levier et Short',
86
  [
87
- 'Title: "Billon et Cie SA Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Billon et Cie SA"="Patrimoine et Commerce SA", "Overview"="Stock Overview"\nSources: S&P Global',
88
- 'Title: "Champions League Schedule"\nCollections: Soccer\nChart Type: schedule:soccer_league_v2',
89
- 'Title: "Nikki Haley fame & popularity tracker (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov',
90
- 'Title: "MORGAN Quarterly Return on Equity"\nCollections: Companies\nDatasets: CompanyComputedRatiosV2\nChart Type: timeseries:eav_v2\nCanonical forms: "Return on Equity"="computed_ratio_return_on_equity"\nSources: S&P Global',
91
- 'Title: "Mobile Premier League Overview"\nCollections: Companies\nChart Type: company:private\nSources: S&P Global',
92
  ]
93
  )
94
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -127,10 +127,10 @@ You can finetune this model on your own dataset.
127
  * Dataset: `validation`
128
  * Evaluated with [<code>CrossEncoderCorrelationEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderCorrelationEvaluator)
129
 
130
- | Metric | Value |
131
- |:-------------|:---------|
132
- | pearson | 0.8786 |
133
- | **spearman** | **0.88** |
134
 
135
  <!--
136
  ## Bias, Risks and Limitations
@@ -150,19 +150,19 @@ You can finetune this model on your own dataset.
150
 
151
  #### Unnamed Dataset
152
 
153
- * Size: 7,994 training samples
154
  * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
155
  * Approximate statistics based on the first 1000 samples:
156
  | | sentence_0 | sentence_1 | label |
157
  |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
158
  | type | string | string | float |
159
- | details | <ul><li>min: 7 characters</li><li>mean: 45.12 characters</li><li>max: 144 characters</li></ul> | <ul><li>min: 75 characters</li><li>mean: 185.28 characters</li><li>max: 444 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.47</li><li>max: 1.0</li></ul> |
160
  * Samples:
161
- | sentence_0 | sentence_1 | label |
162
- |:---------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------|
163
- | <code>les différences entre les ETF Long-only, les ETF actifs, les ETF à effet de levier et Short</code> | <code>Title: "Billon et Cie SA Overview"<br>Collections: Companies<br>Chart Type: company:finance<br>Canonical forms: "Billon et Cie SA"="Patrimoine et Commerce SA", "Overview"="Stock Overview"<br>Sources: S&P Global</code> | <code>0.0</code> |
164
- | <code>Premier League schedule</code> | <code>Title: "Champions League Schedule"<br>Collections: Soccer<br>Chart Type: schedule:soccer_league_v2</code> | <code>0.25</code> |
165
- | <code>nikki haley favorability</code> | <code>Title: "Nikki Haley fame & popularity tracker (United States)"<br>Collections: YouGov Trackers<br>Datasets: YouGovTrackerValueV2<br>Chart Type: survey:timeseries<br>Sources: YouGov</code> | <code>1.0</code> |
166
  * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
167
  ```json
168
  {
@@ -306,21 +306,25 @@ You can finetune this model on your own dataset.
306
  </details>
307
 
308
  ### Training Logs
309
- | Epoch | Step | Training Loss | validation_spearman |
310
- |:-----:|:----:|:-------------:|:-------------------:|
311
- | 0.4 | 100 | - | 0.8045 |
312
- | 0.8 | 200 | - | 0.8455 |
313
- | 1.0 | 250 | - | 0.8493 |
314
- | 1.2 | 300 | - | 0.8547 |
315
- | 1.6 | 400 | - | 0.8622 |
316
- | 2.0 | 500 | 0.4442 | 0.8714 |
317
- | 2.4 | 600 | - | 0.8719 |
318
- | 2.8 | 700 | - | 0.8718 |
319
- | 3.0 | 750 | - | 0.8760 |
320
- | 3.2 | 800 | - | 0.8770 |
321
- | 3.6 | 900 | - | 0.8795 |
322
- | 4.0 | 1000 | 0.3612 | 0.8775 |
323
- | 4.4 | 1100 | - | 0.8800 |
 
 
 
 
324
 
325
 
326
  ### Framework Versions
 
4
  - cross-encoder
5
  - reranker
6
  - generated_from_trainer
7
+ - dataset_size:8352
8
  - loss:BinaryCrossEntropyLoss
9
  base_model: Alibaba-NLP/gte-multilingual-reranker-base
10
  pipeline_tag: text-ranking
 
23
  type: validation
24
  metrics:
25
  - type: pearson
26
+ value: 0.8860059576990913
27
  name: Pearson
28
  - type: spearman
29
+ value: 0.8842438421497182
30
  name: Spearman
31
  ---
32
 
 
70
  model = CrossEncoder("cross_encoder_model_id")
71
  # Get scores for pairs of texts
72
  pairs = [
73
+ ['cas similaires entrepreneurs création entreprises apports intellectuels succès échecs', 'Title: "SNPS Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "SNPS"="Synopsys, Inc.", "Overview"="Stock Overview"\nSources: S&P Global'],
74
+ ['Lakers Nuggets preview', 'Title: "Los Angeles Lakers Schedule"\nCollections: NBA\nChart Type: schedule:basketball_team_v2'],
75
+ ['Bitcoin performance compared to Altcoin performance in 2025', 'Title: "CBTC Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "CBTC"="XTRA Bitcoin Inc.", "Overview"="Stock Overview"\nSources: S&P Global'],
76
+ ['Nvidia market capitalization', 'Title: "Nvidia Market Capitalization"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Nvidia"="NVIDIA Corporation", "Market Capitalization"="Valuation Overview"\nSources: S&P Global'],
77
+ ['scope of the EU cyber resilience act', 'Title: "League of Legends European Championship Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "League of Legends European Championship"="LEC, Inc.", "Overview"="Stock Overview"\nSources: S&P Global'],
78
  ]
79
  scores = model.predict(pairs)
80
  print(scores.shape)
 
82
 
83
  # Or rank different texts based on similarity to a single text
84
  ranks = model.rank(
85
+ 'cas similaires entrepreneurs création entreprises apports intellectuels succès échecs',
86
  [
87
+ 'Title: "SNPS Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "SNPS"="Synopsys, Inc.", "Overview"="Stock Overview"\nSources: S&P Global',
88
+ 'Title: "Los Angeles Lakers Schedule"\nCollections: NBA\nChart Type: schedule:basketball_team_v2',
89
+ 'Title: "CBTC Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "CBTC"="XTRA Bitcoin Inc.", "Overview"="Stock Overview"\nSources: S&P Global',
90
+ 'Title: "Nvidia Market Capitalization"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Nvidia"="NVIDIA Corporation", "Market Capitalization"="Valuation Overview"\nSources: S&P Global',
91
+ 'Title: "League of Legends European Championship Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "League of Legends European Championship"="LEC, Inc.", "Overview"="Stock Overview"\nSources: S&P Global',
92
  ]
93
  )
94
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 
127
  * Dataset: `validation`
128
  * Evaluated with [<code>CrossEncoderCorrelationEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderCorrelationEvaluator)
129
 
130
+ | Metric | Value |
131
+ |:-------------|:-----------|
132
+ | pearson | 0.886 |
133
+ | **spearman** | **0.8842** |
134
 
135
  <!--
136
  ## Bias, Risks and Limitations
 
150
 
151
  #### Unnamed Dataset
152
 
153
+ * Size: 8,352 training samples
154
  * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
155
  * Approximate statistics based on the first 1000 samples:
156
  | | sentence_0 | sentence_1 | label |
157
  |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
158
  | type | string | string | float |
159
+ | details | <ul><li>min: 9 characters</li><li>mean: 45.67 characters</li><li>max: 174 characters</li></ul> | <ul><li>min: 76 characters</li><li>mean: 186.96 characters</li><li>max: 350 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.47</li><li>max: 1.0</li></ul> |
160
  * Samples:
161
+ | sentence_0 | sentence_1 | label |
162
+ |:---------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------|
163
+ | <code>cas similaires entrepreneurs création entreprises apports intellectuels succès échecs</code> | <code>Title: "SNPS Overview"<br>Collections: Companies<br>Chart Type: company:finance<br>Canonical forms: "SNPS"="Synopsys, Inc.", "Overview"="Stock Overview"<br>Sources: S&P Global</code> | <code>0.5</code> |
164
+ | <code>Lakers Nuggets preview</code> | <code>Title: "Los Angeles Lakers Schedule"<br>Collections: NBA<br>Chart Type: schedule:basketball_team_v2</code> | <code>0.75</code> |
165
+ | <code>Bitcoin performance compared to Altcoin performance in 2025</code> | <code>Title: "CBTC Overview"<br>Collections: Companies<br>Chart Type: company:finance<br>Canonical forms: "CBTC"="XTRA Bitcoin Inc.", "Overview"="Stock Overview"<br>Sources: S&P Global</code> | <code>0.0</code> |
166
  * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
167
  ```json
168
  {
 
306
  </details>
307
 
308
  ### Training Logs
309
+ | Epoch | Step | Training Loss | validation_spearman |
310
+ |:------:|:----:|:-------------:|:-------------------:|
311
+ | 0.3831 | 100 | - | 0.8141 |
312
+ | 0.7663 | 200 | - | 0.8486 |
313
+ | 1.0 | 261 | - | 0.8624 |
314
+ | 1.1494 | 300 | - | 0.8641 |
315
+ | 1.5326 | 400 | - | 0.8683 |
316
+ | 1.9157 | 500 | 0.4409 | 0.8728 |
317
+ | 2.0 | 522 | - | 0.8732 |
318
+ | 2.2989 | 600 | - | 0.8731 |
319
+ | 2.6820 | 700 | - | 0.8803 |
320
+ | 3.0 | 783 | - | 0.8804 |
321
+ | 3.0651 | 800 | - | 0.8809 |
322
+ | 3.4483 | 900 | - | 0.8800 |
323
+ | 3.8314 | 1000 | 0.3641 | 0.8825 |
324
+ | 4.0 | 1044 | - | 0.8836 |
325
+ | 4.2146 | 1100 | - | 0.8826 |
326
+ | 4.5977 | 1200 | - | 0.8821 |
327
+ | 4.9808 | 1300 | - | 0.8842 |
328
 
329
 
330
  ### Framework Versions
eval/CrossEncoderCorrelationEvaluator_validation_results.csv CHANGED
@@ -1,6 +1,6 @@
1
  epoch,steps,Pearson_Correlation,Spearman_Correlation
2
- 1.0,250,0.8519500032800031,0.8492531733110029
3
- 2.0,500,0.8742648640961522,0.8714467607534195
4
- 3.0,750,0.8746575213391334,0.8760325627001279
5
- 4.0,1000,0.8778473258529178,0.8774898613680182
6
- 5.0,1250,0.8791097938993816,0.879810679039627
 
1
  epoch,steps,Pearson_Correlation,Spearman_Correlation
2
+ 1.0,261,0.8664252648440269,0.8624316344419658
3
+ 2.0,522,0.876051000065144,0.8731512392307483
4
+ 3.0,783,0.8816187173145336,0.8804230133811932
5
+ 4.0,1044,0.8854817235935349,0.8835956509201133
6
+ 5.0,1305,0.8859959671748322,0.8842324890429593
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:647a48115afe17f6ac2dd4305a0ad9a990f696b16b25b1a68bbc7670ff1749f7
3
  size 1223854204
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb79ac26ead9212291d08a2777f15be6061a36463e79e6fc16768d9a6756bf44
3
  size 1223854204
training_info.txt CHANGED
@@ -1,5 +1,5 @@
1
  Base Model: Alibaba-NLP/gte-multilingual-reranker-base
2
- Training Samples: 7994
3
  Epochs: 5
4
  Batch Size: 32
5
  Learning Rate: 2e-05
 
1
  Base Model: Alibaba-NLP/gte-multilingual-reranker-base
2
+ Training Samples: 8352
3
  Epochs: 5
4
  Batch Size: 32
5
  Learning Rate: 2e-05