TakoData
/

chart-reranker

@@ -4,7 +4,7 @@ tags:
 - cross-encoder
 - reranker
 - generated_from_trainer
-- dataset_size:7994
 - loss:BinaryCrossEntropyLoss
 base_model: Alibaba-NLP/gte-multilingual-reranker-base
 pipeline_tag: text-ranking
@@ -23,10 +23,10 @@ model-index:
       type: validation
     metrics:
     - type: pearson
-      value: 0.8786187568919903
       name: Pearson
     - type: spearman
-      value: 0.8799620299766784
       name: Spearman
 ---
@@ -70,11 +70,11 @@ from sentence_transformers import CrossEncoder
 model = CrossEncoder("cross_encoder_model_id")
 # Get scores for pairs of texts
 pairs = [
-    ['les différences entre les ETF Long-only, les ETF actifs, les ETF à effet de levier et Short', 'Title: "Billon et Cie SA Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Billon et Cie SA"="Patrimoine et Commerce SA", "Overview"="Stock Overview"\nSources: S&P Global'],
-    ['Premier League schedule', 'Title: "Champions League Schedule"\nCollections: Soccer\nChart Type: schedule:soccer_league_v2'],
-    ['nikki haley favorability', 'Title: "Nikki Haley fame & popularity tracker (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov'],
-    ['JP Morgan Goldman Sachs return on equity comparison 2015-2025', 'Title: "MORGAN Quarterly Return on Equity"\nCollections: Companies\nDatasets: CompanyComputedRatiosV2\nChart Type: timeseries:eav_v2\nCanonical forms: "Return on Equity"="computed_ratio_return_on_equity"\nSources: S&P Global'],
-    ['Top teams in the premier league', 'Title: "Mobile Premier League Overview"\nCollections: Companies\nChart Type: company:private\nSources: S&P Global'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
@@ -82,13 +82,13 @@ print(scores.shape)
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
-    'les différences entre les ETF Long-only, les ETF actifs, les ETF à effet de levier et Short',
     [
-        'Title: "Billon et Cie SA Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Billon et Cie SA"="Patrimoine et Commerce SA", "Overview"="Stock Overview"\nSources: S&P Global',
-        'Title: "Champions League Schedule"\nCollections: Soccer\nChart Type: schedule:soccer_league_v2',
-        'Title: "Nikki Haley fame & popularity tracker (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov',
-        'Title: "MORGAN Quarterly Return on Equity"\nCollections: Companies\nDatasets: CompanyComputedRatiosV2\nChart Type: timeseries:eav_v2\nCanonical forms: "Return on Equity"="computed_ratio_return_on_equity"\nSources: S&P Global',
-        'Title: "Mobile Premier League Overview"\nCollections: Companies\nChart Type: company:private\nSources: S&P Global',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -127,10 +127,10 @@ You can finetune this model on your own dataset.
 * Dataset: `validation`
 * Evaluated with [<code>CrossEncoderCorrelationEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderCorrelationEvaluator)
-| Metric       | Value    |
-|:-------------|:---------|
-| pearson      | 0.8786   |
-| **spearman** | **0.88** |
 <!--
 ## Bias, Risks and Limitations
@@ -150,19 +150,19 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 7,994 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence_0                                                                                     | sentence_1                                                                                       | label                                                          |
   |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
   | type    | string                                                                                         | string                                                                                           | float                                                          |
-  | details | <ul><li>min: 7 characters</li><li>mean: 45.12 characters</li><li>max: 144 characters</li></ul> | <ul><li>min: 75 characters</li><li>mean: 185.28 characters</li><li>max: 444 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.47</li><li>max: 1.0</li></ul> |
 * Samples:
-  | sentence_0                                                                                               | sentence_1                                                                                                                                                                                                                      | label             |
-  |:---------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------|
-  | <code>les différences entre les ETF Long-only, les ETF actifs, les ETF à effet de levier et Short</code> | <code>Title: "Billon et Cie SA Overview"<br>Collections: Companies<br>Chart Type: company:finance<br>Canonical forms: "Billon et Cie SA"="Patrimoine et Commerce SA", "Overview"="Stock Overview"<br>Sources: S&P Global</code> | <code>0.0</code>  |
-  | <code>Premier League schedule</code>                                                                     | <code>Title: "Champions League Schedule"<br>Collections: Soccer<br>Chart Type: schedule:soccer_league_v2</code>                                                                                                                 | <code>0.25</code> |
-  | <code>nikki haley favorability</code>                                                                    | <code>Title: "Nikki Haley fame & popularity tracker (United States)"<br>Collections: YouGov Trackers<br>Datasets: YouGovTrackerValueV2<br>Chart Type: survey:timeseries<br>Sources: YouGov</code>                               | <code>1.0</code>  |
 * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
   ```json
   {
@@ -306,21 +306,25 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch | Step | Training Loss | validation_spearman |
-|:-----:|:----:|:-------------:|:-------------------:|
-| 0.4   | 100  | -             | 0.8045              |
-| 0.8   | 200  | -             | 0.8455              |
-| 1.0   | 250  | -             | 0.8493              |
-| 1.2   | 300  | -             | 0.8547              |
-| 1.6   | 400  | -             | 0.8622              |
-| 2.0   | 500  | 0.4442        | 0.8714              |
-| 2.4   | 600  | -             | 0.8719              |
-| 2.8   | 700  | -             | 0.8718              |
-| 3.0   | 750  | -             | 0.8760              |
-| 3.2   | 800  | -             | 0.8770              |
-| 3.6   | 900  | -             | 0.8795              |
-| 4.0   | 1000 | 0.3612        | 0.8775              |
-| 4.4   | 1100 | -             | 0.8800              |
 ### Framework Versions

 - cross-encoder
 - reranker
 - generated_from_trainer
+- dataset_size:8352
 - loss:BinaryCrossEntropyLoss
 base_model: Alibaba-NLP/gte-multilingual-reranker-base
 pipeline_tag: text-ranking
       type: validation
     metrics:
     - type: pearson
+      value: 0.8860059576990913
       name: Pearson
     - type: spearman
+      value: 0.8842438421497182
       name: Spearman
 ---
 model = CrossEncoder("cross_encoder_model_id")
 # Get scores for pairs of texts
 pairs = [
+    ['cas similaires entrepreneurs création entreprises apports intellectuels succès échecs', 'Title: "SNPS Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "SNPS"="Synopsys, Inc.", "Overview"="Stock Overview"\nSources: S&P Global'],
+    ['Lakers Nuggets preview', 'Title: "Los Angeles Lakers Schedule"\nCollections: NBA\nChart Type: schedule:basketball_team_v2'],
+    ['Bitcoin performance compared to Altcoin performance in 2025', 'Title: "CBTC Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "CBTC"="XTRA Bitcoin Inc.", "Overview"="Stock Overview"\nSources: S&P Global'],
+    ['Nvidia market capitalization', 'Title: "Nvidia Market Capitalization"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Nvidia"="NVIDIA Corporation", "Market Capitalization"="Valuation Overview"\nSources: S&P Global'],
+    ['scope of the EU cyber resilience act', 'Title: "League of Legends European Championship Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "League of Legends European Championship"="LEC, Inc.", "Overview"="Stock Overview"\nSources: S&P Global'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
+    'cas similaires entrepreneurs création entreprises apports intellectuels succès échecs',
     [
+        'Title: "SNPS Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "SNPS"="Synopsys, Inc.", "Overview"="Stock Overview"\nSources: S&P Global',
+        'Title: "Los Angeles Lakers Schedule"\nCollections: NBA\nChart Type: schedule:basketball_team_v2',
+        'Title: "CBTC Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "CBTC"="XTRA Bitcoin Inc.", "Overview"="Stock Overview"\nSources: S&P Global',
+        'Title: "Nvidia Market Capitalization"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Nvidia"="NVIDIA Corporation", "Market Capitalization"="Valuation Overview"\nSources: S&P Global',
+        'Title: "League of Legends European Championship Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "League of Legends European Championship"="LEC, Inc.", "Overview"="Stock Overview"\nSources: S&P Global',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 * Dataset: `validation`
 * Evaluated with [<code>CrossEncoderCorrelationEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderCorrelationEvaluator)
+| Metric       | Value      |
+|:-------------|:-----------|
+| pearson      | 0.886      |
+| **spearman** | **0.8842** |
 <!--
 ## Bias, Risks and Limitations
 #### Unnamed Dataset
+* Size: 8,352 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence_0                                                                                     | sentence_1                                                                                       | label                                                          |
   |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
   | type    | string                                                                                         | string                                                                                           | float                                                          |
+  | details | <ul><li>min: 9 characters</li><li>mean: 45.67 characters</li><li>max: 174 characters</li></ul> | <ul><li>min: 76 characters</li><li>mean: 186.96 characters</li><li>max: 350 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.47</li><li>max: 1.0</li></ul> |
 * Samples:
+  | sentence_0                                                                                         | sentence_1                                                                                                                                                                                      | label             |
+  |:---------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------|
+  | <code>cas similaires entrepreneurs création entreprises apports intellectuels succès échecs</code> | <code>Title: "SNPS Overview"<br>Collections: Companies<br>Chart Type: company:finance<br>Canonical forms: "SNPS"="Synopsys, Inc.", "Overview"="Stock Overview"<br>Sources: S&P Global</code>    | <code>0.5</code>  |
+  | <code>Lakers Nuggets preview</code>                                                                | <code>Title: "Los Angeles Lakers Schedule"<br>Collections: NBA<br>Chart Type: schedule:basketball_team_v2</code>                                                                                | <code>0.75</code> |
+  | <code>Bitcoin performance compared to Altcoin performance in 2025</code>                           | <code>Title: "CBTC Overview"<br>Collections: Companies<br>Chart Type: company:finance<br>Canonical forms: "CBTC"="XTRA Bitcoin Inc.", "Overview"="Stock Overview"<br>Sources: S&P Global</code> | <code>0.0</code>  |
 * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
   ```json
   {
 </details>
 ### Training Logs
+| Epoch  | Step | Training Loss | validation_spearman |
+|:------:|:----:|:-------------:|:-------------------:|
+| 0.3831 | 100  | -             | 0.8141              |
+| 0.7663 | 200  | -             | 0.8486              |
+| 1.0    | 261  | -             | 0.8624              |
+| 1.1494 | 300  | -             | 0.8641              |
+| 1.5326 | 400  | -             | 0.8683              |
+| 1.9157 | 500  | 0.4409        | 0.8728              |
+| 2.0    | 522  | -             | 0.8732              |
+| 2.2989 | 600  | -             | 0.8731              |
+| 2.6820 | 700  | -             | 0.8803              |
+| 3.0    | 783  | -             | 0.8804              |
+| 3.0651 | 800  | -             | 0.8809              |
+| 3.4483 | 900  | -             | 0.8800              |
+| 3.8314 | 1000 | 0.3641        | 0.8825              |
+| 4.0    | 1044 | -             | 0.8836              |
+| 4.2146 | 1100 | -             | 0.8826              |
+| 4.5977 | 1200 | -             | 0.8821              |
+| 4.9808 | 1300 | -             | 0.8842              |
 ### Framework Versions

eval/CrossEncoderCorrelationEvaluator_validation_results.csv CHANGED Viewed

@@ -1,6 +1,6 @@
 epoch,steps,Pearson_Correlation,Spearman_Correlation
-1.0,250,0.8519500032800031,0.8492531733110029
-2.0,500,0.8742648640961522,0.8714467607534195
-3.0,750,0.8746575213391334,0.8760325627001279
-4.0,1000,0.8778473258529178,0.8774898613680182
-5.0,1250,0.8791097938993816,0.879810679039627

 epoch,steps,Pearson_Correlation,Spearman_Correlation
+1.0,261,0.8664252648440269,0.8624316344419658
+2.0,522,0.876051000065144,0.8731512392307483
+3.0,783,0.8816187173145336,0.8804230133811932
+4.0,1044,0.8854817235935349,0.8835956509201133
+5.0,1305,0.8859959671748322,0.8842324890429593

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:647a48115afe17f6ac2dd4305a0ad9a990f696b16b25b1a68bbc7670ff1749f7
 size 1223854204

 version https://git-lfs.github.com/spec/v1
+oid sha256:eb79ac26ead9212291d08a2777f15be6061a36463e79e6fc16768d9a6756bf44
 size 1223854204

training_info.txt CHANGED Viewed

@@ -1,5 +1,5 @@
 Base Model: Alibaba-NLP/gte-multilingual-reranker-base
-Training Samples: 7994
 Epochs: 5
 Batch Size: 32
 Learning Rate: 2e-05

 Base Model: Alibaba-NLP/gte-multilingual-reranker-base
+Training Samples: 8352
 Epochs: 5
 Batch Size: 32
 Learning Rate: 2e-05