TakoData
/

chart-reranker

@@ -4,7 +4,7 @@ tags:
 - cross-encoder
 - reranker
 - generated_from_trainer
-- dataset_size:6851
 - loss:BinaryCrossEntropyLoss
 base_model: cross-encoder/ms-marco-MiniLM-L6-v2
 pipeline_tag: text-ranking
@@ -23,10 +23,10 @@ model-index:
       type: validation
     metrics:
     - type: pearson
-      value: 0.6742730018723011
       name: Pearson
     - type: spearman
-      value: 0.5158175772359095
       name: Spearman
 ---
@@ -70,11 +70,11 @@ from sentence_transformers import CrossEncoder
 model = CrossEncoder("cross_encoder_model_id")
 # Get scores for pairs of texts
 pairs = [
-    ['According to a study by the Global Sustainable Tourism Council, by what percentage can sustainable tourism practices increase visitor satisfaction?', 'Title: "Life satisfaction, measured weekly (United Kingdom)"\n  Collections: YouGov Trackers\n  Datasets: YouGovTrackerValueV2\n  Chart Type: survey:timeseries\n  Sources: YouGov'],
-    ['Scoreline for Al‑Bayraq W vs Al‑Riyadh W (WFDL)', 'Title: "Grainger Overview, CBSE:IAM Overview"\n  Collections: Companies\n  Datasets: InstrumentClosePrice1Day\n  Chart Type: timeseries'],
-    ["According to the article 'Top 3 Higher Education Trends to Watch in 2025' by Hanover Research, what percentage of prospective college students in the U.S. report feeling 'not at all familiar' or only 'slightly familiar' with the application process?", 'Title: "AirTanker Services Limited Percentage"\n  Collections: Companies\n  Chart Type: company_card\n  Company: name=ATS Corporation, aliases=[\'ATS Automation Tooling Systems Inc.\', \'Ats Corp\', \'ATS\']\n  Sources: S&P Global'],
-    ["When did RetailMeNot launch the '5 to Buy' event?", 'Title: "Art - past 3 months (United States)"\n  Collections: YouGov Trackers\n  Datasets: YouGovTrackerValueV2\n  Chart Type: survey:timeseries\n  Sources: YouGov]'],
-    ["When was the article '5 Key Trends To Shape Your Business Strategy For 2025' by IESE Business School published on Forbes?", 'Title: "Business Coach Overview"\n  Collections: Companies\n  Chart Type: company_card\n  Company: name=Business Coach Inc., aliases=[\'Business Coach\']\n  Sources: S&P Global'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
@@ -82,13 +82,13 @@ print(scores.shape)
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
-    'According to a study by the Global Sustainable Tourism Council, by what percentage can sustainable tourism practices increase visitor satisfaction?',
     [
-        'Title: "Life satisfaction, measured weekly (United Kingdom)"\n  Collections: YouGov Trackers\n  Datasets: YouGovTrackerValueV2\n  Chart Type: survey:timeseries\n  Sources: YouGov',
-        'Title: "Grainger Overview, CBSE:IAM Overview"\n  Collections: Companies\n  Datasets: InstrumentClosePrice1Day\n  Chart Type: timeseries',
-        'Title: "AirTanker Services Limited Percentage"\n  Collections: Companies\n  Chart Type: company_card\n  Company: name=ATS Corporation, aliases=[\'ATS Automation Tooling Systems Inc.\', \'Ats Corp\', \'ATS\']\n  Sources: S&P Global',
-        'Title: "Art - past 3 months (United States)"\n  Collections: YouGov Trackers\n  Datasets: YouGovTrackerValueV2\n  Chart Type: survey:timeseries\n  Sources: YouGov]',
-        'Title: "Business Coach Overview"\n  Collections: Companies\n  Chart Type: company_card\n  Company: name=Business Coach Inc., aliases=[\'Business Coach\']\n  Sources: S&P Global',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -129,8 +129,8 @@ You can finetune this model on your own dataset.
 | Metric       | Value      |
 |:-------------|:-----------|
-| pearson      | 0.6743     |
-| **spearman** | **0.5158** |
 <!--
 ## Bias, Risks and Limitations
@@ -150,19 +150,19 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 6,851 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence_0                                                                                     | sentence_1                                                                                       | label                                                          |
-  |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
-  | type    | string                                                                                         | string                                                                                           | float                                                          |
-  | details | <ul><li>min: 7 characters</li><li>mean: 99.0 characters</li><li>max: 2253 characters</li></ul> | <ul><li>min: 79 characters</li><li>mean: 184.27 characters</li><li>max: 716 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.06</li><li>max: 1.0</li></ul> |
 * Samples:
-  | sentence_0                                                                                                                                                                                                                                                             | sentence_1                                                                                                                                                                                                                                            | label            |
-  |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
-  | <code>According to a study by the Global Sustainable Tourism Council, by what percentage can sustainable tourism practices increase visitor satisfaction?</code>                                                                                                       | <code>Title: "Life satisfaction, measured weekly (United Kingdom)"<br>  Collections: YouGov Trackers<br>  Datasets: YouGovTrackerValueV2<br>  Chart Type: survey:timeseries<br>  Sources: YouGov</code>                                               | <code>0.0</code> |
-  | <code>Scoreline for Al‑Bayraq W vs Al‑Riyadh W (WFDL)</code>                                                                                                                                                                                                           | <code>Title: "Grainger Overview, CBSE:IAM Overview"<br>  Collections: Companies<br>  Datasets: InstrumentClosePrice1Day<br>  Chart Type: timeseries</code>                                                                                            | <code>0.0</code> |
-  | <code>According to the article 'Top 3 Higher Education Trends to Watch in 2025' by Hanover Research, what percentage of prospective college students in the U.S. report feeling 'not at all familiar' or only 'slightly familiar' with the application process?</code> | <code>Title: "AirTanker Services Limited Percentage"<br>  Collections: Companies<br>  Chart Type: company_card<br>  Company: name=ATS Corporation, aliases=['ATS Automation Tooling Systems Inc.', 'Ats Corp', 'ATS']<br>  Sources: S&P Global</code> | <code>0.0</code> |
 * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
   ```json
   {
@@ -175,9 +175,8 @@ You can finetune this model on your own dataset.
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
-- `per_device_train_batch_size`: 32
-- `per_device_eval_batch_size`: 32
-- `num_train_epochs`: 1
 #### All Hyperparameters
 <details><summary>Click to expand</summary>
@@ -186,8 +185,8 @@ You can finetune this model on your own dataset.
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
-- `per_device_train_batch_size`: 32
-- `per_device_eval_batch_size`: 32
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
@@ -199,7 +198,7 @@ You can finetune this model on your own dataset.
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1
-- `num_train_epochs`: 1
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
@@ -305,12 +304,23 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch  | Step | validation_spearman |
-|:------:|:----:|:-------------------:|
-| 0.2326 | 50   | 0.3960              |
-| 0.4651 | 100  | 0.4804              |
-| 0.6977 | 150  | 0.5031              |
-| 0.9302 | 200  | 0.5158              |
 ### Framework Versions

 - cross-encoder
 - reranker
 - generated_from_trainer
+- dataset_size:8000
 - loss:BinaryCrossEntropyLoss
 base_model: cross-encoder/ms-marco-MiniLM-L6-v2
 pipeline_tag: text-ranking
       type: validation
     metrics:
     - type: pearson
+      value: 0.8481096700155641
       name: Pearson
     - type: spearman
+      value: 0.8528646396544212
       name: Spearman
 ---
 model = CrossEncoder("cross_encoder_model_id")
 # Get scores for pairs of texts
 pairs = [
+    ['prix blé tendre bio Indre et Loire 2025', 'Chart Title: "Wheat (US Soft Red Winter) Spot Price", Collections: Commodity Prices'],
+    ['oil prices', 'Chart Title: "West Texas Intermediate Crude Oil - Price in United States", Collections: Commodities::EIAEnergyIndicators::TimeseriesManager'],
+    ['Nvidia earnings AI chip demand', 'Chart Title: "Nvidia Quarterly Price to Earnings", Collections: Companies::CompanyComputedRatiosV2::TimeseriesManager'],
+    ['show me tesla stock performance 2020 to 2025', 'Title: "Manakoa Services Corporation Stock Performance"\n  Collections: Companies\n  Chart Type: company:private\n  Sources: S&P Global'],
+    ['Samsung A56 5G mémoire', 'Chart Title: "Samsung Publishing Co., Ltd Stock Prices", Info: Stock details for company Samsung Publishing Co., Ltd, Collections: Company Card, Chart Type: company:finance'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
+    'prix blé tendre bio Indre et Loire 2025',
     [
+        'Chart Title: "Wheat (US Soft Red Winter) Spot Price", Collections: Commodity Prices',
+        'Chart Title: "West Texas Intermediate Crude Oil - Price in United States", Collections: Commodities::EIAEnergyIndicators::TimeseriesManager',
+        'Chart Title: "Nvidia Quarterly Price to Earnings", Collections: Companies::CompanyComputedRatiosV2::TimeseriesManager',
+        'Title: "Manakoa Services Corporation Stock Performance"\n  Collections: Companies\n  Chart Type: company:private\n  Sources: S&P Global',
+        'Chart Title: "Samsung Publishing Co., Ltd Stock Prices", Info: Stock details for company Samsung Publishing Co., Ltd, Collections: Company Card, Chart Type: company:finance',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 | Metric       | Value      |
 |:-------------|:-----------|
+| pearson      | 0.8481     |
+| **spearman** | **0.8529** |
 <!--
 ## Bias, Risks and Limitations
 #### Unnamed Dataset
+* Size: 8,000 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                                      | sentence_1                                                                                       | label                                                          |
+  |:--------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
+  | type    | string                                                                                          | string                                                                                           | float                                                          |
+  | details | <ul><li>min: 3 characters</li><li>mean: 51.78 characters</li><li>max: 1024 characters</li></ul> | <ul><li>min: 49 characters</li><li>mean: 136.27 characters</li><li>max: 716 characters</li></ul> | <ul><li>min: 0.2</li><li>mean: 0.52</li><li>max: 1.0</li></ul> |
 * Samples:
+  | sentence_0                                           | sentence_1                                                                                                                                               | label            |
+  |:-----------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
+  | <code>prix blé tendre bio Indre et Loire 2025</code> | <code>Chart Title: "Wheat (US Soft Red Winter) Spot Price", Collections: Commodity Prices</code>                                                         | <code>0.4</code> |
+  | <code>oil prices</code>                              | <code>Chart Title: "West Texas Intermediate Crude Oil - Price in United States", Collections: Commodities::EIAEnergyIndicators::TimeseriesManager</code> | <code>0.8</code> |
+  | <code>Nvidia earnings AI chip demand</code>          | <code>Chart Title: "Nvidia Quarterly Price to Earnings", Collections: Companies::CompanyComputedRatiosV2::TimeseriesManager</code>                       | <code>0.4</code> |
 * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
   ```json
   {
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 16
 #### All Hyperparameters
 <details><summary>Click to expand</summary>
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
+- `per_device_train_batch_size`: 16
+- `per_device_eval_batch_size`: 16
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1
+- `num_train_epochs`: 3
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
 </details>
 ### Training Logs
+| Epoch | Step | Training Loss | validation_spearman |
+|:-----:|:----:|:-------------:|:-------------------:|
+| 0.2   | 100  | -             | 0.7038              |
+| 0.4   | 200  | -             | 0.7816              |
+| 0.6   | 300  | -             | 0.8134              |
+| 0.8   | 400  | -             | 0.8216              |
+| 1.0   | 500  | 0.8021        | 0.8296              |
+| 1.2   | 600  | -             | 0.8358              |
+| 1.4   | 700  | -             | 0.8418              |
+| 1.6   | 800  | -             | 0.8418              |
+| 1.8   | 900  | -             | 0.8478              |
+| 2.0   | 1000 | 0.5726        | 0.8471              |
+| 2.2   | 1100 | -             | 0.8487              |
+| 2.4   | 1200 | -             | 0.8497              |
+| 2.6   | 1300 | -             | 0.8522              |
+| 2.8   | 1400 | -             | 0.8523              |
+| 3.0   | 1500 | 0.5616        | 0.8529              |
 ### Framework Versions

eval/CrossEncoderCorrelationEvaluator_validation_results.csv CHANGED Viewed

@@ -1,2 +1,4 @@
 epoch,steps,Pearson_Correlation,Spearman_Correlation
-1.0,215,0.6765057885942694,0.5160340950125839

 epoch,steps,Pearson_Correlation,Spearman_Correlation
+1.0,500,0.8334498280984426,0.8296374514172629
+2.0,1000,0.8444343598056561,0.8471494664684638
+3.0,1500,0.8481096700155641,0.8528646396544212

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:04a6402495d00a3b98e802a2e1ce50fada050156fa8ac9906d5c561e9fd2aec2
 size 90866412

 version https://git-lfs.github.com/spec/v1
+oid sha256:93357cfe857f758d0ab0429d2076e1599cd7661ab2cc03f999bede0267e1167c
 size 90866412

training_info.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 Base Model: cross-encoder/ms-marco-MiniLM-L6-v2
-Training Samples: 6851
-Epochs: 1
-Batch Size: 32
 Learning Rate: 2e-05
 Max Length: 512

 Base Model: cross-encoder/ms-marco-MiniLM-L6-v2
+Training Samples: 8000
+Epochs: 3
+Batch Size: 16
 Learning Rate: 2e-05
 Max Length: 512