TakoData
/

chart-reranker

@@ -4,7 +4,7 @@ tags:
 - cross-encoder
 - reranker
 - generated_from_trainer
-- dataset_size:24504
 - loss:BinaryCrossEntropyLoss
 base_model: Alibaba-NLP/gte-multilingual-reranker-base
 pipeline_tag: text-ranking
@@ -23,10 +23,10 @@ model-index:
       type: validation
     metrics:
     - type: pearson
-      value: 0.8721120209782917
       name: Pearson
     - type: spearman
-      value: 0.8685098375943734
       name: Spearman
 ---
@@ -70,11 +70,11 @@ from sentence_transformers import CrossEncoder
 model = CrossEncoder("cross_encoder_model_id")
 # Get scores for pairs of texts
 pairs = [
-    ['include the popular publications as well', 'Title: "Americans\' Library use - past 3 months (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov'],
-    ['Give it a good research topic', 'Title: "The most important issues facing the country (United Kingdom)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov'],
-    ['When and where are the Denver Broncos playing the Kansas City Chiefs?', 'Title: "Denver Broncos at Kansas City Chiefs"\nCollections: Football\nChart Type: game_score:football'],
-    ['49ers vs Seahawks', 'Title: "Seahawk Deep Ocean Technology, Inc. Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Overview"="Stock Overview"\nSources: S&P Global'],
-    ['Comparative review of JBL vs Marshall 2025 Bluetooth speakers', 'Title: "B&C Speakers Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "B&C Speakers"="B&C Speakers S.p.A.", "Overview"="Stock Overview"\nSources: S&P Global'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
@@ -82,13 +82,13 @@ print(scores.shape)
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
-    'include the popular publications as well',
     [
-        'Title: "Americans\' Library use - past 3 months (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov',
-        'Title: "The most important issues facing the country (United Kingdom)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov',
-        'Title: "Denver Broncos at Kansas City Chiefs"\nCollections: Football\nChart Type: game_score:football',
-        'Title: "Seahawk Deep Ocean Technology, Inc. Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "Overview"="Stock Overview"\nSources: S&P Global',
-        'Title: "B&C Speakers Overview"\nCollections: Companies\nChart Type: company:finance\nCanonical forms: "B&C Speakers"="B&C Speakers S.p.A.", "Overview"="Stock Overview"\nSources: S&P Global',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -129,8 +129,8 @@ You can finetune this model on your own dataset.
 | Metric       | Value      |
 |:-------------|:-----------|
-| pearson      | 0.8721     |
-| **spearman** | **0.8685** |
 <!--
 ## Bias, Risks and Limitations
@@ -150,19 +150,19 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 24,504 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence_0                                                                                     | sentence_1                                                                                       | label                                                          |
   |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
   | type    | string                                                                                         | string                                                                                           | float                                                          |
-  | details | <ul><li>min: 2 characters</li><li>mean: 86.83 characters</li><li>max: 993 characters</li></ul> | <ul><li>min: 77 characters</li><li>mean: 169.16 characters</li><li>max: 360 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.42</li><li>max: 1.0</li></ul> |
 * Samples:
-  | sentence_0                                                                         | sentence_1                                                                                                                                                                                                | label            |
-  |:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
-  | <code>include the popular publications as well</code>                              | <code>Title: "Americans' Library use - past 3 months (United States)"<br>Collections: YouGov Trackers<br>Datasets: YouGovTrackerValueV2<br>Chart Type: survey:timeseries<br>Sources: YouGov</code>        | <code>0.5</code> |
-  | <code>Give it a good research topic</code>                                         | <code>Title: "The most important issues facing the country (United Kingdom)"<br>Collections: YouGov Trackers<br>Datasets: YouGovTrackerValueV2<br>Chart Type: survey:timeseries<br>Sources: YouGov</code> | <code>1.0</code> |
-  | <code>When and where are the Denver Broncos playing the Kansas City Chiefs?</code> | <code>Title: "Denver Broncos at Kansas City Chiefs"<br>Collections: Football<br>Chart Type: game_score:football</code>                                                                                    | <code>1.0</code> |
 * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
   ```json
   {
@@ -308,38 +308,42 @@ You can finetune this model on your own dataset.
 ### Training Logs
 | Epoch  | Step | Training Loss | validation_spearman |
 |:------:|:----:|:-------------:|:-------------------:|
-| 0.1305 | 100  | -             | 0.7594              |
-| 0.2611 | 200  | -             | 0.7951              |
-| 0.3916 | 300  | -             | 0.8050              |
-| 0.5222 | 400  | -             | 0.8200              |
-| 0.6527 | 500  | 0.468         | 0.8290              |
-| 0.7833 | 600  | -             | 0.8331              |
-| 0.9138 | 700  | -             | 0.8347              |
-| 1.0    | 766  | -             | 0.8434              |
-| 1.0444 | 800  | -             | 0.8432              |
-| 1.1749 | 900  | -             | 0.8467              |
-| 1.3055 | 1000 | 0.4135        | 0.8473              |
-| 1.4360 | 1100 | -             | 0.8475              |
-| 1.5666 | 1200 | -             | 0.8535              |
-| 1.6971 | 1300 | -             | 0.8518              |
-| 1.8277 | 1400 | -             | 0.8571              |
-| 1.9582 | 1500 | 0.3747        | 0.8577              |
-| 2.0    | 1532 | -             | 0.8556              |
-| 2.0888 | 1600 | -             | 0.8587              |
-| 2.2193 | 1700 | -             | 0.8609              |
-| 2.3499 | 1800 | -             | 0.8612              |
-| 2.4804 | 1900 | -             | 0.8619              |
-| 2.6110 | 2000 | 0.3515        | 0.8626              |
-| 2.7415 | 2100 | -             | 0.8622              |
-| 2.8721 | 2200 | -             | 0.8653              |
-| 3.0    | 2298 | -             | 0.8656              |
-| 3.0026 | 2300 | -             | 0.8656              |
-| 3.1332 | 2400 | -             | 0.8643              |
-| 3.2637 | 2500 | 0.3421        | 0.8646              |
-| 3.3943 | 2600 | -             | 0.8654              |
-| 3.5248 | 2700 | -             | 0.8666              |
-| 3.6554 | 2800 | -             | 0.8640              |
-| 3.7859 | 2900 | -             | 0.8685              |
 ### Framework Versions

 - cross-encoder
 - reranker
 - generated_from_trainer
+- dataset_size:24588
 - loss:BinaryCrossEntropyLoss
 base_model: Alibaba-NLP/gte-multilingual-reranker-base
 pipeline_tag: text-ranking
       type: validation
     metrics:
     - type: pearson
+      value: 0.875500492479389
       name: Pearson
     - type: spearman
+      value: 0.8709281334702662
       name: Spearman
 ---
 model = CrossEncoder("cross_encoder_model_id")
 # Get scores for pairs of texts
 pairs = [
+    ['What is the average rent price in Canada?', 'Title: "How many hours do Americans sleep at night (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov'],
+    ['for the topic digital foortprint and identity use "\t " to give a description on if there was an provided teaching materials for this activity.', 'Title: "Different ways Americans define gender for someone who says they are transgender (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov'],
+    ['Which U.S. cities or counties have the highest rates of aggravated assault involving a deadly weapon per 100,000 residents?', 'Title: "U.S. Bank Overview, CITY Overview"\nCollections: Companies\nDatasets: InstrumentClosePrice1Day\nChart Type: timeseries:eav_v3\nCanonical forms: "U.S. Bancorp"="closing_price", "Club De Futbol Intercity Sad"="closing_price"'],
+    ['Black identity topics', 'Title: "Different ways Americans define gender for someone who says they are transgender (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov'],
+    ['Which company in the Interactive Media and Services category has the highest market capitalization?', 'Title: "DigiPlus Interactive. Capital Expenditure (Quarterly)"\nCollections: Companies\nDatasets: StandardIncomeStatement\nChart Type: timeseries:eav_v3\nCanonical forms: "Capital Expenditure"="capital_expenditure"\nSources: S&P Global'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
 # Or rank different texts based on similarity to a single text
 ranks = model.rank(
+    'What is the average rent price in Canada?',
     [
+        'Title: "How many hours do Americans sleep at night (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov',
+        'Title: "Different ways Americans define gender for someone who says they are transgender (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov',
+        'Title: "U.S. Bank Overview, CITY Overview"\nCollections: Companies\nDatasets: InstrumentClosePrice1Day\nChart Type: timeseries:eav_v3\nCanonical forms: "U.S. Bancorp"="closing_price", "Club De Futbol Intercity Sad"="closing_price"',
+        'Title: "Different ways Americans define gender for someone who says they are transgender (United States)"\nCollections: YouGov Trackers\nDatasets: YouGovTrackerValueV2\nChart Type: survey:timeseries\nSources: YouGov',
+        'Title: "DigiPlus Interactive. Capital Expenditure (Quarterly)"\nCollections: Companies\nDatasets: StandardIncomeStatement\nChart Type: timeseries:eav_v3\nCanonical forms: "Capital Expenditure"="capital_expenditure"\nSources: S&P Global',
     ]
 )
 # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 | Metric       | Value      |
 |:-------------|:-----------|
+| pearson      | 0.8755     |
+| **spearman** | **0.8709** |
 <!--
 ## Bias, Risks and Limitations
 #### Unnamed Dataset
+* Size: 24,588 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
 * Approximate statistics based on the first 1000 samples:
   |         | sentence_0                                                                                     | sentence_1                                                                                       | label                                                          |
   |:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
   | type    | string                                                                                         | string                                                                                           | float                                                          |
+  | details | <ul><li>min: 3 characters</li><li>mean: 88.65 characters</li><li>max: 998 characters</li></ul> | <ul><li>min: 73 characters</li><li>mean: 169.97 characters</li><li>max: 352 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.41</li><li>max: 1.0</li></ul> |
 * Samples:
+  | sentence_0                                                                                                                                                  | sentence_1                                                                                                                                                                                                                                                  | label             |
+  |:------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------|
+  | <code>What is the average rent price in Canada?</code>                                                                                                      | <code>Title: "How many hours do Americans sleep at night (United States)"<br>Collections: YouGov Trackers<br>Datasets: YouGovTrackerValueV2<br>Chart Type: survey:timeseries<br>Sources: YouGov</code>                                                      | <code>0.0</code>  |
+  | <code>for the topic digital foortprint and identity use "	 " to give a description on if there was an provided teaching materials for this activity.</code> | <code>Title: "Different ways Americans define gender for someone who says they are transgender (United States)"<br>Collections: YouGov Trackers<br>Datasets: YouGovTrackerValueV2<br>Chart Type: survey:timeseries<br>Sources: YouGov</code>                | <code>0.25</code> |
+  | <code>Which U.S. cities or counties have the highest rates of aggravated assault involving a deadly weapon per 100,000 residents?</code>                    | <code>Title: "U.S. Bank Overview, CITY Overview"<br>Collections: Companies<br>Datasets: InstrumentClosePrice1Day<br>Chart Type: timeseries:eav_v3<br>Canonical forms: "U.S. Bancorp"="closing_price", "Club De Futbol Intercity Sad"="closing_price"</code> | <code>0.0</code>  |
 * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
   ```json
   {
 ### Training Logs
 | Epoch  | Step | Training Loss | validation_spearman |
 |:------:|:----:|:-------------:|:-------------------:|
+| 0.1300 | 100  | -             | 0.7581              |
+| 0.2601 | 200  | -             | 0.7928              |
+| 0.3901 | 300  | -             | 0.8105              |
+| 0.5202 | 400  | -             | 0.8252              |
+| 0.6502 | 500  | 0.4726        | 0.8306              |
+| 0.7802 | 600  | -             | 0.8338              |
+| 0.9103 | 700  | -             | 0.8398              |
+| 1.0    | 769  | -             | 0.8406              |
+| 1.0403 | 800  | -             | 0.8412              |
+| 1.1704 | 900  | -             | 0.8479              |
+| 1.3004 | 1000 | 0.4027        | 0.8525              |
+| 1.4304 | 1100 | -             | 0.8521              |
+| 1.5605 | 1200 | -             | 0.8549              |
+| 1.6905 | 1300 | -             | 0.8591              |
+| 1.8205 | 1400 | -             | 0.8619              |
+| 1.9506 | 1500 | 0.3793        | 0.8614              |
+| 2.0    | 1538 | -             | 0.8627              |
+| 2.0806 | 1600 | -             | 0.8623              |
+| 2.2107 | 1700 | -             | 0.8641              |
+| 2.3407 | 1800 | -             | 0.8598              |
+| 2.4707 | 1900 | -             | 0.8655              |
+| 2.6008 | 2000 | 0.3534        | 0.8641              |
+| 2.7308 | 2100 | -             | 0.8651              |
+| 2.8609 | 2200 | -             | 0.8656              |
+| 2.9909 | 2300 | -             | 0.8668              |
+| 3.0    | 2307 | -             | 0.8660              |
+| 3.1209 | 2400 | -             | 0.8678              |
+| 3.2510 | 2500 | 0.3387        | 0.8654              |
+| 3.3810 | 2600 | -             | 0.8654              |
+| 3.5111 | 2700 | -             | 0.8667              |
+| 3.6411 | 2800 | -             | 0.8676              |
+| 3.7711 | 2900 | -             | 0.8674              |
+| 3.9012 | 3000 | 0.3335        | 0.8704              |
+| 4.0    | 3076 | -             | 0.8703              |
+| 4.0312 | 3100 | -             | 0.8698              |
+| 4.1612 | 3200 | -             | 0.8709              |
 ### Framework Versions

eval/CrossEncoderCorrelationEvaluator_validation_results.csv CHANGED Viewed

@@ -1,6 +1,6 @@
 epoch,steps,Pearson_Correlation,Spearman_Correlation
-1.0,766,0.8453028536443531,0.8434351098924865
-2.0,1532,0.8574271674817566,0.8556349102862147
-3.0,2298,0.8687755325286843,0.865637110569002
-4.0,3064,0.8698030506575616,0.8669249926545327
-5.0,3830,0.8701775404822807,0.8675087793394471

 epoch,steps,Pearson_Correlation,Spearman_Correlation
+1.0,769,0.8422640800344011,0.8405737087278944
+2.0,1538,0.8671876184014233,0.86267528277904
+3.0,2307,0.8685505786860791,0.8660158658561949
+4.0,3076,0.8746765785538436,0.870307593558791
+5.0,3845,0.8738481049232419,0.8697669534338517

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:06035d5c262912d8a1e0fd97e71fc51f0e84c66ed6a5f7e14862da0e88600252
 size 1223854204

 version https://git-lfs.github.com/spec/v1
+oid sha256:29639f0397bff1fa96e9ad2515d5e6c6d3ba99fd5ae7f2ce6831f63120328e1b
 size 1223854204

training_info.txt CHANGED Viewed

@@ -1,5 +1,5 @@
 Base Model: Alibaba-NLP/gte-multilingual-reranker-base
-Training Samples: 24504
 Epochs: 5
 Batch Size: 32
 Learning Rate: 2e-05

 Base Model: Alibaba-NLP/gte-multilingual-reranker-base
+Training Samples: 24588
 Epochs: 5
 Batch Size: 32
 Learning Rate: 2e-05