Text Ranking
sentence-transformers
Safetensors
new
cross-encoder
reranker
Generated from Trainer
dataset_size:24588
loss:BinaryCrossEntropyLoss
custom_code
Eval Results (legacy)
text-embeddings-inference
Instructions to use TakoData/chart-reranker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use TakoData/chart-reranker with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("TakoData/chart-reranker", trust_remote_code=True) query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Upload fine-tuned chart reranker model
Browse files- README.md +48 -38
- eval/CrossEncoderCorrelationEvaluator_validation_results.csv +3 -1
- model.safetensors +1 -1
- training_info.txt +3 -3
README.md
CHANGED
|
@@ -4,7 +4,7 @@ tags:
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
-
- dataset_size:
|
| 8 |
- loss:BinaryCrossEntropyLoss
|
| 9 |
base_model: cross-encoder/ms-marco-MiniLM-L6-v2
|
| 10 |
pipeline_tag: text-ranking
|
|
@@ -23,10 +23,10 @@ model-index:
|
|
| 23 |
type: validation
|
| 24 |
metrics:
|
| 25 |
- type: pearson
|
| 26 |
-
value: 0.
|
| 27 |
name: Pearson
|
| 28 |
- type: spearman
|
| 29 |
-
value: 0.
|
| 30 |
name: Spearman
|
| 31 |
---
|
| 32 |
|
|
@@ -70,11 +70,11 @@ from sentence_transformers import CrossEncoder
|
|
| 70 |
model = CrossEncoder("cross_encoder_model_id")
|
| 71 |
# Get scores for pairs of texts
|
| 72 |
pairs = [
|
| 73 |
-
['
|
| 74 |
-
['
|
| 75 |
-
[
|
| 76 |
-
[
|
| 77 |
-
[
|
| 78 |
]
|
| 79 |
scores = model.predict(pairs)
|
| 80 |
print(scores.shape)
|
|
@@ -82,13 +82,13 @@ print(scores.shape)
|
|
| 82 |
|
| 83 |
# Or rank different texts based on similarity to a single text
|
| 84 |
ranks = model.rank(
|
| 85 |
-
'
|
| 86 |
[
|
| 87 |
-
'Title: "
|
| 88 |
-
'Title: "
|
| 89 |
-
'Title: "
|
| 90 |
-
'Title: "
|
| 91 |
-
'Title: "
|
| 92 |
]
|
| 93 |
)
|
| 94 |
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
|
|
@@ -129,8 +129,8 @@ You can finetune this model on your own dataset.
|
|
| 129 |
|
| 130 |
| Metric | Value |
|
| 131 |
|:-------------|:-----------|
|
| 132 |
-
| pearson | 0.
|
| 133 |
-
| **spearman** | **0.
|
| 134 |
|
| 135 |
<!--
|
| 136 |
## Bias, Risks and Limitations
|
|
@@ -150,19 +150,19 @@ You can finetune this model on your own dataset.
|
|
| 150 |
|
| 151 |
#### Unnamed Dataset
|
| 152 |
|
| 153 |
-
* Size:
|
| 154 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 155 |
* Approximate statistics based on the first 1000 samples:
|
| 156 |
-
| | sentence_0
|
| 157 |
-
|:--------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 158 |
-
| type | string
|
| 159 |
-
| details | <ul><li>min:
|
| 160 |
* Samples:
|
| 161 |
-
| sentence_0
|
| 162 |
-
|:-----------------------------------------------------
|
| 163 |
-
| <code>
|
| 164 |
-
| <code>
|
| 165 |
-
| <code>
|
| 166 |
* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
|
| 167 |
```json
|
| 168 |
{
|
|
@@ -175,9 +175,8 @@ You can finetune this model on your own dataset.
|
|
| 175 |
#### Non-Default Hyperparameters
|
| 176 |
|
| 177 |
- `eval_strategy`: steps
|
| 178 |
-
- `per_device_train_batch_size`:
|
| 179 |
-
- `per_device_eval_batch_size`:
|
| 180 |
-
- `num_train_epochs`: 1
|
| 181 |
|
| 182 |
#### All Hyperparameters
|
| 183 |
<details><summary>Click to expand</summary>
|
|
@@ -186,8 +185,8 @@ You can finetune this model on your own dataset.
|
|
| 186 |
- `do_predict`: False
|
| 187 |
- `eval_strategy`: steps
|
| 188 |
- `prediction_loss_only`: True
|
| 189 |
-
- `per_device_train_batch_size`:
|
| 190 |
-
- `per_device_eval_batch_size`:
|
| 191 |
- `per_gpu_train_batch_size`: None
|
| 192 |
- `per_gpu_eval_batch_size`: None
|
| 193 |
- `gradient_accumulation_steps`: 1
|
|
@@ -199,7 +198,7 @@ You can finetune this model on your own dataset.
|
|
| 199 |
- `adam_beta2`: 0.999
|
| 200 |
- `adam_epsilon`: 1e-08
|
| 201 |
- `max_grad_norm`: 1
|
| 202 |
-
- `num_train_epochs`:
|
| 203 |
- `max_steps`: -1
|
| 204 |
- `lr_scheduler_type`: linear
|
| 205 |
- `lr_scheduler_kwargs`: {}
|
|
@@ -305,12 +304,23 @@ You can finetune this model on your own dataset.
|
|
| 305 |
</details>
|
| 306 |
|
| 307 |
### Training Logs
|
| 308 |
-
| Epoch
|
| 309 |
-
|:------:|:----:|:-------------------:|
|
| 310 |
-
| 0.
|
| 311 |
-
| 0.
|
| 312 |
-
| 0.
|
| 313 |
-
| 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 314 |
|
| 315 |
|
| 316 |
### Framework Versions
|
|
|
|
| 4 |
- cross-encoder
|
| 5 |
- reranker
|
| 6 |
- generated_from_trainer
|
| 7 |
+
- dataset_size:8000
|
| 8 |
- loss:BinaryCrossEntropyLoss
|
| 9 |
base_model: cross-encoder/ms-marco-MiniLM-L6-v2
|
| 10 |
pipeline_tag: text-ranking
|
|
|
|
| 23 |
type: validation
|
| 24 |
metrics:
|
| 25 |
- type: pearson
|
| 26 |
+
value: 0.8481096700155641
|
| 27 |
name: Pearson
|
| 28 |
- type: spearman
|
| 29 |
+
value: 0.8528646396544212
|
| 30 |
name: Spearman
|
| 31 |
---
|
| 32 |
|
|
|
|
| 70 |
model = CrossEncoder("cross_encoder_model_id")
|
| 71 |
# Get scores for pairs of texts
|
| 72 |
pairs = [
|
| 73 |
+
['prix blé tendre bio Indre et Loire 2025', 'Chart Title: "Wheat (US Soft Red Winter) Spot Price", Collections: Commodity Prices'],
|
| 74 |
+
['oil prices', 'Chart Title: "West Texas Intermediate Crude Oil - Price in United States", Collections: Commodities::EIAEnergyIndicators::TimeseriesManager'],
|
| 75 |
+
['Nvidia earnings AI chip demand', 'Chart Title: "Nvidia Quarterly Price to Earnings", Collections: Companies::CompanyComputedRatiosV2::TimeseriesManager'],
|
| 76 |
+
['show me tesla stock performance 2020 to 2025', 'Title: "Manakoa Services Corporation Stock Performance"\n Collections: Companies\n Chart Type: company:private\n Sources: S&P Global'],
|
| 77 |
+
['Samsung A56 5G mémoire', 'Chart Title: "Samsung Publishing Co., Ltd Stock Prices", Info: Stock details for company Samsung Publishing Co., Ltd, Collections: Company Card, Chart Type: company:finance'],
|
| 78 |
]
|
| 79 |
scores = model.predict(pairs)
|
| 80 |
print(scores.shape)
|
|
|
|
| 82 |
|
| 83 |
# Or rank different texts based on similarity to a single text
|
| 84 |
ranks = model.rank(
|
| 85 |
+
'prix blé tendre bio Indre et Loire 2025',
|
| 86 |
[
|
| 87 |
+
'Chart Title: "Wheat (US Soft Red Winter) Spot Price", Collections: Commodity Prices',
|
| 88 |
+
'Chart Title: "West Texas Intermediate Crude Oil - Price in United States", Collections: Commodities::EIAEnergyIndicators::TimeseriesManager',
|
| 89 |
+
'Chart Title: "Nvidia Quarterly Price to Earnings", Collections: Companies::CompanyComputedRatiosV2::TimeseriesManager',
|
| 90 |
+
'Title: "Manakoa Services Corporation Stock Performance"\n Collections: Companies\n Chart Type: company:private\n Sources: S&P Global',
|
| 91 |
+
'Chart Title: "Samsung Publishing Co., Ltd Stock Prices", Info: Stock details for company Samsung Publishing Co., Ltd, Collections: Company Card, Chart Type: company:finance',
|
| 92 |
]
|
| 93 |
)
|
| 94 |
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
|
|
|
|
| 129 |
|
| 130 |
| Metric | Value |
|
| 131 |
|:-------------|:-----------|
|
| 132 |
+
| pearson | 0.8481 |
|
| 133 |
+
| **spearman** | **0.8529** |
|
| 134 |
|
| 135 |
<!--
|
| 136 |
## Bias, Risks and Limitations
|
|
|
|
| 150 |
|
| 151 |
#### Unnamed Dataset
|
| 152 |
|
| 153 |
+
* Size: 8,000 training samples
|
| 154 |
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 155 |
* Approximate statistics based on the first 1000 samples:
|
| 156 |
+
| | sentence_0 | sentence_1 | label |
|
| 157 |
+
|:--------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------------------------------------------------------|
|
| 158 |
+
| type | string | string | float |
|
| 159 |
+
| details | <ul><li>min: 3 characters</li><li>mean: 51.78 characters</li><li>max: 1024 characters</li></ul> | <ul><li>min: 49 characters</li><li>mean: 136.27 characters</li><li>max: 716 characters</li></ul> | <ul><li>min: 0.2</li><li>mean: 0.52</li><li>max: 1.0</li></ul> |
|
| 160 |
* Samples:
|
| 161 |
+
| sentence_0 | sentence_1 | label |
|
| 162 |
+
|:-----------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
|
| 163 |
+
| <code>prix blé tendre bio Indre et Loire 2025</code> | <code>Chart Title: "Wheat (US Soft Red Winter) Spot Price", Collections: Commodity Prices</code> | <code>0.4</code> |
|
| 164 |
+
| <code>oil prices</code> | <code>Chart Title: "West Texas Intermediate Crude Oil - Price in United States", Collections: Commodities::EIAEnergyIndicators::TimeseriesManager</code> | <code>0.8</code> |
|
| 165 |
+
| <code>Nvidia earnings AI chip demand</code> | <code>Chart Title: "Nvidia Quarterly Price to Earnings", Collections: Companies::CompanyComputedRatiosV2::TimeseriesManager</code> | <code>0.4</code> |
|
| 166 |
* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
|
| 167 |
```json
|
| 168 |
{
|
|
|
|
| 175 |
#### Non-Default Hyperparameters
|
| 176 |
|
| 177 |
- `eval_strategy`: steps
|
| 178 |
+
- `per_device_train_batch_size`: 16
|
| 179 |
+
- `per_device_eval_batch_size`: 16
|
|
|
|
| 180 |
|
| 181 |
#### All Hyperparameters
|
| 182 |
<details><summary>Click to expand</summary>
|
|
|
|
| 185 |
- `do_predict`: False
|
| 186 |
- `eval_strategy`: steps
|
| 187 |
- `prediction_loss_only`: True
|
| 188 |
+
- `per_device_train_batch_size`: 16
|
| 189 |
+
- `per_device_eval_batch_size`: 16
|
| 190 |
- `per_gpu_train_batch_size`: None
|
| 191 |
- `per_gpu_eval_batch_size`: None
|
| 192 |
- `gradient_accumulation_steps`: 1
|
|
|
|
| 198 |
- `adam_beta2`: 0.999
|
| 199 |
- `adam_epsilon`: 1e-08
|
| 200 |
- `max_grad_norm`: 1
|
| 201 |
+
- `num_train_epochs`: 3
|
| 202 |
- `max_steps`: -1
|
| 203 |
- `lr_scheduler_type`: linear
|
| 204 |
- `lr_scheduler_kwargs`: {}
|
|
|
|
| 304 |
</details>
|
| 305 |
|
| 306 |
### Training Logs
|
| 307 |
+
| Epoch | Step | Training Loss | validation_spearman |
|
| 308 |
+
|:-----:|:----:|:-------------:|:-------------------:|
|
| 309 |
+
| 0.2 | 100 | - | 0.7038 |
|
| 310 |
+
| 0.4 | 200 | - | 0.7816 |
|
| 311 |
+
| 0.6 | 300 | - | 0.8134 |
|
| 312 |
+
| 0.8 | 400 | - | 0.8216 |
|
| 313 |
+
| 1.0 | 500 | 0.8021 | 0.8296 |
|
| 314 |
+
| 1.2 | 600 | - | 0.8358 |
|
| 315 |
+
| 1.4 | 700 | - | 0.8418 |
|
| 316 |
+
| 1.6 | 800 | - | 0.8418 |
|
| 317 |
+
| 1.8 | 900 | - | 0.8478 |
|
| 318 |
+
| 2.0 | 1000 | 0.5726 | 0.8471 |
|
| 319 |
+
| 2.2 | 1100 | - | 0.8487 |
|
| 320 |
+
| 2.4 | 1200 | - | 0.8497 |
|
| 321 |
+
| 2.6 | 1300 | - | 0.8522 |
|
| 322 |
+
| 2.8 | 1400 | - | 0.8523 |
|
| 323 |
+
| 3.0 | 1500 | 0.5616 | 0.8529 |
|
| 324 |
|
| 325 |
|
| 326 |
### Framework Versions
|
eval/CrossEncoderCorrelationEvaluator_validation_results.csv
CHANGED
|
@@ -1,2 +1,4 @@
|
|
| 1 |
epoch,steps,Pearson_Correlation,Spearman_Correlation
|
| 2 |
-
1.0,
|
|
|
|
|
|
|
|
|
| 1 |
epoch,steps,Pearson_Correlation,Spearman_Correlation
|
| 2 |
+
1.0,500,0.8334498280984426,0.8296374514172629
|
| 3 |
+
2.0,1000,0.8444343598056561,0.8471494664684638
|
| 4 |
+
3.0,1500,0.8481096700155641,0.8528646396544212
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 90866412
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:93357cfe857f758d0ab0429d2076e1599cd7661ab2cc03f999bede0267e1167c
|
| 3 |
size 90866412
|
training_info.txt
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
Base Model: cross-encoder/ms-marco-MiniLM-L6-v2
|
| 2 |
-
Training Samples:
|
| 3 |
-
Epochs:
|
| 4 |
-
Batch Size:
|
| 5 |
Learning Rate: 2e-05
|
| 6 |
Max Length: 512
|
|
|
|
| 1 |
Base Model: cross-encoder/ms-marco-MiniLM-L6-v2
|
| 2 |
+
Training Samples: 8000
|
| 3 |
+
Epochs: 3
|
| 4 |
+
Batch Size: 16
|
| 5 |
Learning Rate: 2e-05
|
| 6 |
Max Length: 512
|