CrossEncoder based on cross-encoder/ms-marco-MiniLM-L6-v2

This is a Cross Encoder model finetuned from cross-encoder/ms-marco-MiniLM-L6-v2 using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Type: Cross Encoder
Base model: cross-encoder/ms-marco-MiniLM-L6-v2
Maximum Sequence Length: 512 tokens
Number of Output Labels: 1 label

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("cross_encoder_model_id")
# Get scores for pairs of texts
pairs = [
    ['what is the average payment volume per transaction for american express?', '(table): company the american express of payments volume ( billions ) is 637 ; the american express of total volume ( billions ) is 647 ; the american express of total transactions ( billions ) is 5.0 ; the american express of cards ( millions ) is 86 ;'],
    ['what is the average payment volume per transaction for american express?', '(text): largest operators of open-loop and closed-loop retail electronic payments networks the largest operators of open-loop and closed-loop retail electronic payments networks are visa , mastercard , american express , discover , jcb and diners club .'],
    ['what is the average payment volume per transaction for american express?', '(text): with the exception of discover , which primarily operates in the united states , all of the other network operators can be considered multi- national or global providers of payments network services .'],
    ['what is the average payment volume per transaction for american express?', '(text): based on payments volume , total volume , number of transactions and number of cards in circulation , visa is the largest retail electronic payments network in the world .'],
    ['what is the average payment volume per transaction for american express?', '(text): the following chart compares our network with those of our major competitors for calendar year 2007 : company payments volume volume transactions cards ( billions ) ( billions ) ( billions ) ( millions ) visa inc. ( 1 ) .'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'what is the average payment volume per transaction for american express?',
    [
        '(table): company the american express of payments volume ( billions ) is 637 ; the american express of total volume ( billions ) is 647 ; the american express of total transactions ( billions ) is 5.0 ; the american express of cards ( millions ) is 86 ;',
        '(text): largest operators of open-loop and closed-loop retail electronic payments networks the largest operators of open-loop and closed-loop retail electronic payments networks are visa , mastercard , american express , discover , jcb and diners club .',
        '(text): with the exception of discover , which primarily operates in the united states , all of the other network operators can be considered multi- national or global providers of payments network services .',
        '(text): based on payments volume , total volume , number of transactions and number of cards in circulation , visa is the largest retail electronic payments network in the world .',
        '(text): the following chart compares our network with those of our major competitors for calendar year 2007 : company payments volume volume transactions cards ( billions ) ( billions ) ( billions ) ( millions ) visa inc. ( 1 ) .',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Dataset: reranker
Evaluated with CrossEncoderRerankingEvaluator with these parameters:
```
{
    "at_k": 10
}
```

Metric	Value
map	0.8939
mrr@10	0.9405
ndcg@10	0.9272

Training Details

Training Dataset

Unnamed Dataset

Size: 175,555 training samples
Columns: query, passage, and label

Approximate statistics based on the first 1000 samples:

	query	passage	label
type	string	string	float
details	min: 41 characters mean: 89.26 characters max: 186 characters	min: 11 characters mean: 182.61 characters max: 1853 characters	min: 0.0 mean: 0.07 max: 1.0

Samples:

query	passage	label
`what is the the interest expense in 2009?`	`(text): if libor changes by 100 basis points , our annual interest expense would change by $ 3.8 million .`	`1.0`
`what is the the interest expense in 2009?`	`(text): interest rate to a variable interest rate based on the three-month libor plus 2.05% ( 2.05 % ) ( 2.34% ( 2.34 % ) as of october 31 , 2009 ) .`	`0.0`
`what is the the interest expense in 2009?`	`(text): foreign currency exposure as more fully described in note 2i .`	`0.0`

Loss: BinaryCrossEntropyLoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": null
}

Evaluation Dataset

Unnamed Dataset

Size: 25,007 evaluation samples
Columns: query, passage, and label

Approximate statistics based on the first 1000 samples:

	query	passage	label
type	string	string	float
details	min: 52 characters mean: 86.04 characters max: 137 characters	min: 11 characters mean: 166.61 characters max: 717 characters	min: 0.0 mean: 0.06 max: 1.0

Samples:

query	passage	label
`what is the average payment volume per transaction for american express?`	`(table): company the american express of payments volume ( billions ) is 637 ; the american express of total volume ( billions ) is 647 ; the american express of total transactions ( billions ) is 5.0 ; the american express of cards ( millions ) is 86 ;`	`1.0`
`what is the average payment volume per transaction for american express?`	`(text): largest operators of open-loop and closed-loop retail electronic payments networks the largest operators of open-loop and closed-loop retail electronic payments networks are visa , mastercard , american express , discover , jcb and diners club .`	`0.0`
`what is the average payment volume per transaction for american express?`	`(text): with the exception of discover , which primarily operates in the united states , all of the other network operators can be considered multi- national or global providers of payments network services .`	`0.0`

Loss: BinaryCrossEntropyLoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": null
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
learning_rate: 0.0001
weight_decay: 0.01
num_train_epochs: 1
warmup_ratio: 0.1
fp16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 0.0001
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss	reranker_ndcg@10
0.0036	10	0.3268	-	-
0.0073	20	0.247	-	-
0.0109	30	0.2451	-	-
0.0146	40	0.2029	-	-
0.0182	50	0.1739	-	-
0.0219	60	0.172	-	-
0.0255	70	0.1425	-	-
0.0292	80	0.138	-	-
0.0328	90	0.1304	-	-
0.0364	100	0.1561	-	-
0.0401	110	0.1627	-	-
0.0437	120	0.1974	-	-
0.0474	130	0.1339	-	-
0.0510	140	0.1137	-	-
0.0547	150	0.1333	-	-
0.0583	160	0.1296	-	-
0.0620	170	0.1723	-	-
0.0656	180	0.1099	-	-
0.0692	190	0.1105	-	-
0.0729	200	0.0917	0.1133	0.9034
0.0765	210	0.1012	-	-
0.0802	220	0.1296	-	-
0.0838	230	0.1332	-	-
0.0875	240	0.095	-	-
0.0911	250	0.1351	-	-
0.0948	260	0.1138	-	-
0.0984	270	0.1318	-	-
0.1020	280	0.1164	-	-
0.1057	290	0.1418	-	-
0.1093	300	0.1337	-	-
0.1130	310	0.1169	-	-
0.1166	320	0.1314	-	-
0.1203	330	0.1197	-	-
0.1239	340	0.1002	-	-
0.1276	350	0.1124	-	-
0.1312	360	0.0932	-	-
0.1348	370	0.1629	-	-
0.1385	380	0.1501	-	-
0.1421	390	0.1097	-	-
0.1458	400	0.0756	0.1138	0.8984
0.1494	410	0.1174	-	-
0.1531	420	0.1472	-	-
0.1567	430	0.1391	-	-
0.1603	440	0.1188	-	-
0.1640	450	0.1555	-	-
0.1676	460	0.1148	-	-
0.1713	470	0.0753	-	-
0.1749	480	0.104	-	-
0.1786	490	0.1313	-	-
0.1822	500	0.1125	-	-
0.1859	510	0.0772	-	-
0.1895	520	0.1045	-	-
0.1931	530	0.1101	-	-
0.1968	540	0.109	-	-
0.2004	550	0.124	-	-
0.2041	560	0.0934	-	-
0.2077	570	0.1305	-	-
0.2114	580	0.1163	-	-
0.2150	590	0.1004	-	-
0.2187	600	0.0917	0.1206	0.9025
0.2223	610	0.0942	-	-
0.2259	620	0.1223	-	-
0.2296	630	0.1156	-	-
0.2332	640	0.0924	-	-
0.2369	650	0.1372	-	-
0.2405	660	0.0984	-	-
0.2442	670	0.0876	-	-
0.2478	680	0.0926	-	-
0.2515	690	0.0819	-	-
0.2551	700	0.1034	-	-
0.2587	710	0.1022	-	-
0.2624	720	0.0661	-	-
0.2660	730	0.124	-	-
0.2697	740	0.1231	-	-
0.2733	750	0.1307	-	-
0.2770	760	0.0973	-	-
0.2806	770	0.0721	-	-
0.2843	780	0.0734	-	-
0.2879	790	0.0806	-	-
0.2915	800	0.0824	0.0996	0.9079
0.2952	810	0.1037	-	-
0.2988	820	0.0771	-	-
0.3025	830	0.1407	-	-
0.3061	840	0.1196	-	-
0.3098	850	0.1087	-	-
0.3134	860	0.0737	-	-
0.3171	870	0.0986	-	-
0.3207	880	0.1042	-	-
0.3243	890	0.0971	-	-
0.3280	900	0.0824	-	-
0.3316	910	0.0842	-	-
0.3353	920	0.1361	-	-
0.3389	930	0.086	-	-
0.3426	940	0.0861	-	-
0.3462	950	0.1039	-	-
0.3499	960	0.1085	-	-
0.3535	970	0.1316	-	-
0.3571	980	0.0806	-	-
0.3608	990	0.0873	-	-
0.3644	1000	0.0952	0.0981	0.9101
0.3681	1010	0.1194	-	-
0.3717	1020	0.1114	-	-
0.3754	1030	0.122	-	-
0.3790	1040	0.094	-	-
0.3827	1050	0.0971	-	-
0.3863	1060	0.1285	-	-
0.3899	1070	0.103	-	-
0.3936	1080	0.1065	-	-
0.3972	1090	0.0885	-	-
0.4009	1100	0.1022	-	-
0.4045	1110	0.1129	-	-
0.4082	1120	0.1229	-	-
0.4118	1130	0.0999	-	-
0.4155	1140	0.0879	-	-
0.4191	1150	0.0763	-	-
0.4227	1160	0.0852	-	-
0.4264	1170	0.0914	-	-
0.4300	1180	0.1004	-	-
0.4337	1190	0.1143	-	-
0.4373	1200	0.1364	0.0940	0.9246
0.4410	1210	0.1017	-	-
0.4446	1220	0.09	-	-
0.4483	1230	0.0687	-	-
0.4519	1240	0.0733	-	-
0.4555	1250	0.1049	-	-
0.4592	1260	0.0918	-	-
0.4628	1270	0.0848	-	-
0.4665	1280	0.0736	-	-
0.4701	1290	0.1129	-	-
0.4738	1300	0.0713	-	-
0.4774	1310	0.0876	-	-
0.4810	1320	0.0866	-	-
0.4847	1330	0.1016	-	-
0.4883	1340	0.1061	-	-
0.4920	1350	0.0791	-	-
0.4956	1360	0.0938	-	-
0.4993	1370	0.1235	-	-
0.5029	1380	0.0693	-	-
0.5066	1390	0.065	-	-
0.5102	1400	0.0839	0.1007	0.9214
0.5138	1410	0.0914	-	-
0.5175	1420	0.0786	-	-
0.5211	1430	0.0916	-	-
0.5248	1440	0.0606	-	-
0.5284	1450	0.1417	-	-
0.5321	1460	0.0856	-	-
0.5357	1470	0.0865	-	-
0.5394	1480	0.0917	-	-
0.5430	1490	0.0774	-	-
0.5466	1500	0.0951	-	-
0.5503	1510	0.074	-	-
0.5539	1520	0.0797	-	-
0.5576	1530	0.0817	-	-
0.5612	1540	0.1137	-	-
0.5649	1550	0.1139	-	-
0.5685	1560	0.0889	-	-
0.5722	1570	0.1075	-	-
0.5758	1580	0.1021	-	-
0.5794	1590	0.1115	-	-
0.5831	1600	0.1047	0.0952	0.9229
0.5867	1610	0.1056	-	-
0.5904	1620	0.116	-	-
0.5940	1630	0.0989	-	-
0.5977	1640	0.1102	-	-
0.6013	1650	0.1006	-	-
0.6050	1660	0.0956	-	-
0.6086	1670	0.1003	-	-
0.6122	1680	0.0984	-	-
0.6159	1690	0.0734	-	-
0.6195	1700	0.079	-	-
0.6232	1710	0.0872	-	-
0.6268	1720	0.1077	-	-
0.6305	1730	0.0833	-	-
0.6341	1740	0.0984	-	-
0.6378	1750	0.0727	-	-
0.6414	1760	0.1062	-	-
0.6450	1770	0.1013	-	-
0.6487	1780	0.0892	-	-
0.6523	1790	0.0765	-	-
0.6560	1800	0.0698	0.0962	0.9208
0.6596	1810	0.0658	-	-
0.6633	1820	0.1386	-	-
0.6669	1830	0.1094	-	-
0.6706	1840	0.103	-	-
0.6742	1850	0.1075	-	-
0.6778	1860	0.091	-	-
0.6815	1870	0.106	-	-
0.6851	1880	0.0753	-	-
0.6888	1890	0.0685	-	-
0.6924	1900	0.1045	-	-
0.6961	1910	0.087	-	-
0.6997	1920	0.0866	-	-
0.7034	1930	0.1253	-	-
0.7070	1940	0.0915	-	-
0.7106	1950	0.061	-	-
0.7143	1960	0.0744	-	-
0.7179	1970	0.0643	-	-
0.7216	1980	0.0571	-	-
0.7252	1990	0.1004	-	-
0.7289	2000	0.1075	0.0936	0.9237
0.7325	2010	0.0637	-	-
0.7362	2020	0.1167	-	-
0.7398	2030	0.1113	-	-
0.7434	2040	0.1314	-	-
0.7471	2050	0.0764	-	-
0.7507	2060	0.1297	-	-
0.7544	2070	0.0841	-	-
0.7580	2080	0.0967	-	-
0.7617	2090	0.0916	-	-
0.7653	2100	0.1196	-	-
0.7690	2110	0.1072	-	-
0.7726	2120	0.0974	-	-
0.7762	2130	0.0772	-	-
0.7799	2140	0.1147	-	-
0.7835	2150	0.1003	-	-
0.7872	2160	0.0944	-	-
0.7908	2170	0.0886	-	-
0.7945	2180	0.062	-	-
0.7981	2190	0.0817	-	-
0.8017	2200	0.1096	0.0919	0.9262
0.8054	2210	0.0821	-	-
0.8090	2220	0.0866	-	-
0.8127	2230	0.0824	-	-
0.8163	2240	0.108	-	-
0.8200	2250	0.0746	-	-
0.8236	2260	0.0708	-	-
0.8273	2270	0.0898	-	-
0.8309	2280	0.0876	-	-
0.8345	2290	0.0898	-	-
0.8382	2300	0.0935	-	-
0.8418	2310	0.0655	-	-
0.8455	2320	0.106	-	-
0.8491	2330	0.0806	-	-
0.8528	2340	0.091	-	-
0.8564	2350	0.0575	-	-
0.8601	2360	0.059	-	-
0.8637	2370	0.0889	-	-
0.8673	2380	0.0955	-	-
0.8710	2390	0.0841	-	-
0.8746	2400	0.0759	0.0896	0.9256
0.8783	2410	0.0558	-	-
0.8819	2420	0.0921	-	-
0.8856	2430	0.0865	-	-
0.8892	2440	0.0787	-	-
0.8929	2450	0.0803	-	-
0.8965	2460	0.0838	-	-
0.9001	2470	0.0837	-	-
0.9038	2480	0.097	-	-
0.9074	2490	0.0673	-	-
0.9111	2500	0.0944	-	-
0.9147	2510	0.0858	-	-
0.9184	2520	0.0761	-	-
0.9220	2530	0.0868	-	-
0.9257	2540	0.0398	-	-
0.9293	2550	0.0494	-	-
0.9329	2560	0.123	-	-
0.9366	2570	0.0956	-	-
0.9402	2580	0.065	-	-
0.9439	2590	0.0662	-	-
0.9475	2600	0.0747	0.0882	0.9272

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.1.2
Transformers: 4.57.3
PyTorch: 2.9.0+cu126
Accelerate: 1.12.0
Datasets: 4.4.1
Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}