CrossEncoder based on microsoft/deberta-v3-base

This is a Cross Encoder model finetuned from microsoft/deberta-v3-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: microsoft/deberta-v3-base
  • Maximum Sequence Length: 512 tokens
  • Number of Output Labels: 1 label

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tani-at-nola/reranker-deberta-v3-base-nli")
# Get scores for pairs of texts
pairs = [
    ['The sisters are hugging goodbye while holding to go packages after just eating lunch.', 'Two women are embracing while holding to go packages.'],
    ['Two woman are holding packages.', 'Two women are embracing while holding to go packages.'],
    ['The men are fighting outside a deli.', 'Two women are embracing while holding to go packages.'],
    ['Two kids in numbered jerseys wash their hands.', 'Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.'],
    ['Two kids at a ballgame wash their hands.', 'Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'The sisters are hugging goodbye while holding to go packages after just eating lunch.',
    [
        'Two women are embracing while holding to go packages.',
        'Two women are embracing while holding to go packages.',
        'Two women are embracing while holding to go packages.',
        'Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.',
        'Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Classification

Metric AllNLI-norm-dev AllNLI-test
accuracy 0.6807 0.6814
accuracy_threshold 0.4376 0.56
f1 0.5466 0.527
f1_threshold 0.0044 0.001
precision 0.4004 0.3655
recall 0.861 0.9436
average_precision 0.4993 0.4819

Training Details

Training Dataset

Unnamed Dataset

  • Size: 942,069 training samples
  • Columns: hypothesis, premise, and label
  • Approximate statistics based on the first 1000 samples:
    hypothesis premise label
    type string string int
    details
    • min: 11 characters
    • mean: 38.26 characters
    • max: 131 characters
    • min: 23 characters
    • mean: 69.54 characters
    • max: 227 characters
    • 0: ~66.60%
    • 1: ~33.40%
  • Samples:
    hypothesis premise label
    A person is training his horse for a competition. A person on a horse jumps over a broken down airplane. 0
    A person is at a diner, ordering an omelette. A person on a horse jumps over a broken down airplane. 0
    A person is outdoors, on a horse. A person on a horse jumps over a broken down airplane. 1
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 19,657 evaluation samples
  • Columns: hypothesis, premise, and label
  • Approximate statistics based on the first 1000 samples:
    hypothesis premise label
    type string string int
    details
    • min: 11 characters
    • mean: 37.66 characters
    • max: 116 characters
    • min: 16 characters
    • mean: 75.01 characters
    • max: 229 characters
    • 0: ~66.90%
    • 1: ~33.10%
  • Samples:
    hypothesis premise label
    The sisters are hugging goodbye while holding to go packages after just eating lunch. Two women are embracing while holding to go packages. 0
    Two woman are holding packages. Two women are embracing while holding to go packages. 1
    The men are fighting outside a deli. Two women are embracing while holding to go packages. 0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss AllNLI-norm-dev_average_precision AllNLI-test_average_precision
-1 -1 - - 0.3614 -
0.0068 100 0.7205 - - -
0.0136 200 0.6972 - - -
0.0204 300 0.6086 - - -
0.0272 400 0.4855 - - -
0.0340 500 0.3991 - - -
0.0408 600 0.3409 - - -
0.0476 700 0.2987 - - -
0.0544 800 0.2841 - - -
0.0611 900 0.2729 - - -
0.0679 1000 0.2627 - - -
0.0747 1100 0.2517 - - -
0.0815 1200 0.2286 - - -
0.0883 1300 0.2385 - - -
0.0951 1400 0.2329 - - -
0.1019 1500 0.2213 0.1959 0.4997 -
0.1087 1600 0.22 - - -
0.1155 1700 0.2295 - - -
0.1223 1800 0.2236 - - -
0.1291 1900 0.2273 - - -
0.1359 2000 0.2071 - - -
0.1427 2100 0.2254 - - -
0.1495 2200 0.2217 - - -
0.1563 2300 0.2093 - - -
0.1631 2400 0.2112 - - -
0.1698 2500 0.2176 - - -
0.1766 2600 0.2195 - - -
0.1834 2700 0.2107 - - -
0.1902 2800 0.2164 - - -
0.1970 2900 0.213 - - -
0.2038 3000 0.2055 0.1726 0.4789 -
0.2106 3100 0.2039 - - -
0.2174 3200 0.2157 - - -
0.2242 3300 0.2155 - - -
0.2310 3400 0.2017 - - -
0.2378 3500 0.2068 - - -
0.2446 3600 0.2111 - - -
0.2514 3700 0.2062 - - -
0.2582 3800 0.2062 - - -
0.2650 3900 0.2217 - - -
0.2718 4000 0.2012 - - -
0.2786 4100 0.2127 - - -
0.2853 4200 0.212 - - -
0.2921 4300 0.2075 - - -
0.2989 4400 0.2099 - - -
0.3057 4500 0.2134 0.1644 0.4993 -
-1 -1 - - - 0.4819
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.2
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tani-at-nola/reranker-deberta-v3-base-nli

Finetuned
(601)
this model

Paper for tani-at-nola/reranker-deberta-v3-base-nli

Evaluation results