metadata
tags:
- sentence-transformers
- cross-encoder
- reranker
- generated_from_trainer
- dataset_size:1548
- loss:BinaryCrossEntropyLoss
base_model: Alibaba-NLP/gte-multilingual-reranker-base
pipeline_tag: text-ranking
library_name: sentence-transformers
CrossEncoder based on Alibaba-NLP/gte-multilingual-reranker-base
This is a Cross Encoder model finetuned from Alibaba-NLP/gte-multilingual-reranker-base on the json dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: Alibaba-NLP/gte-multilingual-reranker-base
- Maximum Sequence Length: 8192 tokens
- Number of Output Labels: 1 label
- Training Dataset:
- json
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("cross_encoder_model_id")
# Get scores for pairs of texts
pairs = [
['приймати позу самовдоволеного, виявляючи пиху, зазнайство', 'having a mutual understanding or shared thoughts'],
['приймати позу самовдоволеного, виявляючи пиху, зазнайство', 'in trouble or state of shame'],
['приймати позу самовдоволеного, виявляючи пиху, зазнайство', 'someone who scrounges from others'],
['приймати позу самовдоволеного, виявляючи пиху, зазнайство', 'to subtly and indirectly seek praise, validation, or admiration from others'],
['приймати позу самовдоволеного, виявляючи пиху, зазнайство', 'to be on the verge of doing something'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'приймати позу самовдоволеного, виявляючи пиху, зазнайство',
[
'having a mutual understanding or shared thoughts',
'in trouble or state of shame',
'someone who scrounges from others',
'to subtly and indirectly seek praise, validation, or admiration from others',
'to be on the verge of doing something',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
Training Details
Training Dataset
json
- Dataset: json
- Size: 1,548 training samples
- Columns:
text1,text2, andlabel - Approximate statistics based on the first 1000 samples:
text1 text2 label type string string int details - min: 5 characters
- mean: 43.72 characters
- max: 111 characters
- min: 4 characters
- mean: 44.34 characters
- max: 139 characters
- 0: ~70.00%
- 1: ~30.00%
- Samples:
text1 text2 label уживається для вираження повного заперечення; ніуживається для повного заперечення змісту зазначеного слова; зовсім не (треба)1уживається для вираження повного заперечення; ніуживається для вираження заперечення чогось1уживається для вираження повного заперечення; ніуживається для повного заперечення змісту зазначеного слова; зовсім не (розбиратися)1 - Loss:
BinaryCrossEntropyLosswith these parameters:{ "activation_fn": "torch.nn.modules.linear.Identity", "pos_weight": null }
Evaluation Dataset
json
- Dataset: json
- Size: 225 evaluation samples
- Columns:
text1,text2, andlabel - Approximate statistics based on the first 225 samples:
text1 text2 label type string string int details - min: 10 characters
- mean: 48.52 characters
- max: 156 characters
- min: 6 characters
- mean: 45.7 characters
- max: 129 characters
- 0: ~81.78%
- 1: ~18.22%
- Samples:
text1 text2 label приймати позу самовдоволеного, виявляючи пиху, зазнайствоhaving a mutual understanding or shared thoughts0приймати позу самовдоволеного, виявляючи пиху, зазнайствоin trouble or state of shame0приймати позу самовдоволеного, виявляючи пиху, зазнайствоsomeone who scrounges from others0 - Loss:
BinaryCrossEntropyLosswith these parameters:{ "activation_fn": "torch.nn.modules.linear.Identity", "pos_weight": null }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16learning_rate: 2e-05num_train_epochs: 2warmup_ratio: 0.1fp16: Trueload_best_model_at_end: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.5155 | 50 | 0.4717 |
| 1.0309 | 100 | 0.3624 |
| 1.5464 | 150 | 0.2148 |
Framework Versions
- Python: 3.13.1
- Sentence Transformers: 5.1.0
- Transformers: 4.56.1
- PyTorch: 2.8.0+cpu
- Accelerate: 1.12.0
- Datasets: 4.4.1
- Tokenizers: 0.22.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}