multilingual_e5_large Finetuned on Data

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-large
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'When is the time of commission of the fraud considered?',
    'According to the provision of Article 386 paragraph 1 of the Greek Penal Code,\n\n"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."\n\nFrom this provision it follows that, for the crime of fraud to be established, the following elements are required:\n\na) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit, without it being necessary that the benefit actually materialize;\n\nb) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is deceived and proceeds to an act, omission, or acquiescence that is detrimental to themselves or another; and\n\nc) Damage to another person’s property, as defined under civil law, which must be causally linked to the deceptive acts or omissions of the perpetrator. It is not required that the person deceived and the person who suffered the damage be the same individual.\n\nThe term “facts”, within the meaning of the above provision, refers to real circumstances relating to the past or present, and not to those that will occur in the future, such as mere promises or contractual obligations. However, when such promises or obligations are accompanied by false assurances and representations of other false facts referring to the present or the past, in such a manner as to create the impression of future fulfillment based on a false present situation fabricated by the perpetrator, who has already formed the decision not to fulfill their obligation, the crime of fraud is established.\n\nThe term “property” refers to the totality of a person’s economic assets that possess monetary value, while damage to property means its reduction—specifically, the difference between the monetary value the property had before the disposition caused by the fraudulent conduct and the value remaining after it. Property damage exists even if the victim possesses an active claim for restitution.\n\nThe time of commission of the fraud is considered to be the moment when the perpetrator acted and completed their fraudulent conduct, namely when they made the false representations that deceived the victim or a third party. Any subsequent moment at which the victim’s damage actually occurred—thereby completing the fraud—or the time when the victim carried out the harmful act or omission, is irrelevant.',
    'Spear phishing targets specific individuals or employees within an organization using personalized, deceptive emails. Unlike mass phishing, these emails are crafted to seem familiar and urgent.\n\nScenarios:\n- CEO Fraud: Attackers impersonate executives to extract financial or sensitive data from employees.\n- Whaling: High-ranking executives are targeted using tailored fraud emails that press for immediate action without verification.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6673, 0.4780],
#         [0.6673, 1.0000, 0.4691],
#         [0.4780, 0.4691, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5238
cosine_accuracy@3 0.5238
cosine_accuracy@5 0.5238
cosine_accuracy@10 0.619
cosine_precision@1 0.5238
cosine_precision@3 0.5079
cosine_precision@5 0.4667
cosine_precision@10 0.4429
cosine_recall@1 0.0822
cosine_recall@3 0.2228
cosine_recall@5 0.2959
cosine_recall@10 0.4766
cosine_ndcg@10 0.5598
cosine_mrr@10 0.5374
cosine_map@100 0.6534

Information Retrieval

Metric Value
cosine_accuracy@1 0.5238
cosine_accuracy@3 0.5238
cosine_accuracy@5 0.5238
cosine_accuracy@10 0.619
cosine_precision@1 0.5238
cosine_precision@3 0.5079
cosine_precision@5 0.4667
cosine_precision@10 0.4429
cosine_recall@1 0.0822
cosine_recall@3 0.2228
cosine_recall@5 0.2959
cosine_recall@10 0.4766
cosine_ndcg@10 0.5598
cosine_mrr@10 0.5374
cosine_map@100 0.6531

Information Retrieval

Metric Value
cosine_accuracy@1 0.5238
cosine_accuracy@3 0.5238
cosine_accuracy@5 0.5238
cosine_accuracy@10 0.619
cosine_precision@1 0.5238
cosine_precision@3 0.5079
cosine_precision@5 0.4667
cosine_precision@10 0.4429
cosine_recall@1 0.0822
cosine_recall@3 0.2228
cosine_recall@5 0.2959
cosine_recall@10 0.4766
cosine_ndcg@10 0.5598
cosine_mrr@10 0.5374
cosine_map@100 0.6492

Information Retrieval

Metric Value
cosine_accuracy@1 0.619
cosine_accuracy@3 0.619
cosine_accuracy@5 0.619
cosine_accuracy@10 0.6667
cosine_precision@1 0.619
cosine_precision@3 0.6032
cosine_precision@5 0.5619
cosine_precision@10 0.519
cosine_recall@1 0.086
cosine_recall@3 0.2342
cosine_recall@5 0.3149
cosine_recall@10 0.5029
cosine_ndcg@10 0.6421
cosine_mrr@10 0.6259
cosine_map@100 0.6976

Information Retrieval

Metric Value
cosine_accuracy@1 0.5238
cosine_accuracy@3 0.5238
cosine_accuracy@5 0.5238
cosine_accuracy@10 0.619
cosine_precision@1 0.5238
cosine_precision@3 0.5079
cosine_precision@5 0.4667
cosine_precision@10 0.4429
cosine_recall@1 0.0812
cosine_recall@3 0.2198
cosine_recall@5 0.2909
cosine_recall@10 0.4667
cosine_ndcg@10 0.5598
cosine_mrr@10 0.5374
cosine_map@100 0.6479

Information Retrieval

Metric Value
cosine_accuracy@1 0.4286
cosine_accuracy@3 0.4762
cosine_accuracy@5 0.4762
cosine_accuracy@10 0.5714
cosine_precision@1 0.4286
cosine_precision@3 0.4444
cosine_precision@5 0.419
cosine_precision@10 0.3952
cosine_recall@1 0.0544
cosine_recall@3 0.187
cosine_recall@5 0.276
cosine_recall@10 0.437
cosine_ndcg@10 0.4918
cosine_mrr@10 0.458
cosine_map@100 0.5872

Training Details

Training Dataset

Unnamed Dataset

  • Size: 82 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 82 samples:
    anchor positive
    type string string
    details
    • min: 9 tokens
    • mean: 18.17 tokens
    • max: 34 tokens
    • min: 69 tokens
    • mean: 399.51 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What determines whether the act in question shall be punished if the offender is in the service of the legal holder of the data? Everyone who obtains access to data recorded in a computer or in the external memory of a computer or transmitted by telecommunication systems shall be punished with imprisonment for up to six months or by a fine from 29 to 15,000 Euro, under the condition that these acts have been committed without right, especially in violation of prohibitions or of security measures taken by the legal holder. If the act concerns the international relations or the security of the State, he shall be punished according to Article 148.
    If the offender is in the service of the legal holder of the data, the act of the preceding paragraph shall be punished only if it has been explicitly prohibited by internal regulations or by a written decision of the holder or of a competent employee of his.
    What must be causally connected to the perpetrator's deceptive acts? According to Article 386 paragraph 1 of the Greek Penal Code,

    "Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."

    From these provisions, it follows that, for the crime of fraud to be established, the following elements are required:

    a) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit;

    b) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is deceived and proceeds to an act, omission, or acquiescence detrimental to th...
    Who can be punished with imprisonment? 1. Anyone who, by knowingly presenting false facts as true or by unlawfully concealing or withholding true facts, damages another person's property by persuading someone to act, omission, or tolerance with the aim of obtaining, for themselves or another, an unlawful financial gain from the damage to that property shall be punished with imprisonment, "and if the damage caused is particularly great, with imprisonment of at least three (3) months and a fine." .
    If the damage caused exceeds a total of one hundred and twenty thousand (120,000) euros, imprisonment of up to ten (10) years and a fine shall be imposed.
    2. If the fraud is directed directly against the legal entity of the Greek State, legal entities governed by public law, or local government organizations, and the damage caused exceeds a total of one hundred and twenty thousand (120,000) euros, a prison sentence of at least ten (10) years and a fine of up to one thousand (1,000) daily units shall be imposed. This offense shall b...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss dim_1024_cosine_ndcg@10 dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1818 1 18.029 - - - - - -
0.3636 2 19.4106 - - - - - -
0.5455 3 16.6201 - - - - - -
0.7273 4 15.3048 - - - - - -
0.9091 5 14.0182 - - - - - -
1.0 6 6.4771 - - - - - -
1.0909 7 6.7664 0.6167 0.5821 0.5524 0.5177 0.5278 0.4124
1.1818 8 11.8583 - - - - - -
1.3636 9 11.9216 - - - - - -
1.5455 10 13.3764 - - - - - -
1.7273 11 12.9063 - - - - - -
1.9091 12 13.5984 - - - - - -
2.0 13 7.8523 - - - - - -
2.0909 14 4.4487 0.5921 0.5921 0.5518 0.5709 0.5685 0.5113
2.1818 15 8.5374 - - - - - -
2.3636 16 9.6999 - - - - - -
2.5455 17 9.0121 - - - - - -
2.7273 18 13.5705 - - - - - -
2.9091 19 13.0195 - - - - - -
3.0 20 7.9821 - - - - - -
3.0909 21 3.2842 0.5159 0.5636 0.5468 0.5468 0.5468 0.5233
3.1818 22 4.4446 - - - - - -
3.3636 23 5.7244 - - - - - -
3.5455 24 7.1394 - - - - - -
3.7273 25 16.7583 - - - - - -
3.9091 26 11.3515 - - - - - -
4.0 27 8.813 - - - - - -
4.0909 28 6.9124 0.5159 0.5468 0.4992 0.5468 0.4992 0.4992
4.1818 29 6.1814 - - - - - -
4.3636 30 7.1606 - - - - - -
4.5455 31 5.0888 - - - - - -
4.7273 32 5.0684 - - - - - -
4.9091 33 6.7382 - - - - - -
5.0 34 7.0497 - - - - - -
5.0909 35 6.582 0.5598 0.5598 0.5598 0.6421 0.5598 0.4918
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.1
  • Transformers: 4.51.3
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IoannisKat1/multilingual-e5-large-new

Finetuned
(143)
this model

Papers for IoannisKat1/multilingual-e5-large-new

Evaluation results