Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("DungHugging/mpnet-finetune-full")
# Run inference
sentences = [
'đăng ký nhận lãi tiết kiệm hàng tháng thay vì cuối kỳ',
'lựa chọn Monthly Interest Payout Option cho tài khoản Savings',
'công ty bảo hiểm chỉ thanh toán khi khách hàng cung cấp đủ bằng chứng',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9182, 0.6618],
# [0.9182, 1.0000, 0.7091],
# [0.6618, 0.7091, 1.0000]])
mpnet_contrastive_evalBinaryClassificationEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.8209 |
| cosine_accuracy_threshold | 0.7716 |
| cosine_f1 | 0.8389 |
| cosine_f1_threshold | 0.7716 |
| cosine_precision | 0.7977 |
| cosine_recall | 0.8846 |
| cosine_ap | 0.8922 |
| cosine_mcc | 0.6429 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
miễn phí thường niên năm đầu tiên |
phí thường niên năm đầu cao gấp đôi các năm sau |
0.0 |
Tỷ lệ quy đổi là 1 lượt golf đổi được 1 set ăn cho 2 người kèm 2 đồ uống. |
Mỗi lượt golf trong tài khoản có thể quy đổi thành một bữa ăn dành cho 02 người bao gồm đồ uống. |
1.0 |
Hợp đồng kỳ hạn không chuyển giao (Non-Deliverable Forward - NDF). |
Vào ngày đáo hạn, hai bên chỉ thanh toán chênh lệch tỷ giá bằng đồng tiền mạnh (thường là USD) thay vì giao nhận vốn gốc. |
1.0 |
OnlineContrastiveLosseval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 10multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 10max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | mpnet_contrastive_eval_cosine_ap |
|---|---|---|---|
| 0.5 | 42 | - | 0.5456 |
| 1.0 | 84 | - | 0.7198 |
| 1.5 | 126 | - | 0.7952 |
| 2.0 | 168 | - | 0.8277 |
| 2.5 | 210 | - | 0.8432 |
| 3.0 | 252 | - | 0.8581 |
| 3.5 | 294 | - | 0.8744 |
| 4.0 | 336 | - | 0.8748 |
| 4.5 | 378 | - | 0.8885 |
| 5.0 | 420 | - | 0.8893 |
| 5.5 | 462 | - | 0.8862 |
| 5.9524 | 500 | 0.8565 | - |
| 6.0 | 504 | - | 0.8847 |
| 6.5 | 546 | - | 0.8916 |
| 7.0 | 588 | - | 0.8942 |
| 7.5 | 630 | - | 0.8916 |
| 8.0 | 672 | - | 0.8907 |
| 8.5 | 714 | - | 0.8897 |
| 9.0 | 756 | - | 0.8918 |
| 9.5 | 798 | - | 0.8926 |
| 10.0 | 840 | - | 0.8922 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}