Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("DungHugging/bge-m3-banking-v2")
# Run inference
sentences = [
'không tích điểm thưởng cho giao dịch rút tiền mặt',
'giao dịch rút tiền tại ATM bị loại trừ khỏi chương trình ưu đãi',
'Salmonoid chuyên phục vụ các món hải sản, không có món thịt bò trong menu.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8485, 0.6857],
# [0.8485, 1.0000, 0.6427],
# [0.6857, 0.6427, 1.0000]])
spec_sim and final_evaluationEmbeddingSimilarityEvaluator| Metric | spec_sim | final_evaluation |
|---|---|---|
| pearson_cosine | 0.4669 | 0.4669 |
| spearman_cosine | 0.4602 | 0.4602 |
spec_binBinaryClassificationEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.7297 |
| cosine_accuracy_threshold | 0.8031 |
| cosine_f1 | 0.7561 |
| cosine_f1_threshold | 0.8026 |
| cosine_precision | 0.7209 |
| cosine_recall | 0.7949 |
| cosine_ap | 0.7603 |
| cosine_mcc | 0.4574 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
HSSV từ 20 tuổi dùng Combo Hi-Tek được miễn phí nếu số dư bình quân đạt 500.000 VND. |
Điều kiện miễn phí cho sinh viên trên 20 tuổi là duy trì số dư bình quân từ 500.000 VND. |
1.0 |
vay tín chấp dựa trên lịch sử tín dụng (CIC) |
vay cầm cố dựa trên số dư sổ tiết kiệm |
0.0 |
tất toán sổ tiết kiệm từng phần linh hoạt |
rút gốc từng phần sẽ làm mất toàn bộ lãi suất của sổ |
0.0 |
OnlineContrastiveLosseval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 10multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 10max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | spec_sim_spearman_cosine | spec_bin_cosine_ap | final_evaluation_spearman_cosine |
|---|---|---|---|---|---|
| 0.4970 | 83 | - | -0.1803 | 0.4536 | - |
| 0.9940 | 166 | - | -0.0670 | 0.4908 | - |
| 1.0 | 167 | - | -0.0659 | 0.4914 | - |
| 1.4910 | 249 | - | 0.0642 | 0.5423 | - |
| 1.9880 | 332 | - | 0.1309 | 0.5742 | - |
| 2.0 | 334 | - | 0.1322 | 0.5750 | - |
| 2.4850 | 415 | - | 0.2133 | 0.6127 | - |
| 2.9820 | 498 | - | 0.2749 | 0.6459 | - |
| 2.9940 | 500 | 0.9824 | - | - | - |
| 3.0 | 501 | - | 0.2777 | 0.6468 | - |
| 3.4790 | 581 | - | 0.3197 | 0.6726 | - |
| 3.9760 | 664 | - | 0.3543 | 0.6954 | - |
| 4.0 | 668 | - | 0.3554 | 0.6960 | - |
| 4.4731 | 747 | - | 0.3734 | 0.7081 | - |
| 4.9701 | 830 | - | 0.3955 | 0.7203 | - |
| 5.0 | 835 | - | 0.3963 | 0.7207 | - |
| 5.4671 | 913 | - | 0.4037 | 0.7250 | - |
| 5.9641 | 996 | - | 0.4241 | 0.7372 | - |
| 5.9880 | 1000 | 0.6612 | - | - | - |
| 6.0 | 1002 | - | 0.4259 | 0.7381 | - |
| 6.4611 | 1079 | - | 0.4328 | 0.7414 | - |
| 6.9581 | 1162 | - | 0.4393 | 0.7452 | - |
| 7.0 | 1169 | - | 0.4398 | 0.7454 | - |
| 7.4551 | 1245 | - | 0.4466 | 0.7530 | - |
| 7.9521 | 1328 | - | 0.4520 | 0.7562 | - |
| 8.0 | 1336 | - | 0.4527 | 0.7566 | - |
| 8.4491 | 1411 | - | 0.4573 | 0.7592 | - |
| 8.9461 | 1494 | - | 0.4590 | 0.7599 | - |
| 8.9820 | 1500 | 0.5747 | - | - | - |
| 9.0 | 1503 | - | 0.4586 | 0.7596 | - |
| 9.4431 | 1577 | - | 0.4597 | 0.7600 | - |
| 9.9401 | 1660 | - | 0.4602 | 0.7603 | - |
| 10.0 | 1670 | - | 0.4602 | 0.7603 | - |
| -1 | -1 | - | - | - | 0.4602 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
BAAI/bge-m3