SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("DungHugging/bge-m3-banking-v2")
# Run inference
sentences = [
    'không tích điểm thưởng cho giao dịch rút tiền mặt',
    'giao dịch rút tiền tại ATM bị loại trừ khỏi chương trình ưu đãi',
    'Salmonoid chuyên phục vụ các món hải sản, không có món thịt bò trong menu.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8485, 0.6857],
#         [0.8485, 1.0000, 0.6427],
#         [0.6857, 0.6427, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric spec_sim final_evaluation
pearson_cosine 0.4669 0.4669
spearman_cosine 0.4602 0.4602

Binary Classification

Metric Value
cosine_accuracy 0.7297
cosine_accuracy_threshold 0.8031
cosine_f1 0.7561
cosine_f1_threshold 0.8026
cosine_precision 0.7209
cosine_recall 0.7949
cosine_ap 0.7603
cosine_mcc 0.4574

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,665 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 6 tokens
    • mean: 14.98 tokens
    • max: 28 tokens
    • min: 8 tokens
    • mean: 18.38 tokens
    • max: 55 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    HSSV từ 20 tuổi dùng Combo Hi-Tek được miễn phí nếu số dư bình quân đạt 500.000 VND. Điều kiện miễn phí cho sinh viên trên 20 tuổi là duy trì số dư bình quân từ 500.000 VND. 1.0
    vay tín chấp dựa trên lịch sử tín dụng (CIC) vay cầm cố dựa trên số dư sổ tiết kiệm 0.0
    tất toán sổ tiết kiệm từng phần linh hoạt rút gốc từng phần sẽ làm mất toàn bộ lãi suất của sổ 0.0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss spec_sim_spearman_cosine spec_bin_cosine_ap final_evaluation_spearman_cosine
0.4970 83 - -0.1803 0.4536 -
0.9940 166 - -0.0670 0.4908 -
1.0 167 - -0.0659 0.4914 -
1.4910 249 - 0.0642 0.5423 -
1.9880 332 - 0.1309 0.5742 -
2.0 334 - 0.1322 0.5750 -
2.4850 415 - 0.2133 0.6127 -
2.9820 498 - 0.2749 0.6459 -
2.9940 500 0.9824 - - -
3.0 501 - 0.2777 0.6468 -
3.4790 581 - 0.3197 0.6726 -
3.9760 664 - 0.3543 0.6954 -
4.0 668 - 0.3554 0.6960 -
4.4731 747 - 0.3734 0.7081 -
4.9701 830 - 0.3955 0.7203 -
5.0 835 - 0.3963 0.7207 -
5.4671 913 - 0.4037 0.7250 -
5.9641 996 - 0.4241 0.7372 -
5.9880 1000 0.6612 - - -
6.0 1002 - 0.4259 0.7381 -
6.4611 1079 - 0.4328 0.7414 -
6.9581 1162 - 0.4393 0.7452 -
7.0 1169 - 0.4398 0.7454 -
7.4551 1245 - 0.4466 0.7530 -
7.9521 1328 - 0.4520 0.7562 -
8.0 1336 - 0.4527 0.7566 -
8.4491 1411 - 0.4573 0.7592 -
8.9461 1494 - 0.4590 0.7599 -
8.9820 1500 0.5747 - - -
9.0 1503 - 0.4586 0.7596 -
9.4431 1577 - 0.4597 0.7600 -
9.9401 1660 - 0.4602 0.7603 -
10.0 1670 - 0.4602 0.7603 -
-1 -1 - - - 0.4602

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.1
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.4.2
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DungHugging/bge-m3-banking-v2

Base model

BAAI/bge-m3
Finetuned
(378)
this model

Paper for DungHugging/bge-m3-banking-v2

Evaluation results