BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Net write-off rate – principal only – Represents the amount of proprietary consumer or small business Card Member loans or receivables written off, consisting of principal (resulting from authorized transactions), less recoveries, as a percentage of the average loan or receivable balance during the period.',
    "What does the term 'net write-off rate – principal only' refer to?",
    "What operating system is used for the Company's iPhone line?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9044, 0.1858],
#         [0.9044, 1.0000, 0.1715],
#         [0.1858, 0.1715, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.69
cosine_accuracy@3 0.82
cosine_accuracy@5 0.8614
cosine_accuracy@10 0.9043
cosine_precision@1 0.69
cosine_precision@3 0.2733
cosine_precision@5 0.1723
cosine_precision@10 0.0904
cosine_recall@1 0.69
cosine_recall@3 0.82
cosine_recall@5 0.8614
cosine_recall@10 0.9043
cosine_ndcg@10 0.797
cosine_mrr@10 0.7627
cosine_map@100 0.7663

Information Retrieval

Metric Value
cosine_accuracy@1 0.6814
cosine_accuracy@3 0.8214
cosine_accuracy@5 0.8657
cosine_accuracy@10 0.9086
cosine_precision@1 0.6814
cosine_precision@3 0.2738
cosine_precision@5 0.1731
cosine_precision@10 0.0909
cosine_recall@1 0.6814
cosine_recall@3 0.8214
cosine_recall@5 0.8657
cosine_recall@10 0.9086
cosine_ndcg@10 0.7963
cosine_mrr@10 0.7602
cosine_map@100 0.7635

Information Retrieval

Metric Value
cosine_accuracy@1 0.68
cosine_accuracy@3 0.8186
cosine_accuracy@5 0.8629
cosine_accuracy@10 0.91
cosine_precision@1 0.68
cosine_precision@3 0.2729
cosine_precision@5 0.1726
cosine_precision@10 0.091
cosine_recall@1 0.68
cosine_recall@3 0.8186
cosine_recall@5 0.8629
cosine_recall@10 0.91
cosine_ndcg@10 0.7944
cosine_mrr@10 0.7573
cosine_map@100 0.7603

Information Retrieval

Metric Value
cosine_accuracy@1 0.6614
cosine_accuracy@3 0.8014
cosine_accuracy@5 0.8471
cosine_accuracy@10 0.8971
cosine_precision@1 0.6614
cosine_precision@3 0.2671
cosine_precision@5 0.1694
cosine_precision@10 0.0897
cosine_recall@1 0.6614
cosine_recall@3 0.8014
cosine_recall@5 0.8471
cosine_recall@10 0.8971
cosine_ndcg@10 0.7792
cosine_mrr@10 0.7415
cosine_map@100 0.7452

Information Retrieval

Metric Value
cosine_accuracy@1 0.6514
cosine_accuracy@3 0.7771
cosine_accuracy@5 0.8086
cosine_accuracy@10 0.87
cosine_precision@1 0.6514
cosine_precision@3 0.259
cosine_precision@5 0.1617
cosine_precision@10 0.087
cosine_recall@1 0.6514
cosine_recall@3 0.7771
cosine_recall@5 0.8086
cosine_recall@10 0.87
cosine_ndcg@10 0.7583
cosine_mrr@10 0.7229
cosine_map@100 0.7277

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 4 tokens
    • mean: 47.53 tokens
    • max: 512 tokens
    • min: 8 tokens
    • mean: 20.3 tokens
    • max: 45 tokens
  • Samples:
    positive anchor
    The report on the financial statements dated February 16, 2024, was conducted by PricewaterhouseCoopers LLP, a registered public accounting firm. Who conducted the audit of the financial statements reported on February 16, 2024?
    The term 'Part IV hereof' in the document refers to the section that directly precedes the consolidated financial statements. What does 'Part IV hereof' refer to in the context of a document layout?
    The Company believes that compensation should be competitive and equitable, and should enable employees to share in the Company’s success. The Company recognizes its people are most likely to thrive when they have the resources to meet their needs and the time and support to succeed in their professional and personal lives. In support of this, the Company offers a wide variety of benefits for employees around the world and invests in tools and resources that are designed to support employees’ individual growth and development. What does Apple believe contributes significantly to employee thriving in professional and personal lives?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: False
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0812 1 2.3727 - - - - -
0.1624 2 2.2415 - - - - -
0.2437 3 1.936 - - - - -
0.3249 4 1.7652 - - - - -
0.4061 5 1.7747 - - - - -
0.4873 6 1.8633 - - - - -
0.5685 7 1.2185 - - - - -
0.6497 8 1.1714 - - - - -
0.7310 9 1.2446 - - - - -
0.8122 10 0.8898 - - - - -
0.8934 11 0.7339 - - - - -
0.9746 12 0.9125 - - - - -
1.0 13 0.8029 0.7797 0.7832 0.7809 0.7678 0.7293
1.0812 14 0.8589 - - - - -
1.1624 15 0.6242 - - - - -
1.2437 16 0.5551 - - - - -
1.3249 17 0.531 - - - - -
1.4061 18 0.6005 - - - - -
1.4873 19 0.5804 - - - - -
1.5685 20 0.4858 - - - - -
1.6497 21 0.5477 - - - - -
1.7310 22 0.5264 - - - - -
1.8122 23 0.5103 - - - - -
1.8934 24 0.526 - - - - -
1.9746 25 0.5467 - - - - -
2.0 26 0.5616 0.7939 0.7914 0.7920 0.7771 0.7526
2.0812 27 0.3629 - - - - -
2.1624 28 0.5791 - - - - -
2.2437 29 0.3589 - - - - -
2.3249 30 0.4871 - - - - -
2.4061 31 0.4882 - - - - -
2.4873 32 0.4918 - - - - -
2.5685 33 0.5361 - - - - -
2.6497 34 0.3287 - - - - -
2.7310 35 0.4862 - - - - -
2.8122 36 0.4074 - - - - -
2.8934 37 0.3773 - - - - -
2.9746 38 0.5296 - - - - -
3.0 39 0.3996 0.7967 0.796 0.7939 0.78 0.7577
3.0812 40 0.4635 - - - - -
3.1624 41 0.5024 - - - - -
3.2437 42 0.2432 - - - - -
3.3249 43 0.3366 - - - - -
3.4061 44 0.3927 - - - - -
3.4873 45 0.383 - - - - -
3.5685 46 0.4662 - - - - -
3.6497 47 0.4321 - - - - -
3.7310 48 0.4524 - - - - -
3.8122 49 0.4118 - - - - -
3.8934 50 0.3569 - - - - -
3.9746 51 0.4397 - - - - -
4.0 52 0.3315 0.7970 0.7963 0.7944 0.7792 0.7583
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.1
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.4.2
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ThuyNT03/bge-base-financial-matryoshka

Finetuned
(460)
this model

Papers for ThuyNT03/bge-base-financial-matryoshka

Evaluation results