BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zkdtckk/bge-base-financial-matryoshka")
# Run inference
sentences = [
    'How much cash collateral did AT&T receive on a net basis during 2023?',
    'During 2023, we received approximately $220 of cash collateral, on a net basis.',
    'NIKE Direct revenues increased 22%, driven by digital sales growth of 23% and comparable store sales growth of 28%.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6843
cosine_accuracy@3 0.8414
cosine_accuracy@5 0.8686
cosine_accuracy@10 0.9057
cosine_precision@1 0.6843
cosine_precision@3 0.2805
cosine_precision@5 0.1737
cosine_precision@10 0.0906
cosine_recall@1 0.6843
cosine_recall@3 0.8414
cosine_recall@5 0.8686
cosine_recall@10 0.9057
cosine_ndcg@10 0.8004
cosine_mrr@10 0.7661
cosine_map@100 0.7696

Information Retrieval

Metric Value
cosine_accuracy@1 0.6914
cosine_accuracy@3 0.8371
cosine_accuracy@5 0.8671
cosine_accuracy@10 0.9043
cosine_precision@1 0.6914
cosine_precision@3 0.279
cosine_precision@5 0.1734
cosine_precision@10 0.0904
cosine_recall@1 0.6914
cosine_recall@3 0.8371
cosine_recall@5 0.8671
cosine_recall@10 0.9043
cosine_ndcg@10 0.8021
cosine_mrr@10 0.7689
cosine_map@100 0.7728

Information Retrieval

Metric Value
cosine_accuracy@1 0.6786
cosine_accuracy@3 0.8243
cosine_accuracy@5 0.8657
cosine_accuracy@10 0.9071
cosine_precision@1 0.6786
cosine_precision@3 0.2748
cosine_precision@5 0.1731
cosine_precision@10 0.0907
cosine_recall@1 0.6786
cosine_recall@3 0.8243
cosine_recall@5 0.8657
cosine_recall@10 0.9071
cosine_ndcg@10 0.7957
cosine_mrr@10 0.7597
cosine_map@100 0.7632

Information Retrieval

Metric Value
cosine_accuracy@1 0.6886
cosine_accuracy@3 0.8086
cosine_accuracy@5 0.8571
cosine_accuracy@10 0.9
cosine_precision@1 0.6886
cosine_precision@3 0.2695
cosine_precision@5 0.1714
cosine_precision@10 0.09
cosine_recall@1 0.6886
cosine_recall@3 0.8086
cosine_recall@5 0.8571
cosine_recall@10 0.9
cosine_ndcg@10 0.794
cosine_mrr@10 0.7601
cosine_map@100 0.7642

Information Retrieval

Metric Value
cosine_accuracy@1 0.65
cosine_accuracy@3 0.8014
cosine_accuracy@5 0.8357
cosine_accuracy@10 0.8871
cosine_precision@1 0.65
cosine_precision@3 0.2671
cosine_precision@5 0.1671
cosine_precision@10 0.0887
cosine_recall@1 0.65
cosine_recall@3 0.8014
cosine_recall@5 0.8357
cosine_recall@10 0.8871
cosine_ndcg@10 0.7703
cosine_mrr@10 0.7327
cosine_map@100 0.7368

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 6,300 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 20.49 tokens
    • max: 45 tokens
    • min: 7 tokens
    • mean: 45.43 tokens
    • max: 246 tokens
  • Samples:
    anchor positive
    How are changes in estimates reflected in financial statements? Changes in estimates are reflected in our financial statements in the period of change based upon on-going actual experience trends or subsequent settlements and realizations depending on the nature and predictability of the estimates and contingencies.
    What was the amount of water consumed per square meter at Hilton's properties in 2023? In 2023, Hilton's properties consumed 0.536 cubic meters of water per square meter.
    How much income tax benefit did HP receive from the US tax return filing in fiscal 2021? HP gained $12 million of income tax benefits as a result of the fiscal 2021 U.S. tax return filing primarily from the decrease in Global Intangible Low Taxed Income.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1016 10 4.6759 - - - - -
0.2032 20 5.2962 - - - - -
0.3048 30 3.5634 - - - - -
0.4063 40 2.4647 - - - - -
0.5079 50 2.4715 - - - - -
0.6095 60 1.8329 - - - - -
0.7111 70 1.3754 - - - - -
0.8127 80 2.2498 - - - - -
0.9143 90 1.4359 - - - - -
1.0 99 - 0.7990 0.7996 0.7931 0.7841 0.7536
1.0102 100 1.1898 - - - - -
1.1117 110 1.3344 - - - - -
1.2133 120 0.9468 - - - - -
1.3149 130 1.3376 - - - - -
1.4165 140 1.0253 - - - - -
1.5181 150 1.0209 - - - - -
1.6197 160 0.9905 - - - - -
1.7213 170 0.4743 - - - - -
1.8229 180 1.0679 - - - - -
1.9244 190 1.1084 - - - - -
2.0 198 - 0.8007 0.8026 0.7984 0.794 0.764
2.0203 200 0.4729 - - - - -
2.1219 210 0.956 - - - - -
2.2235 220 0.7895 - - - - -
2.3251 230 0.5689 - - - - -
2.4267 240 1.4485 - - - - -
2.5283 250 0.5067 - - - - -
2.6298 260 0.577 - - - - -
2.7314 270 1.1618 - - - - -
2.8330 280 0.7196 - - - - -
2.9346 290 0.3933 - - - - -
3.0 297 - 0.8019 0.8022 0.7945 0.7929 0.7701
3.0305 300 1.246 - - - - -
3.1321 310 0.5745 - - - - -
3.2337 320 1.0934 - - - - -
3.3352 330 0.9014 - - - - -
3.4368 340 0.2902 - - - - -
3.5384 350 0.2325 - - - - -
3.64 360 0.5165 - - - - -
3.7416 370 1.1044 - - - - -
3.8432 380 0.583 - - - - -
3.9448 390 0.346 - - - - -
4.0 396 - 0.8004 0.8021 0.7957 0.7940 0.7703
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zkdtckk/bge-base-financial-matryoshka

Finetuned
(465)
this model

Papers for zkdtckk/bge-base-financial-matryoshka

Evaluation results