SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '[SECTION HEADING] § 24.6 Performance appraisal system. The members of the Service shall be subject to a performance appraisal system that is designed to encourage excellence in performance and shall provide for periodic and systematic assessment of the performance of members.',
    '[SECTION HEADING] § 24.6 Performance appraisal system. The members of the Service shall be subject to a performance appraisal system that is designed to encourage excellence in performance and shall provide for periodic and systematic assessment of the performance of members.',
    '[SUBSECTION a] General rules.. (1) An HMO or CMP that has an APCRP (as determined under § 417.590) greater than its ACR (as determined under § 417.594) must elect one of the options specified in paragraph (b) of this section. [CLAUSE 2] The dollar value of the elected option must, over the course of a contract period, be at least equal to the difference between the APCRP and the proposed ACR.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 0.6782],
#         [1.0000, 1.0000, 0.6782],
#         [0.6782, 0.6782, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 24,880 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 35 tokens
    • mean: 105.52 tokens
    • max: 335 tokens
    • min: 37 tokens
    • mean: 112.72 tokens
    • max: 335 tokens
  • Samples:
    sentence_0 sentence_1
    [ITEM i] The name and TIN of the CJR collaborator and the name, TIN, and NPI of the collaboration agent. [ITEM ii] The start date and, if applicable, end date, for the distribution arrangement between the CJR collaborator and the collaboration agent. [ENUM Downstream collaboration agents.] (3) For each physician, nonphysician practitioner, or therapist who is a downstream collaboration agent during the period of the CJR performance year specified by CMS— [ITEM i] The name and TIN of the CJR collaborator and the [MASK] TIN, and NPI of the downstream collaboration [ITEM i] The name and TIN of the CJR collaborator and the name, TIN, and NPI of the collaboration agent. [ITEM ii] The start date and, if applicable, end date, for the distribution arrangement between the CJR collaborator and the collaboration agent. [ENUM Downstream collaboration agents.] (3) For each physician, nonphysician practitioner, or therapist who is a downstream collaboration agent during the period of the CJR performance year specified by CMS— [ITEM i] The name and TIN of the CJR collaborator and the name and TIN of the collaboration agent and the name, TIN, and NPI of the downstream collaboration
    [SUBSECTION a] Termination of agreements.. (1) CMS may terminate any approved agreement if it finds, after the procedures described in this paragraph are followed that the State system does not satisfactorily meet the requirements of section 1886(c) of the Act or the regulations in this subpart. A termination must be effective on the last day of a calendar quarter. [CLAUSE 2] CMS will give the State reasonable notice of the proposed termination of an agreement [MASK] days before the effective date of the termination. [CLAUSE 3] CMS will give the State the opportunity to present evidence to refute the finding. [SUBSECTION a] Termination of agreements.. (1) CMS may terminate any approved agreement if it finds, after the procedures described in this paragraph are followed that the State system does not satisfactorily meet the requirements of section 1886(c) of the Act or the regulations in this subpart. A termination must be effective on the last day of a calendar quarter. [CLAUSE 2] CMS will give the State reasonable notice of the proposed termination of an agreement and of the reasons for the termination at least 90 days before the effective date of the termination. [CLAUSE 3] CMS will give the State the opportunity to present evidence to refute the finding.
    [CLAUSE 4] The amount of the post-TDAPA add-on payment adjustment is equal to 65 percent of the amount calculated in paragraph (g)(2) of this section, multiplied by the reduction factor specified in paragraph (g)(3) of this section, and multiplied by the latest available forecast of annual growth in the ESRD bundled market basket composite price proxy for pharmaceuticals. [CLAUSE 5] The post-TDAPA [MASK] ESRD PPS claim is adjsuted by any applicable patient-level case-mix adjustments under § 413.235. [CITATIONS] [CLAUSE 4] The amount of the post-TDAPA add-on payment adjustment is equal to 65 percent of the amount calculated in paragraph (g)(2) of this section, multiplied by the reduction factor specified in paragraph (g)(3) of this section, and multiplied by the latest available forecast of annual growth in the ESRD bundled market basket composite price proxy for pharmaceuticals. [CLAUSE 5] The post-TDAPA add-on payment adjustment that is applied to an ESRD PPS claim is adjsuted by any applicable patient-level case-mix adjustments under § 413.235. [CITATIONS]
  • Loss: DenoisingAutoEncoderLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.3215 500 5.7169
0.6431 1000 4.3196
0.9646 1500 3.8613
1.2862 2000 3.5443
1.6077 2500 3.357
1.9293 3000 3.2075
2.2508 3500 3.0466
2.5723 4000 2.9261
2.8939 4500 2.8525

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 5.2.0
  • Transformers: 4.56.0
  • PyTorch: 2.8.0+cu129
  • Accelerate: 1.10.1
  • Datasets: 4.4.1
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

DenoisingAutoEncoderLoss

@inproceedings{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    pages = "671--688",
    url = "https://arxiv.org/abs/2104.06979",
}
Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for atx-labs/bge-base-en-cfr

Finetuned
(434)
this model