SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'cls', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("agraharr/telecom-bge-base-hard-neg")
# Run inference
sentences = [
    'What are the minimum conformance requirements for the event-triggered reporting tests outlined in this specification?',
    'The minimum conformance requirements for the event-triggered reporting tests are specified in clause 10.4.1.0 of the relevant technical specification. These requirements ensure that the UE meets the necessary performance standards when conducting tests related to event-triggered reporting. The reference document for these requirements is TS 38.133, specifically clause A.10.4.1.3, which outlines the detailed criteria and parameters that the UE must fulfill during testing. This includes aspects such as measurement reporting delays, the rate of correct event observations, and the handling of specific measurement quantities across different test scenarios. Meeting these requirements is essential for compliance and interoperability within the network.<|im_end|>',
    'When a User Equipment (UE) is in Automatic network selection mode and is switched on or returns to coverage, it must not select a CAG cell if the CAG-ID of that cell is not present in the Allowed CAG list. This requirement ensures that the UE adheres strictly to the CAG (Cell Access Group) policies in place, which are designed to control access to specific network resources. By preventing the selection of unauthorized CAG cells, the system maintains network integrity and security, ensuring that only authorized users can access certain network services. Therefore, the UE will ignore any CAG cells that are not permitted based on its configuration.<|im_end|>',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8795, 0.0193],
#         [0.8795, 1.0000, 0.1109],
#         [0.0193, 0.1109, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.994

Training Details

Training Dataset

Unnamed Dataset

  • Size: 39,360 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 4 tokens
    • mean: 22.58 tokens
    • max: 66 tokens
    • min: 3 tokens
    • mean: 105.42 tokens
    • max: 306 tokens
  • Samples:
    sentence_0 sentence_1
    What are the key differences between downlink resource allocation type 0 and type 1 in terms of resource block assignment? Downlink resource allocation type 0 utilizes a bitmap to indicate which Resource Block Groups (RBGs) are allocated to the UE. This bitmap is derived from the size and configuration of the bandwidth part and the RBG size defined by higher layer parameters. The assignment is based on consecutive virtual resource blocks, where each RBG is addressable and indexed in increasing frequency order. In contrast, downlink resource allocation type 1 provides a resource indication value (RIV) that specifies a starting virtual resource block and a length in terms of contiguously allocated resource blocks. Type 1 can also involve either non-interleaved or interleaved allocations within the active bandwidth part, depending on the DCI format used. Ultimately, type 0 is more granular in how resources are identified, while type 1 focuses on contiguous resource block assignments.<|im_end|>
    What is the purpose of NR carrier aggregation in the context of FR1 and FR2? NR carrier aggregation is designed to enhance the overall bandwidth and improve the data throughput by combining multiple frequency bands. Specifically, it allows for simultaneous use of at least one operating band from Frequency Range 1 (FR1) and one from Frequency Range 2 (FR2). The combination of these bands can significantly increase the peak data rates and improve user experience, particularly in areas where higher frequency bands provide greater capacity but may have limited coverage. This approach helps in leveraging the benefits of both frequency ranges to optimize network performance.<|im_end|>
    How does the spatial exclusion zone impact the testing of base station receivers? The spatial exclusion zone is a protective measure designed to safeguard the base station receiver during testing. It allows for the establishment of a controlled environment where external electromagnetic interference is minimized. For frequencies above 690 MHz, as specified by ETSI EN 301 489-50, the EMC RF electromagnetic field immunity requirement mandates a level of 10 V/m on the non-radiating faces of BS type 1-O and BS type 2-O. However, depending on the specific implementation of the base station, applying the spatial exclusion to all radiating faces may hinder proper execution of receiver immunity testing. In such scenarios, it is advisable to consider exclusion bands to protect the receivers while allowing for effective testing. This careful balance is crucial for accurate assessment of receiver performance.<|im_end|>
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 32,
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • num_train_epochs: 2
  • disable_tqdm: True
  • per_device_eval_batch_size: 32
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 2
  • max_steps: -1
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0
  • optim: adamw_torch
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: True
  • project: huggingface
  • trackio_space_id: None
  • trackio_bucket_id: None
  • trackio_static_space_id: None
  • per_device_eval_batch_size: 32
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_static_graph: None
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss telecom-eval_cosine_accuracy
-1 -1 - 0.7760
0.4065 500 0.3510 -
0.8130 1000 0.1703 -
1.0 1230 - 0.9920
1.2195 1500 0.1305 -
1.6260 2000 0.0961 -
2.0 2460 - 0.9940

Training Time

  • Training: 1.2 hours
  • Evaluation: 10.4 seconds
  • Total: 1.2 hours

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.4.1
  • Transformers: 5.7.0
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.13.0
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
36
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agraharr/telecom-bge-base-hard-neg

Finetuned
(467)
this model

Papers for agraharr/telecom-bge-base-hard-neg

Evaluation results