Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
Paper • 2101.06983 • Published • 2
How to use agraharr/telecom-bge-base-hard-neg with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("agraharr/telecom-bge-base-hard-neg")
sentences = [
"What does DM-RS stand for?",
"In a groupcast scenario, a TX UE restarts its SL inactivity timer for the specific destination L2 ID whenever it receives new data directed to that same destination L2 ID. This mechanism ensures that the allowable transmission time is accurately determined based on the most recent data received. The RX UE, on the other hand, maintains a SL inactivity timer for each destination L2 ID and selects the largest timer value when multiple timers associated with different QoS profiles are configured. This coordination between TX and RX UEs helps maintain effective communication and manage resource allocation efficiently in groupcast transmissions.<|im_end|>",
"For a UE to identify and report the CGI of a known NR target cell while in RRC_CONNECTED state, it must be configured for SA operation mode. The UE shall identify and report the CGI when prompted by the network through the reportCGI command. It will receive one cell indication through *cellForWhichToReportCGI* for this purpose. The UE may utilize autonomous gaps in both downlink and uplink to receive the MIB and SIB1 messages, unless *useAutonomousGaps* is set to false. The identification of the CGI must be completed within a specific time frame, denoted as T~identify_CGI_redcap~, which is derived from the time taken to acquire the MIB and SIB1 messages. Furthermore, during this identification period, the UE must meet certain interruption requirements and ensure that it can detect the necessary signals under specified conditions. This process is crucial for maintaining connectivity and ensuring accurate cell identification in the network.<|im_end|>",
"Demodulation Reference Signal"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
(1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'cls', 'include_prompt': True})
(2): Normalize({})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("agraharr/telecom-bge-base-hard-neg")
# Run inference
sentences = [
'What are the minimum conformance requirements for the event-triggered reporting tests outlined in this specification?',
'The minimum conformance requirements for the event-triggered reporting tests are specified in clause 10.4.1.0 of the relevant technical specification. These requirements ensure that the UE meets the necessary performance standards when conducting tests related to event-triggered reporting. The reference document for these requirements is TS 38.133, specifically clause A.10.4.1.3, which outlines the detailed criteria and parameters that the UE must fulfill during testing. This includes aspects such as measurement reporting delays, the rate of correct event observations, and the handling of specific measurement quantities across different test scenarios. Meeting these requirements is essential for compliance and interoperability within the network.<|im_end|>',
'When a User Equipment (UE) is in Automatic network selection mode and is switched on or returns to coverage, it must not select a CAG cell if the CAG-ID of that cell is not present in the Allowed CAG list. This requirement ensures that the UE adheres strictly to the CAG (Cell Access Group) policies in place, which are designed to control access to specific network resources. By preventing the selection of unauthorized CAG cells, the system maintains network integrity and security, ensuring that only authorized users can access certain network services. Therefore, the UE will ignore any CAG cells that are not permitted based on its configuration.<|im_end|>',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8795, 0.0193],
# [0.8795, 1.0000, 0.1109],
# [0.0193, 0.1109, 1.0000]])
telecom-evalTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.994 |
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
What are the key differences between downlink resource allocation type 0 and type 1 in terms of resource block assignment? |
Downlink resource allocation type 0 utilizes a bitmap to indicate which Resource Block Groups (RBGs) are allocated to the UE. This bitmap is derived from the size and configuration of the bandwidth part and the RBG size defined by higher layer parameters. The assignment is based on consecutive virtual resource blocks, where each RBG is addressable and indexed in increasing frequency order. In contrast, downlink resource allocation type 1 provides a resource indication value (RIV) that specifies a starting virtual resource block and a length in terms of contiguously allocated resource blocks. Type 1 can also involve either non-interleaved or interleaved allocations within the active bandwidth part, depending on the DCI format used. Ultimately, type 0 is more granular in how resources are identified, while type 1 focuses on contiguous resource block assignments.<|im_end|> |
What is the purpose of NR carrier aggregation in the context of FR1 and FR2? |
NR carrier aggregation is designed to enhance the overall bandwidth and improve the data throughput by combining multiple frequency bands. Specifically, it allows for simultaneous use of at least one operating band from Frequency Range 1 (FR1) and one from Frequency Range 2 (FR2). The combination of these bands can significantly increase the peak data rates and improve user experience, particularly in areas where higher frequency bands provide greater capacity but may have limited coverage. This approach helps in leveraging the benefits of both frequency ranges to optimize network performance.<|im_end|> |
How does the spatial exclusion zone impact the testing of base station receivers? |
The spatial exclusion zone is a protective measure designed to safeguard the base station receiver during testing. It allows for the establishment of a controlled environment where external electromagnetic interference is minimized. For frequencies above 690 MHz, as specified by ETSI EN 301 489-50, the EMC RF electromagnetic field immunity requirement mandates a level of 10 V/m on the non-radiating faces of BS type 1-O and BS type 2-O. However, depending on the specific implementation of the base station, applying the spatial exclusion to all radiating faces may hinder proper execution of receiver immunity testing. In such scenarios, it is advisable to consider exclusion bands to protect the receivers while allowing for effective testing. This careful balance is crucial for accurate assessment of receiver performance.<|im_end|> |
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 32,
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
per_device_train_batch_size: 32num_train_epochs: 2disable_tqdm: Trueper_device_eval_batch_size: 32batch_sampler: no_duplicatesmulti_dataset_batch_sampler: round_robinper_device_train_batch_size: 32num_train_epochs: 2max_steps: -1learning_rate: 5e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0optim: adamw_torchoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1label_smoothing_factor: 0.0bf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Trueproject: huggingfacetrackio_space_id: Nonetrackio_bucket_id: Nonetrackio_static_space_id: Noneper_device_eval_batch_size: 32prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_static_graph: Noneddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | telecom-eval_cosine_accuracy |
|---|---|---|---|
| -1 | -1 | - | 0.7760 |
| 0.4065 | 500 | 0.3510 | - |
| 0.8130 | 1000 | 0.1703 | - |
| 1.0 | 1230 | - | 0.9920 |
| 1.2195 | 1500 | 0.1305 | - |
| 1.6260 | 2000 | 0.0961 | - |
| 2.0 | 2460 | - | 0.9940 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Base model
BAAI/bge-base-en-v1.5