SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 64 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'cls', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("agraharr/telecom-gte-modernbert-matryoshka")
# Run inference
sentences = [
    'What are the key steps involved in the physical-layer processing for the downlink transport channels?',
    'The physical-layer processing for downlink transport channels involves several critical steps. First',
    'The BAP MAPPING CONFIGURATION ACKNOWLEDGE message serves as a response from the gNB-DU to the gNB-CU',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8686, 0.1211],
#         [0.8686, 1.0000, 0.1872],
#         [0.1211, 0.1872, 1.0000]])

Evaluation

Metrics

Triplet

Metric eval final
cosine_accuracy 0.9 0.895

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,800 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 10 tokens
    • mean: 23.42 tokens
    • max: 59 tokens
    • min: 5 tokens
    • mean: 20.89 tokens
    • max: 64 tokens
    • min: 2 tokens
    • mean: 21.95 tokens
    • max: 63 tokens
  • Samples:
    anchor positive negative
    What is the primary purpose of collecting performance measurements in an O-RAN system? 2. To assess the overall performance and capacity of the network. 1. To identify and troubleshoot system failures.
    Which organization is responsible for the development of the O-RAN specifications? 3. O-RAN ALLIANCE 1. 3GPP
    What are the capabilities of a UE in Carrier Aggregation when it comes to timing advance? In Carrier Aggregation (CA), a User Equipment (UE) can operate under different timing advance capabi The PUSCH Pathloss Reference RS Update MAC CE includes several key fields: The Serving Cell ID, whic
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            384,
            128
        ],
        "matryoshka_weights": [
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 200 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 200 samples:
    anchor positive negative
    type string string string
    details
    • min: 14 tokens
    • mean: 25.21 tokens
    • max: 47 tokens
    • min: 5 tokens
    • mean: 21.38 tokens
    • max: 50 tokens
    • min: 4 tokens
    • mean: 23.97 tokens
    • max: 62 tokens
  • Samples:
    anchor positive negative
    What is the maximum acceptable relative time error between the O-DU and O-RU for S-plane measurement signals in O-RAN, according to the provided context? 2. 3 µs 1. 1.5 µs
    What are the calibration metrics calibrated for BS antenna configuration Config 2 in Table 7.8-2? Table 7.8-2: Simulation assumptions for full calibration The UE variable VarRA-Report includes the random-access related information. VarRA-Report UE variable -- ASN1START -- TAG-VARRA-REPORT-START VarRA-Rep
    How does the scheduled modulation order affect the PT-RS scaling factor when transform precoding is enabled? When transform precoding is enabled, the PT-RS scaling factor (β') is determined based on the schedu The spurious response test for 2DL carrier aggregation (CA) is designed to assess the receiver's cap
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            384,
            128
        ],
        "matryoshka_weights": [
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4
  • num_train_epochs: 2
  • learning_rate: 3e-05
  • warmup_steps: 0.1
  • disable_tqdm: True
  • per_device_eval_batch_size: 4
  • push_to_hub: True
  • hub_model_id: agraharr/telecom-gte-modernbert-matryoshka
  • hub_strategy: end
  • load_best_model_at_end: True
  • dataloader_pin_memory: False

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 4
  • num_train_epochs: 2
  • max_steps: -1
  • learning_rate: 3e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: True
  • project: huggingface
  • trackio_space_id: None
  • trackio_bucket_id: None
  • trackio_static_space_id: None
  • per_device_eval_batch_size: 4
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: True
  • hub_private_repo: None
  • hub_model_id: agraharr/telecom-gte-modernbert-matryoshka
  • hub_strategy: end
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: False
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_static_graph: None
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss eval_cosine_accuracy final_cosine_accuracy
-1 -1 - - 0.8000 -
0.0022 1 1.2819 - - -
0.1111 50 2.1454 - - -
0.2222 100 1.2912 - - -
0.3333 150 1.3769 - - -
0.4444 200 1.5782 - - -
0.5556 250 1.0937 - - -
0.6667 300 1.0673 - - -
0.7778 350 1.2251 - - -
0.8889 400 1.0413 - - -
1.0 450 0.8361 0.8982 0.8800 -
1.1111 500 0.6237 - - -
1.2222 550 0.7264 - - -
1.3333 600 0.5985 - - -
1.4444 650 0.7544 - - -
1.5556 700 0.7694 - - -
1.6667 750 0.6571 - - -
1.7778 800 0.4875 - - -
1.8889 850 0.5598 - - -
2.0 900 0.5807 0.8917 0.9 -
-1 -1 - - - 0.8950
  • The bold row denotes the saved checkpoint.

Training Time

  • Training: 13.8 minutes
  • Evaluation: 28.7 seconds
  • Total: 14.3 minutes

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.4.1
  • Transformers: 5.6.1
  • PyTorch: 2.11.0+cu130
  • Accelerate: 1.13.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month
204
Safetensors
Model size
0.1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agraharr/telecom-gte-modernbert-matryoshka

Finetuned
(361)
this model

Papers for agraharr/telecom-gte-modernbert-matryoshka

Evaluation results