Matryoshka Representation Learning
Paper • 2205.13147 • Published • 27
How to use agraharr/telecom-gte-modernbert-matryoshka with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("agraharr/telecom-gte-modernbert-matryoshka")
sentences = [
"How does the NG-RAN node respond after successfully activating UL SRS transmission in the UE?",
"After successfully activating the UL SRS transmission in the UE, the NG-RAN node responds with a POS",
"Table 6.1.1.1-6: Beam layout parameters for single satellite simulation",
"The transmit OFF power limits are set at -35 dBm for various operating bands such as n257, n258, n25"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
(1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'cls', 'include_prompt': True})
(2): Normalize({})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("agraharr/telecom-gte-modernbert-matryoshka")
# Run inference
sentences = [
'What are the key steps involved in the physical-layer processing for the downlink transport channels?',
'The physical-layer processing for downlink transport channels involves several critical steps. First',
'The BAP MAPPING CONFIGURATION ACKNOWLEDGE message serves as a response from the gNB-DU to the gNB-CU',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8686, 0.1211],
# [0.8686, 1.0000, 0.1872],
# [0.1211, 0.1872, 1.0000]])
eval and finalTripletEvaluator| Metric | eval | final |
|---|---|---|
| cosine_accuracy | 0.9 | 0.895 |
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
What is the primary purpose of collecting performance measurements in an O-RAN system? |
2. To assess the overall performance and capacity of the network. |
1. To identify and troubleshoot system failures. |
Which organization is responsible for the development of the O-RAN specifications? |
3. O-RAN ALLIANCE |
1. 3GPP |
What are the capabilities of a UE in Carrier Aggregation when it comes to timing advance? |
In Carrier Aggregation (CA), a User Equipment (UE) can operate under different timing advance capabi |
The PUSCH Pathloss Reference RS Update MAC CE includes several key fields: The Serving Cell ID, whic |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
384,
128
],
"matryoshka_weights": [
1,
1
],
"n_dims_per_step": -1
}
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
What is the maximum acceptable relative time error between the O-DU and O-RU for S-plane measurement signals in O-RAN, according to the provided context? |
2. 3 µs |
1. 1.5 µs |
What are the calibration metrics calibrated for BS antenna configuration Config 2 in Table 7.8-2? |
Table 7.8-2: Simulation assumptions for full calibration |
The UE variable VarRA-Report includes the random-access related information. VarRA-Report UE variable -- ASN1START -- TAG-VARRA-REPORT-START VarRA-Rep |
How does the scheduled modulation order affect the PT-RS scaling factor when transform precoding is enabled? |
When transform precoding is enabled, the PT-RS scaling factor (β') is determined based on the schedu |
The spurious response test for 2DL carrier aggregation (CA) is designed to assess the receiver's cap |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
384,
128
],
"matryoshka_weights": [
1,
1
],
"n_dims_per_step": -1
}
per_device_train_batch_size: 4num_train_epochs: 2learning_rate: 3e-05warmup_steps: 0.1disable_tqdm: Trueper_device_eval_batch_size: 4push_to_hub: Truehub_model_id: agraharr/telecom-gte-modernbert-matryoshkahub_strategy: endload_best_model_at_end: Truedataloader_pin_memory: Falseper_device_train_batch_size: 4num_train_epochs: 2max_steps: -1learning_rate: 3e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Trueproject: huggingfacetrackio_space_id: Nonetrackio_bucket_id: Nonetrackio_static_space_id: Noneper_device_eval_batch_size: 4prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Truehub_private_repo: Nonehub_model_id: agraharr/telecom-gte-modernbert-matryoshkahub_strategy: endhub_always_push: Falsehub_revision: Noneload_best_model_at_end: Trueignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Falsedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_static_graph: Noneddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss | eval_cosine_accuracy | final_cosine_accuracy |
|---|---|---|---|---|---|
| -1 | -1 | - | - | 0.8000 | - |
| 0.0022 | 1 | 1.2819 | - | - | - |
| 0.1111 | 50 | 2.1454 | - | - | - |
| 0.2222 | 100 | 1.2912 | - | - | - |
| 0.3333 | 150 | 1.3769 | - | - | - |
| 0.4444 | 200 | 1.5782 | - | - | - |
| 0.5556 | 250 | 1.0937 | - | - | - |
| 0.6667 | 300 | 1.0673 | - | - | - |
| 0.7778 | 350 | 1.2251 | - | - | - |
| 0.8889 | 400 | 1.0413 | - | - | - |
| 1.0 | 450 | 0.8361 | 0.8982 | 0.8800 | - |
| 1.1111 | 500 | 0.6237 | - | - | - |
| 1.2222 | 550 | 0.7264 | - | - | - |
| 1.3333 | 600 | 0.5985 | - | - | - |
| 1.4444 | 650 | 0.7544 | - | - | - |
| 1.5556 | 700 | 0.7694 | - | - | - |
| 1.6667 | 750 | 0.6571 | - | - | - |
| 1.7778 | 800 | 0.4875 | - | - | - |
| 1.8889 | 850 | 0.5598 | - | - | - |
| 2.0 | 900 | 0.5807 | 0.8917 | 0.9 | - |
| -1 | -1 | - | - | - | 0.8950 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
Base model
BAAI/bge-small-en-v1.5