Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
25
This is a sentence-transformers model finetuned from sentence-transformers/msmarco-distilbert-base-v4. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Shashwat13333/msmarco-distilbert-base-v4")
# Run inference
sentences = [
'What managed services does TechChefz provide ?',
' What we do\n\nDigital Strategy\nCreating digital frameworks that transform your digital enterprise and produce a return on investment.\n\nPlatform Selection\nHelping you select the optimal digital experience, commerce, cloud and marketing platform for your enterprise.\n\nPlatform Builds\nDeploying next-gen scalable and agile enterprise digital platforms, along with multi-platform integrations.\n\nProduct Builds\nHelp you ideate, strategize, and engineer your product with help of our enterprise frameworks \n\nTeam Augmentation\nHelp you scale up and augment your existing team to solve your hiring challenges with our easy to deploy staff augmentation offerings .\nManaged Services\nOperate and monitor your business-critical applications, data, and IT workloads, along with Application maintenance and operations\n',
'In the Introducing the world of Global Insurance Firm, we crafted Effective Solutions for Complex Problems and delieverd a comprehensive Website Development, Production Support & Managed Services, we optimized customer journeys, integrate analytics, CRM, ERP, and third-party applications, and implement cutting-edge technologies for enhanced performance and efficiency\nand achievied 200% Reduction in operational time & effort managing content & experience, 70% Reduction in Deployment Errors and Downtime, 2.5X Customer Engagement, Conversion & Retention',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
dim_768, dim_512, dim_256, dim_128 and dim_64InformationRetrievalEvaluator| Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
|---|---|---|---|---|---|
| cosine_accuracy@1 | 0.1067 | 0.1067 | 0.1467 | 0.12 | 0.16 |
| cosine_accuracy@3 | 0.4933 | 0.4667 | 0.4533 | 0.4533 | 0.3867 |
| cosine_accuracy@5 | 0.5333 | 0.5333 | 0.4933 | 0.4933 | 0.4667 |
| cosine_accuracy@10 | 0.6267 | 0.6133 | 0.6 | 0.6 | 0.5467 |
| cosine_precision@1 | 0.1067 | 0.1067 | 0.1467 | 0.12 | 0.16 |
| cosine_precision@3 | 0.1644 | 0.1556 | 0.1511 | 0.1511 | 0.1289 |
| cosine_precision@5 | 0.1067 | 0.1067 | 0.0987 | 0.0987 | 0.0933 |
| cosine_precision@10 | 0.0627 | 0.0613 | 0.06 | 0.06 | 0.0547 |
| cosine_recall@1 | 0.1067 | 0.1067 | 0.1467 | 0.12 | 0.16 |
| cosine_recall@3 | 0.4933 | 0.4667 | 0.4533 | 0.4533 | 0.3867 |
| cosine_recall@5 | 0.5333 | 0.5333 | 0.4933 | 0.4933 | 0.4667 |
| cosine_recall@10 | 0.6267 | 0.6133 | 0.6 | 0.6 | 0.5467 |
| cosine_ndcg@10 | 0.3697 | 0.3703 | 0.3732 | 0.3495 | 0.3449 |
| cosine_mrr@10 | 0.2865 | 0.2909 | 0.3006 | 0.2696 | 0.281 |
| cosine_map@100 | 0.2993 | 0.3047 | 0.3135 | 0.2815 | 0.2953 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
How can digital transformation enhance customer interactions across multiple channels? |
We offer custom software development, digital marketing strategies, and tailored solutions to drive tangible results for your business. Our expert team combines technical prowess with industry insights to propel your business forward in the digital landscape. |
How does a CRM system improve customer retention? |
Our MarTech capabilities |
How can your recommendation engines improve our business? |
How can your recommendation engines improve our business? |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: epochgradient_accumulation_steps: 4learning_rate: 1e-05weight_decay: 0.01num_train_epochs: 4lr_scheduler_type: cosinewarmup_ratio: 0.1fp16: Trueload_best_model_at_end: Trueoptim: adamw_torch_fusedpush_to_hub: Truehub_model_id: Shashwat13333/msmarco-distilbert-base-v4push_to_hub_model_id: msmarco-distilbert-base-v4batch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 4max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Trueresume_from_checkpoint: Nonehub_model_id: Shashwat13333/msmarco-distilbert-base-v4hub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: msmarco-distilbert-base-v4push_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|---|---|---|---|---|---|---|---|
| 0.2105 | 1 | 3.5757 | - | - | - | - | - |
| 0.8421 | 4 | - | 0.3563 | 0.3543 | 0.3378 | 0.3681 | 0.3077 |
| 1.2105 | 5 | 4.4031 | - | - | - | - | - |
| 1.8421 | 8 | - | 0.3652 | 0.3547 | 0.3574 | 0.3542 | 0.3579 |
| 2.4211 | 10 | 3.3423 | - | - | - | - | - |
| 2.8421 | 12 | - | 0.3783 | 0.3680 | 0.3558 | 0.3807 | 0.3408 |
| 3.6316 | 15 | 2.3695 | - | - | - | - | - |
| 3.8421 | 16 | - | 0.3697 | 0.3703 | 0.3732 | 0.3495 | 0.3449 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}