Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
25
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jet-taekyo/mpnet_finetuned_recursive")
# Run inference
sentences = [
'What impact do automated systems have on underserved communities?',
"automated systems make on underserved communities and to institute proactive protections that support these \ncommunities. \n•\nAn automated system using nontraditional factors such as educational attainment and employment history as\npart of its loan underwriting and pricing model was found to be much more likely to charge an applicant who\nattended a Historically Black College or University (HBCU) higher loan prices for refinancing a student loan\nthan an applicant who did not attend an HBCU. This was found to be true even when controlling for\nother credit-related factors.32\n•\nA hiring tool that learned the features of a company's employees (predominantly men) rejected women appli\xad\ncants for spurious and discriminatory reasons; resumes with the word “women’s,” such as “women’s\nchess club captain,” were penalized in the candidate ranking.33\n•\nA predictive model marketed as being able to predict whether students are likely to drop out of school was",
'on a principle of local control, such that those individuals closest to the data subject have more access while \nthose who are less proximate do not (e.g., a teacher has access to their students’ daily progress data while a \nsuperintendent does not). \nReporting. In addition to the reporting on data privacy (as listed above for non-sensitive data), entities devel-\noping technologies related to a sensitive domain and those collecting, using, storing, or sharing sensitive data \nshould, whenever appropriate, regularly provide public reports describing: any data security lapses or breaches \nthat resulted in sensitive data leaks; the number, type, and outcomes of ethical pre-reviews undertaken; a \ndescription of any data sold, shared, or made public, and how that data was assessed to determine it did not pres-\nent a sensitive data risk; and ongoing risk identification and management procedures, and any mitigation added',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.8882 |
| cosine_accuracy@3 | 0.9934 |
| cosine_accuracy@5 | 0.9934 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.8882 |
| cosine_precision@3 | 0.3311 |
| cosine_precision@5 | 0.1987 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.8882 |
| cosine_recall@3 | 0.9934 |
| cosine_recall@5 | 0.9934 |
| cosine_recall@10 | 1.0 |
| cosine_ndcg@10 | 0.955 |
| cosine_mrr@10 | 0.9395 |
| cosine_map@100 | 0.9395 |
| dot_accuracy@1 | 0.8882 |
| dot_accuracy@3 | 0.9934 |
| dot_accuracy@5 | 0.9934 |
| dot_accuracy@10 | 1.0 |
| dot_precision@1 | 0.8882 |
| dot_precision@3 | 0.3311 |
| dot_precision@5 | 0.1987 |
| dot_precision@10 | 0.1 |
| dot_recall@1 | 0.8882 |
| dot_recall@3 | 0.9934 |
| dot_recall@5 | 0.9934 |
| dot_recall@10 | 1.0 |
| dot_ndcg@10 | 0.955 |
| dot_mrr@10 | 0.9395 |
| dot_map@100 | 0.9395 |
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
What information should designers and developers provide about automated systems to ensure transparency? |
You should know that an automated system is being used, |
Why is it important for individuals impacted by automated systems to be notified of significant changes in functionality? |
You should know that an automated system is being used, |
What specific technical questions does the questionnaire for evaluating software workers cover? |
questionnaire that businesses can use proactively when procuring software to evaluate workers. It covers |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsper_device_train_batch_size: 20per_device_eval_batch_size: 20num_train_epochs: 5multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 20per_device_eval_batch_size: 20per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | cosine_map@100 |
|---|---|---|
| 1.0 | 36 | 0.9395 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-mpnet-base-v2