Matryoshka Representation Learning
Paper
• 2205.13147 • Published
• 25
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("yinong333/finetuned_MiniLM")
# Run inference
sentences = [
'What legal action did the Federal Trade Commission take against Kochava regarding data tracking?',
'ENDNOTES\n75. See., e.g., Sam Sabin. Digital surveillance in a post-Roe world. Politico. May 5, 2022. https://\nwww.politico.com/newsletters/digital-future-daily/2022/05/05/digital-surveillance-in-a-post-roe\xad\nworld-00030459; Federal Trade Commission. FTC Sues Kochava for Selling Data that Tracks People at\nReproductive Health Clinics, Places of Worship, and Other Sensitive Locations. Aug. 29, 2022. https://\nwww.ftc.gov/news-events/news/press-releases/2022/08/ftc-sues-kochava-selling-data-tracks-people\xad\nreproductive-health-clinics-places-worship-other\n76. Todd Feathers. This Private Equity Firm Is Amassing Companies That Collect Data on America’s\nChildren. The Markup. Jan. 11, 2022.\nhttps://themarkup.org/machine-learning/2022/01/11/this-private-equity-firm-is-amassing-companies\xad\nthat-collect-data-on-americas-children\n77. Reed Albergotti. Every employee who leaves Apple becomes an ‘associate’: In job databases used by',
'DATA PRIVACY \nEXTRA PROTECTIONS FOR DATA RELATED TO SENSITIVE\nDOMAINS\n•\nContinuous positive airway pressure machines gather data for medical purposes, such as diagnosing sleep\napnea, and send usage data to a patient’s insurance company, which may subsequently deny coverage for the\ndevice based on usage data. Patients were not aware that the data would be used in this way or monitored\nby anyone other than their doctor.70 \n•\nA department store company used predictive analytics applied to collected consumer data to determine that a\nteenage girl was pregnant, and sent maternity clothing ads and other baby-related advertisements to her\nhouse, revealing to her father that she was pregnant.71\n•\nSchool audio surveillance systems monitor student conversations to detect potential "stress indicators" as\na warning of potential violence.72 Online proctoring systems claim to detect if a student is cheating on an',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.7214 |
| cosine_accuracy@3 | 0.8786 |
| cosine_accuracy@5 | 0.95 |
| cosine_accuracy@10 | 0.9714 |
| cosine_precision@1 | 0.7214 |
| cosine_precision@3 | 0.2929 |
| cosine_precision@5 | 0.19 |
| cosine_precision@10 | 0.0971 |
| cosine_recall@1 | 0.7214 |
| cosine_recall@3 | 0.8786 |
| cosine_recall@5 | 0.95 |
| cosine_recall@10 | 0.9714 |
| cosine_ndcg@10 | 0.8515 |
| cosine_mrr@10 | 0.8122 |
| cosine_map@100 | 0.8142 |
| dot_accuracy@1 | 0.7214 |
| dot_accuracy@3 | 0.8786 |
| dot_accuracy@5 | 0.95 |
| dot_accuracy@10 | 0.9714 |
| dot_precision@1 | 0.7214 |
| dot_precision@3 | 0.2929 |
| dot_precision@5 | 0.19 |
| dot_precision@10 | 0.0971 |
| dot_recall@1 | 0.7214 |
| dot_recall@3 | 0.8786 |
| dot_recall@5 | 0.95 |
| dot_recall@10 | 0.9714 |
| dot_ndcg@10 | 0.8515 |
| dot_mrr@10 | 0.8122 |
| dot_map@100 | 0.8142 |
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
What is the purpose of the AI Bill of Rights mentioned in the context? |
BLUEPRINT FOR AN |
When was the Blueprint for an AI Bill of Rights published? |
BLUEPRINT FOR AN |
What was the purpose of the Blueprint for an AI Bill of Rights published by the White House Office of Science and Technology Policy? |
About this Document |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
384,
128
],
"matryoshka_weights": [
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsper_device_train_batch_size: 30per_device_eval_batch_size: 30num_train_epochs: 5multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 30per_device_eval_batch_size: 30per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | cosine_map@100 |
|---|---|---|
| 1.0 | 26 | 0.7610 |
| 1.9231 | 50 | 0.8047 |
| 2.0 | 52 | 0.8051 |
| 3.0 | 78 | 0.8116 |
| 3.8462 | 100 | 0.8142 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-MiniLM-L6-v2