Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
25
This is a sentence-transformers model finetuned from microsoft/mpnet-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Chandar/sv-subject-based-matryoshka-mpnet-base")
# Run inference
sentences = [
'channels. Cl- ions enter the cell and hyperpolarizes the membrane, making the neuron less likely to fire\nan action potential.\nOnce neurotransmission has occurred, the neurotransmitter must be removed from the synaptic\ncleft so the postsynaptic membrane can “reset” and be ready to receive another signal. This can be\naccomplished in three ways: the neurotransmitter can diffuse away from the synaptic cleft, it can be\ndegraded by enzymes in the synaptic cleft, or it can be recycled (sometimes called reuptake) by the\npresynaptic neuron. Several drugs act at this step of neurotransmission. For example, some drugs that\nare given to Alzheimer’s patients work by inhibiting acetylcholinesterase, the enzyme that degrades\nacetylcholine. This inhibition of the enzyme essentially increases neurotransmission at synapses that\nrelease acetylcholine. Once released, the acetylcholine stays in the cleft and can continually bind and\nunbind to postsynaptic receptors.\nNeurotransmitter Function and Location\nNeurotransmitter Example Location\nAcetylcholine — CNS and/or\nPNS\nBiogenic amine Dopamine, serotonin, norepinephrine CNS and/or\nPNS\nAmino acid Glycine, glutamate, aspartate, gamma aminobutyric\nacidCNS\nNeuropeptide Substance P, endorphins CNS and/or\nPNS\nTable 35.2\nElectrical Synapse\nWhile electrical synapses are fewer in number than chemical synapses, they are found in all nervous\nsystems and play important and unique roles. The mode of neurotransmission in electrical synapses is\nquite different from that in chemical synapses. In an electrical synapse, the presynaptic and postsynaptic\nmembranes are very close together and are actually physically connected by channel proteins forming\ngap junctions. Gap junctions allow current to pass directly from one cell to the next. In addition to the\nions that carry this current, other molecules, such as ATP, can diffuse through the large gap junction\npores.\nThere are key differences between chemical and electrical synapses. Because chemical synapses\ndepend on the release of neurotransmitter molecules from synaptic vesicles to pass on their signal, there\nis an approximately one millisecond delay between when the axon potential reaches the presynaptic\nterminal and when the neurotransmitter leads to opening of postsynaptic ion channels. Additionally, this\nsignaling is unidirectional. Signaling in electrical synapses, in contrast, is virtually instantaneous (which\nis important for synapses involved in key reflexes), and some electrical synapses are bidirectional.\nElectrical synapses are also more reliable as they are less likely to be blocked, and they are important\nfor synchronizing the electrical activity of a group of neurons. For example, electrical synapses in the\nthalamus are thought to regulate slow-wave sleep, and disruption of these synapses can cause seizures.\nSignal Summation\nSometimes a single EPSP is strong enough to induce an action potential in the postsynaptic neuron,\nbut often multiple presynaptic inputs must create EPSPs around the same time for the postsynaptic\nneuron to be sufficiently depolarized to fire an action potential. This process is calledsummation and\noccurs at the axon hillock, as illustrated inFigure 35.16. Additionally, one neuron often has inputs\nfrom many presynaptic neurons—some excitatory and some inhibitory—so IPSPs can cancel out EPSPs\nand vice versa. It is the net change in postsynaptic membrane voltage that determines whether the\npostsynaptic cell has reached its threshold of excitation needed to fire an action potential. Together,\nsynaptic summation and the threshold for excitation act as a filter so that random “noise” in the system\nis not transmitted as important information.\n1004 CHAPTER 35 | THE NERVOUS SYSTEM\nThis content is available for free at http://cnx.org/content/col11448/1.9',
'WHERESELENIUMPOISONING\nIncertainportionsoftheNorthCentralgreatplains,plantsabsorbenoughsele-niumfromthesoiltoinjureanimalsthatfeeduponthem.Thepoisoningmayresultinaslowdiseaseknownas"blindstaggers"oras"alkalidisease",oritmaybequicklyfatal.Asaresultoftheselenium,thejointsoftheleg-bonesbecomebadlyeroded.Thehoofsdevelopabnormalitiesordropoff.Locomotionisimpaired.Theeffectoftheseleniumper-sists,fortheanimalsdonotusuallyrecoverevenifre-movedfromsucharegionandfedagoodrationOCCURSironisanessentialconstituentofthehemoglobin(seepage205).Coppercompoundsaregenerallypoisonoustomostkindsofprotoplasm;yetforsomespeciescopperisnecessaryinsmallamounts.Copperisanessentialelementinthebluishoxygen-carrierhemocyaninofthekingcrabandthelobster.InsomeoftheWesternstatesthesoilcontainstheelementselenium.Thiselementispresentalsoinplantsgrowinginsuchsoil,althoughitdoesnotappeartoaffecttheminanyway.Butanimalsthatfeeduponsuchplantsareoftenseriouslypoisoned(seeillustrationopposite).Inotherregionsvariationintheamountoffluorineinthesoilmaybeimportanttous.Theelementfluorine,whichisverywidelybutunevenlydistributed,seemstoplayaroleintheassimilationofcalciumandphosphorus,andsoaffectstheformationoftheteeth.Astudyof7000girlsandboysofhigh-schoolageinvariousmiddleandsouthwesternstatesbroughtoutthefactthattherewasmuchmoretoothdecay,orcaries^incommunitieswhosewatersupplieswerefreeoffluorinethanincommunitiesusingwaterwith0.5ormorepartsfluorinepermillionpartswater.Thus,thepopulationofacertainpartofTexas,DeafSmithCounty,wasfoundtohaveanexceptionallylownumberofdecayedteeth;andthisrelativefreedomfromdentalcariesisasso-ciatedwithmorethanusualamountsoffluorineinthelocalwaters.Inotherregionsunusualamountsoffluorineinthesoilandsoilwatersapparentlybringaboutthedevelopmentof"mottledteeth"amongthechildrenlivingthere.Nobodywantsblotchyteeth,butnobodywantscaries102',
"INDEX 275\nSenescence\n1\n, 27-30,46,7078\ntheoiietiof,43f>0\nSenilechangesinneivecells,27-29\nSenilityat,causeofdeath,1011\ninplants,44,71,75\nSepiK.rniia,231,282\nSeitna,252,258\nMeium,influenceontissuecultme,\n70,77\nSe\\organs,107,IOS,111,121-125,\n217219\nSexualiepioduction,3741\nShell,J, 20,27\nSkeletalsyhtem,107,108,112,127,\n128\nskin,107,H)8,no,112,331,132\nSlonakei,,T I?,212,213,218,228,\n2U7\nSloiopolhki,B, 33,2(>7\nSnow,12 C,179-383,225,267\nSofteningofthe-hiain,231,2.12\nSoma,40\nSomaticcella,ranmntalityof,5878\nSpan,174,175\nSpiegelberg,W, 87\nSpmtuahsm,18-20\nSpleen,61\nSponges,62\nStatine,174,175\nStomach,M,217-210,207\nStenuntomum,35,36\nSievenflon,THC.,206-208,2(>7\nStillbirths,205\n$ ongyloocnt')otuspitrpmnIus,55,\n56\nHummaiyofreaultH,223-227\nf-hxi'vivorfiluplinesofDrosophilo,\n188,192,195\nSyphihH,123\nTable,life,70-82\nTempeiatuic,208-217\nTethehn,70,220222\nTbeoiiesofdeath,4350\nTheoiyofpopulationgiowtli,249\nThyioulgland,01\nTissuecnltmet Ditto,5878\nTianaplantationoftumois,04,(51\nTubeicuIomH,101,204,208,230,2!1,\n238\nTuiuoiti.uihpliinlation,61, 65\nTyphoidfcvei,230,2,11,2.J5,2,'iO\nUnitedStates,giowtliol,2502,12,\n254-257\nLi'iostylayiandis,72\nVanBuien,GII, 11J,2UO\nVariation,genetic,190\nVeneiwildiseases,123,124\nVeilmlHt,PF,249,2(57\nVerwoin,M,,44,207\nVienna,245,246\nVoumoll,217\nWallei,AD,216,207\nWalwoith,BH,152,267\nWar,243\nWedekmd,33,267\nWeiimann,A, 26,43,65,207\nWlialc,longevityof,22\nWilaon,HV,62,267\nWittalom,99\nWoodruff,LL,30,33,72,73,267,\n268\nWoodn,FA, 38,3,208\nYellowfever,240,21-2\nYoung,TTO,,23-25,268",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 768
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.7608 |
| cosine_accuracy_threshold | 0.956 |
| cosine_f1 | 0.5962 |
| cosine_f1_threshold | 0.9436 |
| cosine_precision | 0.5779 |
| cosine_recall | 0.6158 |
| cosine_ap | 0.6558 |
| cosine_mcc | 0.3892 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 512
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.7512 |
| cosine_accuracy_threshold | 0.9695 |
| cosine_f1 | 0.5758 |
| cosine_f1_threshold | 0.96 |
| cosine_precision | 0.579 |
| cosine_recall | 0.5726 |
| cosine_ap | 0.6315 |
| cosine_mcc | 0.3695 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 256
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.7688 |
| cosine_accuracy_threshold | 0.8354 |
| cosine_f1 | 0.6109 |
| cosine_f1_threshold | 0.7678 |
| cosine_precision | 0.5523 |
| cosine_recall | 0.6833 |
| cosine_ap | 0.6883 |
| cosine_mcc | 0.3938 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 128
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.755 |
| cosine_accuracy_threshold | 0.8873 |
| cosine_f1 | 0.5923 |
| cosine_f1_threshold | 0.8024 |
| cosine_precision | 0.5241 |
| cosine_recall | 0.6809 |
| cosine_ap | 0.6656 |
| cosine_mcc | 0.3587 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 64
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.7398 |
| cosine_accuracy_threshold | 0.9206 |
| cosine_f1 | 0.5858 |
| cosine_f1_threshold | 0.8388 |
| cosine_precision | 0.5022 |
| cosine_recall | 0.7027 |
| cosine_ap | 0.6336 |
| cosine_mcc | 0.3404 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 32
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.7362 |
| cosine_accuracy_threshold | 0.9587 |
| cosine_f1 | 0.5954 |
| cosine_f1_threshold | 0.8968 |
| cosine_precision | 0.5004 |
| cosine_recall | 0.735 |
| cosine_ap | 0.6264 |
| cosine_mcc | 0.3528 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 16
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.7298 |
| cosine_accuracy_threshold | 0.9745 |
| cosine_f1 | 0.5948 |
| cosine_f1_threshold | 0.929 |
| cosine_precision | 0.5095 |
| cosine_recall | 0.7143 |
| cosine_ap | 0.6023 |
| cosine_mcc | 0.3555 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 8
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.708 |
| cosine_accuracy_threshold | 0.9946 |
| cosine_f1 | 0.518 |
| cosine_f1_threshold | 0.9767 |
| cosine_precision | 0.4194 |
| cosine_recall | 0.6772 |
| cosine_ap | 0.5203 |
| cosine_mcc | 0.2049 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 4
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.6796 |
| cosine_accuracy_threshold | 0.9989 |
| cosine_f1 | 0.4953 |
| cosine_f1_threshold | -0.7454 |
| cosine_precision | 0.3291 |
| cosine_recall | 1.0 |
| cosine_ap | 0.4414 |
| cosine_mcc | 0.014 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 2
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.6728 |
| cosine_accuracy_threshold | 1.0 |
| cosine_f1 | 0.4953 |
| cosine_f1_threshold | -0.7795 |
| cosine_precision | 0.3291 |
| cosine_recall | 1.0 |
| cosine_ap | 0.3823 |
| cosine_mcc | 0.014 |
BinaryClassificationEvaluator with these parameters:{
"truncate_dim": 1
}
| Metric | Value |
|---|---|
| cosine_accuracy | 0.6708 |
| cosine_accuracy_threshold | 1.0 |
| cosine_f1 | 0.4953 |
| cosine_f1_threshold | -1.0 |
| cosine_precision | 0.3293 |
| cosine_recall | 0.9994 |
| cosine_ap | 0.3352 |
| cosine_mcc | 0.0 |
sentence1, sentence2, and label| sentence1 | sentence2 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence1 | sentence2 | label |
|---|---|---|
channels. Cl- ions enter the cell and hyperpolarizes the membrane, making the neuron less likely to fire |
Figure 25.6 This table shows the major divisions of green plants. |
Green Algae: Precursors of Land Plants By the end of this section, you will be able to: • Describe the traits shared by green algae and land plants • Explain the reasons why Charales are considered the closest relative to land plants • Understand that current phylogenetic relationships are reshaped by comparative analysis of DNA sequences Streptophytes Until recently, all photosynthetic eukaryotes were considered members of the kingdom Plantae. The brown, red, and gold algae, however, have been reassigned to the Protista kingdom. This is because apart from their ability to capture light energy and fix CO2, they lack many structural and biochemical traits ... |
channels. Cl- ions enter the cell and hyperpolarizes the membrane, making the neuron less likely to fire |
NATURALDEATH,PUBLICHEALTH229 |
1 |
channels. Cl- ions enter the cell and hyperpolarizes the membrane, making the neuron less likely to fire |
through a mass which is now formed out of sugar and is now dissolved again |
1 |
MatryoshkaLoss with these parameters:{
"loss": "CoSENTLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64,
32,
16,
8,
4,
2,
1
],
"matryoshka_weights": [
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
per_device_train_batch_size: 16per_device_eval_batch_size: 32learning_rate: 2e-05weight_decay: 0.01max_steps: 2000overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3.0max_steps: 2000lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | cosine_ap |
|---|---|---|---|
| -1 | -1 | - | 0.3352 |
| 0.0071 | 500 | 13.3939 | - |
| 0.0142 | 1000 | 3.2648 | - |
| 0.0213 | 1500 | 2.8893 | - |
| 0.0285 | 2000 | 2.9935 | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}
Base model
microsoft/mpnet-base