Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
25
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("mahsaBa76/bge-base-custom-matryoshka")
# Run inference
sentences = [
'What is the difference between absolute settlement and relative settlement for transactions in a ledger?',
'paper-title: Ledger Combiners for Fast Settlement\n\nSince the above requirements are formulated independently for each $t$, it is well-defined to treat $\\mathrm{C}[\\cdot]$ as operating on ledgers rather than dynamic ledgers; we sometimes overload the notation in this sense.\n\nLooking ahead, our amplification combiner will consider $\\mathrm{t}_{\\mathrm{C}}\\left(\\mathbf{L}_{1}^{(t)}, \\ldots, \\mathbf{L}_{m}^{(t)}\\right)=\\bigcup_{i} \\mathbf{L}_{i}^{(t)}$ along with two related definitions of $\\mathrm{a}_{\\mathrm{C}}$ :\n\n$$\n\\mathrm{a}_{\\mathrm{C}}\\left(A_{1}^{(t)}, \\ldots, A_{m}^{(t)}\\right)=\\bigcup_{i} A_{i}^{(t)} \\quad \\text { and } \\quad \\mathrm{a}_{\\mathrm{C}}\\left(A_{1}^{(t)}, \\ldots, A_{m}^{(t)}\\right)=\\bigcap_{i} A_{i}^{(t)}\n$$\n\nsee Section 3. The robust combiner will adopt a more sophisticated notion of $t_{c}$; see Section 5 . In each of these cases, the important structural properties of the construction are captured by the rank function $r_{C}$.\n\n\\subsection*{2.3 Transaction Validity and Settlement}\nIn the discussion below, we assume a general notion of transaction validity that can be decided inductively: given a ledger $\\mathbf{L}$, the validity of a transaction $t x \\in \\mathbf{L}$ is determined by the transactions in the state $\\mathbf{L}\\lceil\\operatorname{tx}\\rceil$ of $\\mathbf{L}$ up to tx and their ordering. Intuitively, only valid transactions are then accounted for when interpreting the state of the ledger on the application level. The canonical example of such a validity predicate in the case of so-called UTXO transactions is formalized for completeness in Appendix B. Note that protocols such as Bitcoin allow only valid transactions to enter the ledger; as the Bitcoin ledger is represented by a simple chain it is possible to evaluate the validity predicate upon block creation for each included transaction. This may not be the case for more general ledgers, such as the result of applying one of our combiners or various DAG-based constructions.\n\nWhile we focus our analysis on persistence and liveness as given in Definition 3, our broader goal is to study settlement. Intuitively, settlement is the delay necessary to ensure that a transaction included in some $A^{(t)}$ enters the dynamic ledger and, furthermore, that its validity stabilizes for all future times.\n\nDefinition 5 (Absolute settlement). For a dynamic ledger $\\mathbf{D} \\stackrel{\\text { def }}{=} \\mathbf{L}^{(0)}, \\mathbf{L}^{(1)}, \\ldots$ we say that a transaction $t x \\in$ $A^{(\\tau)} \\cap \\mathbf{L}^{(t)}($ for $\\tau \\leq t)$ is (absolutely) settled at time $t$ iffor all $\\ell \\geq t$ we have: (i) $\\mathbf{L}^{(t)}\\lceil\\mathrm{tx}\\rceil \\subseteq \\mathbf{L}^{(\\ell)}$, (ii) the linear orders $<_{\\mathbf{L}^{(t)}}$ and $<_{\\mathbf{L}^{(t)}}$ agree on $\\mathbf{L}^{(t)}\\lceil\\mathrm{tx}\\rceil$, and (iii) for any $\\mathrm{tx}^{\\prime} \\in \\mathbf{L}^{(e)}$ such that $\\mathrm{tx}^{\\prime}{<_{\\mathbf{L}}(t)} \\mathrm{tx}$ we have $\\mathrm{tx}^{\\prime} \\in \\mathbf{L}^{(t)}\\lceil\\mathrm{tx}\\rceil$.\n\nNote that for any absolutely settled transaction, its validity is determined and it is guaranteed to remain unchanged in the future.\n\nIt will be useful to also consider a weaker notion of relative settlement of a transaction: Intuitively, tx is relatively settled at time $t$ if we have the guarantee that no (conflicting) transaction $\\mathrm{tx}^{\\prime}$ that is not part of the ledger at time $t$ can possibly eventually precede $t x$ in the ledger ordering.',
'paper-title: Casper the Friendly Finality Gadget\n\n\\documentclass[10pt]{article}\n\\usepackage[utf8]{inputenc}\n\\usepackage[T1]{fontenc}\n\\usepackage{amsmath}\n\\usepackage{amsfonts}\n\\usepackage{amssymb}\n\\usepackage[version=4]{mhchem}\n\\usepackage{stmaryrd}\n\\usepackage{graphicx}\n\\usepackage[export]{adjustbox}\n\\graphicspath{ {./images/} }\n\\usepackage{hyperref}\n\\hypersetup{colorlinks=true, linkcolor=blue, filecolor=magenta, urlcolor=cyan,}\n\\urlstyle{same}\n\n\\title{Casper the Friendly Finality Gadget }\n\n\\author{Vitalik Buterin and Virgil Griffith\\\\\nEthereum Foundation}\n\\date{}\n\n\n%New command to display footnote whose markers will always be hidden\n\\let\\svthefootnote\\thefootnote\n\\newcommand\\blfootnotetext[1]{%\n \\let\\thefootnote\\relax\\footnote{#1}%\n \\addtocounter{footnote}{-1}%\n \\let\\thefootnote\\svthefootnote%\n}\n\n%Overriding the \\footnotetext command to hide the marker if its value is `0`\n\\let\\svfootnotetext\\footnotetext\n\\renewcommand\\footnotetext[2][?]{%\n \\if\\relax#1\\relax%\n \\ifnum\\value{footnote}=0\\blfootnotetext{#2}\\else\\svfootnotetext{#2}\\fi%\n \\else%\n \\if?#1\\ifnum\\value{footnote}=0\\blfootnotetext{#2}\\else\\svfootnotetext{#2}\\fi%\n \\else\\svfootnotetext[#1]{#2}\\fi%\n \\fi\n}\n\n\\begin{document}\n\\maketitle\n\n\n\\begin{abstract}\nWe introduce Casper, a proof of stake-based finality system which overlays an existing proof of work blockchain. Casper is a partial consensus mechanism combining proof of stake algorithm research and Byzantine fault tolerant consensus theory. We introduce our system, prove some desirable features, and show defenses against long range revisions and catastrophic crashes. The Casper overlay provides almost any proof of work chain with additional protections against block reversions.\n\\end{abstract}\n\n\\section*{1. Introduction}\nOver the past few years there has been considerable research into "proof of stake" (PoS) based blockchain consensus algorithms. In a PoS system, a blockchain appends and agrees on new blocks through a process where anyone who holds coins inside of the system can participate, and the influence an agent has is proportional to the number of coins (or "stake") it holds. This is a vastly more efficient alternative to proof of work (PoW) "mining" and enables blockchains to operate without mining\'s high hardware and electricity costs.\\\\[0pt]\nThere are two major schools of thought in PoS design. The first, chain-based proof of stake[1, 2], mimics proof of work mechanics and features a chain of blocks and simulates mining by pseudorandomly assigning the right to create new blocks to stakeholders. This includes Peercoin[3], Blackcoin[4], and Iddo Bentov\'s work[5].\\\\[0pt]\nThe other school, Byzantine fault tolerant (BFT) based proof of stake, is based on a thirty-year-old body of research into BFT consensus algorithms such as PBFT[6]. BFT algorithms typically have proven mathematical properties; for example, one can usually mathematically prove that as long as $>\\frac{2}{3}$ of protocol participants are following the protocol honestly, then, regardless of network latency, the algorithm cannot finalize conflicting blocks. Repurposing BFT algorithms for proof of stake was first introduced by Tendermint[7], and has modern inspirations such as [8]. Casper follows this BFT tradition, though with some modifications.\n\n\\subsection*{1.1. Our Work}\nCasper the Friendly Finality Gadget is an overlay atop a proposal mechanism-a mechanism which proposes blocks ${ }^{1}$. Casper is responsible for finalizing these blocks, essentially selecting a unique chain which represents the canonical transactions of the ledger. Casper provides safety, but liveness depends on the chosen proposal mechanism. That is, if attackers wholly control the proposal mechanism, Casper protects against finalizing two conflicting checkpoints, but the attackers could prevent Casper from finalizing any future checkpoints.\\\\\nCasper introduces several new features that BFT algorithms do not necessarily support:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
dim_768, dim_512, dim_256, dim_128 and dim_64InformationRetrievalEvaluator| Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
|---|---|---|---|---|---|
| cosine_accuracy@1 | 0.5 | 0.5714 | 0.5714 | 0.5 | 0.4286 |
| cosine_accuracy@3 | 0.7857 | 0.7857 | 0.7857 | 0.75 | 0.6786 |
| cosine_accuracy@5 | 0.8571 | 0.8214 | 0.8214 | 0.8214 | 0.75 |
| cosine_accuracy@10 | 0.8571 | 0.8571 | 0.8571 | 0.8571 | 0.8214 |
| cosine_precision@1 | 0.5 | 0.5714 | 0.5714 | 0.5 | 0.4286 |
| cosine_precision@3 | 0.2619 | 0.2619 | 0.2619 | 0.25 | 0.2262 |
| cosine_precision@5 | 0.1714 | 0.1643 | 0.1643 | 0.1643 | 0.15 |
| cosine_precision@10 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0821 |
| cosine_recall@1 | 0.5 | 0.5714 | 0.5714 | 0.5 | 0.4286 |
| cosine_recall@3 | 0.7857 | 0.7857 | 0.7857 | 0.75 | 0.6786 |
| cosine_recall@5 | 0.8571 | 0.8214 | 0.8214 | 0.8214 | 0.75 |
| cosine_recall@10 | 0.8571 | 0.8571 | 0.8571 | 0.8571 | 0.8214 |
| cosine_ndcg@10 | 0.7032 | 0.7277 | 0.7285 | 0.6935 | 0.6316 |
| cosine_mrr@10 | 0.6512 | 0.6849 | 0.6857 | 0.6396 | 0.5696 |
| cosine_map@100 | 0.6553 | 0.6886 | 0.6893 | 0.6425 | 0.5757 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
How does ByzCoin ensure that microblock chains remain consistent even in the presence of keyblock conflicts? |
paper-title: Enhancing Bitcoin Security and Performance with Strong Consistency via Collective Signing |
What are the primary ways in which Bitcoin users can be deanonymized, and why is network-layer deanonymization particularly concerning? |
paper-title: Bitcoin and Cryptocurrency Technologies |
What is the main purpose of the ledger indistinguishability and transaction non-malleability properties in the Zerocash protocol? |
paper-title: Zerocash: Decentralized Anonymous Payments from Bitcoin |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: epochper_device_train_batch_size: 32gradient_accumulation_steps: 16learning_rate: 2e-05num_train_epochs: 4overwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 16eval_accumulation_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 4max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|---|---|---|---|---|---|---|
| 1.0 | 1 | 0.6975 | 0.6930 | 0.6760 | 0.6960 | 0.6098 |
| 2.0 | 2 | 0.7258 | 0.7082 | 0.7062 | 0.6935 | 0.6231 |
| 3.0 | 3 | 0.7079 | 0.7270 | 0.7067 | 0.6935 | 0.6184 |
| 4.0 | 4 | 0.7032 | 0.7277 | 0.7285 | 0.6935 | 0.6316 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-base-en-v1.5