Matryoshka Representation Learning
Paper • 2205.13147 • Published • 25
This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 1536, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("seongil-dn/bge-m3-mrl-264")
# Run inference
sentences = [
'어떤 사람의 연금 수령액을 증가시키면 연금재정이 어려워져?',
'한편, 제19대국회에서는 소득대체율을 높이지 않는 대신, 연금급여산식의 기준이 되는 기준소득월액의 상ㆍ하한액을 인상함으로써 가입자 전체의 소득평균을 높여 보험급여를 인상하는 방안도 논의되었다. 이 방안은 소득재분배 부문에 해당하는 국민연금의 A값을 상향하여 소득재분배 기능을 강화하는 장점을 가진 반면, 보험료가 인상되는 저소득층 가입자와 영세사업장, 그리고 고소득 사업장가입자와 사업장의 연금보험료 부담이 증가하기 때문에, 경제 및 산업계의 반발로 이어질 가능성도 있다. 또한 고소득 가입자들의 연급수급액의 증가는 시간의 경과에 따라 연금재정에 추가적인 부담을 주게 된다는 것이다.',
'다. 재정<br>□ 저출산·고령화의 진전으로 세원이 되는 생산가능인구의 비중은 줄어들고, 연금급여 및 의료비 지출 등은 늘어남에 따라 재정수지 부담은 가중될 전망<br>― 출산율이 하락하면 전체 인구 중 생산가능인구의 비율이 감소하고 따라서 세수 감소로 이어질 가능성<br>― 반면, 고령화로 인해 연금수급자가 증가하면 연금 및 의료비 등의 재정지출 확대로 이어질 가능성<br>― 국민연금 가입자 중 노령연금 수급율은 인구감소 및 은퇴자 증가에 따라 2010년 13.3%, 2030년 41.9%, 2050년 88.5%로 급증할 전망<br>□ IMF에 따르면 GDP 대비 재정수지는 생산가능인구비율 1% 증가 시 0.06%p 개선되는 반면, 노인인구 1% 증가시 0.46%p 악화<br>― 또한, OECD는 고령화로 인해 노인관련 재정지출이 급증해 주요국의 2050년 재정수지가 적자를 기록할 것으로 전망',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
per_device_train_batch_size: 32gradient_accumulation_steps: 32learning_rate: 3e-05weight_decay: 0.01warmup_ratio: 0.05fp16: Truegradient_checkpointing: Truegradient_checkpointing_kwargs: {'use_reentrant': False}batch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 32eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 3e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.05warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Truedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Truegradient_checkpointing_kwargs: {'use_reentrant': False}include_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss |
|---|---|---|
| 0.0091 | 1 | 15.81 |
| 0.0181 | 2 | 15.9499 |
| 0.0272 | 3 | 15.3393 |
| 0.0363 | 4 | 15.4563 |
| 0.0453 | 5 | 15.5322 |
| 0.0544 | 6 | 16.0348 |
| 0.0635 | 7 | 15.3445 |
| 0.0725 | 8 | 15.7129 |
| 0.0816 | 9 | 14.4393 |
| 0.0907 | 10 | 13.4846 |
| 0.0997 | 11 | 12.5233 |
| 0.1088 | 12 | 12.1728 |
| 0.1178 | 13 | 11.9232 |
| 0.1269 | 14 | 11.5308 |
| 0.1360 | 15 | 10.7525 |
| 0.1450 | 16 | 10.393 |
| 0.1541 | 17 | 9.7346 |
| 0.1632 | 18 | 9.4875 |
| 0.1722 | 19 | 9.2608 |
| 0.1813 | 20 | 8.7966 |
| 0.1904 | 21 | 8.5579 |
| 0.1994 | 22 | 8.4993 |
| 0.2085 | 23 | 8.1505 |
| 0.2176 | 24 | 8.5027 |
| 0.2266 | 25 | 7.9795 |
| 0.2357 | 26 | 7.5782 |
| 0.2448 | 27 | 7.68 |
| 0.2538 | 28 | 7.539 |
| 0.2629 | 29 | 7.5871 |
| 0.2720 | 30 | 7.2676 |
| 0.2810 | 31 | 6.9613 |
| 0.2901 | 32 | 6.89 |
| 0.2992 | 33 | 6.7585 |
| 0.3082 | 34 | 6.7286 |
| 0.3173 | 35 | 6.754 |
| 0.3263 | 36 | 6.7466 |
| 0.3354 | 37 | 6.6096 |
| 0.3445 | 38 | 6.5864 |
| 0.3535 | 39 | 6.5235 |
| 0.3626 | 40 | 6.5429 |
| 0.3717 | 41 | 6.4971 |
| 0.3807 | 42 | 6.4463 |
| 0.3898 | 43 | 6.332 |
| 0.3989 | 44 | 6.1275 |
| 0.4079 | 45 | 6.2551 |
| 0.4170 | 46 | 6.1372 |
| 0.4261 | 47 | 6.1075 |
| 0.4351 | 48 | 6.1408 |
| 0.4442 | 49 | 6.062 |
| 0.4533 | 50 | 5.9831 |
| 0.4623 | 51 | 5.9956 |
| 0.4714 | 52 | 5.8332 |
| 0.4805 | 53 | 5.7447 |
| 0.4895 | 54 | 5.9531 |
| 0.4986 | 55 | 5.911 |
| 0.5076 | 56 | 5.8576 |
| 0.5167 | 57 | 5.8116 |
| 0.5258 | 58 | 5.6564 |
| 0.5348 | 59 | 5.7289 |
| 0.5439 | 60 | 5.7514 |
| 0.5530 | 61 | 5.5991 |
| 0.5620 | 62 | 5.553 |
| 0.5711 | 63 | 5.4728 |
| 0.5802 | 64 | 5.6212 |
| 0.5892 | 65 | 5.6554 |
| 0.5983 | 66 | 5.4389 |
| 0.6074 | 67 | 5.3669 |
| 0.6164 | 68 | 5.5667 |
| 0.6255 | 69 | 5.4106 |
| 0.6346 | 70 | 5.3122 |
| 0.6436 | 71 | 5.4145 |
| 0.6527 | 72 | 5.3794 |
| 0.6618 | 73 | 5.269 |
| 0.6708 | 74 | 5.3583 |
| 0.6799 | 75 | 5.311 |
| 0.6890 | 76 | 5.2061 |
| 0.6980 | 77 | 5.133 |
| 0.7071 | 78 | 5.4036 |
| 0.7161 | 79 | 5.2761 |
| 0.7252 | 80 | 5.0696 |
| 0.7343 | 81 | 5.3648 |
| 0.7433 | 82 | 5.0591 |
| 0.7524 | 83 | 5.074 |
| 0.7615 | 84 | 5.1789 |
| 0.7705 | 85 | 5.0147 |
| 0.7796 | 86 | 5.251 |
| 0.7887 | 87 | 5.1282 |
| 0.7977 | 88 | 5.1111 |
| 0.8068 | 89 | 5.2096 |
| 0.8159 | 90 | 5.0734 |
| 0.8249 | 91 | 4.9202 |
| 0.8340 | 92 | 5.0058 |
| 0.8431 | 93 | 5.0928 |
| 0.8521 | 94 | 4.9845 |
| 0.8612 | 95 | 5.0683 |
| 0.8703 | 96 | 5.0267 |
| 0.8793 | 97 | 5.0821 |
| 0.8884 | 98 | 4.8806 |
| 0.8975 | 99 | 5.0043 |
| 0.9065 | 100 | 4.888 |
| 0.9156 | 101 | 5.0629 |
| 0.9246 | 102 | 5.0454 |
| 0.9337 | 103 | 4.9619 |
| 0.9428 | 104 | 4.9217 |
| 0.9518 | 105 | 4.7401 |
| 0.9609 | 106 | 4.8068 |
| 0.9700 | 107 | 4.8151 |
| 0.9790 | 108 | 4.8689 |
| 0.9881 | 109 | 5.0193 |
| 0.9972 | 110 | 4.706 |
| 1.0062 | 111 | 4.8057 |
| 1.0153 | 112 | 4.7279 |
| 1.0244 | 113 | 4.7721 |
| 1.0334 | 114 | 4.7767 |
| 1.0425 | 115 | 4.669 |
| 1.0516 | 116 | 4.8533 |
| 1.0606 | 117 | 4.8634 |
| 1.0697 | 118 | 4.9135 |
| 1.0788 | 119 | 4.7629 |
| 1.0878 | 120 | 4.7479 |
| 1.0969 | 121 | 4.743 |
| 1.1059 | 122 | 4.5606 |
| 1.1150 | 123 | 4.6933 |
| 1.1241 | 124 | 4.6659 |
| 1.1331 | 125 | 4.7131 |
| 1.1422 | 126 | 4.7059 |
| 1.1513 | 127 | 4.5701 |
| 1.1603 | 128 | 4.4892 |
| 1.1694 | 129 | 4.6497 |
| 1.1785 | 130 | 4.4814 |
| 1.1875 | 131 | 4.2669 |
| 1.1966 | 132 | 4.4983 |
| 1.2057 | 133 | 4.431 |
| 1.2147 | 134 | 4.414 |
| 1.2238 | 135 | 4.3975 |
| 1.2329 | 136 | 4.3101 |
| 1.2419 | 137 | 4.3422 |
| 1.2510 | 138 | 4.476 |
| 1.2601 | 139 | 4.6629 |
| 1.2691 | 140 | 4.3559 |
| 1.2782 | 141 | 4.2049 |
| 1.2873 | 142 | 4.303 |
| 1.2963 | 143 | 4.3053 |
| 1.3054 | 144 | 4.2366 |
| 1.3144 | 145 | 4.5165 |
| 1.3235 | 146 | 4.2634 |
| 1.3326 | 147 | 4.4295 |
| 1.3416 | 148 | 4.2595 |
| 1.3507 | 149 | 4.3753 |
| 1.3598 | 150 | 4.3454 |
| 1.3688 | 151 | 4.2618 |
| 1.3779 | 152 | 4.4016 |
| 1.3870 | 153 | 4.2672 |
| 1.3960 | 154 | 4.1824 |
| 1.4051 | 155 | 4.3268 |
| 1.4142 | 156 | 4.091 |
| 1.4232 | 157 | 4.3111 |
| 1.4323 | 158 | 4.2397 |
| 1.4414 | 159 | 4.1694 |
| 1.4504 | 160 | 4.2119 |
| 1.4595 | 161 | 4.1292 |
| 1.4686 | 162 | 4.1154 |
| 1.4776 | 163 | 4.1638 |
| 1.4867 | 164 | 4.3548 |
| 1.4958 | 165 | 4.2137 |
| 1.5048 | 166 | 4.1888 |
| 1.5139 | 167 | 4.2609 |
| 1.5229 | 168 | 4.2644 |
| 1.5320 | 169 | 4.2183 |
| 1.5411 | 170 | 4.2414 |
| 1.5501 | 171 | 4.242 |
| 1.5592 | 172 | 4.0547 |
| 1.5683 | 173 | 4.1509 |
| 1.5773 | 174 | 4.247 |
| 1.5864 | 175 | 4.3103 |
| 1.5955 | 176 | 4.0845 |
| 1.6045 | 177 | 4.0918 |
| 1.6136 | 178 | 4.1582 |
| 1.6227 | 179 | 4.2982 |
| 1.6317 | 180 | 4.0515 |
| 1.6408 | 181 | 4.0738 |
| 1.6499 | 182 | 4.2416 |
| 1.6589 | 183 | 4.1212 |
| 1.6680 | 184 | 4.174 |
| 1.6771 | 185 | 4.1369 |
| 1.6861 | 186 | 3.9908 |
| 1.6952 | 187 | 4.1155 |
| 1.7042 | 188 | 3.9893 |
| 1.7133 | 189 | 4.2362 |
| 1.7224 | 190 | 4.074 |
| 1.7314 | 191 | 4.0604 |
| 1.7405 | 192 | 4.0065 |
| 1.7496 | 193 | 4.0041 |
| 1.7586 | 194 | 4.0428 |
| 1.7677 | 195 | 4.0094 |
| 1.7768 | 196 | 3.962 |
| 1.7858 | 197 | 4.1932 |
| 1.7949 | 198 | 4.133 |
| 1.8040 | 199 | 4.1344 |
| 1.8130 | 200 | 4.1004 |
| 1.8221 | 201 | 4.0633 |
| 1.8312 | 202 | 4.0545 |
| 1.8402 | 203 | 4.0434 |
| 1.8493 | 204 | 4.0576 |
| 1.8584 | 205 | 4.0892 |
| 1.8674 | 206 | 4.1945 |
| 1.8765 | 207 | 4.0809 |
| 1.8856 | 208 | 4.0655 |
| 1.8946 | 209 | 4.155 |
| 1.9037 | 210 | 4.0801 |
| 1.9127 | 211 | 4.0837 |
| 1.9218 | 212 | 4.1487 |
| 1.9309 | 213 | 4.0574 |
| 1.9399 | 214 | 4.0952 |
| 1.9490 | 215 | 4.0414 |
| 1.9581 | 216 | 3.9645 |
| 1.9671 | 217 | 4.0327 |
| 1.9762 | 218 | 3.9183 |
| 1.9853 | 219 | 4.1204 |
| 1.9943 | 220 | 4.0043 |
| 2.0034 | 221 | 3.904 |
| 2.0125 | 222 | 4.0489 |
| 2.0215 | 223 | 4.0316 |
| 2.0306 | 224 | 3.9649 |
| 2.0397 | 225 | 3.891 |
| 2.0487 | 226 | 4.0352 |
| 2.0578 | 227 | 4.1811 |
| 2.0669 | 228 | 4.1212 |
| 2.0759 | 229 | 4.2356 |
| 2.0850 | 230 | 4.1295 |
| 2.0941 | 231 | 4.0231 |
| 2.1031 | 232 | 3.914 |
| 2.1122 | 233 | 3.916 |
| 2.1212 | 234 | 3.8657 |
| 2.1303 | 235 | 4.0986 |
| 2.1394 | 236 | 3.9774 |
| 2.1484 | 237 | 3.9112 |
| 2.1575 | 238 | 3.8232 |
| 2.1666 | 239 | 3.85 |
| 2.1756 | 240 | 3.8874 |
| 2.1847 | 241 | 3.6777 |
| 2.1938 | 242 | 3.7898 |
| 2.2028 | 243 | 3.8527 |
| 2.2119 | 244 | 3.7038 |
| 2.2210 | 245 | 3.9404 |
| 2.2300 | 246 | 3.7468 |
| 2.2391 | 247 | 3.7905 |
| 2.2482 | 248 | 3.8356 |
| 2.2572 | 249 | 3.9682 |
| 2.2663 | 250 | 3.9372 |
| 2.2754 | 251 | 3.7579 |
| 2.2844 | 252 | 3.6927 |
| 2.2935 | 253 | 3.7372 |
| 2.3025 | 254 | 3.6125 |
| 2.3116 | 255 | 4.0475 |
| 2.3207 | 256 | 3.7422 |
| 2.3297 | 257 | 3.8646 |
| 2.3388 | 258 | 3.6637 |
| 2.3479 | 259 | 3.8496 |
| 2.3569 | 260 | 3.753 |
| 2.3660 | 261 | 3.7632 |
| 2.3751 | 262 | 3.7097 |
| 2.3841 | 263 | 3.8584 |
| 2.3932 | 264 | 3.6547 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-m3