Matryoshka Representation Learning
Paper
• 2205.13147 • Published
• 25
This is a sentence-transformers model finetuned from intfloat/multilingual-e5-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("mohanprakash462/tamil-embed-base")
# Run inference
sentences = [
'ஒரு முதியவன் பாதாளங்களைத் தாண்டும் தன் மந்திரக்கோலால் சாய்த்தபடியிருக்கிறான் நாட்சத்திரங்களை............................................................................................................................................................................... இது எத்தனையாவது [...]',
'தந்தைக்குக் கடினமான பரிசுகளைக் கொடுத்துக் கொண்டிருந்தார்.',
'பிக்பாஸைப் பிடித்த போது எந்தப் படமும் நடக்கவில்லை.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4205, 0.4317],
# [0.4205, 1.0000, 0.3737],
# [0.4317, 0.3737, 1.0000]])
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
Jack and Jill: A Village Story by Louisa May Alcott, is a children's book originally published in 1880.It takes place in a small New England town after the Civil War.The story of two good friends named Jack and Janey, "Jack and Jill" tells of the aftermath of a serious sliding accident. |
ஜாக் மற்றும் ஜானி இரு நல்ல நண்பர்கள். |
SourceMedia ஒரு mid-size diversified business-to-business digital media company owned by Observer Capital, which acquired the company from Investcorp in August 2014.Thomson Corporation's former Thomson Media division, SourceMedia விழுந்து, Thomson 2004 இல் Investcorp க்கு விற்கப்பட்டது $ 350 மில்லியன். |
SourceMedia ஒரு Digital Media நிறுவனம் |
ஒரு முதியவன் பாதாளங்களைத் தாண்டும் தன் மந்திரக்கோலால் சாய்த்தபடியிருக்கிறான் நாட்சத்திரங்களை............................................................................................................................................................................... இது எத்தனையாவது [...] |
பல்வேறு மாநிலங்களில் அரசுக்கு எச்சரிக்கை |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128
],
"matryoshka_weights": [
1,
1,
1,
1
],
"n_dims_per_step": -1
}
per_device_train_batch_size: 64learning_rate: 1e-06warmup_steps: 144fp16: Truegradient_checkpointing: Truebatch_sampler: no_duplicatesper_device_train_batch_size: 64num_train_epochs: 3max_steps: -1learning_rate: 1e-06lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 144optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Falsefp16: Truebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Truegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: noper_device_eval_batch_size: 8prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.0174 | 25 | 9.5049 |
| 0.0347 | 50 | 9.2988 |
| 0.0521 | 75 | 8.7502 |
| 0.0695 | 100 | 7.9748 |
| 0.0869 | 125 | 7.1927 |
| 0.1042 | 150 | 6.1935 |
| 0.1216 | 175 | 5.3092 |
| 0.1390 | 200 | 4.6630 |
| 0.1564 | 225 | 4.1481 |
| 0.1737 | 250 | 3.5569 |
| 0.1911 | 275 | 3.5474 |
| 0.2085 | 300 | 3.5098 |
| 0.2259 | 325 | 3.2235 |
| 0.2432 | 350 | 2.9600 |
| 0.2606 | 375 | 3.0261 |
| 0.2780 | 400 | 2.8874 |
| 0.2953 | 425 | 2.9094 |
| 0.3127 | 450 | 2.9079 |
| 0.3301 | 475 | 2.6196 |
| 0.3475 | 500 | 2.6887 |
| 0.3648 | 525 | 3.0199 |
| 0.3822 | 550 | 2.8014 |
| 0.3996 | 575 | 2.8743 |
| 0.4170 | 600 | 2.7243 |
| 0.4343 | 625 | 2.7829 |
| 0.4517 | 650 | 2.7898 |
| 0.4691 | 675 | 2.7561 |
| 0.4864 | 700 | 2.6587 |
| 0.5038 | 725 | 2.6228 |
| 0.5212 | 750 | 2.5352 |
| 0.5386 | 775 | 2.6544 |
| 0.5559 | 800 | 2.6122 |
| 0.5733 | 825 | 2.6155 |
| 0.5907 | 850 | 2.4361 |
| 0.6081 | 875 | 2.6018 |
| 0.6254 | 900 | 2.5225 |
| 0.6428 | 925 | 2.5303 |
| 0.6602 | 950 | 2.7318 |
| 0.6776 | 975 | 2.5735 |
| 0.6949 | 1000 | 2.5443 |
| 0.7123 | 1025 | 2.3904 |
| 0.7297 | 1050 | 2.4995 |
| 0.7470 | 1075 | 2.5640 |
| 0.7644 | 1100 | 2.6522 |
| 0.7818 | 1125 | 2.5466 |
| 0.7992 | 1150 | 2.4968 |
| 0.8165 | 1175 | 2.3753 |
| 0.8339 | 1200 | 2.4524 |
| 0.8513 | 1225 | 2.3839 |
| 0.8687 | 1250 | 2.6322 |
| 0.8860 | 1275 | 2.5143 |
| 0.9034 | 1300 | 2.6360 |
| 0.9208 | 1325 | 2.3736 |
| 0.9382 | 1350 | 3.3474 |
| 0.9555 | 1375 | 4.2932 |
| 0.9729 | 1400 | 3.8941 |
| 0.9903 | 1425 | 4.0057 |
| 1.0076 | 1450 | 3.2783 |
| 1.0250 | 1475 | 2.6051 |
| 1.0424 | 1500 | 2.8140 |
| 1.0598 | 1525 | 2.4573 |
| 1.0771 | 1550 | 2.5487 |
| 1.0945 | 1575 | 2.5347 |
| 1.1119 | 1600 | 2.3618 |
| 1.1293 | 1625 | 2.3501 |
| 1.1466 | 1650 | 2.4186 |
| 1.1640 | 1675 | 2.3757 |
| 1.1814 | 1700 | 2.6012 |
| 1.1987 | 1725 | 2.3281 |
| 1.2161 | 1750 | 2.4444 |
| 1.2335 | 1775 | 2.5461 |
| 1.2509 | 1800 | 2.5203 |
| 1.2682 | 1825 | 2.4201 |
| 1.2856 | 1850 | 2.6096 |
| 1.3030 | 1875 | 2.4021 |
| 1.3204 | 1900 | 2.4524 |
| 1.3377 | 1925 | 2.3002 |
| 1.3551 | 1950 | 2.4063 |
| 1.3725 | 1975 | 2.1237 |
| 1.3899 | 2000 | 2.3219 |
| 1.4072 | 2025 | 2.3227 |
| 1.4246 | 2050 | 2.3646 |
| 1.4420 | 2075 | 2.4407 |
| 1.4593 | 2100 | 2.2862 |
| 1.4767 | 2125 | 2.2900 |
| 1.4941 | 2150 | 2.2512 |
| 1.5115 | 2175 | 2.3741 |
| 1.5288 | 2200 | 2.6308 |
| 1.5462 | 2225 | 2.5161 |
| 1.5636 | 2250 | 2.4871 |
| 1.5810 | 2275 | 2.5049 |
| 1.5983 | 2300 | 2.6384 |
| 1.6157 | 2325 | 2.4185 |
| 1.6331 | 2350 | 2.4573 |
| 1.6505 | 2375 | 2.2954 |
| 1.6678 | 2400 | 2.2384 |
| 1.6852 | 2425 | 2.3318 |
| 1.7026 | 2450 | 2.2915 |
| 1.7199 | 2475 | 2.2013 |
| 1.7373 | 2500 | 2.4082 |
| 1.7547 | 2525 | 2.5290 |
| 1.7721 | 2550 | 2.4825 |
| 1.7894 | 2575 | 2.4610 |
| 1.8068 | 2600 | 2.3414 |
| 1.8242 | 2625 | 2.3729 |
| 1.8416 | 2650 | 2.5862 |
| 1.8589 | 2675 | 2.4320 |
| 1.8763 | 2700 | 2.2745 |
| 1.8937 | 2725 | 2.3046 |
| 1.9110 | 2750 | 2.3621 |
| 1.9284 | 2775 | 2.3097 |
| 1.9458 | 2800 | 4.1645 |
| 1.9632 | 2825 | 4.5466 |
| 1.9805 | 2850 | 4.6750 |
| 1.9979 | 2875 | 2.8955 |
| 2.0153 | 2900 | 2.9962 |
| 2.0327 | 2925 | 2.3366 |
| 2.0500 | 2950 | 2.2591 |
| 2.0674 | 2975 | 2.3375 |
| 2.0848 | 3000 | 2.4169 |
| 2.1022 | 3025 | 2.2635 |
| 2.1195 | 3050 | 2.1642 |
| 2.1369 | 3075 | 2.4082 |
| 2.1543 | 3100 | 2.3501 |
| 2.1716 | 3125 | 2.4870 |
| 2.1890 | 3150 | 2.7393 |
| 2.2064 | 3175 | 2.3203 |
| 2.2238 | 3200 | 2.2731 |
| 2.2411 | 3225 | 2.1901 |
| 2.2585 | 3250 | 2.3000 |
| 2.2759 | 3275 | 2.3846 |
| 2.2933 | 3300 | 2.2514 |
| 2.3106 | 3325 | 2.2218 |
| 2.3280 | 3350 | 2.5800 |
| 2.3454 | 3375 | 2.4384 |
| 2.3628 | 3400 | 2.4946 |
| 2.3801 | 3425 | 2.2781 |
| 2.3975 | 3450 | 2.2777 |
| 2.4149 | 3475 | 2.2062 |
| 2.4322 | 3500 | 2.3994 |
| 2.4496 | 3525 | 2.5084 |
| 2.4670 | 3550 | 2.1158 |
| 2.4844 | 3575 | 2.0865 |
| 2.5017 | 3600 | 2.3174 |
| 2.5191 | 3625 | 2.3668 |
| 2.5365 | 3650 | 2.3439 |
| 2.5539 | 3675 | 2.4482 |
| 2.5712 | 3700 | 2.3998 |
| 2.5886 | 3725 | 2.2155 |
| 2.6060 | 3750 | 2.0207 |
| 2.6233 | 3775 | 2.2652 |
| 2.6407 | 3800 | 2.4261 |
| 2.6581 | 3825 | 2.2214 |
| 2.6755 | 3850 | 2.2244 |
| 2.6928 | 3875 | 2.2835 |
| 2.7102 | 3900 | 2.4259 |
| 2.7276 | 3925 | 2.3013 |
| 2.7450 | 3950 | 2.1069 |
| 2.7623 | 3975 | 2.4415 |
| 2.7797 | 4000 | 2.3380 |
| 2.7971 | 4025 | 2.3013 |
| 2.8145 | 4050 | 2.4202 |
| 2.8318 | 4075 | 2.2488 |
| 2.8492 | 4100 | 2.1855 |
| 2.8666 | 4125 | 2.3882 |
| 2.8839 | 4150 | 2.5306 |
| 2.9013 | 4175 | 2.3197 |
| 2.9187 | 4200 | 2.3295 |
| 2.9361 | 4225 | 3.2070 |
| 2.9534 | 4250 | 3.9697 |
| 2.9708 | 4275 | 4.2241 |
| 2.9882 | 4300 | 3.5779 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
intfloat/multilingual-e5-base