| | --- |
| | tags: |
| | - sentence-transformers |
| | - sentence-similarity |
| | - feature-extraction |
| | - dense |
| | - generated_from_trainer |
| | - dataset_size:92081 |
| | - loss:MatryoshkaLoss |
| | - loss:MultipleNegativesRankingLoss |
| | base_model: intfloat/multilingual-e5-base |
| | widget: |
| | - source_sentence: அவர் வீட்டுக்கு திரும்பினார்.அவர் தனது குரங்குக்கு உணவு கொடுத்து |
| | சென்றார்.அவரின் குரங்கு எங்கும் காணப்படவில்லை.அவரின் குரங்கு எல்லையில் தேடி வந்தார்.அவருக்கு |
| | அடுத்த நாள் தனது குரங்கு கண்டுபிடிக்க முடிந்தது. |
| | sentences: |
| | - Here Comes Santa Claus ஒரு இடத்தில் ஒரு முதல் 10 ஹெட்டாக இருந்தது |
| | - சாம் ஒரு Pet Cat |
| | - இது ஒரு ergonomic office chair. |
| | - source_sentence: 'Topics: ஏகத்துவத்தைக் கொண்டே பிரச்சாரத்தை ஆரம்பிக்க வேண்டும் and |
| | தாயத்து கட்டுவது ஷிர்க்கை சார்ந்தது Begin propagation with Monotheism, and Using |
| | amulets is Shirk Speaker: மவ்லவி கே.எல்.எம்.' |
| | sentences: |
| | - பிரெஞ்சுக்குத் தேவையான அளவு பிரெஞ்சு தேவை. |
| | - அமெரிக்கா தான் மற்ற நாடுகள் கவனித்து வருகின்றன. |
| | - ரஜினிகாந்த் ராகுல் ஒரு ராகுலக் காட்சியை வெளியிட்டிருக்கிறார். |
| | - source_sentence: Karl & Co is a Norwegian situation comedy created by Tore Ryen, |
| | starring Nils Vogt reprising his role as Karl Reverud from the popular sitcom |
| | "Mot i brøstet".It aired on TV 2, run for three seasons from 1998 to 2001, a total |
| | of 63 episodes. |
| | sentences: |
| | - ஆங்கிலத்தில் இதை Single Orgasm, Multiple Orgasm என்றும் கூறுகிறார்கள். |
| | - Hamvention 2018 Xenia இல் நடைபெறுகிறது. |
| | - ஜூனியர் ஒப்பந்தங்கள் |
| | - source_sentence: There is only one temple in the village, no amman etc. The temple |
| | to Sri Narayanan.கீழ்தட்டு மக்களே இராமனுஜரை, இவர்களுக்கு இருக்கும் பற்று எனக்கில்லையே |
| | என நினைக்கவைத்த கதையும் உண்டு.ஒருநாள், நம்மாழ்வார் அவதரித்த ஊருக்குச் செல்லும்காலை, |
| | அவருக்கு வழிதெரியவில்லை. |
| | sentences: |
| | - Wenham Parva ஒரு ஊர் மட்டுமே அல்ல, மேலும் ஒரு குடியரசு குடியரசு. |
| | - பேச்சுவார்த்தை நிராகரிக்கப்படவில்லை. |
| | - Zazie Beetz, Vanessa on Atlanta படத்தில் நடிக்கிறார். |
| | - source_sentence: ஒரு முதியவன் பாதாளங்களைத் தாண்டும் தன் மந்திரக்கோலால் சாய்த்தபடியிருக்கிறான் |
| | நாட்சத்திரங்களை............................................................................................................................................................................... |
| | இது எத்தனையாவது [...] |
| | sentences: |
| | - விமானங்கள் போக்குவரத்துக்காக காவல்துறையில் அனுமதிக்கப்பட்டுள்ளன. |
| | - தந்தைக்குக் கடினமான பரிசுகளைக் கொடுத்துக் கொண்டிருந்தார். |
| | - பிக்பாஸைப் பிடித்த போது எந்தப் படமும் நடக்கவில்லை. |
| | pipeline_tag: sentence-similarity |
| | library_name: sentence-transformers |
| | --- |
| | |
| | # SentenceTransformer based on intfloat/multilingual-e5-base |
| |
|
| | This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| | - **Model Type:** Sentence Transformer |
| | - **Base model:** [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) <!-- at revision 835193815a3936a24a0ee7dc9e3d48c1fbb19c55 --> |
| | - **Maximum Sequence Length:** 512 tokens |
| | - **Output Dimensionality:** 768 dimensions |
| | - **Similarity Function:** Cosine Similarity |
| | <!-- - **Training Dataset:** Unknown --> |
| | <!-- - **Language:** Unknown --> |
| | <!-- - **License:** Unknown --> |
| |
|
| | ### Model Sources |
| |
|
| | - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
| | - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers) |
| | - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
| |
|
| | ### Full Model Architecture |
| |
|
| | ``` |
| | SentenceTransformer( |
| | (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'}) |
| | (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
| | (2): Normalize() |
| | ) |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | ### Direct Usage (Sentence Transformers) |
| |
|
| | First install the Sentence Transformers library: |
| |
|
| | ```bash |
| | pip install -U sentence-transformers |
| | ``` |
| |
|
| | Then you can load this model and run inference. |
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | |
| | # Download from the 🤗 Hub |
| | model = SentenceTransformer("mohanprakash462/tamil-embed-base") |
| | # Run inference |
| | sentences = [ |
| | 'ஒரு முதியவன் பாதாளங்களைத் தாண்டும் தன் மந்திரக்கோலால் சாய்த்தபடியிருக்கிறான் நாட்சத்திரங்களை............................................................................................................................................................................... இது எத்தனையாவது [...]', |
| | 'தந்தைக்குக் கடினமான பரிசுகளைக் கொடுத்துக் கொண்டிருந்தார்.', |
| | 'பிக்பாஸைப் பிடித்த போது எந்தப் படமும் நடக்கவில்லை.', |
| | ] |
| | embeddings = model.encode(sentences) |
| | print(embeddings.shape) |
| | # [3, 768] |
| | |
| | # Get the similarity scores for the embeddings |
| | similarities = model.similarity(embeddings, embeddings) |
| | print(similarities) |
| | # tensor([[1.0000, 0.4205, 0.4317], |
| | # [0.4205, 1.0000, 0.3737], |
| | # [0.4317, 0.3737, 1.0000]]) |
| | ``` |
| |
|
| | <!-- |
| | ### Direct Usage (Transformers) |
| |
|
| | <details><summary>Click to see the direct usage in Transformers</summary> |
| |
|
| | </details> |
| | --> |
| |
|
| | <!-- |
| | ### Downstream Usage (Sentence Transformers) |
| |
|
| | You can finetune this model on your own dataset. |
| |
|
| | <details><summary>Click to expand</summary> |
| |
|
| | </details> |
| | --> |
| |
|
| | <!-- |
| | ### Out-of-Scope Use |
| |
|
| | *List how the model may foreseeably be misused and address what users ought not to do with the model.* |
| | --> |
| |
|
| | <!-- |
| | ## Bias, Risks and Limitations |
| |
|
| | *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
| | --> |
| |
|
| | <!-- |
| | ### Recommendations |
| |
|
| | *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
| | --> |
| |
|
| | ## Training Details |
| |
|
| | ### Training Dataset |
| |
|
| | #### Unnamed Dataset |
| |
|
| | * Size: 92,081 training samples |
| | * Columns: <code>anchor</code> and <code>positive</code> |
| | * Approximate statistics based on the first 1000 samples: |
| | | | anchor | positive | |
| | |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| |
| | | type | string | string | |
| | | details | <ul><li>min: 15 tokens</li><li>mean: 57.89 tokens</li><li>max: 200 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 16.06 tokens</li><li>max: 87 tokens</li></ul> | |
| | * Samples: |
| | | anchor | positive | |
| | |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------| |
| | | <code>Jack and Jill: A Village Story by Louisa May Alcott, is a children's book originally published in 1880.It takes place in a small New England town after the Civil War.The story of two good friends named Jack and Janey, "Jack and Jill" tells of the aftermath of a serious sliding accident.</code> | <code>ஜாக் மற்றும் ஜானி இரு நல்ல நண்பர்கள்.</code> | |
| | | <code>SourceMedia ஒரு mid-size diversified business-to-business digital media company owned by Observer Capital, which acquired the company from Investcorp in August 2014.Thomson Corporation's former Thomson Media division, SourceMedia விழுந்து, Thomson 2004 இல் Investcorp க்கு விற்கப்பட்டது $ 350 மில்லியன்.</code> | <code>SourceMedia ஒரு Digital Media நிறுவனம்</code> | |
| | | <code>ஒரு முதியவன் பாதாளங்களைத் தாண்டும் தன் மந்திரக்கோலால் சாய்த்தபடியிருக்கிறான் நாட்சத்திரங்களை............................................................................................................................................................................... இது எத்தனையாவது [...]</code> | <code>பல்வேறு மாநிலங்களில் அரசுக்கு எச்சரிக்கை</code> | |
| | * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: |
| | ```json |
| | { |
| | "loss": "MultipleNegativesRankingLoss", |
| | "matryoshka_dims": [ |
| | 768, |
| | 512, |
| | 256, |
| | 128 |
| | ], |
| | "matryoshka_weights": [ |
| | 1, |
| | 1, |
| | 1, |
| | 1 |
| | ], |
| | "n_dims_per_step": -1 |
| | } |
| | ``` |
| |
|
| | ### Training Hyperparameters |
| | #### Non-Default Hyperparameters |
| |
|
| | - `per_device_train_batch_size`: 64 |
| | - `learning_rate`: 1e-06 |
| | - `warmup_steps`: 144 |
| | - `fp16`: True |
| | - `gradient_checkpointing`: True |
| | - `batch_sampler`: no_duplicates |
| | |
| | #### All Hyperparameters |
| | <details><summary>Click to expand</summary> |
| | |
| | - `per_device_train_batch_size`: 64 |
| | - `num_train_epochs`: 3 |
| | - `max_steps`: -1 |
| | - `learning_rate`: 1e-06 |
| | - `lr_scheduler_type`: linear |
| | - `lr_scheduler_kwargs`: None |
| | - `warmup_steps`: 144 |
| | - `optim`: adamw_torch_fused |
| | - `optim_args`: None |
| | - `weight_decay`: 0.0 |
| | - `adam_beta1`: 0.9 |
| | - `adam_beta2`: 0.999 |
| | - `adam_epsilon`: 1e-08 |
| | - `optim_target_modules`: None |
| | - `gradient_accumulation_steps`: 1 |
| | - `average_tokens_across_devices`: True |
| | - `max_grad_norm`: 1.0 |
| | - `label_smoothing_factor`: 0.0 |
| | - `bf16`: False |
| | - `fp16`: True |
| | - `bf16_full_eval`: False |
| | - `fp16_full_eval`: False |
| | - `tf32`: None |
| | - `gradient_checkpointing`: True |
| | - `gradient_checkpointing_kwargs`: None |
| | - `torch_compile`: False |
| | - `torch_compile_backend`: None |
| | - `torch_compile_mode`: None |
| | - `use_liger_kernel`: False |
| | - `liger_kernel_config`: None |
| | - `use_cache`: False |
| | - `neftune_noise_alpha`: None |
| | - `torch_empty_cache_steps`: None |
| | - `auto_find_batch_size`: False |
| | - `log_on_each_node`: True |
| | - `logging_nan_inf_filter`: True |
| | - `include_num_input_tokens_seen`: no |
| | - `log_level`: passive |
| | - `log_level_replica`: warning |
| | - `disable_tqdm`: False |
| | - `project`: huggingface |
| | - `trackio_space_id`: trackio |
| | - `eval_strategy`: no |
| | - `per_device_eval_batch_size`: 8 |
| | - `prediction_loss_only`: True |
| | - `eval_on_start`: False |
| | - `eval_do_concat_batches`: True |
| | - `eval_use_gather_object`: False |
| | - `eval_accumulation_steps`: None |
| | - `include_for_metrics`: [] |
| | - `batch_eval_metrics`: False |
| | - `save_only_model`: False |
| | - `save_on_each_node`: False |
| | - `enable_jit_checkpoint`: False |
| | - `push_to_hub`: False |
| | - `hub_private_repo`: None |
| | - `hub_model_id`: None |
| | - `hub_strategy`: every_save |
| | - `hub_always_push`: False |
| | - `hub_revision`: None |
| | - `load_best_model_at_end`: False |
| | - `ignore_data_skip`: False |
| | - `restore_callback_states_from_checkpoint`: False |
| | - `full_determinism`: False |
| | - `seed`: 42 |
| | - `data_seed`: None |
| | - `use_cpu`: False |
| | - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
| | - `parallelism_config`: None |
| | - `dataloader_drop_last`: False |
| | - `dataloader_num_workers`: 0 |
| | - `dataloader_pin_memory`: True |
| | - `dataloader_persistent_workers`: False |
| | - `dataloader_prefetch_factor`: None |
| | - `remove_unused_columns`: True |
| | - `label_names`: None |
| | - `train_sampling_strategy`: random |
| | - `length_column_name`: length |
| | - `ddp_find_unused_parameters`: None |
| | - `ddp_bucket_cap_mb`: None |
| | - `ddp_broadcast_buffers`: False |
| | - `ddp_backend`: None |
| | - `ddp_timeout`: 1800 |
| | - `fsdp`: [] |
| | - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
| | - `deepspeed`: None |
| | - `debug`: [] |
| | - `skip_memory_metrics`: True |
| | - `do_predict`: False |
| | - `resume_from_checkpoint`: None |
| | - `warmup_ratio`: None |
| | - `local_rank`: -1 |
| | - `prompts`: None |
| | - `batch_sampler`: no_duplicates |
| | - `multi_dataset_batch_sampler`: proportional |
| | - `router_mapping`: {} |
| | - `learning_rate_mapping`: {} |
| |
|
| | </details> |
| |
|
| | ### Training Logs |
| | <details><summary>Click to expand</summary> |
| |
|
| | | Epoch | Step | Training Loss | |
| | |:------:|:----:|:-------------:| |
| | | 0.0174 | 25 | 9.5049 | |
| | | 0.0347 | 50 | 9.2988 | |
| | | 0.0521 | 75 | 8.7502 | |
| | | 0.0695 | 100 | 7.9748 | |
| | | 0.0869 | 125 | 7.1927 | |
| | | 0.1042 | 150 | 6.1935 | |
| | | 0.1216 | 175 | 5.3092 | |
| | | 0.1390 | 200 | 4.6630 | |
| | | 0.1564 | 225 | 4.1481 | |
| | | 0.1737 | 250 | 3.5569 | |
| | | 0.1911 | 275 | 3.5474 | |
| | | 0.2085 | 300 | 3.5098 | |
| | | 0.2259 | 325 | 3.2235 | |
| | | 0.2432 | 350 | 2.9600 | |
| | | 0.2606 | 375 | 3.0261 | |
| | | 0.2780 | 400 | 2.8874 | |
| | | 0.2953 | 425 | 2.9094 | |
| | | 0.3127 | 450 | 2.9079 | |
| | | 0.3301 | 475 | 2.6196 | |
| | | 0.3475 | 500 | 2.6887 | |
| | | 0.3648 | 525 | 3.0199 | |
| | | 0.3822 | 550 | 2.8014 | |
| | | 0.3996 | 575 | 2.8743 | |
| | | 0.4170 | 600 | 2.7243 | |
| | | 0.4343 | 625 | 2.7829 | |
| | | 0.4517 | 650 | 2.7898 | |
| | | 0.4691 | 675 | 2.7561 | |
| | | 0.4864 | 700 | 2.6587 | |
| | | 0.5038 | 725 | 2.6228 | |
| | | 0.5212 | 750 | 2.5352 | |
| | | 0.5386 | 775 | 2.6544 | |
| | | 0.5559 | 800 | 2.6122 | |
| | | 0.5733 | 825 | 2.6155 | |
| | | 0.5907 | 850 | 2.4361 | |
| | | 0.6081 | 875 | 2.6018 | |
| | | 0.6254 | 900 | 2.5225 | |
| | | 0.6428 | 925 | 2.5303 | |
| | | 0.6602 | 950 | 2.7318 | |
| | | 0.6776 | 975 | 2.5735 | |
| | | 0.6949 | 1000 | 2.5443 | |
| | | 0.7123 | 1025 | 2.3904 | |
| | | 0.7297 | 1050 | 2.4995 | |
| | | 0.7470 | 1075 | 2.5640 | |
| | | 0.7644 | 1100 | 2.6522 | |
| | | 0.7818 | 1125 | 2.5466 | |
| | | 0.7992 | 1150 | 2.4968 | |
| | | 0.8165 | 1175 | 2.3753 | |
| | | 0.8339 | 1200 | 2.4524 | |
| | | 0.8513 | 1225 | 2.3839 | |
| | | 0.8687 | 1250 | 2.6322 | |
| | | 0.8860 | 1275 | 2.5143 | |
| | | 0.9034 | 1300 | 2.6360 | |
| | | 0.9208 | 1325 | 2.3736 | |
| | | 0.9382 | 1350 | 3.3474 | |
| | | 0.9555 | 1375 | 4.2932 | |
| | | 0.9729 | 1400 | 3.8941 | |
| | | 0.9903 | 1425 | 4.0057 | |
| | | 1.0076 | 1450 | 3.2783 | |
| | | 1.0250 | 1475 | 2.6051 | |
| | | 1.0424 | 1500 | 2.8140 | |
| | | 1.0598 | 1525 | 2.4573 | |
| | | 1.0771 | 1550 | 2.5487 | |
| | | 1.0945 | 1575 | 2.5347 | |
| | | 1.1119 | 1600 | 2.3618 | |
| | | 1.1293 | 1625 | 2.3501 | |
| | | 1.1466 | 1650 | 2.4186 | |
| | | 1.1640 | 1675 | 2.3757 | |
| | | 1.1814 | 1700 | 2.6012 | |
| | | 1.1987 | 1725 | 2.3281 | |
| | | 1.2161 | 1750 | 2.4444 | |
| | | 1.2335 | 1775 | 2.5461 | |
| | | 1.2509 | 1800 | 2.5203 | |
| | | 1.2682 | 1825 | 2.4201 | |
| | | 1.2856 | 1850 | 2.6096 | |
| | | 1.3030 | 1875 | 2.4021 | |
| | | 1.3204 | 1900 | 2.4524 | |
| | | 1.3377 | 1925 | 2.3002 | |
| | | 1.3551 | 1950 | 2.4063 | |
| | | 1.3725 | 1975 | 2.1237 | |
| | | 1.3899 | 2000 | 2.3219 | |
| | | 1.4072 | 2025 | 2.3227 | |
| | | 1.4246 | 2050 | 2.3646 | |
| | | 1.4420 | 2075 | 2.4407 | |
| | | 1.4593 | 2100 | 2.2862 | |
| | | 1.4767 | 2125 | 2.2900 | |
| | | 1.4941 | 2150 | 2.2512 | |
| | | 1.5115 | 2175 | 2.3741 | |
| | | 1.5288 | 2200 | 2.6308 | |
| | | 1.5462 | 2225 | 2.5161 | |
| | | 1.5636 | 2250 | 2.4871 | |
| | | 1.5810 | 2275 | 2.5049 | |
| | | 1.5983 | 2300 | 2.6384 | |
| | | 1.6157 | 2325 | 2.4185 | |
| | | 1.6331 | 2350 | 2.4573 | |
| | | 1.6505 | 2375 | 2.2954 | |
| | | 1.6678 | 2400 | 2.2384 | |
| | | 1.6852 | 2425 | 2.3318 | |
| | | 1.7026 | 2450 | 2.2915 | |
| | | 1.7199 | 2475 | 2.2013 | |
| | | 1.7373 | 2500 | 2.4082 | |
| | | 1.7547 | 2525 | 2.5290 | |
| | | 1.7721 | 2550 | 2.4825 | |
| | | 1.7894 | 2575 | 2.4610 | |
| | | 1.8068 | 2600 | 2.3414 | |
| | | 1.8242 | 2625 | 2.3729 | |
| | | 1.8416 | 2650 | 2.5862 | |
| | | 1.8589 | 2675 | 2.4320 | |
| | | 1.8763 | 2700 | 2.2745 | |
| | | 1.8937 | 2725 | 2.3046 | |
| | | 1.9110 | 2750 | 2.3621 | |
| | | 1.9284 | 2775 | 2.3097 | |
| | | 1.9458 | 2800 | 4.1645 | |
| | | 1.9632 | 2825 | 4.5466 | |
| | | 1.9805 | 2850 | 4.6750 | |
| | | 1.9979 | 2875 | 2.8955 | |
| | | 2.0153 | 2900 | 2.9962 | |
| | | 2.0327 | 2925 | 2.3366 | |
| | | 2.0500 | 2950 | 2.2591 | |
| | | 2.0674 | 2975 | 2.3375 | |
| | | 2.0848 | 3000 | 2.4169 | |
| | | 2.1022 | 3025 | 2.2635 | |
| | | 2.1195 | 3050 | 2.1642 | |
| | | 2.1369 | 3075 | 2.4082 | |
| | | 2.1543 | 3100 | 2.3501 | |
| | | 2.1716 | 3125 | 2.4870 | |
| | | 2.1890 | 3150 | 2.7393 | |
| | | 2.2064 | 3175 | 2.3203 | |
| | | 2.2238 | 3200 | 2.2731 | |
| | | 2.2411 | 3225 | 2.1901 | |
| | | 2.2585 | 3250 | 2.3000 | |
| | | 2.2759 | 3275 | 2.3846 | |
| | | 2.2933 | 3300 | 2.2514 | |
| | | 2.3106 | 3325 | 2.2218 | |
| | | 2.3280 | 3350 | 2.5800 | |
| | | 2.3454 | 3375 | 2.4384 | |
| | | 2.3628 | 3400 | 2.4946 | |
| | | 2.3801 | 3425 | 2.2781 | |
| | | 2.3975 | 3450 | 2.2777 | |
| | | 2.4149 | 3475 | 2.2062 | |
| | | 2.4322 | 3500 | 2.3994 | |
| | | 2.4496 | 3525 | 2.5084 | |
| | | 2.4670 | 3550 | 2.1158 | |
| | | 2.4844 | 3575 | 2.0865 | |
| | | 2.5017 | 3600 | 2.3174 | |
| | | 2.5191 | 3625 | 2.3668 | |
| | | 2.5365 | 3650 | 2.3439 | |
| | | 2.5539 | 3675 | 2.4482 | |
| | | 2.5712 | 3700 | 2.3998 | |
| | | 2.5886 | 3725 | 2.2155 | |
| | | 2.6060 | 3750 | 2.0207 | |
| | | 2.6233 | 3775 | 2.2652 | |
| | | 2.6407 | 3800 | 2.4261 | |
| | | 2.6581 | 3825 | 2.2214 | |
| | | 2.6755 | 3850 | 2.2244 | |
| | | 2.6928 | 3875 | 2.2835 | |
| | | 2.7102 | 3900 | 2.4259 | |
| | | 2.7276 | 3925 | 2.3013 | |
| | | 2.7450 | 3950 | 2.1069 | |
| | | 2.7623 | 3975 | 2.4415 | |
| | | 2.7797 | 4000 | 2.3380 | |
| | | 2.7971 | 4025 | 2.3013 | |
| | | 2.8145 | 4050 | 2.4202 | |
| | | 2.8318 | 4075 | 2.2488 | |
| | | 2.8492 | 4100 | 2.1855 | |
| | | 2.8666 | 4125 | 2.3882 | |
| | | 2.8839 | 4150 | 2.5306 | |
| | | 2.9013 | 4175 | 2.3197 | |
| | | 2.9187 | 4200 | 2.3295 | |
| | | 2.9361 | 4225 | 3.2070 | |
| | | 2.9534 | 4250 | 3.9697 | |
| | | 2.9708 | 4275 | 4.2241 | |
| | | 2.9882 | 4300 | 3.5779 | |
| |
|
| | </details> |
| |
|
| | ### Framework Versions |
| | - Python: 3.12.12 |
| | - Sentence Transformers: 5.2.3 |
| | - Transformers: 5.3.0 |
| | - PyTorch: 2.9.0+cu126 |
| | - Accelerate: 1.12.0 |
| | - Datasets: 4.0.0 |
| | - Tokenizers: 0.22.2 |
| |
|
| | ## Citation |
| |
|
| | ### BibTeX |
| |
|
| | #### Sentence Transformers |
| | ```bibtex |
| | @inproceedings{reimers-2019-sentence-bert, |
| | title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
| | author = "Reimers, Nils and Gurevych, Iryna", |
| | booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
| | month = "11", |
| | year = "2019", |
| | publisher = "Association for Computational Linguistics", |
| | url = "https://arxiv.org/abs/1908.10084", |
| | } |
| | ``` |
| |
|
| | #### MatryoshkaLoss |
| | ```bibtex |
| | @misc{kusupati2024matryoshka, |
| | title={Matryoshka Representation Learning}, |
| | author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, |
| | year={2024}, |
| | eprint={2205.13147}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.LG} |
| | } |
| | ``` |
| |
|
| | #### MultipleNegativesRankingLoss |
| | ```bibtex |
| | @misc{henderson2017efficient, |
| | title={Efficient Natural Language Response Suggestion for Smart Reply}, |
| | author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, |
| | year={2017}, |
| | eprint={1705.00652}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL} |
| | } |
| | ``` |
| |
|
| | <!-- |
| | ## Glossary |
| |
|
| | *Clearly define terms in order to be accessible across audiences.* |
| | --> |
| |
|
| | <!-- |
| | ## Model Card Authors |
| |
|
| | *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
| | --> |
| |
|
| | <!-- |
| | ## Model Card Contact |
| |
|
| | *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
| | --> |