Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use Mohamed-Gamil/multilingual-e5-small-JapaneseTeacher with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Mohamed-Gamil/multilingual-e5-small-JapaneseTeacher")
sentences = [
"page_content='| 私わたしが困こまっていたとき、 | 兄あにがいつも助たすけてくれ(た / ました) |\n|-------------------------------------------------------------|-------------------------------------------------------------|\n| Time Clause | Main Clause |\n| When I was in trouble, my elder brother always helped [me]. | When I was in trouble, my elder brother always helped [me]. | \nとき is grammatically a noun \"時: time\" and written in Hiragana or Kanji. By using relative clauses with とき, you can express time clauses like \"when\" in English. Thus, you can directly connect sentences to とき without using conjugations. When you connect nouns and na-adjectives, you can also use の instead of the state-of-being style, e.g. 学生のとき VS. 学生だったとき. There is a certain difference in nuance between とき and ときに. When you use とき, main clauses should indicate habitual actions, ongoing states, and constant states. \n| 学生がくせいのとき、祭まつりが好すき(だった / でした)。 When [I] was a student, [I] liked festivals. |\n|---|\n| 子こ供どもだったとき、よくポケモンをして(いた / いました)。 When [I] was a child, [I] often played Pokemon. | \nWhen it comes to ときに, the に comes from the particle に which indicates specific time. Therefore, when you use ときに, main clauses should indicate non-habitual actions or one-time events. \n| 学生がくせいのときに、自じ転車てんしゃで旅行りょこう(した / しました)。 When [I] was a student, [I] traveled by bicycle. |\n|---|\n| 地じ震しんが来きたときに、泣ないて(しまった / しまいました)。 [I] (unintentionally) cried when the earthquake came. | \nSince it is a relative clause with a noun, you can substitute other words which have similar meanings to とき. For example, you can use \" 頃 ころ : (approximate) time\" which indicates a wider range of time than とき. \n| 学生がくせいの頃ころ、祭まつりが好すき(だった / でした)。 |\n|-----------------------------------------------------------------------------|\n| 学生がくせいの頃ころに、自じ転車てんしゃで旅行りょこう(した / しました)。 | \nConsidering both of the characteristics, if time clauses indicate actions which take place within a short time, とき is more suitable than 頃 and vice versa. Take a look at the following comparison. \n| 地じ震しんが起おきたときは、火ひを使つかってはいけ(ない / ません)。 => Natural (As for when earthquakes happen, you must not use fire.) |\n|---|\n| 地震が起きた頃は、火を使ってはいけ(ない / ません)。 => Wrong |\n| 学生がくせいのとき、祭まつりが好すき(だった / でした)。 => Natural |\n| 学生がくせいの頃ころ、祭まつりが好すき(だった / でした)。 => More natural |' metadata={'h3': 'とき and ときに: When'}",
"Can you give an example of how それ is used in a conversation about shoes?",
"Is there a difference between 'I'm singing' and 'I'm singing (and still singing)'?",
"What is the correct usage of とき versus 頃ころ in time expressions?"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Mohamed-Gamil/multilingual-e5-small-JapaneseTeacher")
# Run inference
sentences = [
"page_content='Do native speakers correctly use counters every time? The answer is no. We often make mistakes or intentionally use wrong ones for the sake of simplicity. \nCounter: 膳 \n<!-- 🖼️❌ Image not available. Please use `PdfPipelineOptions(generate_picture_images=True)` --> \nFor example, when you count chopsticks, the counter: 膳 ぜん is right. However, a lot of people count chopsticks with 本 ほん . Although it's not right, 本 ほん is applicable because of the form of chopsticks. What we wanted to say here is that you **don't need be a perfectionist** . Of course, it's better that you can use every counter correctly, but actually, it is a fact that most native speakers can't do it themselves.' metadata={'h3': 'Practical Usages in Reality'}",
'Are there situations where using a wrong counter is acceptable?',
'Can you give an example of a casual sentence versus a polite one in Japanese?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
ir_evaluatorInformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.6637 |
| cosine_accuracy@3 | 0.8561 |
| cosine_accuracy@5 | 0.9103 |
| cosine_accuracy@10 | 0.9475 |
| cosine_precision@1 | 0.6637 |
| cosine_precision@3 | 0.2854 |
| cosine_precision@5 | 0.1822 |
| cosine_precision@10 | 0.0949 |
| cosine_recall@1 | 0.6633 |
| cosine_recall@3 | 0.8557 |
| cosine_recall@5 | 0.9099 |
| cosine_recall@10 | 0.9475 |
| cosine_ndcg@10 | 0.8122 |
| cosine_mrr@10 | 0.7681 |
| cosine_map@100 | 0.7702 |
positive and anchor| positive | anchor | |
|---|---|---|
| type | string | string |
| details |
|
|
| positive | anchor |
|---|---|
page_content=' |
私わたしが困こまっていたとき、 |
page_content=' |
私わたしが困こまっていたとき、 |
page_content=' |
私わたしが困こまっていたとき、 |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: epochper_device_train_batch_size: 16gradient_accumulation_steps: 32learning_rate: 2e-05num_train_epochs: 20lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: Truedataloader_num_workers: 2load_best_model_at_end: Trueoptim: adamw_torch_fuseddataloader_pin_memory: Falsegradient_checkpointing: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 32eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 20max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 2dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Falsedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Truegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | ir_evaluator_cosine_ndcg@10 |
|---|---|---|---|
| -1 | -1 | - | 0.5500 |
| 1.0 | 10 | 67.2044 | 0.6343 |
| 2.0 | 20 | 49.2443 | 0.6757 |
| 3.0 | 30 | 21.3377 | 0.7179 |
| 4.0 | 40 | 8.6437 | 0.7687 |
| 5.0 | 50 | 5.8509 | 0.7862 |
| 6.0 | 60 | 5.0683 | 0.7905 |
| 7.0 | 70 | 3.6658 | 0.8006 |
| 8.0 | 80 | 3.5062 | 0.8011 |
| 9.0 | 90 | 3.0544 | 0.8055 |
| 10.0 | 100 | 2.7832 | 0.8060 |
| 11.0 | 110 | 2.743 | 0.8090 |
| 12.0 | 120 | 2.3785 | 0.8056 |
| 13.0 | 130 | 2.3046 | 0.8069 |
| 14.0 | 140 | 2.4136 | 0.8119 |
| 15.0 | 150 | 2.3528 | 0.8119 |
| 16.0 | 160 | 2.032 | 0.8115 |
| 17.0 | 170 | 2.1875 | 0.8115 |
| 18.0 | 180 | 2.0299 | 0.8124 |
| 19.0 | 190 | 2.1747 | 0.8126 |
| 20.0 | 200 | 2.1729 | 0.8122 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
intfloat/multilingual-e5-small