Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
This is a sentence-transformers model finetuned from hkunlp/instructor-xl. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'T5EncoderModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': False})
(2): Dense({'in_features': 1024, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ahmedHamdi/ir-fr-en-instructor-xl")
# Run inference
sentences = [
"Represent the plot: Three young women on holiday in the Balearic Islands, relaxing by the sea, meet four boys who invite them to a party on a yacht. On board, the party is in full swing, fueled by alcohol, drugs, and sex. At the height of their climax, one of the boys kills Lisa, his sexual partner, by breaking her neck. The boys tell the girls it was an accident, but one of them saw Josh specifically hit his girlfriend. At first, they all decide to return to shore, but after some thought, the boys decide it's best to throw the girl's body overboard to avoid any trouble. The situation spirals out of control, and a desperate fight for survival begins on board the yacht, with passengers dying one after another. Only one girl, Tammy, survives and escapes in a lifeboat.",
"Represent the plot: While on a holiday in Mallorca, Lisa, Kim, and Tammi meet four young men, Bluey, Josh, Sean, and Marcus. After spending the day at the resort together, the girls are invited to the men's yacht, where they plan to party out at sea. While aboard the boat, they take drugs and the conversation turns to sex, and in particular, types of sexual acts. Bluey describes a sex act called a donkey punch which involves punching the woman in the back of the head while having doggy style sex in order to increase the sexual pleasure for the man. Marcus, Bluey, Kim, and Lisa go to the master bedrooms, where they begin having drug-fuelled sex. They are watched by Josh who, known to all involved, lingers furtively in the darkness recording the action with a camera. Bluey, who is copulating with Lisa, asks Josh to film the action and then both of them beckon Josh to have sex with Lisa. Immediately prior to ejaculation and with Bluey's encouragement, Josh donkey punches Lisa but uses excessive force, breaking her neck and killing her instantly. To cover up the incident, the men decide to throw the body overboard while the women want to report it to the authorities, and argument ensues about what to do with the tape. Bluey continually insults Tammi and in a fit of rage, she stabs him in the chest with a knife, and the women escape in the yacht's tender. However, the girls soon realise that the tender's outboard motor is missing (a cut scene shows it still attached to the yacht). In a fit of despair, Tammi fires a flare, attracting the attention of the men. They quickly locate and pick up the women. As the men attempt to get the women aboard, threatening them with a shotgun, Kim shoots a second flare directly at Marcus. The flare explodes into Marcus's torso and slowly burns him to death. Josh locks the recaptured women in one of the rooms below. Sean asks Josh to call in and request medical assistance for Bluey. However, knowing that Bluey still has the tape that contains footage of him dealing the earlier fatal blow, Josh instead decides to discover its whereabouts by torturing Bluey. He does this by withholding pain numbing drugs from him. Bluey reveals the location of the tape, beneath the bed in the state room. Tammi escapes the room by smashing through the glass door and cutting herself, and overhears Bluey mention where the tape is. She frantically roams the boat trying to locate it. She does this just moments before Josh attempts to retrieve it. Unable to find the tape, Josh returns to Bluey, stepping up the torture by turning his attention to the knife which still protrudes from the wound. Josh ultimately takes this too far and, following Bluey's pained protestations that he has already revealed the location, he twists the knife further into the wound before pulling it out, causing Bluey to die. Sean tries to bring the situation under control by retrieving a shotgun from Josh. He then tries to calm both Josh and Tammi from a frantic argument about the whereabouts of the tape. Kim notices Sean holding a shotgun with Josh and Tammi visibly upset from an adjacent room. She misinterprets his intentions towards Tammi and Josh, and brutally kills Sean with the propeller of an outboard motor. After realizing her mistake, she manically commits suicide by jumping overboard, leaving Josh and Tammi as the only ones left. A distraught Tammi decides that she cannot remain on the boat any longer. Josh agrees, and readies the tender. His plan is to leave the yacht, get back to shore and claim that there was an accident. As Josh pushes the tender away from the yacht, Tammi quietly takes hold of the end of the mooring rope at the stern. Once the tender has floated some distance from the yacht, Josh pulls his hunting knife from his shorts and points it at Tammi. He demands the incriminating tape from Tammi and, fearing for her life, she obliges, throwing it on the floor of the tender. As a distracted Josh reaches for the tape, Tammi quickly throws the looped end of the mooring rope around his neck. The rope immediately reaches the end of its tether and Josh is wrenched into the sea, snapping his neck and killing him. Tammi, now the sole survivor, fires a distress flare in the hope of being rescued. She then lies down on the tender and morbidly stares up at the night sky, as the raft drifts away into the ocean.",
"Represent the plot: Marla is a formerly free-spirited girl who has grown up to be responsible yet overprotective in order to care for her brother Charlie, who has grown lonely and disconnected from her after the death of their parents in a car accident. One night, Charlie sneaks out to visit a toy museum with a Playmobil exhibit. After Marla arrives and tells Charlie off for running away, a lighthouse illuminates them and transports them to the Playmobil world. Marla and Charlie –who has been transformed into a Viking warrior– find themselves in the middle of a viking battle, and Charlie helps them until he is kidnapped by a group of pirates. Frantic to find her brother, Marla goes to the nearest town hoping to ask for help, and runs into Del, the driver of a food truck whose client refuses to pay him over pink hay that causes the town's horses to sprout wings. As Marla tries to form a posse to find Charlie, Del gets her out of trouble when she shows Viking gold to the whole town, and agrees to help Marla find her brother in exchange for the gold. Marla and Del run into Rex Dasher, a secret agent and an old friend of Del. Rex explains that several characters have disappeared, and the group sneaks into a villainous spy headquarters to find information about the disappearances. Despite some issues, they successfully gather the data and escape, but Rex is later captured by the pirates. He is taken away to Constantinopolis and finds Charlie, who had been locked up with other characters by Emperor Maximus, who intends to have the prisoners fight to their deaths. Rex tells Charlie that Marla had been looking for him, which encourages Charlie to break away. However, he later allows himself to be recaptured so the other characters could escape. Del recognizes that a device used by the pirates belongs to Glinara, an alien crime lord. After meeting with her in exchange for information, Del offers to pay twice as much as he owes her. Glinara agrees and reveals that she sold the device to Maximus. However, Del is unable to uphold his end of the bargain, as Marla only had two pieces of gold left. Angered, Glinara captures them and attempts to drop them into a portal, but they are spared by Glinara's robot servant Robotitron, who hacks the portal and drops the group into a forest. Del leaves the group, upset by Marla's deception. Marla and Robotitron get lost in the forest until Marla accidentally hits a fairy godmother, who encourages her to continue her search and sends her to Constantinopolis. Arriving in the city, Marla reaches a coliseum where Charlie is about to fight a Tyrannosaurus rex. Charlie and Marla work together to fight off the T-Rex but to no avail. Del soon arrives with his food truck, and Marla uses the last of Del's pink hay to turn the T-Rex harmless. An enraged Maximus orders his guards to arrest them, but the guards reveal themselves to be Rex and the missing warriors, who then lock Maximus inside a cage. As everyone celebrates their victory, Marla and Charlie use the T-Rex to fly back to the lighthouse and return to the real world, where it is revealed that they were missing for only five minutes. On good terms, Marla promises Charlie that their relationship will be mended.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8384, 0.0192],
# [0.8384, 1.0000, 0.0579],
# [0.0192, 0.0579, 1.0000]])
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
Represent the plot: In 1943, near the village of Sobibor in the General Government of Poland, a train carrying Jewish deportees stopped at the entrance to the Sobibor extermination camp. This camp was one of the components of Operation Reinhard. The deportees were relieved of their belongings. Those who declared they had a profession deemed useful were separated from the others. A group of prisoners organized an uprising and a large-scale escape. Four hundred people escaped from the camp. One hundred people perished during the escape. The local population killed or handed over approximately 150 people to the Germans. As a testament to the failure of the Third Reich, the Sobibor camp was dismantled under orders from the German High Command. The camp commandant, SS-Oberscharführer Karl August Frenzel, was sentenced to life imprisonment. He died in 1943. Alexander Pechersky crossed the front lines and fought until the end of the war. He died at the age of 80. His heroic act was never reco... |
Represent the plot: The film is based on the Sobibor revolt which occurred in 1943 in German-occupied Poland. The main character of the movie is the Jewish-Soviet soldier Alexander Pechersky, who was a lieutenant in the Red Army. In October 1943, he was deported to the Sobibor death camp, where Jews were being exterminated in gas chambers. In just three weeks, Pechersky planned an uprising with prisoners from Poland and other locations around Western Europe. This uprising was partly successful, allowing roughly 300 prisoners to escape, of whom roughly 60 survived the war. |
Represent the plot: Eric Vincent, in his thirties, leads an uneventful life with his wife, Audrey. While the couple is discussing having a child, Eric is contacted by a friend of his biological father, whom he has never met. Eric learns that his father is dead and that he can come and collect his ashes if he wishes. Initially reluctant, he eventually agrees. This sets off a chain of events that will unfold, a series of intertwined stories and destinies... |
Represent the plot: Eric Vincent is in his thirties and lives an uneventful life with his wife Audrey. They are talking about having a child. Eric is contacted by a friend of his biological father he never knew about the latter's death. Eric can come and collect his father's ashes. He is reluctant but then accepts. |
Represent the plot: Marla Brenner (Anya Taylor-Joy) is an 18-year-old girl who lives with her younger brother Charlie (Gabriel Bateman) in a comfortable home in Montreal. Like Charlie, she loves Playmobil toys, and they are always inventing new stories. They especially enjoy playing knights and Vikings battling Roman soldiers. One sunny day, she is overjoyed because she has received her passport and ID card. She dreams of filling it out and exploring the world. She promises Charlie she will take him with her. But moments later, two police officers ring the Brenners' doorbell. Marla opens the door, and they announce that her parents have just been in a car accident. They did not survive… Four years later, Marla has given up on her dreams and is doing her best to take care of the house and her brother Charlie, who is now 10 years old. He hasn't lost his childlike spirit, but no longer recognizes his older sister who has "grown up" much too quickly. |
Represent the plot: Marla is a formerly free-spirited girl who has grown up to be responsible yet overprotective in order to care for her brother Charlie, who has grown lonely and disconnected from her after the death of their parents in a car accident. One night, Charlie sneaks out to visit a toy museum with a Playmobil exhibit. After Marla arrives and tells Charlie off for running away, a lighthouse illuminates them and transports them to the Playmobil world. Marla and Charlie –who has been transformed into a Viking warrior– find themselves in the middle of a viking battle, and Charlie helps them until he is kidnapped by a group of pirates. Frantic to find her brother, Marla goes to the nearest town hoping to ask for help, and runs into Del, the driver of a food truck whose client refuses to pay him over pink hay that causes the town's horses to sprout wings. As Marla tries to form a posse to find Charlie, Del gets her out of trouble when she shows Viking gold to the whole town, and... |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.2458 | 500 | 0.0884 |
| 0.4916 | 1000 | 0.0271 |
| 0.7375 | 1500 | 0.0235 |
| 0.9833 | 2000 | 0.0259 |
| 1.2291 | 2500 | 0.0129 |
| 1.4749 | 3000 | 0.0096 |
| 1.7207 | 3500 | 0.0062 |
| 1.9666 | 4000 | 0.0086 |
| 2.2124 | 4500 | 0.0051 |
| 2.4582 | 5000 | 0.0029 |
| 2.7040 | 5500 | 0.0038 |
| 2.9499 | 6000 | 0.0035 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
hkunlp/instructor-xl