Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use Jrinky/snowflake with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Jrinky/snowflake")
sentences = [
"What aspect of human relationship to nature is omitted from the text",
"There are a few good ones, though. Here are the best WWE apps and WWE games for Android! The first five are the best games...\nGo Android Apps (blog)\nThe Best Themes for Android Free Download: Hi friend we are again back with our new top ten best free themes for android list. This article is especially dedicated for those persons who want to make their smartphone...\nParagon Software has created an app for Android that allows your device to natively read partitions in file systems that Android normally can't handle, such as Microsoft's NTFS, allowing immediate and easy use of... While the Sentio Desktop app can be used on its own, it was primarily meant to complement Sentio's Superbook, a crowdfunded laptop shell for Android smartphones and tablets that's just entering production after...\n... phone then GBWhatsapp is the app for you. GBWhatsapp is basically similar to Whatsapp+ in terms of features. The newest available version right now is GBWhatsapp 6.40 APK for Android devices.",
"A true entertainer. date city state venue 11/23/2012 West Palm Beach FL Kravis Center 11/24/2012 Sarasota FL Van Wezel Performing Arts Hall 11/25/2012 Clearwater FL Capitol Theatre 11/29/2012 Durham NC Durham Performing Arts Center 12/1/2012 Atlantic City NJ Trump Taj Mahal 12/2/2012 Staten Island NY St. George Theatre 12/4/2012 Bethlehem PA Musikfest Cafe 12/5/2012 Verona NY Turning Stone Casino 12/6/2012 Stamford CT Palace Theatre Stamford 12/8/2012 Shippensburg PA Luhrs Center 12/9/2012 Boston MA Wilbur Theatre 12/11/2012 Greensburg PA The Palace Theatre 12/12/2012 Easton MD Avalon Theatre 12/15/2012 Saint Charles IL Arcada Theater 12/16/2012 Milwaukee WI Potawatomi Bingo Casino 12/18/2012 Beaver Creek CO Vilar Performing Arts Center 12/20/2012 Chandler AZ Ovations Live!",
"The reader will gain a better understanding of the direction nature and culture is heading today by learning how connections were made in the past. It omits that which Raymond Williams called \"a working landscape\" -- the most intimate human relationship to nature which is people who live and work on it."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l-v2.0. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Jrinky/snowflake")
# Run inference
sentences = [
'Why is it important to keep moving over the summer',
"It's important to keep moving over the summer!",
'2008. CHENG HF, LEE YM, Chu CH, Leung WK & Mok TMY. - Journal Editor (Hong Kong Medical Journal) 2008\n- Editor-in-Chief (Hong Kong Dental Journal) 2007\n- Editor-in-Chief (Hong Kong Dental Journal) 2006\n- Deputy Editor (Hong Kong Dental Journal) 2004',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
What might have been unnecessary if better emergency plans had been implemented |
If better emergency plans had been in place, maybe chemical dipersants wouldn't be needed. And on and on. |
What was the year of publication for the 3rd Edition of 'Regular Polytopes' by H.S.M. Coxeter |
Coxeter, Regular Polytopes, 3rd Edition, Dover New York, 1973 |
Who is the author of the GURPS Shapeshifters supplement |
GURPS Shapeshifters () is a supplement by Robert M. Schroeck for the GURPS role-playing game system, third edition. |
selfloss.Infonce with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
What impressive achievements did the Warriors accomplish during their last season in Division III |
The Warriors were among the most lethal offensive teams in Division III this past year, posting a team batting average of .344 and averaging nearly seven runs per game, smacking 29 home runs, and collecting nearly 600 total bases. They shared the Little East Conference regular-season championship and later knocked off the top seed in the NCAA regional tournament (Montclair State) en route to their winningest season in 14 years. |
How many bars had nectar and capped honey on them |
Eight of the bars had nectar and capped honey on them. There are eighteen bars with brood in some form on them and a mix of workers and drones. |
What idea is being requested regarding the 'triangle' |
Next up...the "triangle". Please, seriously, if anyone could float me an idea, I would really appreciate it. |
selfloss.Infonce with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 3per_device_eval_batch_size: 3learning_rate: 5e-06num_train_epochs: 5warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 3per_device_eval_batch_size: 3per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-06weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Truedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 0.0777 | 150 | 0.0257 | 0.0134 |
| 0.1554 | 300 | 0.0136 | 0.0082 |
| 0.2332 | 450 | 0.0079 | 0.0062 |
| 0.3109 | 600 | 0.0065 | 0.0051 |
| 0.3886 | 750 | 0.0059 | 0.0045 |
| 0.4663 | 900 | 0.0057 | 0.0040 |
| 0.5440 | 1050 | 0.0064 | 0.0037 |
| 0.6218 | 1200 | 0.005 | 0.0034 |
| 0.6995 | 1350 | 0.0052 | 0.0034 |
| 0.7772 | 1500 | 0.0041 | 0.0032 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
Snowflake/snowflake-arctic-embed-l-v2.0
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Jrinky/snowflake") sentences = [ "What aspect of human relationship to nature is omitted from the text", "There are a few good ones, though. Here are the best WWE apps and WWE games for Android! The first five are the best games...\nGo Android Apps (blog)\nThe Best Themes for Android Free Download: Hi friend we are again back with our new top ten best free themes for android list. This article is especially dedicated for those persons who want to make their smartphone...\nParagon Software has created an app for Android that allows your device to natively read partitions in file systems that Android normally can't handle, such as Microsoft's NTFS, allowing immediate and easy use of... While the Sentio Desktop app can be used on its own, it was primarily meant to complement Sentio's Superbook, a crowdfunded laptop shell for Android smartphones and tablets that's just entering production after...\n... phone then GBWhatsapp is the app for you. GBWhatsapp is basically similar to Whatsapp+ in terms of features. The newest available version right now is GBWhatsapp 6.40 APK for Android devices.", "A true entertainer. date city state venue 11/23/2012 West Palm Beach FL Kravis Center 11/24/2012 Sarasota FL Van Wezel Performing Arts Hall 11/25/2012 Clearwater FL Capitol Theatre 11/29/2012 Durham NC Durham Performing Arts Center 12/1/2012 Atlantic City NJ Trump Taj Mahal 12/2/2012 Staten Island NY St. George Theatre 12/4/2012 Bethlehem PA Musikfest Cafe 12/5/2012 Verona NY Turning Stone Casino 12/6/2012 Stamford CT Palace Theatre Stamford 12/8/2012 Shippensburg PA Luhrs Center 12/9/2012 Boston MA Wilbur Theatre 12/11/2012 Greensburg PA The Palace Theatre 12/12/2012 Easton MD Avalon Theatre 12/15/2012 Saint Charles IL Arcada Theater 12/16/2012 Milwaukee WI Potawatomi Bingo Casino 12/18/2012 Beaver Creek CO Vilar Performing Arts Center 12/20/2012 Chandler AZ Ovations Live!", "The reader will gain a better understanding of the direction nature and culture is heading today by learning how connections were made in the past. It omits that which Raymond Williams called \"a working landscape\" -- the most intimate human relationship to nature which is people who live and work on it." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4]