Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
9
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jesse-adanac/sykes-embedder2")
# Run inference
sentences = [
'"One-bedroom pet-free holiday cottage with sea view near RSPB Titchwell Marsh reserve"',
'In Focus is a one-bedroom holiday cottage that fits your criteria. It is pet-free and offers glimpses of the sea from the courtyard garden. It is also conveniently located within a 10-minute walk of the RSPB Titchwell Marsh reserve.',
'Seagate House is a luxury coastal holiday cottage in Brancaster that features a sauna and offers pet-friendly accommodations. It allows two dogs with a small additional fee.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
small_propsInformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.0 |
| cosine_accuracy@3 | 0.0 |
| cosine_accuracy@5 | 0.0 |
| cosine_accuracy@10 | 0.0 |
| cosine_precision@1 | 0.0 |
| cosine_precision@3 | 0.0 |
| cosine_precision@5 | 0.0 |
| cosine_precision@10 | 0.0 |
| cosine_recall@1 | 0.0 |
| cosine_recall@3 | 0.0 |
| cosine_recall@5 | 0.0 |
| cosine_recall@10 | 0.0 |
| cosine_ndcg@10 | 0.0 |
| cosine_mrr@10 | 0.0 |
| cosine_map@100 | 0.0 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
"Charming pet-friendly cottage with heated pool near Wells-next-the-Sea" |
The described property, Gable End, is not pet-friendly as it states "pet_friendly: False." Therefore, it does not match the query for a "Charming pet-friendly cottage with heated pool near Wells-next-the-Sea." |
"Luxury coastal cottage rental with balcony views and bird watching in Burnham Overy Staithe" |
Water Mill House (4) perfectly fits the description of a "luxury coastal cottage rental with balcony views and bird watching in Burnham Overy Staithe." This property offers stunning views over the mill pond, River Burn, and surrounding water meadows, which can be enjoyed from a private first-floor balcony. It is located on the coast road on the outskirts of the popular coastal village of Burnham Overy Staithe. The house is part of a superb conversion of a Grade II listed Georgian water mill, providing characterful and well-proportioned accommodation over three floors. The location is ideal for bird watching, as the area is frequented by a variety of coastal birds and wildlife. Additionally, the property offers a stylish and comfortable living experience, making it a luxurious choice for a coastal getaway. |
"luxury coastal holiday cottage in Brancaster with sauna and pet-friendly accommodations" |
Seagate House is a luxury coastal holiday cottage in Brancaster that features a sauna and offers pet-friendly accommodations. It allows two dogs with a small additional fee. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
per_device_train_batch_size: 4learning_rate: 2e-05fp16: Trueoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | small_props_cosine_ndcg@10 |
|---|---|---|
| -1 | -1 | 0.0 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-base-en-v1.5