Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("AShi846/all-MiniLM-L6-v2_rag_ft_e-3")
# Run inference
sentences = [
'The data contains information about submissions to a prestigious machine learning conference called ICLR. Columns:\nyear, paper, authors, ratings, decisions, institution, csranking, categories, authors_citations, authors_publications, authors_hindex, arxiv. The data is stored in a pandas.DataFrame format. \n\nCreate two fields called has_top_company and has_top_institution. The field has_top_company equals 1 if the article contains an author in the following list of companies ["Facebook", "Google", "Microsoft", "Deepmind"], and 0 otherwise. The field has_top_institution equals 1 if the article contains an author in the top 10 institutions according to CSRankings.',
"Recall that, in the Hedge algorithm we learned in class, the total loss over time is upper bounded by $\\sum_{t = 1}^T m_i^t + \\frac{\\ln N}{\\epsilon} + \\epsilon T$. In the case of investments, we want to do almost as good as the best investment. Let $g_i^t$ be the fractional change of the value of $i$'th investment at time $t$. I.e., $g_i^t = (100 + change(i))/100$, and $p_i^{t+1} = p_i^{t} \\cdot g_i^t$. Thus, after time $T$, $p_i^{T+1} = p_i^1 \\prod_{t = 1}^T g_i^t$. To get an analogous bound to that of the Hedge algorithm, we take the logarithm. The logarithm of the total gain would be $\\sum_{t=1}^T \\ln g_i^t$. To convert this into a loss, we multiply this by $-1$, which gives a loss of $\\sum_{t=1}^T (- \\ln g_i^t)$. Hence, to do almost as good as the best investment, we make our cost vectors to be $m_i^t = - \\ln g_i^t$. Now, from the analysis of Hedge algorithm in the lecture, it follows that for all $i \\in [N]$, $$\\sum_{t = 1}^T p^{(t)}_i \\cdot m^{(t)} \\leq \\sum_{t = 1}^{T} m^{(t)}_i + \\frac{\\ln N}{\\epsilon} + \\epsilon T.$$ Taking the exponent in both sides, We have that \\begin{align*} \\exp \\left( \\sum_{t = 1}^T p^{(t)}_i \\cdot m^{(t)} \\right) &\\leq \\exp \\left( \\sum_{t = 1}^{T} m^{(t)}_i + \\frac{\\ln N}{\\epsilon} + \\epsilon T \\right)\\\\ \\prod_{t = 1}^T \\exp( p^{(t)}_i \\cdot m^{(t)} ) &\\leq \\exp( \\ln N / \\epsilon + \\epsilon T) \\prod_{t = 1}^T \\exp(m^t_i) \\\\ \\prod_{t = 1}^T \\prod_{i \\in [N]} (1 / g_i^t)^{p^{(t)}_i} &\\leq \\exp( \\ln N / \\epsilon + \\epsilon T) \\prod_{t = 1}^{T} (1/g^{(t)}_i) \\end{align*} Taking the $T$-th root on both sides, \\begin{align*} \\left(\\prod_{t = 1}^T \\prod_{i \\in [N]} (1 / g_i^t)^{p^{(t)}_i} \\right)^{(1/T)} &\\leq \\exp( \\ln N / \\epsilon T + \\epsilon ) \\left( \\prod_{t = 1}^{T} (1/g^{(t)}_i) \\right)^{(1/T)}. \\end{align*} This can be interpreted as the weighted geometric mean of the loss is not much worse than the loss of the best performing investment.",
'1',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
Assume that your team is discussing the following java code: |
D(cat,dog)=2 |
0.1 |
If several elements are ready in a reservation station, which |
Obama SLOP/1 Election returns document 3 Obama SLOP/2 Election returns documents 3 and T Obama SLOP/5 Election returns documents 3,1, and 2 Thus the values are X=1, x=2, and x=5 Obama = (4 : {1 - [3}, {2 - [6]}, {3 [2,17}, {4 - [1]}) Election = (4: {1 - [4)}, (2 - [1, 21), {3 - [3]}, {5 - [16,22, 51]}) |
0.1 |
If process i fails, then eventually all processes j≠i fail |
No, it is almost certain that it would not work. On a |
0.1 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
sentence-transformers/all-MiniLM-L6-v2