Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Chandar/sv-subject-based-bert-base-uncased")
# Run inference
sentences = [
'mason-wasp\tis\tsolitary.\nThus\tthe\tprocess\tof\torganic\tevolution\tis\tfar\tfrom\tbeing\tfully\tunderstood.\tWe\tcan\nonly\tsuppose\tthat\tas\tthere\tare\tdevised\tby\thuman\tbeings\tmany\tpuzzles\tapparently\nunanswerable\ttill\tthe\tanswer\tis\tgiven,\tand\tmany\tnecromantic\ttricks\twhich\tseem\nimpossible\ttill\tthe\tmode\tof\tperformance\tis\tshown;\tso\tthere\tare\tapparently\nincomprehensible\tresults\twhich\tare\treally\tachieved\tby\tnatural\tprocesses.\tOr,\notherwise,\twe\tmust\tconclude\tthat\tsince\tLife\titself\tproves\tto\tbe\tin\tits\tultimate\nnature\tinconceivable,\tthere\tis\tprobably\tan\tinconceivable\telement\tin\tits\tultimate\nworkings.\nEND\tOF\tVOL.\tI.',
'11.4 The Flea on Schr ¨odinger’s Cat 455\nwhere M = Nm is the total mass of the system, for simplicity we assumed V to be\nanalytic (it will even be taken to be polynomial), and we abbreviated\nfk(ρ)=\n(\n− 1√\nN′\nN\n∑\nl=1\nρl\n)k\n+\nN\n∑\nn=1\n(\n√\nN′ρn − 1√\nN′\nN\n∑\nl=1\nρl\n)k\n. (11.22)\nNote that f1(ρ)= 0, so that to lowest order (i.e. k = 2) we have\nhAE (Q,ρ)=\n(\n1\n2 N\nN\n∑\nn=1\nρ2\nn −\nN\n∑\nk̸=l\nρkρl\n)\nV ′′(Q)+ ··· (11.23)\nWe pass to the corresponding quantum-mechanical Hamiltonians in the usual way,\nand couple a two-level quantum system to the apparatus through the Hamiltonian\nhSA = μ·σ3 ⊗P, (11.24)\nwhere the object observable s = σ3, acting on HS = C2, is to be measured. The idea\nis that hA is the Hamiltonian of a pointer that registers outcomes by localization on\nthe real line, hE is the (free) Hamiltonian of the “environment”, realized as the in-\nternal degrees of the freedom of the total apparatus that are not used in recording\nthe outcome of the measurement, and hAE describes the pointer-environment inter-\naction. The classical description of the apparatus then involves two approximations:\n• Ignoring all degrees of freedom except those of A, which classically are (P,Q);\n• Taking the classical limit of hA, here realized as N →∞ (in lieu of ¯h →0).\nThe measurement ofs is now expected to unfold according to the following scenario:\n1. The apparatus is initially in a metastable state (this is a very common assump-\ntion), whose wave-function is e.g. a Gaussian centered at the origin.\n2. If the object state is “spin up”, i.e., ψS =( 1,0), then it kicks the pointer to the\nright, where it comes to a standstill at the bottom of the double well. If spin is\ndown, likewise to the left. IfψS =( 1,1)/\n√\n2, the pointer moves to a superposition\nof these, which is close to the ground state of V displayed in Figure 11.2.\n3. In the last case, the Flea mechanism of §10.2 comes into play: tiny asymmetric\nperturbations irrelevant for small N localize the ground state as N →∞.\n4. Mere localization of the ground state of the perturbed (apparatus) Hamiltonian in\nthe classical regime is not enough: there should be a dynamical transition from\nthe ground state of the original (unperturbed) Hamiltonian (which has become\nmetastable upon perturbation) to the ground state of the perturbed one. This dy-\nnamical mechanism in question should also recover the Born rule.\nThus the classical description of the apparatus is at the same time the root of the\nmeasurement problem and the key to its solution: it creates the problem because at\nfirst sight a Schr¨odinger Cat state has the wrong classical limit (namely a mixture),\nbut it also solves it, because precisely in the classical limit Cat states are destabilized\neven by the tiniest (asymmetric) perturbations and collapse to the “right” states.',
'tration,p.115).Thebaby\'sparentshavealreadyattainedtheirfullgrowth.Theamountoffoodthatapersonneedstomakeupfortheheatradiatedfromthesurfaceofthebodyvariesw^iththesizeandalsow^iththeshapeofthebody(seeillustrationopposite).Thesmallerachild,themoresurfacehehasinproportiontohisbodyweight,andhencehelosesrelativelymoreheat.Byactualmeasurement,aone-year-oldchildneedsapproximatelytwiceasmuchenergyperpoundofbody-weightasdoesanadult.Energyneedsareindirectlyrelatedtosex.Girlsandwomenhaveathickerlayeroffattytissuebeneaththeskinthanboysandmen.Thisfatpreventsrapidradiationofheatfromthebody.Itisinterestingtorecallthatmostlong-distanceswimmingrecordsareheldbywomenratherthanbymen.Exposurealsoaffectsthebody\'slossofheat.Thebodylosesheatfasterinacold,dry,windyclimatethaninawarm,moistclimate.Cloth-ingandshelterare,ofcourse,factorsinthelossofheat.Circulationoftheblood,breathing,andotherprocessesarecontinuallygoingonwhenthebodyisatrest."Warm-blooded"animalsmaintainaconstanttemperature.Theheatcontinuallyradiatingfromthesurfaceisconstantlybeingreplaced.Muscularmovementsarecontinuallytakingplaceinthedigestiveorgans,andenergyisusedinvariousotherwayswithinthebody.From40to50percentofthebodyismadeupofmus-culartissue.Thebulkofthistissueisattachedtotheskeletonandisusedinstandingaswellasinlocomotionandothervoluntaryactions.Atalltimes,evenwhenthesemusclesarerelaxed,energyisusedinkeepingthemsomewhatonthestretch.AbovetheBaseLineTheamountofenergythatthebodyuses,evenwhileitis"doingnothing",isconstantlyinfluencedbytwosetsoffactors.Digestingfoodinvolvesameasurableamountofenergy.Thusthebodyusesabout6percentmoreenergysoonafterameal,whenthedigestiveorgansaremostactive,thanjustbeforeameal,whendigestionispracti-callyatastandstill.Whenyouaresittingandreading,orwhenyouarestandingquietly,yourbodyusesaboutone-and-a-thirdtimesasmuchenergyasitdoeswhilesleeping.Walkingatamoderatepaceusesabouttwo-and-a-halftimesasmuch;runningusesaboutseventimesandstair-climbingaboutfifteentimesasmuch.UnitofEnergyTomeasuretheenergyexpendedbythelivingbody,weuseaunitdevelopedbyengineers.ThisistheCalorie(Cal),and,likethemorefamiliarfoot-pound(ft-lb)usedinmeasuringwork,itiscomposedoftwofactors.Wemeasureworkasifitalwaysconsistedofsomequantityofmatter(pounds)movingacertaindistance(feet).Inasimilarwaywemeasureheatasaquantityofmatter,forexample,1kilogram(kg)ofwater,beingheatedacertain"distance"(1degreeonthecentigradescale).116',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
BinaryClassificationEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.7932 |
| cosine_accuracy_threshold | 0.8014 |
| cosine_f1 | 0.6447 |
| cosine_f1_threshold | 0.7684 |
| cosine_precision | 0.6414 |
| cosine_recall | 0.648 |
| cosine_ap | 0.7217 |
| cosine_mcc | 0.4692 |
sentence1, sentence2, and label| sentence1 | sentence2 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence1 | sentence2 | label |
|---|---|---|
mason-wasp is solitary. |
difficulties which my own hypothesis avoids? If, as I have argued, the germ- |
1 |
mason-wasp is solitary. |
RNA Editing in Trypanosomes |
1 |
mason-wasp is solitary. |
plane subtends with the axis an angle of 89° 59′, we have an ellipse which no |
1 |
CoSENTLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "pairwise_cos_sim"
}
per_device_train_batch_size: 16per_device_eval_batch_size: 32learning_rate: 2e-05weight_decay: 0.01max_steps: 2000overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3.0max_steps: 2000lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | cosine_ap |
|---|---|---|---|
| -1 | -1 | - | 0.7217 |
| 0.0071 | 500 | 0.4089 | - |
| 0.0142 | 1000 | 0.007 | - |
| 0.0213 | 1500 | 0.0 | - |
| 0.0285 | 2000 | 0.0 | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}
Base model
google-bert/bert-base-uncased