Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'What is required to access soliton states in terms of the pump laser frequency tuning direction?',
'microcomb into a THz wave. The optical power of the auxiliary light is monitored by a third photodiode (not shown in Fig. 1(a)). The frequency of the auxiliary laser is tuned into its resonance from the blue detuned side and fixed on the blue side of the resonance. Note that, the auxiliary laser is free-running without feedback control on the laser frequency or the power during the soliton generation. By optimizing the laser detuning and optical power of the auxiliary laser, soliton states can be accessed by slowly tuning the pump laser frequency into a soliton regime from the blue detuned side. A detailed description of the soliton generation process can be found in the Ref. [21].',
'To gain further insights into the CW and CCW fields, we define modified detuning as the frequency difference between the pump laser and the XPM shifted resonance, given by \n$$\n\\Delta\\omega_{\\mathrm{mod,CW}}=\\Delta\\omega-(2-f_{R})P_{\\mathrm{CCW}}\n$$ \n$$\n\\Delta\\omega_{\\mathrm{mod,CCW}}=\\Delta\\omega-(2-f_{R})P_{\\mathrm{CW}}\n$$ \nwhere $P_{\\mathrm{CCW}}={\\overline{{|B|^{2}}}}$ and $P_{\\mathrm{CW}}={\\overline{{|A|^{2}}}}$ are the average power in the corresponding directions. By substituting the modified detunings into Eqs. (4) and (5), the equations become a form similar to the unidirectionally driven Lugiato-Lefever equation [31]. \nFor a general model with unidirectional pump, the soliton peak power is mainly determined by the cold cavity detuning [31,32]. A larger detuning corresponds to a higher soliton peak power. As illustrated in Fig. 4, if the CW direction has a larger pump power than the CCW direction, the CW intracavity power will be higher, i.e., $P_{\\mathrm{CW}}{>}P_{\\mathrm{CCW}}$ . Thus the modified detuning $\\Delta\\omega_{\\mathrm{mod,CW}}{>}\\Delta\\omega_{\\mathrm{mod,CCW}}$ . As a consequence, the soliton peak power in the CW direction becomes larger than that in the CCW direction. Therefore, by tuning the MZI to change the pump splitting ratio, different modified detunings are introduced in the CW and CCW directions through the XPM effect, leading to different soliton peak power in the two directions. The evolution of the average intracavity power and soliton peak power shown in Figs. 3(i) and 3(j) is consistent with the above analysis. To further validate the conclusion, we run simulations with different pump spliting ratios and calculate the modified detunings. Figure 5(a) illustrates the relationship between the pump splitting ratio and the modified detuning. Figure 5(b) illustrates the relationship between the soliton peak power and the modified detuning. Due to symmetry, the curves are degenerated in the CW and CCW directions. \nIt has been known that the soliton group velocity and repetition rate can be changed due to Raman induced soliton self-frequency shift [17]. Therefore, the different soliton peak power in the CW and CCW directions will cause different Raman self-frequency shifts and different soliton repetition rates. Theoretically, the normalized repetition rate difference is related to the detuning by [17] \n$$',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5185, 0.1030],
# [0.5185, 1.0000, 0.2032],
# [0.1030, 0.2032, 1.0000]])
sentence_0, sentence_1, and sentence_2| sentence_0 | sentence_1 | sentence_2 | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| sentence_0 | sentence_1 | sentence_2 |
|---|---|---|
How does the behavior of front solutions differ between high and low drive powers in a normal dispersion Kerr resonator without spectral filtering? |
Here, we develop some insight into the difference between dark solitons in the LLE and LLE-F as well as into the formation of chirped-pulse solitons unique to the LLE-F. For fixed detuning, Fig. 7 indicates that the drive power is a suitable parameter for traversing between the different solution types. We therefore examine the variation of steady-state solutions along the dashed line in Fig. 7. For a normal dispersion Kerr resonator without spectral filtering, front solutions (also known as domain walls or switching waves) often move in the reference frame of the driving field [19,54,55,66]. To examine the moving properties of front solutions, we initialize the simulation with a two-front intensity variation in the time domain. The equation is numerically solved with this initial condition and examined as a function of propagation distance until the waveform converges. At large drive powers without a filter [Fig. 8(a) and a from Fig. 7], the front solutions move together and vanish to... |
Dissipative solitons are self-localised structures resulting from the double balance of dispersion by nonlinearity and dissipation by a driving force arising in numerous systems. In Kerr-nonlinear optical resonators, temporal solitons permit the formation of light pulses in the cavity and the generation of coherent optical frequency combs. Apart from shape-invariant stationary solitons, these systems can support breathing dissipative solitons exhibiting a periodic oscillatory behaviour. Here, we generate and study single and multiple breathing solitons in coherently driven microresonators. We present a deterministic route to induce soliton breathing, allowing a detailed exploration of the breathing dynamics in two microresonator platforms. We measure the relation between the breathing frequency and two control parameters--pump laser power and effective-detuning--and observe transitions to higher periodicity, irregular oscillations and switching, in agreement with numerical predictions.... |
What are the key advantages of microcombs that make them suitable for portable applications? |
Microresonator based optical frequency comb (often termed "microcomb" or "Kerr comb) generation was first demonstrated in 2007 [1]. It quickly attracted people's great interest and evolved to a hot research area. Microcombs are very promising for portable applications because they have many unique advantages including the capability of generating ultra-broad comb spectra (even more than one octave [2,3]), chip-level integration [4,5], and low power consumption. The basic scheme of microcomb generation is shown in Fig. 1(a). The frequency of a pump laser is tuned into the resonance of one high-quality-factor $(\boldsymbol{Q})$ microresonator which is made of Kerr nonlinear material. When the pump power exceeds some threshold, new frequency lines grow due to parametric gain. More lines are generated through cascaded four-wave mixing between the pump and initial lines, forming a broad frequency comb [6]. Intense studies have been performed to investigate microcomb generation. Various mate... |
We briefly review the physics of the parametric process in microresonators, discussed in detail in (30, 80). Kerr frequency combs were initially discovered in silica microtoroids, and experiments proved that the parametrically generated (11, 81) sidebands were equidistant to at least one part in $10^{-17}$ as compared with the optical carrier. In these early experiments, the combs repetition rate was in the terahertz range, and a femtosecond-laser frequency comb was used to bridge and verify the equidistant nature of the teeth spacing. It is today understood that such highly coherent combs only exist in certain regimes. |
What is the formula for determining the number of rolls that appear in the azimuthal direction when the cavity is pumped just above the threshold of modulational instability? |
Roll patterns emerge from noise after the breakdown of an unstable flat background through modulational instability, when the resonator is pumped above a certain threshold. This mechanism preferably occurs in the regime of anomalous GVD, but, however, rolls can also be sustained in the normal GVD regime, although under very marginal conditions (typically, very large detuning, see refs. [9, 18, 47]). |
$$ |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
num_train_epochs: 5fp16: Truemulti_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 2.7174 | 500 | 0.2217 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-small-en-v1.5