SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'What is required to access soliton states in terms of the pump laser frequency tuning direction?',
    'microcomb into a THz wave. The optical power of the auxiliary light is monitored by a third photodiode (not shown in Fig. 1(a)). The frequency of the auxiliary laser is tuned into its resonance from the blue detuned side and fixed on the blue side of the resonance. Note that, the auxiliary laser is free-running without feedback control on the laser frequency or the power during the soliton generation. By optimizing the laser detuning and optical power of the auxiliary laser, soliton states can be accessed by slowly tuning the pump laser frequency into a soliton regime from the blue detuned side. A detailed description of the soliton generation process can be found in the Ref. [21].',
    'To gain further insights into the CW and CCW fields, we define modified detuning as the frequency difference between the pump laser and the XPM shifted resonance, given by  \n$$\n\\Delta\\omega_{\\mathrm{mod,CW}}=\\Delta\\omega-(2-f_{R})P_{\\mathrm{CCW}}\n$$  \n$$\n\\Delta\\omega_{\\mathrm{mod,CCW}}=\\Delta\\omega-(2-f_{R})P_{\\mathrm{CW}}\n$$  \nwhere $P_{\\mathrm{CCW}}={\\overline{{|B|^{2}}}}$ and $P_{\\mathrm{CW}}={\\overline{{|A|^{2}}}}$ are the average power in the corresponding directions. By substituting the modified detunings into Eqs. (4) and (5), the equations become a form similar to the unidirectionally driven Lugiato-Lefever equation [31].  \nFor a general model with unidirectional pump, the soliton peak power is mainly determined by the cold cavity detuning [31,32]. A larger detuning corresponds to a higher soliton peak power. As illustrated in Fig. 4, if the CW direction has a larger pump power than the CCW direction, the CW intracavity power will be higher, i.e., $P_{\\mathrm{CW}}{>}P_{\\mathrm{CCW}}$ . Thus the modified detuning $\\Delta\\omega_{\\mathrm{mod,CW}}{>}\\Delta\\omega_{\\mathrm{mod,CCW}}$ . As a consequence, the soliton peak power in the CW direction becomes larger than that in the CCW direction. Therefore, by tuning the MZI to change the pump splitting ratio, different modified detunings are introduced in the CW and CCW directions through the XPM effect, leading to different soliton peak power in the two directions. The evolution of the average intracavity power and soliton peak power shown in Figs. 3(i) and 3(j) is consistent with the above analysis. To further validate the conclusion, we run simulations with different pump spliting ratios and calculate the modified detunings. Figure 5(a) illustrates the relationship between the pump splitting ratio and the modified detuning. Figure 5(b) illustrates the relationship between the soliton peak power and the modified detuning. Due to symmetry, the curves are degenerated in the CW and CCW directions.  \nIt has been known that the soliton group velocity and repetition rate can be changed due to Raman induced soliton self-frequency shift [17]. Therefore, the different soliton peak power in the CW and CCW directions will cause different Raman self-frequency shifts and different soliton repetition rates. Theoretically, the normalized repetition rate difference is related to the detuning by [17]  \n$$',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5185, 0.1030],
#         [0.5185, 1.0000, 0.2032],
#         [0.1030, 0.2032, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,466 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 13 tokens
    • mean: 27.84 tokens
    • max: 77 tokens
    • min: 90 tokens
    • mean: 402.23 tokens
    • max: 512 tokens
    • min: 25 tokens
    • mean: 394.46 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    How does the behavior of front solutions differ between high and low drive powers in a normal dispersion Kerr resonator without spectral filtering? Here, we develop some insight into the difference between dark solitons in the LLE and LLE-F as well as into the formation of chirped-pulse solitons unique to the LLE-F. For fixed detuning, Fig. 7 indicates that the drive power is a suitable parameter for traversing between the different solution types. We therefore examine the variation of steady-state solutions along the dashed line in Fig. 7. For a normal dispersion Kerr resonator without spectral filtering, front solutions (also known as domain walls or switching waves) often move in the reference frame of the driving field [19,54,55,66]. To examine the moving properties of front solutions, we initialize the simulation with a two-front intensity variation in the time domain. The equation is numerically solved with this initial condition and examined as a function of propagation distance until the waveform converges. At large drive powers without a filter [Fig. 8(a) and a from Fig. 7], the front solutions move together and vanish to... Dissipative solitons are self-localised structures resulting from the double balance of dispersion by nonlinearity and dissipation by a driving force arising in numerous systems. In Kerr-nonlinear optical resonators, temporal solitons permit the formation of light pulses in the cavity and the generation of coherent optical frequency combs. Apart from shape-invariant stationary solitons, these systems can support breathing dissipative solitons exhibiting a periodic oscillatory behaviour. Here, we generate and study single and multiple breathing solitons in coherently driven microresonators. We present a deterministic route to induce soliton breathing, allowing a detailed exploration of the breathing dynamics in two microresonator platforms. We measure the relation between the breathing frequency and two control parameters--pump laser power and effective-detuning--and observe transitions to higher periodicity, irregular oscillations and switching, in agreement with numerical predictions....
    What are the key advantages of microcombs that make them suitable for portable applications? Microresonator based optical frequency comb (often termed "microcomb" or "Kerr comb) generation was first demonstrated in 2007 [1]. It quickly attracted people's great interest and evolved to a hot research area. Microcombs are very promising for portable applications because they have many unique advantages including the capability of generating ultra-broad comb spectra (even more than one octave [2,3]), chip-level integration [4,5], and low power consumption. The basic scheme of microcomb generation is shown in Fig. 1(a). The frequency of a pump laser is tuned into the resonance of one high-quality-factor $(\boldsymbol{Q})$ microresonator which is made of Kerr nonlinear material. When the pump power exceeds some threshold, new frequency lines grow due to parametric gain. More lines are generated through cascaded four-wave mixing between the pump and initial lines, forming a broad frequency comb [6]. Intense studies have been performed to investigate microcomb generation. Various mate... We briefly review the physics of the parametric process in microresonators, discussed in detail in (30, 80). Kerr frequency combs were initially discovered in silica microtoroids, and experiments proved that the parametrically generated (11, 81) sidebands were equidistant to at least one part in $10^{-17}$ as compared with the optical carrier. In these early experiments, the combs repetition rate was in the terahertz range, and a femtosecond-laser frequency comb was used to bridge and verify the equidistant nature of the teeth spacing. It is today understood that such highly coherent combs only exist in certain regimes.
    What is the formula for determining the number of rolls that appear in the azimuthal direction when the cavity is pumped just above the threshold of modulational instability? Roll patterns emerge from noise after the breakdown of an unstable flat background through modulational instability, when the resonator is pumped above a certain threshold. This mechanism preferably occurs in the regime of anomalous GVD, but, however, rolls can also be sustained in the normal GVD regime, although under very marginal conditions (typically, very large detuning, see refs. [9, 18, 47]).
    When the pump is below the threshold, there is only one excited mode in the resonator $\left(l\ =\ 0\right)$ , while all the sidemodes amplitudes $\mathcal{A}{l}$ with $l\neq0$ are null. From the spatiotemporal standpoint, the intracavity feld is constant (flat background). Under certain conditions, when the pump $F$ is increased beyond a certain threshold value $F{\mathrm{th}}$ , the flat background solution becomes unstable and breaks down into a roll pattern characterized by a periodic modulation of the intracavity power as a function of the azimuthal angle (see Fig. 6). This phenome...
    $$
    Note that $\mathcal{N}{h}=S\mathcal{M}{h}S^{-1}$ and $\mathcal{I N}{h}=S\mathcal{I M}{h}S^{-1}$ , so the spectra of the full linearized operator, $\mathcal{I N}{h}$ , is equivalent to $\mathcal{I M}{h}$ . Also, $\sigma(\mathcal{N}{h})$ is equivalent to $\sigma(\mathcal{M}{h})$
    Since the two problems are equivalent, we note that the form (4.3) of the eigenvalue problem is more suggestive of our approach. For $h=0$ , we have two dimensional $\mathrm{Ker}[\mathcal{M}{0}]$ , spanned by the vectors is?. $\big(\begin{array}{c}{\varphi{0}^{\prime}}\ {0}\end{array}\big)$ and $\big(\begin{array}{c}{0}\ {\varphi_{0}}\end{array}\big)$ . We need to see what the evolution of the modulational eigenvalue is as $h:0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 5
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
2.7174 500 0.2217

Framework Versions

  • Python: 3.9.19
  • Sentence Transformers: 5.1.0
  • Transformers: 4.51.0
  • PyTorch: 2.5.0+cu124
  • Accelerate: 0.34.2
  • Datasets: 2.19.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JALLAJ/5epo

Finetuned
(287)
this model

Papers for JALLAJ/5epo