SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("andreinsardi/SciBERT-SolarPhysics-Search")
# Run inference
sentences = [
    'data-driven ringed residual u-net scheme for full waveform inversion',
    "full waveform inversion (fwi) is a powerful means for accurately reconstructing subsurface velocity models at high resolution. yet it is nevertheless a nonlinear and ill-posed problem. physics-driven fwi methods employ gradient-based optimization algorithms to minimize the error between the observed seismic data and the synthetically generated seismic data. the solution may converge to a local rather than global minimum. the cycle-skipping problem occurs when the synthetic data exceed a half-wavelength shift relative to the observed data. fwi relies on an accurate initial velocity model to mitigate the cycle-skipping problem. moreover, due to the increasing size and desired resolution of seismic data, fwi costs a great deal of computational time. to obviate these problems, we present a data-driven fwi scheme based on a deep learning architecture called u-net. the network consists of the ringed residual unit, which integrates residual propagation and residual feedback. it beneficially achieves correspondence between the seismic data domain and the velocity model domain. the features of the shallow layers are connected with the deep layers by a skip connection to facilitate seismic data spatial information propagation and utilization. they improve inversion accuracy and make the network more generalizable and robust. we utilize the society of exploration geophysicists (segs)/european association of geoscientists and engineers (eage) overthrust and salt models to verify our proposed method's impressive performance. the experimental results clearly demonstrate that the proposed method can produce high-quality velocity models. compared with the conventional physics-informed fwi, it has advantages in both computational time and initial model dependence. © 2024 elsevier b.v., all rights reserved.",
    'amidst the increasing penetration of intermittent renewable generation and the persistent growth of load demands, voltage stability assumes a pivotal concern in smart grids. the real-time voltage stability assessment (vsa) under time-varying operating conditions becomes paramount. recent strides in real-time vsa, utilizing intelligent data-driven learning with measurements, mark significant progress. however, a critical and unresolved challenge with purely data-driven methods is their susceptibility to performance degradation, especially in out-of-sample scenarios. to this end, this article presents a physics-informed guided deep learning (pgdl) paradigm for the practical and accurate assessment of voltage stability margins (vsms), leveraging both physics-based and data-driven techniques. the pgdl architecture includes an improved temporal convolutional network (itcn) for the automatic extraction of representative temporal features necessary for vsa from measurement data. additionally, pgdl integrates physics-based features informed by domain-specific knowledge. a feature fusion scheme is then devised to merge deep-learned features with pertinent physics-based attributes. acknowledging the unique contributions of these feature modalities to vsa, a novel twin attention mechanism (tam) is proposed to adaptively adjust attention weights, prioritizing learned features and thus optimizing vsa performance. substantial experiments on power systems of different scales, coupled with comparative analyses against state-of-the-art benchmarks, illustrate the efficacy and merits of the proposed approach. © 2025 elsevier b.v., all rights reserved.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5779, 0.0253],
#         [0.5779, 1.0000, 0.0727],
#         [0.0253, 0.0727, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 36,416 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 4 tokens
    • mean: 46.47 tokens
    • max: 269 tokens
    • min: 90 tokens
    • mean: 292.29 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1
    digital twin; eddy current; electrical-mechanical response; mechanical property monitoring; multiscale modeling; plastic deformation; constitutive models; eddy current testing; electric network analysis; electric network parameters; plasticity testing; surface discharges; eddy-current; electrical-mechanical response; electromagnetics; mechanical; mechanical property monitoring; mechanical response; modelling framework; monitoring system; multiscale modeling; property; constitutive equations this study aims to develop a thermodynamic modeling framework for the electromagnetic-plastic deformation response coupled with circuit analysis. to accomplish this objective, we derived the thermodynamic balance laws for materials exposed to electromagnetic fields while undergoing plastic deformation. the balance laws serve as the foundation for refining the connection between the plastic deformation and electrical conductivity of materials. this study also modeled the relationship between dislocation density and matthiessen's rule. the constitutive equations were subsequently implemented into a crystal plasticity model, thereby calibrating and validating the model. the derived modeling framework considers the 1st and 2nd laws of thermodynamics. the model was then transformed into a circuit model for a monitoring system by formulating equations to analyze the changes in material impedance resulting from the evolution of plastic deformation. this lays the groundwork for creating a moni...
    mechanism of the failed eruption of an intermediate solar filament solar filament eruptions can generate coronal mass ejections (cmes), which are huge threats to space weather. thus, we need to understand their underlying mechanisms. although many authors have studied the mechanisms for several decades, we still do not fully understand in what conditions a filament can erupt to become a cme or not. previous studies have discussed extensively why a highly twisted and already erupted filament will be interrupted and considered that a strong overlying constraint field seems to be the key factor. however, few of them study filaments in the weak field, namely, quiescent filaments, as it is too hard to reconstruct the magnetic configuration there. here we show a case study, in which we can fully reconstruct the configuration of an intermediate filament with the mhd-relaxation extrapolation model and discuss its initial eruption and eventual failure. by analyzing the magnetic configuration, we suggest that the reconnection between the erupting magnetic flux ...
    long-term earth magnetosphere science orbit with earth-moon resonance orbit we introduce the long-term earth magnetosphere science orbits designed to maintain a fixed orientation relative to earth's magnetosphere over extended durations. by leveraging the earth-moon resonant orbits, the spacecraft's argument of periapsis is aligned with the orientation of earth's magnetosphere, thereby enabling continuous observations. three specific earth–moon resonant orbits, characterized by distinct values of the jacobi integral, are identified to exhibit these properties of stable, magnetosphere-aligned evolution. this approach facilitates sustained monitoring of large-scale magnetospheric dynamics and opens new opportunities for focused science objectives. these include studying the interaction between the earth and the moon in shaping magnetospheric boundaries and probing magnetospheric vortices and other transient phenomena. the resultant long-term vantage point—achieved through careful resonance and orbital design—offers a platform for future space weather research, m...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 2
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.8787 500 0.216
1.7575 1000 0.0434

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for andreinsardi/SciBERT-SolarPhysics-Search