SentenceTransformer based on Alibaba-NLP/gte-multilingual-base

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-multilingual-base on the offshore_energy_v1 dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'NewModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Sampath1987/EnergyEmbed-nv1")
# Run inference
sentences = [
    'How does the predictive reservoir effectiveness model aid in the exploration of the Winduck Interval?',
    'The latest Silurian to Early Devonian Winduck Interval of the extensive but poorly exposed Neckarboo Sub-basin, consists of several thousands of metres of a quartzose siliciclastic sandstone succession that has been divided into three sequence divisions called (in ascending parasequence order) parasequence A (coarse-grained quartz sandstone), parasequence B (fining-upward succession of sandstone with siltstone and sandstone beds thicken upward) and parasequence C (coarse-grained quartz sandstone with siltstone and interbedded calcareous sandstones). These three geophysically defined parasequences are separated by slightly discordant erosion surfaces. The erosion surfaces are characterised by abrupt breaks at the top of parasequences A and B and the surface at the top of parasequence B represents relatively local erosion. The top of parasequence C is marked by a major unconformity with the Snake Cave Interval. Gamma ray and self-potential signatures within the parasequences can be correlated throughout the Neckarboo Sub-basin. The three sequence divisions are further subdivided into depositional parasequences, which are readily recognised from core sedimentology and electrofacies analysis. The parasequences provide the framework for a detailed sedimentological analysis, which focuses on the identification of lithofacies successions and parasequences. Petrophysical data are recorded and their relationships to the depositional parasequences are discussed. This paper presents a predictive reservoir effectiveness model that has been developed to aid exploration of the Winduck Interval. The aim is to find the distribution of parasequences (based on variations in porosity, net effective thickness and lithofacies with burial depth) and to provide a dataset for lithostratigraphic units within the Winduck Interval and parameter input for exploration prospect evaluation. Parasequence stratigraphic analyses were obtained where there is good lithofacies control. The porosity and permeability results have been analyzed in a number of parasequences and poor reservoir quality may be due to the effects of structure and fluid flow. This approach provides for better and more precise stratigraphic trap analysis.',
    'In this multi-Tcf subsea gas development off the North West coast of Australia, reservoir simulation supports the key business decisions and processes. An important factor when providing production forecasts is ensuring that a range of possible outcomes (low-mid-high) are captured accurately by the models. The output from these models may then be used by decision makers for evaluating different developments and scenarios. The design of experiments (DoE) is commonly employed to aid the evaluation of subsurface uncertainties and characterise the impact and influence to key model outcomes supporting development decisions.\nField production performance is often driven by uncertainty in reservoir outcome. This paper is helpful to practitioners involved in any computer modelling of petroleum reservoirs who are interested in capturing the uncertainty inherent in a field and building an appropriate workflow for the development and sensitivity of a range of models. Both model building and using DoE to evaluate developments and Value of Information (VoI) studies for reservoir management will be shared. Integrated DoE focusing on static, dynamic and well-based uncertainties will be illustrated.\nResults will cover:\n–\nLessons learned and best practices using ED (Experimental Design) to generate low-mid-high reservoir simulation models\n–\nUnderstanding reservoir and well based uncertainties separately\n–\nEvaluating incremental field developments using ED\n–\nUtilizing ED to anticipate range of surveillance responses\nFew papers exist on the integrated application of ED to giant gas fields using reservoir simulation. Firstly, this case study will highlight some pitfalls to avoid during the workflow. Secondly, the authors will discuss the important issue of how to integrate or separate static, dynamic, well and facility based uncertainties. Thirdly, the work will show the unique application of ED in VoI and field development scoping.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6207, 0.1418],
#         [0.6207, 1.0000, 0.0860],
#         [0.1418, 0.0860, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.98

Training Details

Training Dataset

offshore_energy_v1

  • Dataset: offshore_energy_v1 at d4682d4
  • Size: 44,838 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 13 tokens
    • mean: 24.54 tokens
    • max: 46 tokens
    • min: 33 tokens
    • mean: 430.25 tokens
    • max: 1027 tokens
    • min: 45 tokens
    • mean: 423.92 tokens
    • max: 1204 tokens
  • Samples:
    anchor positive negative
    What benefits were realized through the adoption of remote operations services in the North Sea? The North Sea has always been a pioneer for the adoption of remote operations services (ROS) in offshore drilling applications. Drilling services such as Measurement While Drilling (MWD), Logging While Drilling (LWD) and/or mud logging (ML) have been performed with an element of ROS for over the last two decades. Early adoption of these remote services delivered initial benefits to operators such as reducing HSE risks related to the travel and accommodation of field service employees at offshore rig sites. Meanwhile service companies were able to explore the added efficiencies gained by having multi-skilled employees providing a higher level of support to customers while also gaining additional agility to manage their personnel through tighter market cycles. The mutual benefit of this early adoption created a solid foundation for ROS to expand the scope of influence in drilling operations to include Directional Drilling (DD).
    Despite the maturity of ROS within a select community of ope...
    A new program for the development of graduate engineers has been implemented in Denmark on a stimulation vessel in the North Sea. It is designed to provide graduate engineers with a three-year period of extensive experience in offshore operations, knowledge of equipment and designing effective stimulation jobs. There are many components to the program that address training, skills, demonstration of capabilities and evidence of competence. These are essential components that ultimately lead to improved operational performance and highlights.
    The North Sea oil and gas industry requires a constant effort to maintain the engineering skills of its offshore workers so vital to continued success. Paradoxically, there are numerous factors that hinder on site development of young engineering talent in the North Sea. There is a lack of offshore accommodation that often restricts onsite time for trainees. This is exacerbated by a low frequency of many operations compared to other provinces in the...
    What is the estimated storage capacity for CO2 in the analyzed study area? The oil and gas industry is a significant contributor to carbon dioxide (CO2) emissions, which have a major impact on climate change. Geoscientists in the industry play a crucial role in mitigating climate change by identifying and evaluating potential CO2 storage sites, monitoring CO2 behavior after injection, and exploring CO2 enhanced oil recovery (EOR) techniques. CO2 -EOR involves injecting CO2 into depleted oil reservoirs to increase oil production. Reservoir characterization using well log and seismic data analysis helps determine storage capacity, containment, and injectivity of reservoirs for CO2 sequestration and EOR. In this study, two sand reservoirs (RES 1 and RES 2) were analyzed, with RES 2 being considered more suitable for CO2 sequestration and CO2 -EOR. The estimated storage capacity of the study area was approximately 40 million metric tons (MT). Assessments of fault sealing capacity and reservoir properties were conducted to validate storage potential. Further inves... Transported and geologically stored CO2 contains several impurities that depend on its source and associated capture technology. Impurities in anthropogenic CO2 can have damaging impacts on the different elements of a CCS system, which must be considered when developing a CO2 specification (Table 1). Thus, characterising all the impurities and determining the required purity of the CO2 mixture is critically important for the safe design and operation of CCS transport and storage systems.
    It is important to note that CO2 specifications relate to normal operations. Short-term excursions outside of the recommended maximum concentrations for each impurity may be permissible provided they do not lead to health and safety risks and / or risks to the mechanical integrity of the asset.
    What is the role of a Preventive Maintenance Program (PMP) in enhancing the reliability of Electrical Submersible Pumps (ESPs)? The reliability of Electrical Submersible Pumps (ESPs) is a critical target for companies managing artificially lifted fields. While efforts to continuously improve the reliability in the downhole system are crucial, it is necessary to focus on the health and long-term reliability of the ESP surface equipment. One effective approach toward achieving this goal is through conducting a comprehensive Preventive Maintenance Program (PMP) for the different components of the ESP surface system.
    An ESP PMP should be managed without jeopardizing production strategy. The design of the PMP must meet the production demand while maintaining the best-in-class PMP practices. The well operating condition, frequency, weather, well location, required periodic inspection and preemptive servicing and replacement of surface equipment components must be considered, based on studied criterion. The design of the PMP considers equipment upgrades and thermal imaging surveillance to guarantee healthy electrical ...
    A family of exciting new Electric Submersible Pump (ESP) technologies promises to radically improve the development economics of many oilfields and field extensions. This technology is particularly relevant to prospects in the range 5-100 million barrels reserves, which are located greater than 15 kilometres from existing platforms and often suffer uncertainties on reservoir performance (pressure, sweep, heterogeneities inflow performance etc.). Prospects in that category generally offer mediocre to inadequate economics or unacceptable risks of ‘downside’ potential. Platform development entails untenable capex exposure, whereas conventional subsea development (e.g. by gas lift) will result in very inferior production performance.
    The new technologies which ‘unlock’ the economics of such fields are:
    Viable subsea ESP technology is available now and will be field proven during 1994/95.
    Proven high reliability pump systems are now available, underwritten by performance contract.
    Bottom di...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

offshore_energy_v1

  • Dataset: offshore_energy_v1 at d4682d4
  • Size: 5,604 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 14 tokens
    • mean: 24.45 tokens
    • max: 41 tokens
    • min: 47 tokens
    • mean: 440.51 tokens
    • max: 1091 tokens
    • min: 56 tokens
    • mean: 426.21 tokens
    • max: 1152 tokens
  • Samples:
    anchor positive negative
    What is the role of nanocrystalline cellulose (NCC) in the formulation of hydraulic fracturing fluids? Guar gum and its derivative based-gels cross-linked with boron have been used in hydraulic fracturing for decades. In order to achieve gel strength requirements, conventional fracturing requires the use of a large amount of thickener and cross-linking agent, which results in more residue and difficulty in the recovery of permeability. At the same time, the gel can be used to achieve the best thermal stability in a high pH environment. Therefore, we proposed a highly efficient organoboron nanocellulose cross-linker for low polymer loading fracturing fluids.
    Nanocrystalline cellulose (NCC) resulted from sulfuric acid hydrolysis of cellulose microciystalline. Boron-modified nanoparticles were synthesized by one-pot reaction as nano boron cross-linker (NBC). Nanocrystalline cellulose (NCC), (3-Aminopropyl) triethoxysilane, Organic boron (OBC) was mixed at a ratio of 1:4:4 and stirred at a constant temperature of 85°C for 5 hours. The presence of surface modification was shown with FTIR spe...
    The unstable wellbore created by the infiltration of drilling fluids into the reservoir formation is a great challenge in drilling operations. Reducing the fluid infiltration using nanoparticles (NPs) brings about a significant improvement in drilling operation. Herein, a mixture of iron oxide nanoparticle (IONP) and polyanionic cellulose nanoparticle (nano-PAC) additives were added to water-based mud (WBM) to determine their impact on rheological and filtration properties measured at 80 °F, 100 °F, and 250 °F. Polyanionic cellulose (PAC-R) was processed into nano-PAC by wet ball-milling process. The rheological behaviour, low-pressure low-temperature (LPLT), and high-pressure high-temperature (HPHT) filtration properties performance of IONP, nano-PAC, and IONP and nano-PAC mixtures were compared in the WBM. The results showed that IONP, nano-PAC, and synergy effect of IONP and nano-PAC in WBM at temperatures of 80 °F and 250 °F improved the density, 10-s and 10-min gel strength (10-s ...
    What is the definition of tail gas in oil and gas engineering processes? #### T
    Tail gas
    Effluent gas at the end of a process.
    Technical Potential
    The amount by which it is possible to reduce greenhouse gas emissions by implementing a
    technology or practice that has reached the demonstration phase.
    Tectonically active area
    Area of the Earth where deformation is presently causing structural changes.
    Thermocline
    The ocean phenomenon characterized by a sharp change in temperature with depth.
    Thermohaline
    The vertical overturning of water masses due to seasonal heating, evaporation, and cooling.
    Third party
    Entity that is independent of the parties involved with the issues in question Top-down model.
    A model based on applying macro-economic theory and econometric techniques to historical
    data about consumption, prices, etc.
    Tracer
    A chemical compound or isotope added in small quantities to trace flow patterns.
    36
    SUSTAINABILITY REPORTING GUIDANCE FOR THE OIL AND GAS INDUSTRY
    Particulate matter: A complex mixture of small particles or droplets such as salts, organic
    chemicals, metals and soil particles [ENV-5].
    Petrochemicals: Chemical products derived from oil and gas.
    Pipelines: Construction and use of facilities to transport liquid or gaseous hydrocarbons
    over long distances in above-ground, below-ground or underwater pipes.
    Primary containment: The vessel, pipe, barrel, equipment or other barrier that is designed
    to keep a material within it [ENV-6, ENV-7, SHS-6].
    Primary energy: The energy content of a hydrocarbon fuel or other energy source used to
    produce power, usually in the form of electricity, heat or steam [CCE-6].
    Process safety: A systematic approach to ensuring the safe containment of hazardous
    materials or energy by applying good design, construction and operating principles [SHS-6].
    In this Guidance, this term is used synonymously with Asset i...
    How is dense phase acid gas injected back into the formation to mitigate environmental impacts? A systematic hazard management approach was used to identify, assess and mitigate hazards at the conceptual design stage of a large onshore sour gas development in Abu Dhabi. The potential environmental impact of sulphur block production and poor prospects of a sulphur market led to a concept involving injection of dense phase acid gas back into the formation. Significant Health, Safety and Environmental (HSE) challenges were addressed relating to the scale of the sour gas development which included the gathering, processing and injection of sour/acid gas containing 33% – 80% H2S. Quantitative Risk Assessment and H2S dispersion calculations were performed to evaluate the risk reduction effectiveness of specific HSE design considerations including material selection, pipeline design, pipeline routing, well design and the location of the processing facility and sour/acid gas wells. These HSE design considerations were integrated into the concept selection. Best industry practices in desi... Nowadays, as the deep gas reservoirs in Daqing are explored, the complex volcanic reservoirs have been the major reservoirs in deep natural gas exploration and production. The reserves of volcanic gas reservoirs take up 88% of the total gas reserves. However, the deep complex gas reservoirs may cause heavy pollution during the drilling completion, and some of the barriers between target zones of the wells are very thin, leading to a poor stability. Additionally, because of the complex water/gas relations in the formation, such as appearance of bottom water and water and gas sharing the same formation in some wells, the fracturing operations will induce water channeling. All these facts may cause the failure of the fracturing operations.
    Especially, when the fractured formation is close to the water/gas interface, the fractures will easily extend into the water layer. The existence of water in the gas wells directly leads to the reduction of production and recovery rate of the gas reser...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Validation Loss ai-job-validation_cosine_accuracy
0.3568 1000 0.0982 0.9764
0.7135 2000 0.0870 0.9800

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 5.1.0
  • Transformers: 4.53.3
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
7
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sampath1987/EnergyEmbed-nv1

Finetuned
(90)
this model

Dataset used to train Sampath1987/EnergyEmbed-nv1

Evaluation results