metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:34235
- loss:MultipleNegativesRankingLoss
base_model: zacbrld/MNLP_M2_document_encoder
widget:
- source_sentence: What is This?
sentences:
- >-
Bellcranks are also seen in automotive applications, such as in the
linkage connecting the throttle pedal to the carburetor or connecting
the brake pedal to the master cylinder In vehicle suspensions,
bellcranks are used in pullrod and pushrod suspensions in cars or in the
Christie suspension in tanks More vertical suspension designs such as
MacPherson struts may not be feasible in some vehicle designs due to
space, aerodynamic, or other design constraints; bellcranks translate
the vertical motion of the wheel into horizontal motion, allowing the
suspension to be mounted transversely or longitudinally within the
vehicle
- >-
DynaMo was also used as the face of the BBC's parental assistance
website This was created for parents to assist children with homework
There was also a section called "DynaMo's Den" which included
educational games for children The website was activated on 2 October
1998
- >-
The diode equation above is an example of an element constitutive
equation of the general form,
f
(
v
,
i
)
=
0
{\displaystyle f(v,i)=0}
This can be thought of as a non-linear resistor The corresponding
constitutive equations for non-linear inductors and capacitors are
respectively;
f
(
v
,
φ
)
=
0
{\displaystyle f(v,\varphi )=0}
f
(
v
,
q
)
=
0
{\displaystyle f(v,q)=0}
where f is any arbitrary function, φ is the stored magnetic flux and q
is the stored charge
- source_sentence: algorithm explanation
sentences:
- |-
Descriptive statistics
Average
Mean
Median
Mode
Measures of scale
Variance
Standard deviation
Median absolute deviation
Correlation
Polychoric correlation
Outlier
Statistical graphics
Histogram
Frequency distribution
Quantile
Survival function
Failure rate
Scatter plot
Bar chart
- >-
The various fields and topics that projects engineers are involved with
include:
Work breakdown structure: a deliverable-oriented breakdown of a project
into smaller components
Gantt chart: type of bar chart that illustrates a project schedule
Critical Path Analysis: an algorithm for scheduling a set of project
activities
Program evaluation and review technique: a statistical tool which was
designed to analyze and represent the tasks involved in completing a
given project
Graphical Evaluation and Review Technique: network analysis technique
that allows probabilistic treatment both network logic and estimation of
activity duration
Petri Nets: one of several mathematical modeling languages for the
description of distributed systems
- >-
Jessiko was marketed as a luxury decoration for businesses such as
hotels, restaurants, and museums Tiraby expressed hope that one day it
would be common to find his invention in household ponds and swimming
pools
- source_sentence: >-
The firm was founded as SECOR Ltd in 1994 by John Leeson, Alan Sheppard,
and David Richards After establishing the company in Oxford, United
Kingdom, in 1994, David oversaw the growth of the business from a small UK
operator into an environmental consultancies in the UK, with international
operations across Africa, Australasia, Canada, Europe, and the US
In 2000, the senior management team completed a management buyout and the
company's name was changed to SLR Consulting Limited In 2004 they secured
funding from Livingbridge, who invested £4 85 million as part of a £13
million investment including other partners, and took a significant
minority stake in the company In 2008, 3i invested £32 5 million in the
firm, and replaced Livingbridge with a significant minority stake In March
2018, Charterhouse Capital Partners (CCP) acquired a majority shareholding
in the business In June 2022 Charterhouse Capital Partners agreed to a
sale of SLR Consulting to Ares Management private equity partners David
Richards was Chief Executive Officer from 1994–2013 In line with the
Group's succession plans, Neil Penhall, formerly Managing Director of SLR
Consulting and an Executive Director of SLR Management, assumed the role
of CEO
sentences:
- |-
Institute for Transuranium Elements (ITU)
Institute for the Protection and the Security of the Citizen (IPSC)
Institute for Environment and Sustainability (IES)
Institute for Health and Consumer Protection (IHCP)
Institute for Energy (IE)
Institute for Prospective Technological Studies (IPTS)
- >-
Project NExT was founded by James (Jim) Leitzel (Ohio State University)
and Chris Stevens (Saint Louis University) The first fellows were
selected in 1994 Jim Leitzel died in 1998, and Aparna Higgins
(University of Dayton) and Joe Gallian (University of Minnesota Duluth)
became co-directors of Project NExT Chris Stevens stepped down as
director in 2010, and was succeeded by Aparna Higgins and Joe Gallian
Judith Covington (Louisiana State University, Shreveport) and Gavin
LaRose (University of Michigan) first served as Associate Co-Directors
and later became Co-Directors In 2007, the total number of fellows
surpassed 1000 By 2017 the total number of fellows reached 1700 In 2023
Christine Kelley became director
- >-
Quantum secure communication is a method that is expected to be 'quantum
safe' in the advent of quantum computing systems that could break
current cryptography systems using methods such as Shor's algorithm
These methods include quantum key distribution (QKD), a method of
transmitting information using entangled light in a way that makes any
interception of the transmission obvious to the user Another method is
the quantum random number generator, which is capable of producing truly
random numbers unlike non-quantum algorithms that merely imitate
randomness
- source_sentence: chemical reaction
sentences:
- >-
With suitably encoded scales (multitrack, vernier, digital code, or
pseudo-random code) an encoder can determine its position without
movement or needing to find a reference position Such absolute encoders
also communicate using serial communication protocols Many of these
protocols are proprietary (e g , Fanuc, Mitsubishi, FeeDat (Fagor
Automation), Heidenhain EnDat, DriveCliq, Panasonic, Yaskawa) but open
standards such as BiSS are now appearing, which avoid tying users to a
particular supplier
- >-
Bonneau, Pierre; Allens, Gaspard d' (2020) Cent mille ans Bure ou le
scandale enfoui des déchets nucléaires [One hundred thousand years Bure,
or the buried scandal of nuclear waste] Illustrated by Cécile Guillard
La Revue dessinée - Seuil ISBN 978-2-02-145982-1
- >-
The reason why MACE is heavily researched is that it allows completely
anisotropic etching of silicon substrates which is not possible with
other wet chemical etching methods (see figure to the right) Usually the
silicon substrate is covered with a protective layer such as photoresist
before it is immersed in an etching solution The etching solution
usually has no preferred direction of attacking the substrate, therefore
isotropic etching takes place In semiconductor engineering, however it
is often required that the sidewalls of the etched trenches are steep
This is usually realized with methods that operate in the gas-phase such
as reactive ion etching These methods require expensive equipment
compared to simple wet etching MACE, in principle allows the fabrication
of steep trenches but is still cheap compared to gas-phase etching
methods
- source_sentence: synthesis method
sentences:
- >-
STEMNET used to receive funding from the Department for Education and
Skills Since June 2007, it receives funding from the Department for
Children, Schools and Families and Department for Innovation,
Universities and Skills, since STEMNET sits on the chronological
dividing point (age 16) of both of the new departments
- >-
The Arab States of the Persian Gulf plan to start their own joint
civilian nuclear program An agreement in the final days of the Bush
administration provided for cooperation between the United Arab Emirates
and the United States of America in which the United States would sell
the UAE nuclear reactors and nuclear fuel The UAE would, in return,
renounce their right to enrich uranium for their civilian nuclear
program At the time of signing, this agreement was touted as a way to
reduce risks of nuclear proliferation in the Persian Gulf However,
Mustafa Alani of the Dubai-based Gulf Research Center stated that,
should the Nuclear Non-Proliferation Treaty collapse, nuclear reactors
such as those slated to be sold to the UAE under this agreement could
provide the UAE with a path toward a nuclear weapon, raising the specter
of further nuclear proliferation In March 2007, foreign ministers of the
six-member Gulf Cooperation Council met in Saudi Arabia to discuss
progress in plans agreed in December 2006, for a joint civilian nuclear
program
- >-
Timber framing dates back thousands of years, and has been used in many
parts of the world during various periods such as ancient Japan, Europe
and medieval England in localities where timber was in good supply and
building stone and the skills to work it were not The use of timber
framing in buildings provides their complete skeletal framing which
offers some structural benefits as the timber frame, if properly
engineered, lends itself to better seismic survivability
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on zacbrld/MNLP_M2_document_encoder
This is a sentence-transformers model finetuned from zacbrld/MNLP_M2_document_encoder. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: zacbrld/MNLP_M2_document_encoder
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("zacbrld/MNLP_M3_document_encoder_V1")
# Run inference
sentences = [
'synthesis method',
'STEMNET used to receive funding from the Department for Education and Skills Since June 2007, it receives funding from the Department for Children, Schools and Families and Department for Innovation, Universities and Skills, since STEMNET sits on the chronological dividing point (age 16) of both of the new departments',
'The Arab States of the Persian Gulf plan to start their own joint civilian nuclear program An agreement in the final days of the Bush administration provided for cooperation between the United Arab Emirates and the United States of America in which the United States would sell the UAE nuclear reactors and nuclear fuel The UAE would, in return, renounce their right to enrich uranium for their civilian nuclear program At the time of signing, this agreement was touted as a way to reduce risks of nuclear proliferation in the Persian Gulf However, Mustafa Alani of the Dubai-based Gulf Research Center stated that, should the Nuclear Non-Proliferation Treaty collapse, nuclear reactors such as those slated to be sold to the UAE under this agreement could provide the UAE with a path toward a nuclear weapon, raising the specter of further nuclear proliferation In March 2007, foreign ministers of the six-member Gulf Cooperation Council met in Saudi Arabia to discuss progress in plans agreed in December 2006, for a joint civilian nuclear program',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 34,235 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 3 tokens
- mean: 21.24 tokens
- max: 256 tokens
- min: 34 tokens
- mean: 133.62 tokens
- max: 256 tokens
- Samples:
sentence_0 sentence_1 chemistry experimentSince 1982, research has been conducted to develop technologies, commonly referred to as electronic noses, that could detect and recognize odors and flavors Application areas include food, medicine and the environmentquantum physicsHydro electric - Hydro-electric turbomachinery uses potential energy stored in water to flow over an open impeller to turn a generator which creates electricity
Steam turbines - Steam turbines used in power generation come in many different variations The overall principle is high pressure steam is forced over blades attached to a shaft, which turns a generator As the steam travels through the turbine, it passes through smaller blades causing the shaft to spin faster, creating more electricity Gas turbines - Gas turbines work much like steam turbines Air is forced in through a series of blades that turn a shaft Then fuel is mixed with the air and causes a combustion reaction, increasing the power This then causes the shaft to spin faster, creating more electricity Windmills - Also known as a wind turbine, windmills are increasing in popularity for their ability to efficiently use the wind to generate electricity Although they come in many shapes and sizes, the most common one is the la...physics lawBacklash in gear couplings allows for slight angular misalignment There can be significant backlash in unsynchronized transmissions because of the intentional gap between the dogs in dog clutches The gap is necessary to engage dogs when input shaft (engine) speed and output shaft (driveshaft) speed are imperfectly synchronized If there was a smaller clearance, it would be nearly impossible to engage the gears because the dogs would interfere with each other in most configurations In synchronized transmissions, synchromesh solves this problem However, backlash is undesirable in precision positioning applications such as machine tool tables It can be minimized by choosing ball screws or leadscrews with preloaded nuts, and mounting them in preloaded bearings A preloaded bearing uses a spring and/or a second bearing to provide a compressive axial force that maintains bearing surfaces in contact despite reversal of the load direction - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
num_train_epochs: 2multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.1168 | 500 | 1.465 |
| 0.2336 | 1000 | 1.189 |
| 0.3505 | 1500 | 1.1209 |
| 0.4673 | 2000 | 1.0333 |
| 0.5841 | 2500 | 0.993 |
| 0.7009 | 3000 | 0.9573 |
| 0.8178 | 3500 | 0.9275 |
| 0.9346 | 4000 | 0.9177 |
| 1.0514 | 4500 | 0.8241 |
| 1.1682 | 5000 | 0.7726 |
| 1.2850 | 5500 | 0.7685 |
| 1.4019 | 6000 | 0.7623 |
| 1.5187 | 6500 | 0.7668 |
| 1.6355 | 7000 | 0.7556 |
| 1.7523 | 7500 | 0.7002 |
| 1.8692 | 8000 | 0.7363 |
| 1.9860 | 8500 | 0.7396 |
Framework Versions
- Python: 3.12.8
- Sentence Transformers: 3.4.1
- Transformers: 4.52.2
- PyTorch: 2.7.0+cu126
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}