metadata
base_model: sentence-transformers/all-mpnet-base-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:281342
- loss:CachedMultipleNegativesRankingLoss
widget:
- source_sentence: >-
ez buy federal 556 nato xm855 62 grain green tip fmj us bulk 223 ammo for
sale cheap bulk ammunition in fmj and jhp depotus american eagle 223556
fmj rebate 400 maximum per household case upcs accepted for mailin us
identifiers is xm855lpc120rebate30 category of toolsandhomeimprovement
sentences:
- >-
ez buy federal 556 nato xm855 62 grain green tip fmj us bulk 223 ammo
for sale cheap bulk ammunition in fmj and jhp depotus american eagle
223556 fmj rebate 400 maximum per household case upcs accepted for
mailin us identifiers is xm855lpc120rebate30 category of
toolsandhomeimprovement
- >-
3m cable for dlink 10gbe cx4 module demcb300cx 3m module dlink list
retailers identifiers is demcb300cx category of otherelectronics
- >-
seat frame wiring harnessd 02082010 gb 2013 volkswagen golf china market
electrics harness 4doorright gb identifiers is 1k4971369f category of
automotive
- source_sentence: >-
epson t33xl photo black inkjet cartridge ink colours photo black
identifiers is 8715946600598 category of officeproducts key specifications
are attributes ink colours photo black volume 81ml newremanufactured new
single or multi colour cartridge compatibility compatible brand epson
printers expression premium xp530xp630xp635xp830 originalcompatible
original manufacturer general number t33614010 c13t33614012 model name
33xl physical form factor pack quantity 1 pieces recycling information can
i recycle it click here for details on how to recycle
sentences:
- >-
rear child seat support bolt 516x35 belts upholstery page 1 1994 bmw
325i base sedan seats produced by genuine bmw identifiers is
72111922499boe category of automotive
- >-
epson claria 33xl ink cartridge photo black inkjet 400 page 1 blister
pack c13t33614010 novatech inkjet 400 page 1 blister pack produced by
epson identifiers is eps101979 category of officeproducts
- >-
maglite xenon replacement lamps for 2cell aa flashlights 2packus the
maglite xenon replacement lamps for 2cell aa flashlights 2pack help keep
your mini shining mini not included are highintensity bulbs and come in
a package of 2 convenience us produced by maglite us identifiers is
100045339 category of toolsandhomeimprovement
- source_sentence: >-
control arm front right lower 1993 bmw 325i base sedan suspension shocks
springs page 8 produced by genuine bmw identifiers is 31122339996boe
category of automotive
sentences:
- >-
vehicle jump starter jumpncarry 660 note 1700 peak amps 425 cranking
clore proformer battery technology 46 2 awg welding cable leads
industrialgrade clamps builtin charger automatic charging voltmeter
provides charge status of onboard battery 12v dc outlet to power
accessories 1991 bmw 325i base sedan charging system page 2 note 1700
peak amps 425 cranking clore proformer battery technology 46 2 awg
welding cable leads industrialgrade clamps builtin charger automatic
charging voltmeter provides charge status of onboard battery 12v dc
outlet to power accessories produced by null identifiers is jnc660m1313
category of automotive
- >-
hose clamp 1628 mm range 9 width screw type 1997 bmw 318is base coupe
radiators page 2 produced by norma identifiers is 64218367179m249
category of automotive
- >-
control arm front right lower 1998 bmw 318ti base hatchback suspension
shocks springs page 6 produced by delphi identifiers is 31122339996m292
category of automotive
- source_sentence: >-
sunvisor support bracket interior right sideus chevy parts sunvisor
interior this is the interior sunvisor support bracket for right side with
screws and a template passenger sideus identifiers is 986155r4753 category
of automotive
sentences:
- >-
chevrolet sunvisor support bracket interior right sideus chevy parts
interior sunvisors chevs of the 40sus this is the interior sunvisor
support bracket for right side with screws and a template passenger side
1947 1948 1949 1950 1951 1952 1953 chevrolet trucks us identifiers is
986155r4753 category of automotive
- >-
abbey round 30 wall mirror in frameless 791888045671 guildhall 8quote 1
light sconce dutch goldantique sale home lighting fixtures lamps more
online this stylish silver wall mirror will introduce a modern feel to
any room its sleek design makes the versatile and distinct in frameless
produced by cooper classics identifiers is 4567upc791888045671 category
of toolsandhomeimprovement
- >-
10700 series return shell 30w x 20d 2912h henna cherry hon107270xjj home
office desks page 601 furniture town 10700 series return shell 30w x 20d
29 12h henna cherry produced by mydirectadvantage identifiers is
hon107270xjj107270xjj category of officeproducts
- source_sentence: >-
paint sealant sonax profiline polymer net shield 75 ml aerosol can 1994
bmw 318is base coupe miscellaneous page 24 note innovative surface
protection based on hybrid polymers protects the paintwork by means of a
resistant network made from organic and inorganic components can be
applied quickly easily intensively freshens up paint color produces silky
smooth with an outstanding drip off effect one 75 ml should complete
average size car produced by sonax identifiers is 223000m941 category of
automotive
sentences:
- >-
paint sealant sonax profiline polymer net shield 75 ml aerosol can 1991
bmw 325i base convertible miscellaneous page 23 note innovative surface
protection based on hybrid polymers protects the paintwork by means of a
resistant network made from organic and inorganic components can be
applied quickly easily intensively freshens up paint color produces
silky smooth with an outstanding drip off effect one 75 ml should
complete average size car produced by sonax identifiers is 223000m941
category of automotive
- >-
honeywell accessories for terminal cod99exmb12 honeywell cod871238012
honeywell dolphin 99ex mobile base vehicle kit charging cradle rs232
universal mounting bracket and 12v cigarette lighter power adapter
produced by honeywell metrologic identifiers is 99exmb12 category of
computersandaccessories
- >-
usb flash drives hard quillcom null identifiers is 901507043 category of
computersandaccessories
SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-mpnet-base-v2
- Maximum Sequence Length: 384 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'paint sealant sonax profiline polymer net shield 75 ml aerosol can 1994 bmw 318is base coupe miscellaneous page 24 note innovative surface protection based on hybrid polymers protects the paintwork by means of a resistant network made from organic and inorganic components can be applied quickly easily intensively freshens up paint color produces silky smooth with an outstanding drip off effect one 75 ml should complete average size car produced by sonax identifiers is 223000m941 category of automotive',
'paint sealant sonax profiline polymer net shield 75 ml aerosol can 1991 bmw 325i base convertible miscellaneous page 23 note innovative surface protection based on hybrid polymers protects the paintwork by means of a resistant network made from organic and inorganic components can be applied quickly easily intensively freshens up paint color produces silky smooth with an outstanding drip off effect one 75 ml should complete average size car produced by sonax identifiers is 223000m941 category of automotive',
'honeywell accessories for terminal cod99exmb12 honeywell cod871238012 honeywell dolphin 99ex mobile base vehicle kit charging cradle rs232 universal mounting bracket and 12v cigarette lighter power adapter produced by honeywell metrologic identifiers is 99exmb12 category of computersandaccessories',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 281,342 training samples
- Columns:
anchorandpositive - Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 28 tokens
- mean: 81.11 tokens
- max: 384 tokens
- min: 28 tokens
- mean: 82.01 tokens
- max: 384 tokens
- Samples:
anchor positive honeywell hand held products dolphin 99509951 series mobile computer usb cable 6 ft 18m 80000355e usb cable 6 ft 18m identifiers is 80000355e category of computersandaccessorieshand held usb cable 6 ft hand ft 80000355e scanner accessories cdwcom hand held products is the leading provider of imagebased data collection solutions for mobile wireless and transaction processing applications to end users throughout world by investing in hhp products its customers are able reduce costs improve service position their companies future growth identifiers is 26121604 category of computersandaccessoriesintake boot air mass sensor to throttle housing 1995 bmw 318i base convertible intake system page 2 note from 0994 produced by oem identifiers is 13711247829m58 category of automotiveintake boot air mass sensor to throttle housing 1995 bmw 318i base convertible intake system page 2 produced by crp identifiers is 13711247829int category of automotiveblue sky panorama with transparent clouds vector image sky images over 150 000 vector blue sky panorama with transparent clouds vector background image identifiers is 15266707 category of officeproductsblue sky panorama with transparent clouds vector image images within landscapes nature over 55 000 vector blue sky panorama with transparent clouds vector background image identifiers is 15266707 category of officeproducts - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 70,336 evaluation samples
- Columns:
anchorandpositive - Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 28 tokens
- mean: 84.74 tokens
- max: 384 tokens
- min: 25 tokens
- mean: 87.69 tokens
- max: 384 tokens
- Samples:
anchor positive heater hose inlet from cylinder head to water valve 1997 bmw 318i base sedan heater system page 3 produced by genuine bmw identifiers is 64211394295boe category of automotiveheater hose inlet from cylinder head to water valve 1996 bmw 318i base convertible heater system page 3 produced by genuine bmw identifiers is 64211394295boe category of automotiveharris harris group inc group 1 full quote netdaniacom produced by source nasdaq identifiers is isinus4138331040 category of toolsandhomeimprovementharris harris group inc 1 statistics netdaniacom group produced by source nasdaq identifiers is isinus4138331040 category of toolsandhomeimprovementswiffer dusters with extendable handledusters plastic handle extends to 3 ft 1 per kit handledusters ft kitpag82074 buy online at janeice products identifiers is pag82074 category of toolsandhomeimprovement key specifications are weight per case std pkg quantity package one handle and three dusters description includes item cube 008276 upc code 037000447504 pack 00037000820741 length 092 width 022 height 042 04766 pack value bundle pag82074 dusters plastic handle extends to 3 ft 1 dusters per kitus feather page 5 the janitorial marketus now its easier than ever to get those hardtoreach places pivoting head can be adjusted and locked into place for cleaning angled surfaces such as ceiling fans cabinet corners baseboards refill dusters sold separately one handle three per box bristle material fiber color white plastic greenus produced by pag82074us identifiers is pag82074 category of toolsandhomeimprovement - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepslearning_rate: 1e-05num_train_epochs: 2warmup_ratio: 0.1fp16: Trueauto_find_batch_size: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Truefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | loss |
|---|---|---|---|
| 0.1990 | 7000 | 0.0087 | 0.0027 |
| 0.3981 | 14000 | 0.0026 | 0.0020 |
| 0.5971 | 21000 | 0.0014 | 0.0018 |
| 0.7962 | 28000 | 0.0014 | 0.0014 |
| 0.9952 | 35000 | 0.0013 | 0.0010 |
| 1.1943 | 42000 | 0.0008 | 0.0010 |
| 1.3933 | 49000 | 0.0005 | 0.0010 |
| 1.5924 | 56000 | 0.0003 | 0.0009 |
Framework Versions
- Python: 3.10.13
- Sentence Transformers: 3.0.1
- Transformers: 4.44.0
- PyTorch: 2.2.1
- Accelerate: 0.33.0
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}