mpnet_ISB / README.md
spl4shedEdu's picture
Upload model checkpoint
246f80f verified
metadata
base_model: sentence-transformers/all-mpnet-base-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:281362
  - loss:CachedMultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      steel lock washer 8 x 144 2 mm zinc plated split 2004 bmw 325ci base
      convertible miscellaneous hardware page 3 auveco 17397m769 automotive
    sentences:
      - >-
        steel lock washer 8 x 144 2 mm zinc plated split 1993 bmw 318i base
        sedan miscellaneous hardware page 3 auveco 17397m769 automotive
      - >-
        generac protector rg03624ansx standby generators liquidcooled reviews
        ratings product discontinued discontinued electric directcom
        696471617450 toolsandhomeimprovement
      - >-
        drive belt tensioner water pumpalternator 1994 bmw 325i base convertible
        charging system battery page 6 note shock type hydraulic ina
        11281717188m40 automotive
  - source_sentence: >-
      nokya hyper white front turn signal light bulbs 2010 toyota camry please
      double check your bulbs to make sure we have the right replacement bulb
      listed so there is arguably even an added benefit of increased safety
      tooplease note then you almost have change corner lights too avoid ruining
      benefits new headlights give look car also signal in style with these
      nokya hyper white front turn signal bulbs instead ugly stock orange 1010
      camry came while try be as accurate possible our listings custom front
      definitely stand out more compared other could whole assembly a set in
      case where are already changing headlight color nok52022pcs automotive
    sentences:
      - >-
        datalogic accessories for readers codbc9180433 datalogic codstdp090
        datalogic base stationcharger ethernet datalogic bc9180433
        computersandaccessories
      - >-
        nokya hyper white front turn signal light bulbs 2010 toyota camry please
        double check your bulbs to make sure we have the right replacement bulb
        listed so there is arguably even an added benefit of increased safety
        tooplease note then you almost have change corner lights too avoid
        ruining benefits new headlights give look car also signal in style with
        these nokya hyper white front turn signal bulbs instead ugly stock
        orange 1010 camry came while try be as accurate possible our listings
        custom front definitely stand out more compared other could whole
        assembly a set in case where are already changing headlight color
        nok52022pcs automotive
      - >-
        39400001 axor citterio wall mounted bath tub filler faucetnohtin
        39034821 bathroom faucet tall and handle brushed sale appliance specials
        and replacement parts axor citterio revives the opulence of water and
        redefines the purity of space each arch angle and line weds clarity and
        harmony evoking timeless classics that are mysterious yet somehow
        familiar discover a new form of luxury with axor citterio axor
        232848id39400001 toolsandhomeimprovement
  - source_sentence: >-
      canon pixus 865r cartridges for ink jet printers quillcom null
      901tgbci6bkclo officeproducts
    sentences:
      - >-
        smart racing products smartcamber digital camber gauge 2003 bmw 325ci
        base convertible suspension upgrades performance page 7 pel1850070smrt
        automotive valving option street comfort front spring 180mm 8kg rear
        spring 135mm 10kg front pillowball pillowball w camber plates rear
        pillowball n1 basic w top plates no camber plates valving option street
        sport front spring 180mm 8kg rear spring 135mm 10kg front pillowball
        pillowball w camber plates rear pillowball basic w top plates no camber
        plates valving option track race front spring 180mm 10kg rear spring
        140mm 10kg front pillowball pillowball w camber plates rear pillowball
        basic w top plates no camber plates
      - >-
        datalogic cable for readers cod90a051903 datalogic cod90a051330
        datalogic cable cab413 usb straight ibm pos mode datalogic 90a051903
        computersandaccessories
      - >-
        canon pixus 865r cartridges for ink jet printers quillcom null
        901tgbci6bkclo officeproducts
  - source_sentence: >-
      headlamp restoration kit sonax 2002 bmw 325i base wagon lights and lenses
      page 7 note removes yellowing and haze of plastic headlight lenses
      restoring likenew clarity one kit restores four headlights simple three
      step process requires no polishing machine step one use the circular
      sanding pad to gently remove stubborn headlight hazing step two use the
      abrasive polish and application pad to gently remove sanding marks step
      three use the towelette to apply a uv protective coating to maintain
      headlight clarity contains qty 1 75 ml polish 4 sanding discs 5000 grit 2
      application sponges 4 polishing cloths 2 moist cloths with sealant sonax
      405941m941 automotive
    sentences:
      - >-
        headlamp restoration kit sonax 1976 bmw 30si base sedan lights and
        lenses page 2 note removes yellowing and haze of plastic headlight
        lenses restoring likenew clarity one kit restores four headlights simple
        three step process requires no polishing machine step one use the
        circular sanding pad to gently remove stubborn headlight hazing step two
        use the abrasive polish and application pad to gently remove sanding
        marks step three use the towelette to apply a uv protective coating to
        maintain headlight clarity contains qty 1 75 ml polish 4 sanding discs
        5000 grit 2 application sponges 4 polishing cloths 2 moist cloths with
        sealant sonax 405941m941 automotive
      - >-
        philips ultinon led lighting 2122w 43mm festoon white 1 piece 1996 bmw
        318i base convertible lights and lenses page 3 phi2122ulwx11 automotive
      - >-
        canon pixma mx850 cartridges for ink jet printers quillcom trust genuine
        canon cli8bk ink cartridges to provide outstanding print quality for all
        your important photos and documentsunlike bargain replacement inks
        original canon cli8bk ink cartridges are designed specifically to work
        with canon printers for exceptional reliability and performancehave full
        photolithography inkjet nozzle engineering 901cli8bk officeproducts
  - source_sentence: >-
      phone cable flat 4 wire solid silver 1000ft 26awg wire solid 1000ft phone
      cable flat 4 wire solid silver 1000ft 26awg allows you to connect your
      telephones faxes answering machines and most modems perfect for all your
      custom installation projects 1000ft roll bulk phone cable flat cable
      silver color 4 conductor 26 awg solid copper ul listed 815239013642
      otherelectronics
    sentences:
      - >-
        phone cable flat 4 wire solid silver 1000ft 26awg wire solid 1000ft
        phone cable flat 4 wire solid silver 1000ft 26awg allows you to connect
        your telephones faxes answering machines and most modems perfect for all
        your custom installation projects 1000ft roll bulk phone cable flat
        cable silver color 4 conductor 26 awg solid copper ul listed
        815239013642 otherelectronics
      - >-
        soul black gb 2013 audi a4 allroad quattro canada market body middle
        armrest front pr6e3gb fz period 1111 gb 8k0864207jtq8 automotive
      - >-
        flashlight streamlight stinger led 1970 bmw 1602 base coupe tools page 8
        note compact and extremely powerful with 3 microprocessor controlled
        intensity modes strobe mode and the latest in power led technology 6000
        series machined aircraft aluminum with nonslip rubberized comfort grip
        with antiroll rubber ring unbreakable polycarbonate lens with
        scratchresistant coating oring sealed c4 led technology impervious to
        shock with a 50000 hour lifetime includes qty 2 3cell 36 volt nicd subc
        battery rechargeable upto 1000 times 1 piggy back chargerholder 1 120v
        ac charge cord 1 12v dc charge cord 841 inch length 162 inch major
        diameter 117 inch body diameter light output 350 lumens on high 175
        lumens on medium 85 lumens on low streamlight blue 552480010m1272
        toolsandhomeimprovement

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'phone cable flat 4 wire solid silver 1000ft 26awg wire solid 1000ft phone cable flat 4 wire solid silver 1000ft 26awg allows you to connect your telephones faxes answering machines and most modems perfect for all your custom installation projects 1000ft roll bulk phone cable flat cable silver color 4 conductor 26 awg solid copper ul listed 815239013642 otherelectronics',
    'phone cable flat 4 wire solid silver 1000ft 26awg wire solid 1000ft phone cable flat 4 wire solid silver 1000ft 26awg allows you to connect your telephones faxes answering machines and most modems perfect for all your custom installation projects 1000ft roll bulk phone cable flat cable silver color 4 conductor 26 awg solid copper ul listed 815239013642 otherelectronics',
    'soul black gb 2013 audi a4 allroad quattro canada market body middle armrest front pr6e3gb fz period 1111 gb 8k0864207jtq8 automotive',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 281,362 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 14 tokens
    • mean: 77.68 tokens
    • max: 384 tokens
    • min: 20 tokens
    • mean: 79.97 tokens
    • max: 384 tokens
  • Samples:
    anchor positive
    glue tamiya cement 40ml 12 johnn johnny herbert gb shunko models marking livery 120 scale lotus ford type 102d camel 11 tam20033 and tam20034 ref shkd310 decals markings f1 cars 90 years spotmodel derek warwick japan grand prix 1992 water slide decals assembly instructions for references tam20030 tamiya tam87003 automotive glue tamiya cement 40ml shunko models marking livery 120 scale benetton ford b192 camel 19 20 michael schumacher de martin brundle gb fia formula 1 world championship 1992 water slide decals and assembly instructions for reference tam20036 ref shkd281 decals markings f1 cars 90 years spotmodel tamiya tam87003 automotive
    hose clamp 29325 mm range 12 width spring type 1995 bmw 325i base sedan radiators page 3 mubea sc2932512m219 automotive hose clamp 29325 mm range 12 width spring type bmw 7series e65 20022008 cooling system miscellaneous page 1 mubea sc2932512m219 automotive part 07129952131boe more info 760i 200406 760li 200308 part 11151726339m395 more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 16121180240m395 more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 16121180240boe more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 16121180242boe more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 32411156956m395 more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 32411156956boe more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 32411712735boe more info 745i and 745li 200205 760i 200406 760li 200308 alpina b7 200708 part 32416751127m9 more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 64218367179boe more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 07129952102boe more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 07129952123boe more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 12511309471boe more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 16121176918boe more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308 alpina b7 200708 part 11631716970boe more info 745i and 745li 200205 750i and 750li 200608 760i 200406 760li 200308
    serial rj45 interlocking cable codak17463008 zebra europe qlrwp4t series lithium ion fast charger codat187373 zebra serial rj45 interlocking cable zebra ak17463008 computersandaccessories zebra universal accessories other by totalbarcodecom zebra ak17463008 kit mod plug to 9pin db pc cable ak17463008 computersandaccessories
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 70,341 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 21 tokens
    • mean: 83.4 tokens
    • max: 384 tokens
    • min: 19 tokens
    • mean: 83.0 tokens
    • max: 384 tokens
  • Samples:
    anchor positive
    coolant antifreeze blue 1 liter 1996 bmw 318is base coupe radiators page 1 note approved for all bmw and mini engines concentrate for distilled water see part 55 7864 010 fuchs maintain fricofin 82142209769m865 automotive coolant antifreeze blue 1 liter 1996 bmw 318is base coupe radiators page 1 note approved for all bmw and mini engines concentrate for distilled water see part 55 7864 010 genuine bmw 82142209769m9 automotive
    sealing compound loctite rtv 5699 gray silicone gasket maker 80 ml tube and supplies page 2 1991 bmw 318i base convertible engine rebuilding kits tools note high performance and noncorrosive designed for high torque applications loctite 37464m258 automotive sealing compound loctite rtv 5699 gray silicone gasket maker 80 ml tube and supplies page 2 1991 bmw 318i base convertible engine rebuilding kits tools note high performance and noncorrosive designed for high torque applications loctite 37464m258 automotive
    lexmark remanufactured 18c2090 14 black ink cartridge lexmark x2630 cartridges 4inkjets remanlx14 officeproducts remanufactured lexmark inkjet cartridge 18c2090 14 black ink lexmark z2320 ink cartridges and printer supplies inkcartridges remanlx14 officeproducts
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 1e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • auto_find_batch_size: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: True
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
0.1990 7000 0.0113 0.0031
0.3981 14000 0.0022 0.0019
0.5971 21000 0.0019 0.0012
0.7961 28000 0.0017 0.0012
0.9951 35000 0.0013 0.0011
1.1942 42000 0.0012 0.0008
1.3932 49000 0.0005 0.0008

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.0
  • PyTorch: 2.2.1
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, 
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}