metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:68
- loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-mpnet-base-v2
widget:
- source_sentence: >-
The Atlantic spotted dolphin is a dolphin found in warm temperate and
tropical waters of the Atlantic Ocean. Older members of the species have a
very distinctive spotted coloration all over their bodies.
sentences:
- baikal_seal
- southern_right_whale
- atlantic_spotted_dolphin
- source_sentence: >-
The burmeisters porpoise is a marine mammal belonging to the cetaceans
group. It inhabits ocean and coastal habitats worldwide and plays an
important role in marine ecosystems.
sentences:
- false_killer_whale
- burmeisters_porpoise
- south_asian_river_dolphin
- source_sentence: >-
Dall's porpoise is a species of porpoise endemic to the North Pacific. It
is the largest of porpoises and the only member of the genus Phocoenoides.
The species is named after American naturalist W. H. Dall.
sentences:
- dalls_porpoise
- burrunan_dolphin
- bolivian_river_dolphin
- source_sentence: >-
The hourglass dolphin is a small dolphin in the family Delphinidae that
inhabits offshore Antarctic and sub-Antarctic waters. It is commonly seen
from ships crossing the Drake Passage but has a circumpolar distribution.
sentences:
- common_dolphin
- hourglass_dolphin
- harbour_porpoise
- source_sentence: >-
The harp seal, also known as the saddleback seal or Greenland seal, is a
species of earless seal, or true seal, native to the northernmost Atlantic
Ocean and Arctic Ocean. Originally in the genus Phoca with a number of
other species, it was reclassified into the monotypic genus Pagophilus in
1844. In Greek, its scientific name translates to "Greenlandic ice-lover",
and its taxonomic synonym, Phoca groenlandica translates to "Greenlandic
seal". This is the only species in the genus Pagophilus.
sentences:
- harp_seal
- amazon_river_dolphin
- ringed_seal
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-mpnet-base-v2
- Maximum Sequence Length: 384 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'The harp seal, also known as the saddleback seal or Greenland seal, is a species of earless seal, or true seal, native to the northernmost Atlantic Ocean and Arctic Ocean. Originally in the genus Phoca with a number of other species, it was reclassified into the monotypic genus Pagophilus in 1844. In Greek, its scientific name translates to "Greenlandic ice-lover", and its taxonomic synonym, Phoca groenlandica translates to "Greenlandic seal". This is the only species in the genus Pagophilus.',
'harp_seal',
'ringed_seal',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7737, 0.2011],
# [0.7737, 1.0000, 0.4141],
# [0.2011, 0.4141, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 68 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 68 samples:
sentence_0 sentence_1 type string string details - min: 11 tokens
- mean: 101.24 tokens
- max: 226 tokens
- min: 4 tokens
- mean: 6.79 tokens
- max: 12 tokens
- Samples:
sentence_0 sentence_1 Dall's porpoise is a species of porpoise endemic to the North Pacific. It is the largest of porpoises and the only member of the genus Phocoenoides. The species is named after American naturalist W. H. Dall.dalls_porpoiseThe Caspian seal is one of the smallest members of the earless seal family and unique in that it is found exclusively in the brackish Caspian Sea. It lives along the shorelines, but also on the many rocky islands and floating blocks of ice that dot the Caspian Sea. In winter and cooler parts of the spring and autumn season, it populates the northern Caspian coastline. As the ice melts in the summer and warmer parts of the spring and autumn season, it also occurs in the deltas of the Volga and Ural Rivers, as well as the southern latitudes of the Caspian where the water is cooler due to greater depth.caspian_sealThe Weddell seal is a relatively large and abundant true seal with a circumpolar distribution surrounding Antarctica. The Weddell seal was discovered and named in the 1820s during expeditions led by British sealing captain James Weddell to the area of the Southern Ocean now known as the Weddell Sea. The life history of this species is well documented since it occupies fast ice environments close to the Antarctic continent and often adjacent to Antarctic bases. It is the only species in the genus Leptonychotes.weddell_seal - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Training Hyperparameters
Non-Default Hyperparameters
num_train_epochs: 5multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Framework Versions
- Python: 3.10.11
- Sentence Transformers: 5.2.3
- Transformers: 4.56.1
- PyTorch: 2.5.1+cu121
- Accelerate: 1.10.1
- Datasets: 4.0.0
- Tokenizers: 0.22.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}