Sentence Similarity
sentence-transformers
Safetensors
modernbert
feature-extraction
Generated from Trainer
dataset_size:989216
loss:MultipleNegativesRankingLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use bobox/synt-dataset-multi-task with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use bobox/synt-dataset-multi-task with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("bobox/synt-dataset-multi-task") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:989216
- loss:MultipleNegativesRankingLoss
base_model: ibm-granite/granite-embedding-97m-multilingual-r2
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
model-index:
- name: >-
SentenceTransformer based on
ibm-granite/granite-embedding-97m-multilingual-r2
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts b
type: sts-b
metrics:
- type: pearson_cosine
value: 0.8441982122790919
name: Pearson Cosine
- type: spearman_cosine
value: 0.8552538368687038
name: Spearman Cosine
SentenceTransformer based on ibm-granite/granite-embedding-97m-multilingual-r2
This is a sentence-transformers model finetuned from ibm-granite/granite-embedding-97m-multilingual-r2 on 12 datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: ibm-granite/granite-embedding-97m-multilingual-r2
- Maximum Sequence Length: 32768 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Supported Modality: Text
- Training Datasets:
- standard_mnrl
- multi_lingual
- STS
- translation
- cross_lingual
- entailment_logic
- information_extraction
- summaryzation
- keyword_semantic_search
- anchor_type_and_intent_symm
- anchor_type_and_intent_asymm
- topic_clustering
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'ModernBertModel'})
(1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'cls', 'include_prompt': False})
(2): Normalize({})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("bobox/synt-dataset-multi-task")
# Run inference
sentences = [
'attachment styles, neurobiological mechanisms, risk-taking behaviors, adolescents, prefrontal cortex, amygdala, oxytocin, dopamine pathways, cortisol regulation, longitudinal correlations, limbic system, executive function',
'Empirical investigations demonstrate that teenagers with secure caregiver bonds generally display controlled engagement in perilous activities, attributable to mature prefrontal inhibitory control. Conversely, anxious-ambivalent attachment correlates with amygdalar hyperactivation precipitating rash actions, while avoidant attachment links to diminished oxytocin reception fostering sensation-seeking. Longitudinal neuroimaging confirms insecure attachments remodel mesolimbic dopamine circuits throughout adolescence, elevating vulnerability to substance use and hazardous conduct. Additionally, glucocorticoid imbalance from persistent stress reactions in insecure dyads compromises risk evaluation capacities. These findings illustrate how early caregiving dynamics shape the maturation of emotional processing and cognitive control systems.',
'Longitudinal studies correlate anxious-ambivalent attachment with increased adolescent anxiety disorders, manifesting as social withdrawal and academic underachievement due to altered HPA axis functioning.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9043, 0.8639],
# [0.9043, 1.0000, 0.8681],
# [0.8639, 0.8681, 1.0000]])
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts-b - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.8442 |
| spearman_cosine | 0.8553 |
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}