bobox's picture
Update README.md
1f83ef0 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:989216
  - loss:MultipleNegativesRankingLoss
base_model: ibm-granite/granite-embedding-97m-multilingual-r2
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
model-index:
  - name: >-
      SentenceTransformer based on
      ibm-granite/granite-embedding-97m-multilingual-r2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts b
          type: sts-b
        metrics:
          - type: pearson_cosine
            value: 0.8441982122790919
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8552538368687038
            name: Spearman Cosine

SentenceTransformer based on ibm-granite/granite-embedding-97m-multilingual-r2

This is a sentence-transformers model finetuned from ibm-granite/granite-embedding-97m-multilingual-r2 on 12 datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: ibm-granite/granite-embedding-97m-multilingual-r2
  • Maximum Sequence Length: 32768 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text
  • Training Datasets:
    • standard_mnrl
    • multi_lingual
    • STS
    • translation
    • cross_lingual
    • entailment_logic
    • information_extraction
    • summaryzation
    • keyword_semantic_search
    • anchor_type_and_intent_symm
    • anchor_type_and_intent_asymm
    • topic_clustering

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'ModernBertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'cls', 'include_prompt': False})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/synt-dataset-multi-task")
# Run inference
sentences = [
    'attachment styles, neurobiological mechanisms, risk-taking behaviors, adolescents, prefrontal cortex, amygdala, oxytocin, dopamine pathways, cortisol regulation, longitudinal correlations, limbic system, executive function',
    'Empirical investigations demonstrate that teenagers with secure caregiver bonds generally display controlled engagement in perilous activities, attributable to mature prefrontal inhibitory control. Conversely, anxious-ambivalent attachment correlates with amygdalar hyperactivation precipitating rash actions, while avoidant attachment links to diminished oxytocin reception fostering sensation-seeking. Longitudinal neuroimaging confirms insecure attachments remodel mesolimbic dopamine circuits throughout adolescence, elevating vulnerability to substance use and hazardous conduct. Additionally, glucocorticoid imbalance from persistent stress reactions in insecure dyads compromises risk evaluation capacities. These findings illustrate how early caregiving dynamics shape the maturation of emotional processing and cognitive control systems.',
    'Longitudinal studies correlate anxious-ambivalent attachment with increased adolescent anxiety disorders, manifesting as social withdrawal and academic underachievement due to altered HPA axis functioning.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9043, 0.8639],
#         [0.9043, 1.0000, 0.8681],
#         [0.8639, 0.8681, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8442
spearman_cosine 0.8553

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}