MPA
/

sambert

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Model card Files Files and versions

MPA commited on Feb 6, 2024

Commit

127abd8

·

verified ·

1 Parent(s): 7892c23

Update README.md

Files changed (1) hide show

README.md +17 -3

README.md CHANGED Viewed

@@ -6,7 +6,8 @@ tags:
 - feature-extraction
 - sentence-similarity
 - transformers
 ---
 # {MODEL_NAME}
@@ -52,7 +53,7 @@ def mean_pooling(model_output, attention_mask):
 # Sentences we want sentence embeddings for
-sentences = ['This is an example sentence', 'Each sentence is converted']
 # Load model from HuggingFace Hub
 tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
@@ -82,6 +83,9 @@ For an automated evaluation of this model, see the *Sentence Embeddings Benchmar
 ## Training
 The model was trained with the parameters:
 **DataLoader**:
@@ -124,4 +128,14 @@ SentenceTransformer(
 ## Citing & Authors
-<!--- Describe where people can find more information -->

 - feature-extraction
 - sentence-similarity
 - transformers
+language:
+- he
 ---
 # {MODEL_NAME}
 # Sentences we want sentence embeddings for
+sentences = ["אמא הלכה לגן", "אבא הלך לגן", "ירקוני קונה לנו פיצות"]
 # Load model from HuggingFace Hub
 tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
 ## Training
+This model were trained in 2 stages:
+1. Unsupervised - ~2M paragraphs with 'MultipleNegativesRankingLoss' on cls-token
+2. Supervised - ~70k paragraphs with 'CosineSimilarityLoss'
 The model was trained with the parameters:
 **DataLoader**:
 ## Citing & Authors
+<!--- Describe where people can find more information -->
+Based on
+@misc{gueta2022large,
+      title={Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All},
+      author={Eylon Gueta and Avi Shmidman and Shaltiel Shmidman and Cheyn Shmuel Shmidman and Joshua Guedalia and Moshe Koppel and Dan Bareket and Amit Seker and Reut Tsarfaty},
+      year={2022},
+      eprint={2211.15199},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}