MPA commited on
Commit
127abd8
verified
1 Parent(s): 7892c23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -3
README.md CHANGED
@@ -6,7 +6,8 @@ tags:
6
  - feature-extraction
7
  - sentence-similarity
8
  - transformers
9
-
 
10
  ---
11
 
12
  # {MODEL_NAME}
@@ -52,7 +53,7 @@ def mean_pooling(model_output, attention_mask):
52
 
53
 
54
  # Sentences we want sentence embeddings for
55
- sentences = ['This is an example sentence', 'Each sentence is converted']
56
 
57
  # Load model from HuggingFace Hub
58
  tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
@@ -82,6 +83,9 @@ For an automated evaluation of this model, see the *Sentence Embeddings Benchmar
82
 
83
 
84
  ## Training
 
 
 
85
  The model was trained with the parameters:
86
 
87
  **DataLoader**:
@@ -124,4 +128,14 @@ SentenceTransformer(
124
 
125
  ## Citing & Authors
126
 
127
- <!--- Describe where people can find more information -->
 
 
 
 
 
 
 
 
 
 
 
6
  - feature-extraction
7
  - sentence-similarity
8
  - transformers
9
+ language:
10
+ - he
11
  ---
12
 
13
  # {MODEL_NAME}
 
53
 
54
 
55
  # Sentences we want sentence embeddings for
56
+ sentences = ["讗诪讗 讛诇讻讛 诇讙谉", "讗讘讗 讛诇讱 诇讙谉", "讬专拽讜谞讬 拽讜谞讛 诇谞讜 驻讬爪讜转"]
57
 
58
  # Load model from HuggingFace Hub
59
  tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
 
83
 
84
 
85
  ## Training
86
+ This model were trained in 2 stages:
87
+ 1. Unsupervised - ~2M paragraphs with 'MultipleNegativesRankingLoss' on cls-token
88
+ 2. Supervised - ~70k paragraphs with 'CosineSimilarityLoss'
89
  The model was trained with the parameters:
90
 
91
  **DataLoader**:
 
128
 
129
  ## Citing & Authors
130
 
131
+ <!--- Describe where people can find more information -->
132
+ Based on
133
+
134
+ @misc{gueta2022large,
135
+ title={Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All},
136
+ author={Eylon Gueta and Avi Shmidman and Shaltiel Shmidman and Cheyn Shmuel Shmidman and Joshua Guedalia and Moshe Koppel and Dan Bareket and Amit Seker and Reut Tsarfaty},
137
+ year={2022},
138
+ eprint={2211.15199},
139
+ archivePrefix={arXiv},
140
+ primaryClass={cs.CL}
141
+ }