Ericu950 commited on
Commit
08ab0e7
·
verified ·
1 Parent(s): 9548e4e

Initial release: v3-sentences

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,543 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:33406
9
+ - loss:MultipleNegativesRankingLoss
10
+ - loss:OnlineContrastiveLoss
11
+ widget:
12
+ - source_sentence: παρεκελεύσατο οὖν λαβοῦσαν τὸν ἐξ αὑτοῦ πεσόντα γόνον, καὶ τούτῳ
13
+ προσμίξασαν ἔλαιον καὶ τὸ ἀπὸ τῆς ἀκίδος ἀποστάζον αἷμα, χρῖσαι τὸν χιτῶνα τοῦ
14
+ Ἡρακλέους.
15
+ sentences:
16
+ - 'Då föll hans medtjänare till hans fötter och bad honom och sade: hav tålamod
17
+ med mig, så skall jag betala dig allt.'
18
+ - Vore de blott til en del eller i allmänhet bestämda, och således til en del obestämda,
19
+ så ägde de blott en half existents, om detta uttryck kan nyttjas, eller rättare
20
+ ingen.
21
+ - När hon försökte vinna tillbaka Herakles med en blodindränkt mantel, frätte den
22
+ dock bort hans hud och kött, varpå han brände sig själv på berget Oite.
23
+ - source_sentence: ὃς γὰρ ἂν ἐπαισχυνθῇ με ἢ τοὺς ἐμοὺς λόγους ἐν τῇ γενεᾷ ταύτῃ τῇ
24
+ μοιχαλίδι καὶ ἁμαρτωλῷ, καὶ ὁ υἱὸς τοῦ ἀνθρώπου ἐπαισχυνθήσεται αὐτόν, ὅταν ἔλθῃ
25
+ ἐν τῇ δόξῃ τοῦ πατρὸς αὐτοῦ μετὰ τῶν ἀγγέλων αὐτοῦ.
26
+ sentences:
27
+ - De som förtjäna lite vill ju alltid hindra dem som förtjäna mera att kapa åt sig
28
+ ännu mera — och överhuvud förneka deras rätt därtill — medan de själva däremot
29
+ med full rätt anse sig kunna ställa anspråk.
30
+ - 'Alltså: Hela Kullens höjd = K.h = Pyramidens bas (P.b); och Högkullens höjd HK.h
31
+ = Pyramidens (P.h) höjd.'
32
+ - Var och en som bättrar sig på detta sätt skall jag genast gå till mötes som en
33
+ god fader sin felande son.
34
+ - source_sentence: τοῦτον οὖν τὸν τίτλον πολλοὶ ἀνέγνωσαν τῶν Ἰουδαίων, ὅτι ἐγγὺς
35
+ ἦν ὁ τόπος τῆς πόλεως ὅπου ἐσταυρώθη ὁ Ἰησοῦς· καὶ ἦν γεγραμμένον Ἐβραϊστί, Ῥωμαϊστί,
36
+ Ἑλληνιστί.
37
+ sentences:
38
+ - Dessa ingredienser stamma till stor del ur Logiakällan, och det har ju förmodats,
39
+ att denna källa har byggt på samtida anteckningar av »publikanen Matteus», en
40
+ förmodad omedelbar lärjunge till Jesus.
41
+ - Den överskriften läste många av judarna, ty det ställe, där Jesus var korsfäst,
42
+ låg nära staden; och den var avfattad på hebreiska, på latin och på grekiska.
43
+ - Men om någon på denna grund bygger med guld, silver, dyrbara stenar, trä, hö,
44
+ eller strå, så skall vars och ens verk bliva uppenbart, ty dagen skall göra det
45
+ klart, emedan han uppenbaras med eld, och elden skall pröva, hurudant vars och
46
+ ens verk är.
47
+ - source_sentence: λέγει αὐτῷ [ὁ] Πέτρος Κύριε, διὰ τί οὐ δύναμαί σοι ἀκολουθεῖν ἄρτι;
48
+ sentences:
49
+ - 'Jesus svarade honom: »Om jag icke tvår dig, så har du ingen del med mig.»'
50
+ - '9. Tå sade Petrus til henne: Hwarföre woren J öfwerens/ at J skullen fresta Herrans
51
+ Anda?'
52
+ - 'Petrus sade till honom: »Herre, varför kan jag icke följa dig nu?'
53
+ - source_sentence: Οἱ οὖν στρατιῶται ὅτε ἐσταύρωσαν τὸν Ἰησοῦν ἔλαβον τὰ ἱμάτια αὐτοῦ
54
+ καὶ ἐποίησαν τέσσερα μέρη, ἑκάστῳ στρατιώτῃ μέρος, καὶ τὸν χιτῶνα.
55
+ sentences:
56
+ - Då nu krigsmännen hade korsfäst Jesus, togo de hans kläder och delade dem i fyra
57
+ delar, en del åt var krigsman.
58
+ - »Vi veta ju detta, att vår gamla människa har blivit korsfäst med honom, för att
59
+ syndakroppen skall göras om intet, så att vi icke mer tjäna synden.
60
+ - Derefter fatta presterna på hans vink katafalken, lyfta den på sina skuldror och
61
+ bära den med Simon till ett vid tvärskeppet beläget, för presterskapet afskildt
62
+ rum.
63
+ pipeline_tag: sentence-similarity
64
+ library_name: sentence-transformers
65
+ metrics:
66
+ - cosine_accuracy@1
67
+ - cosine_accuracy@5
68
+ - cosine_accuracy@10
69
+ - cosine_accuracy@25
70
+ - cosine_precision@1
71
+ - cosine_precision@5
72
+ - cosine_precision@10
73
+ - cosine_precision@25
74
+ - cosine_precision@100
75
+ - cosine_recall@1
76
+ - cosine_recall@5
77
+ - cosine_recall@10
78
+ - cosine_recall@25
79
+ - cosine_recall@100
80
+ - cosine_ndcg@10
81
+ - cosine_mrr@10
82
+ - cosine_map@100
83
+ model-index:
84
+ - name: SentenceTransformer
85
+ results:
86
+ - task:
87
+ type: information-retrieval
88
+ name: Information Retrieval
89
+ dataset:
90
+ name: labeled src to swe
91
+ type: labeled_src_to_swe
92
+ metrics:
93
+ - type: cosine_accuracy@1
94
+ value: 0.7843866171003717
95
+ name: Cosine Accuracy@1
96
+ - type: cosine_accuracy@5
97
+ value: 0.9368029739776952
98
+ name: Cosine Accuracy@5
99
+ - type: cosine_accuracy@10
100
+ value: 0.9553903345724907
101
+ name: Cosine Accuracy@10
102
+ - type: cosine_accuracy@25
103
+ value: 1.0
104
+ name: Cosine Accuracy@25
105
+ - type: cosine_precision@1
106
+ value: 0.7843866171003717
107
+ name: Cosine Precision@1
108
+ - type: cosine_precision@5
109
+ value: 0.18736059479553904
110
+ name: Cosine Precision@5
111
+ - type: cosine_precision@10
112
+ value: 0.09553903345724908
113
+ name: Cosine Precision@10
114
+ - type: cosine_precision@25
115
+ value: 0.04000000000000001
116
+ name: Cosine Precision@25
117
+ - type: cosine_precision@100
118
+ value: 0.010000000000000002
119
+ name: Cosine Precision@100
120
+ - type: cosine_recall@1
121
+ value: 0.7843866171003717
122
+ name: Cosine Recall@1
123
+ - type: cosine_recall@5
124
+ value: 0.9368029739776952
125
+ name: Cosine Recall@5
126
+ - type: cosine_recall@10
127
+ value: 0.9553903345724907
128
+ name: Cosine Recall@10
129
+ - type: cosine_recall@25
130
+ value: 1.0
131
+ name: Cosine Recall@25
132
+ - type: cosine_recall@100
133
+ value: 1.0
134
+ name: Cosine Recall@100
135
+ - type: cosine_ndcg@10
136
+ value: 0.8804480378872143
137
+ name: Cosine Ndcg@10
138
+ - type: cosine_mrr@10
139
+ value: 0.8553741075116541
140
+ name: Cosine Mrr@10
141
+ - type: cosine_map@100
142
+ value: 0.8582062450441782
143
+ name: Cosine Map@100
144
+ ---
145
+
146
+ # SentenceTransformer
147
+
148
+ This is a [sentence-transformers](https://www.SBERT.net) model trained on the triplets and pairs datasets. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
149
+
150
+ ## Model Details
151
+
152
+ ### Model Description
153
+ - **Model Type:** Sentence Transformer
154
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
155
+ - **Maximum Sequence Length:** 192 tokens
156
+ - **Output Dimensionality:** 1024 dimensions
157
+ - **Similarity Function:** Cosine Similarity
158
+ - **Training Datasets:**
159
+ - triplets
160
+ - pairs
161
+ <!-- - **Language:** Unknown -->
162
+ <!-- - **License:** Unknown -->
163
+
164
+ ### Model Sources
165
+
166
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
167
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
168
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
169
+
170
+ ### Full Model Architecture
171
+
172
+ ```
173
+ SentenceTransformer(
174
+ (0): Transformer({'max_seq_length': 192, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
175
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
176
+ (2): Normalize()
177
+ )
178
+ ```
179
+
180
+ ## Usage
181
+
182
+ ### Direct Usage (Sentence Transformers)
183
+
184
+ First install the Sentence Transformers library:
185
+
186
+ ```bash
187
+ pip install -U sentence-transformers
188
+ ```
189
+
190
+ Then you can load this model and run inference.
191
+ ```python
192
+ from sentence_transformers import SentenceTransformer
193
+
194
+ # Download from the 🤗 Hub
195
+ model = SentenceTransformer("sentence_transformers_model_id")
196
+ # Run inference
197
+ sentences = [
198
+ 'Οἱ οὖν στρατιῶται ὅτε ἐσταύρωσαν τὸν Ἰησοῦν ἔλαβον τὰ ἱμάτια αὐτοῦ καὶ ἐποίησαν τέσσερα μέρη, ἑκάστῳ στρατιώτῃ μέρος, καὶ τὸν χιτῶνα.',
199
+ 'Då nu krigsmännen hade korsfäst Jesus, togo de hans kläder och delade dem i fyra delar, en del åt var krigsman.',
200
+ 'Derefter fatta presterna på hans vink katafalken, lyfta den på sina skuldror och bära den med Simon till ett vid tvärskeppet beläget, för presterskapet afskildt rum.',
201
+ ]
202
+ embeddings = model.encode(sentences)
203
+ print(embeddings.shape)
204
+ # [3, 1024]
205
+
206
+ # Get the similarity scores for the embeddings
207
+ similarities = model.similarity(embeddings, embeddings)
208
+ print(similarities)
209
+ # tensor([[1.0000, 0.8955, 0.3442],
210
+ # [0.8955, 1.0000, 0.3027],
211
+ # [0.3442, 0.3027, 1.0000]])
212
+ ```
213
+
214
+ <!--
215
+ ### Direct Usage (Transformers)
216
+
217
+ <details><summary>Click to see the direct usage in Transformers</summary>
218
+
219
+ </details>
220
+ -->
221
+
222
+ <!--
223
+ ### Downstream Usage (Sentence Transformers)
224
+
225
+ You can finetune this model on your own dataset.
226
+
227
+ <details><summary>Click to expand</summary>
228
+
229
+ </details>
230
+ -->
231
+
232
+ <!--
233
+ ### Out-of-Scope Use
234
+
235
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
236
+ -->
237
+
238
+ ## Evaluation
239
+
240
+ ### Metrics
241
+
242
+ #### Information Retrieval
243
+
244
+ * Dataset: `labeled_src_to_swe`
245
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
246
+
247
+ | Metric | Value |
248
+ |:---------------------|:-----------|
249
+ | cosine_accuracy@1 | 0.7844 |
250
+ | cosine_accuracy@5 | 0.9368 |
251
+ | cosine_accuracy@10 | 0.9554 |
252
+ | cosine_accuracy@25 | 1.0 |
253
+ | cosine_precision@1 | 0.7844 |
254
+ | cosine_precision@5 | 0.1874 |
255
+ | cosine_precision@10 | 0.0955 |
256
+ | cosine_precision@25 | 0.04 |
257
+ | cosine_precision@100 | 0.01 |
258
+ | cosine_recall@1 | 0.7844 |
259
+ | cosine_recall@5 | 0.9368 |
260
+ | cosine_recall@10 | 0.9554 |
261
+ | cosine_recall@25 | 1.0 |
262
+ | cosine_recall@100 | 1.0 |
263
+ | **cosine_ndcg@10** | **0.8804** |
264
+ | cosine_mrr@10 | 0.8554 |
265
+ | cosine_map@100 | 0.8582 |
266
+
267
+ <!--
268
+ ## Bias, Risks and Limitations
269
+
270
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
271
+ -->
272
+
273
+ <!--
274
+ ### Recommendations
275
+
276
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
277
+ -->
278
+
279
+ ## Training Details
280
+
281
+ ### Training Datasets
282
+
283
+ #### triplets
284
+
285
+ * Dataset: triplets
286
+ * Size: 16,035 training samples
287
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
288
+ * Approximate statistics based on the first 1000 samples:
289
+ | | anchor | positive | negative |
290
+ |:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
291
+ | type | string | string | string |
292
+ | details | <ul><li>min: 28 tokens</li><li>mean: 68.48 tokens</li><li>max: 145 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 39.62 tokens</li><li>max: 93 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 34.04 tokens</li><li>max: 192 tokens</li></ul> |
293
+ * Samples:
294
+ | anchor | positive | negative |
295
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
296
+ | <code>Οἱ οὖν στρατιῶται ὅτε ἐσταύρωσαν τὸν Ἰησοῦν ἔλαβον τὰ ἱμάτια αὐτοῦ καὶ ἐποίησαν τέσσερα μέρη, ἑκάστῳ στρατιώτῃ μέρος, καὶ τὸν χιτῶνα.</code> | <code>Då nu krigsmännen hade korsfäst Jesus, togo de hans kläder och delade dem i fyra delar, en del åt var krigsman.</code> | <code>Franciskanermunkarna kommo nu med sin bår och när de hade klädt honom i kåpa och rep, drogo de ned hufvan öfver hans anlete.</code> |
297
+ | <code>Οἱ οὖν στρατιῶται ὅτε ἐσταύρωσαν τὸν Ἰησοῦν ἔλαβον τὰ ἱμάτια αὐτοῦ καὶ ἐποίησαν τέσσερα μέρη, ἑκάστῳ στρατιώτῃ μέρος, καὶ τὸν χιτῶνα.</code> | <code>Då nu krigsmännen hade korsfäst Jesus, togo de hans kläder och delade dem i fyra delar, en del åt var krigsman.</code> | <code>De togo hans krus.</code> |
298
+ | <code>Οἱ οὖν στρατιῶται ὅτε ἐσταύρωσαν τὸν Ἰησοῦν ἔλαβον τὰ ἱμάτια αὐτοῦ καὶ ἐποίησαν τέσσερα μέρη, ἑκάστῳ στρατιώτῃ μέρος, καὶ τὸν χιτῶνα.</code> | <code>Då nu krigsmännen hade korsfäst Jesus, togo de hans kläder och delade dem i fyra delar, en del åt var krigsman.</code> | <code>Derefter fatta presterna på hans vink katafalken, lyfta den på sina skuldror och bära den med Simon till ett vid tvärskeppet beläget, för presterskapet afskildt rum.</code> |
299
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
300
+ ```json
301
+ {
302
+ "scale": 20.0,
303
+ "similarity_fct": "cos_sim",
304
+ "gather_across_devices": false,
305
+ "directions": [
306
+ "query_to_doc"
307
+ ],
308
+ "partition_mode": "joint",
309
+ "hardness_mode": null,
310
+ "hardness_strength": 0.0
311
+ }
312
+ ```
313
+
314
+ #### pairs
315
+
316
+ * Dataset: pairs
317
+ * Size: 17,371 training samples
318
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
319
+ * Approximate statistics based on the first 1000 samples:
320
+ | | sentence1 | sentence2 | label |
321
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
322
+ | type | string | string | int |
323
+ | details | <ul><li>min: 7 tokens</li><li>mean: 93.08 tokens</li><li>max: 192 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 45.67 tokens</li><li>max: 192 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
324
+ * Samples:
325
+ | sentence1 | sentence2 | label |
326
+ |:-----------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
327
+ | <code>λέγω γὰρ ὑμῖν, οὐ μὴ πίω ἀπὸ τοῦ νῦν ἀπὸ τοῦ γενήματος τῆς ἀμπέλου ἕως οὗ ἡ βασιλεία τοῦ θεοῦ ἔλθῃ.</code> | <code>ty jag säger eder att jag icke mer skall fira denna högtid, förrän den kommer till fullbordan i Guds rike.»</code> | <code>0</code> |
328
+ | <code>ἐξῆλθεν οὖν ὁ μαθητὴς ὁ ἄλλος ὁ γνωστὸς τοῦ ἀρχιερέως καὶ εἶπεν τῇ θυρωρῷ καὶ εἰσήγαγεν τὸν Πέτρον.</code> | <code>Den lärjungen var bekant med översteprästen och gick med Jesus in på översteprästens gård; men Petrus stod utanför vid porten.</code> | <code>0</code> |
329
+ | <code>ὥστε οἱ ἐκ πίστεως εὐλογοῦνται σὺν τῷ πιστῷ Ἀβραάμ.</code> | <code>Veten därför, att de som äro av tron, de äro Abrahams barn.</code> | <code>0</code> |
330
+ * Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
331
+
332
+ ### Training Hyperparameters
333
+ #### Non-Default Hyperparameters
334
+
335
+ - `per_device_train_batch_size`: 32
336
+ - `num_train_epochs`: 1.0
337
+ - `learning_rate`: 2e-06
338
+ - `warmup_steps`: 0.05
339
+ - `bf16`: True
340
+ - `eval_strategy`: steps
341
+ - `per_device_eval_batch_size`: 128
342
+ - `load_best_model_at_end`: True
343
+ - `dataloader_drop_last`: True
344
+ - `dataloader_num_workers`: 4
345
+ - `ddp_find_unused_parameters`: True
346
+ - `batch_sampler`: no_duplicates
347
+ - `multi_dataset_batch_sampler`: round_robin
348
+
349
+ #### All Hyperparameters
350
+ <details><summary>Click to expand</summary>
351
+
352
+ - `per_device_train_batch_size`: 32
353
+ - `num_train_epochs`: 1.0
354
+ - `max_steps`: -1
355
+ - `learning_rate`: 2e-06
356
+ - `lr_scheduler_type`: linear
357
+ - `lr_scheduler_kwargs`: None
358
+ - `warmup_steps`: 0.05
359
+ - `optim`: adamw_torch_fused
360
+ - `optim_args`: None
361
+ - `weight_decay`: 0.0
362
+ - `adam_beta1`: 0.9
363
+ - `adam_beta2`: 0.999
364
+ - `adam_epsilon`: 1e-08
365
+ - `optim_target_modules`: None
366
+ - `gradient_accumulation_steps`: 1
367
+ - `average_tokens_across_devices`: True
368
+ - `max_grad_norm`: 1.0
369
+ - `label_smoothing_factor`: 0.0
370
+ - `bf16`: True
371
+ - `fp16`: False
372
+ - `bf16_full_eval`: False
373
+ - `fp16_full_eval`: False
374
+ - `tf32`: None
375
+ - `gradient_checkpointing`: False
376
+ - `gradient_checkpointing_kwargs`: None
377
+ - `torch_compile`: False
378
+ - `torch_compile_backend`: None
379
+ - `torch_compile_mode`: None
380
+ - `use_liger_kernel`: False
381
+ - `liger_kernel_config`: None
382
+ - `use_cache`: False
383
+ - `neftune_noise_alpha`: None
384
+ - `torch_empty_cache_steps`: None
385
+ - `auto_find_batch_size`: False
386
+ - `log_on_each_node`: True
387
+ - `logging_nan_inf_filter`: True
388
+ - `include_num_input_tokens_seen`: no
389
+ - `log_level`: passive
390
+ - `log_level_replica`: warning
391
+ - `disable_tqdm`: False
392
+ - `project`: huggingface
393
+ - `trackio_space_id`: trackio
394
+ - `eval_strategy`: steps
395
+ - `per_device_eval_batch_size`: 128
396
+ - `prediction_loss_only`: True
397
+ - `eval_on_start`: False
398
+ - `eval_do_concat_batches`: True
399
+ - `eval_use_gather_object`: False
400
+ - `eval_accumulation_steps`: None
401
+ - `include_for_metrics`: []
402
+ - `batch_eval_metrics`: False
403
+ - `save_only_model`: False
404
+ - `save_on_each_node`: False
405
+ - `enable_jit_checkpoint`: False
406
+ - `push_to_hub`: False
407
+ - `hub_private_repo`: None
408
+ - `hub_model_id`: None
409
+ - `hub_strategy`: every_save
410
+ - `hub_always_push`: False
411
+ - `hub_revision`: None
412
+ - `load_best_model_at_end`: True
413
+ - `ignore_data_skip`: False
414
+ - `restore_callback_states_from_checkpoint`: False
415
+ - `full_determinism`: False
416
+ - `seed`: 42
417
+ - `data_seed`: None
418
+ - `use_cpu`: False
419
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
420
+ - `parallelism_config`: None
421
+ - `dataloader_drop_last`: True
422
+ - `dataloader_num_workers`: 4
423
+ - `dataloader_pin_memory`: True
424
+ - `dataloader_persistent_workers`: False
425
+ - `dataloader_prefetch_factor`: None
426
+ - `remove_unused_columns`: True
427
+ - `label_names`: None
428
+ - `train_sampling_strategy`: random
429
+ - `length_column_name`: length
430
+ - `ddp_find_unused_parameters`: True
431
+ - `ddp_bucket_cap_mb`: None
432
+ - `ddp_broadcast_buffers`: False
433
+ - `ddp_backend`: None
434
+ - `ddp_timeout`: 1800
435
+ - `fsdp`: []
436
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
437
+ - `deepspeed`: None
438
+ - `debug`: []
439
+ - `skip_memory_metrics`: True
440
+ - `do_predict`: False
441
+ - `resume_from_checkpoint`: None
442
+ - `warmup_ratio`: None
443
+ - `local_rank`: -1
444
+ - `prompts`: None
445
+ - `batch_sampler`: no_duplicates
446
+ - `multi_dataset_batch_sampler`: round_robin
447
+ - `router_mapping`: {}
448
+ - `learning_rate_mapping`: {}
449
+
450
+ </details>
451
+
452
+ ### Training Logs
453
+ | Epoch | Step | Training Loss | labeled_src_to_swe_cosine_ndcg@10 |
454
+ |:-------:|:-------:|:-------------:|:---------------------------------:|
455
+ | -1 | -1 | - | 0.8786 |
456
+ | 0.04 | 10 | 0.0745 | - |
457
+ | 0.08 | 20 | 0.0752 | - |
458
+ | 0.1 | 25 | - | 0.8772 |
459
+ | 0.12 | 30 | 0.0674 | - |
460
+ | 0.16 | 40 | 0.0577 | - |
461
+ | 0.2 | 50 | 0.0564 | 0.8777 |
462
+ | 0.24 | 60 | 0.0498 | - |
463
+ | 0.28 | 70 | 0.0479 | - |
464
+ | 0.3 | 75 | - | 0.8796 |
465
+ | 0.32 | 80 | 0.0439 | - |
466
+ | 0.36 | 90 | 0.0432 | - |
467
+ | 0.4 | 100 | 0.0406 | 0.8796 |
468
+ | 0.44 | 110 | 0.0400 | - |
469
+ | 0.48 | 120 | 0.0356 | - |
470
+ | 0.5 | 125 | - | 0.8796 |
471
+ | 0.52 | 130 | 0.0417 | - |
472
+ | 0.56 | 140 | 0.0421 | - |
473
+ | 0.6 | 150 | 0.0534 | 0.8796 |
474
+ | 0.64 | 160 | 0.0335 | - |
475
+ | 0.68 | 170 | 0.0370 | - |
476
+ | **0.7** | **175** | **-** | **0.8809** |
477
+ | 0.72 | 180 | 0.0385 | - |
478
+ | 0.76 | 190 | 0.0326 | - |
479
+ | 0.8 | 200 | 0.0344 | 0.8804 |
480
+ | 0.84 | 210 | 0.0337 | - |
481
+ | 0.88 | 220 | 0.0313 | - |
482
+ | 0.9 | 225 | - | 0.8804 |
483
+ | 0.92 | 230 | 0.0335 | - |
484
+ | 0.96 | 240 | 0.0328 | - |
485
+
486
+ * The bold row denotes the saved checkpoint.
487
+
488
+ ### Framework Versions
489
+ - Python: 3.12.3
490
+ - Sentence Transformers: 5.3.0
491
+ - Transformers: 5.3.0
492
+ - PyTorch: 2.9.0+cu128
493
+ - Accelerate: 1.13.0
494
+ - Datasets: 4.8.5
495
+ - Tokenizers: 0.22.2
496
+
497
+ ## Citation
498
+
499
+ ### BibTeX
500
+
501
+ #### Sentence Transformers
502
+ ```bibtex
503
+ @inproceedings{reimers-2019-sentence-bert,
504
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
505
+ author = "Reimers, Nils and Gurevych, Iryna",
506
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
507
+ month = "11",
508
+ year = "2019",
509
+ publisher = "Association for Computational Linguistics",
510
+ url = "https://arxiv.org/abs/1908.10084",
511
+ }
512
+ ```
513
+
514
+ #### MultipleNegativesRankingLoss
515
+ ```bibtex
516
+ @misc{oord2019representationlearningcontrastivepredictive,
517
+ title={Representation Learning with Contrastive Predictive Coding},
518
+ author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
519
+ year={2019},
520
+ eprint={1807.03748},
521
+ archivePrefix={arXiv},
522
+ primaryClass={cs.LG},
523
+ url={https://arxiv.org/abs/1807.03748},
524
+ }
525
+ ```
526
+
527
+ <!--
528
+ ## Glossary
529
+
530
+ *Clearly define terms in order to be accessible across audiences.*
531
+ -->
532
+
533
+ <!--
534
+ ## Model Card Authors
535
+
536
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
537
+ -->
538
+
539
+ <!--
540
+ ## Model Card Contact
541
+
542
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
543
+ -->
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_cross_attention": false,
3
+ "architectures": [
4
+ "XLMRobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "dtype": "float32",
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 1024,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 4096,
16
+ "is_decoder": false,
17
+ "layer_norm_eps": 1e-05,
18
+ "max_position_embeddings": 8194,
19
+ "model_type": "xlm-roberta",
20
+ "num_attention_heads": 16,
21
+ "num_hidden_layers": 24,
22
+ "output_past": true,
23
+ "pad_token_id": 1,
24
+ "position_embedding_type": "absolute",
25
+ "tie_word_embeddings": true,
26
+ "transformers_version": "5.3.0",
27
+ "type_vocab_size": 1,
28
+ "use_cache": true,
29
+ "vocab_size": 250002
30
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.3.0",
4
+ "transformers": "5.3.0",
5
+ "pytorch": "2.9.0+cu128"
6
+ },
7
+ "model_type": "SentenceTransformer",
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afaeaeab7a564b3714920717ae627a4d01639df919707e86d2ca6be3eee0f2b7
3
+ size 2271064408
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 192,
3
+ "do_lower_case": false
4
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:826dde65dfa2758657bbb0543df764e99dd9bb616f972dac0240af0a226b0746
3
+ size 17098338
tokenizer_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<s>",
5
+ "clean_up_tokenization_spaces": true,
6
+ "cls_token": "<s>",
7
+ "eos_token": "</s>",
8
+ "is_local": true,
9
+ "mask_token": "<mask>",
10
+ "max_length": 256,
11
+ "model_max_length": 192,
12
+ "pad_to_multiple_of": null,
13
+ "pad_token": "<pad>",
14
+ "pad_token_type_id": 0,
15
+ "padding_side": "right",
16
+ "sep_token": "</s>",
17
+ "sp_model_kwargs": {},
18
+ "stride": 0,
19
+ "tokenizer_class": "XLMRobertaTokenizer",
20
+ "truncation_side": "right",
21
+ "truncation_strategy": "longest_first",
22
+ "unk_token": "<unk>"
23
+ }