ponteineptique commited on
Commit
2f67c47
·
verified ·
1 Parent(s): 74d0290

Upload 10 files

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,457 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:99840
9
+ - loss:MultipleNegativesRankingLoss
10
+ widget:
11
+ - source_sentence: na. es. sui .s. siqs aut̃. It nrͣ hodie ꝙ ea demũ sit ma. ⁊ a.
12
+ g̃iadħ de ni ma. ⁊ a. siadħ de nolib ubis ⁊ młrib e Uia ex ꝯcubinis filu nascũt᷑
13
+ uales ⁊ te nalib faluꝰ usdeimꝰ ⁊ de mr̃ib eoꝵ .s. qui dicãt᷑ nales. ⁊ ꝙͣtũ
14
+ pocut eiꝰ relinqͥ. ł int̾ iuiuoꝵ. ⁊ ĩ ucima nol̃tate quoqͣ.tdari ⁊ postea ꝓseqũ
15
+ teꝰ denorẽ ice. dicemꝰ quib ex cai lb̾i uales fi ant sus .i. redigãt᷑ in potatẽ
16
+ ꝑentũ. ⁊ de h tͣctatu
17
+ sentences:
18
+ - na. es. sui .s. siqs aut̃. It nrͣ hodie ꝙ ea demũ sit ma. ⁊ a. g̃iadħ de ni ma.
19
+ ⁊ a. siadħ de nolib ubis ⁊ młrib e Uia ex ꝯcubinis filu nascũt᷑ uales ⁊ te
20
+ nalib faluꝰ usdeimꝰ ⁊ de mr̃ib eoꝵ .s. qui dicãt᷑ nales. ⁊ ꝙͣtũ pocut eiꝰ
21
+ relinqͥ. ł int̾ iuiuoꝵ. ⁊ ĩ ucima nol̃tate quoqͣ.tdari ⁊ postea ꝓseqũ teꝰ denorẽ
22
+ ice. dicemꝰ quib ex cai lb̾i uales fi ant sus .i. redigãt᷑ in potatẽ ꝑentũ.
23
+ ⁊ de h tͣctatu
24
+ - 'illius excubaret: ibidem ꝓ fide xp̃i aꝑsecutorib tradita est qi cum digna &
25
+ eumenia: & eupe Ciuitate falare: passio scõtu graciliani. & felicissime iurg
26
+ nis. Quoꝵ ora ꝓxp̃o contusi lapidib. dehinc gladio ꝑcusi optatam martytii suscepert̃
27
+ palmam. idus augusti.'
28
+ - 'Et nos Poncius Ugo, Dei gracia Impuriarum comes predictus, promitimus vobis Raimundo
29
+ Xetmario, nomine dicte domne Marchisie, predictam forciam deffendere ab omni homine
30
+ qui a te directum accipere noluerit vel facere. Assigno etiam vobis et dono in
31
+ feudum, in esmendam dicti careu, dictos V squillatas milii, annuatim accipiendas
32
+ in festo Omnium Sanctorum, in omnibus nostris directis et taschis quas accipimus
33
+ in stagno de Cils. Actum est hoc VII kalendas novembris anno Domini MºCCºLXXº
34
+ octavo. Sig(+)num Raimundi Xetmarii predicti, qui hoc firmo et laudo. Sig(+)num
35
+ Ponci Ugonis, Dei gracia comis Impuriarum predicti, qui hoc firmamus et laudamus.
36
+ Testes huius rei sunt: Bernardus de Palaciolo de Villanova, et Berengarius de
37
+ Lanciano, et Guilelmus Alferici et Simon de Trilia, milites.'
38
+ - source_sentence: co segnito se dicitcis. Quid qͣntis. Qiditeit ti. nabiꝙ s int̾prctatum
39
+ magr̃ubhtͥtał. di ca cis docnite r itdete uencrunt ⁊ intertubi ibi manceto ⁊ mancrunt
40
+ sbidit illo. Hora aut̃ trar q̃i deciina. Eint auł anoicas tẽ srmo ni petͥ unis
41
+ ce ouioui qui aud itrant a ioilt ⁊ situti fucrant guns. Inicrut hu trem suilss
42
+ mont̃ ⁊ dicit ci. quutumui inesilq intͥpꝰ fͤ Et arduxi cum ad ihͥm. Intuit uit᷑issctil
43
+ ois.
44
+ sentences:
45
+ - 'Si qua Deos tangit pietas, Astraeaque vivit, Castigatque Reum torto Rhamnusia
46
+ plumbo: Et te poena manet Physignathe, nec Rhadamanthi Effugies uncum.'
47
+ - Sic absentibus ducibus praedictis et caeteris regni primatibus, reconciliatus
48
+ est rex Saxonibus simulatorie, et cum eis ad usque Goslariam pervenit, non multum
49
+ tamen confidens in illis. Roudolfus dux et caeteri rebelles reconciliantur regi.
50
+ - co segnito se dicitcis. Quid qͣntis. Qiditeit ti. nabiꝙ s int̾prctatum magr̃ubhtͥtał.
51
+ di ca cis docnite r itdete uencrunt ⁊ intertubi ibi manceto ⁊ mancrunt sbidit
52
+ illo. Hora aut̃ trar q̃i deciina. Eint auł anoicas tẽ srmo ni petͥ unis ce ouioui
53
+ qui aud itrant a ioilt ⁊ situti fucrant guns. Inicrut hu trem suilss mont̃ ⁊ dicit
54
+ ci. quutumui inesilq intͥpꝰ fͤ Et arduxi cum ad ihͥm. Intuit uit᷑issctil ois.
55
+ - source_sentence: 'israhelitieo popło inͦ deserto p̃cessit: ipse in euan geliis deserta
56
+ gentiũ uisitauit: Et ꝑ quẽ tunc in srna huic generationi manna defluxit: ipse
57
+ xp̃ianȩ genti. corporis sanguinis sui manna in xecclesia subministrat.'
58
+ sentences:
59
+ - Sed ne quid videretur omissum, aut nostro potuisset dubium cordi remanere, ad
60
+ beati Petri sacratissimum corpus districta eum ex abundanti fecimus sacramenta
61
+ praebere. Quibus praestitis, magna sumus exsultatione gavisi, quod hujuscemodi
62
+ experimento innocentia ejus evidenter enituit. Pro qua re gloria vestra praedictum
63
+ virum cum omni charitate suscipiat, et reverentiam ei, qualem sacerdoti decet,
64
+ exhibeat, nec quaedam cordibus remaneat de iis quae sunt jam purgata dubietas.
65
+ Sed ita suprascripto vos episcopo devotissime oportet in omnibus adhaerere, ut
66
+ congrue decenterque Deum in ejus persona cujus minister est videamini honorare.
67
+ EPISTOLA XXXIV.
68
+ - La char d'une joe de beuf tranchee par lesches et mise en pasté, et puis, quant
69
+ elle est cuicte, gecter la saulse d'un halebran dedens. En la haste menue d'un
70
+ pourcel n'a aucun appareil a faire fors le laver et embrocher et enveloper de
71
+ sa taye, et cuire longuement. Poules farcies coulourees ou dorees.
72
+ - 'israhelitieo popło inͦ deserto p̃cessit: ipse in euan geliis deserta gentiũ
73
+ uisitauit: Et ꝑ quẽ tunc in srna huic generationi manna defluxit: ipse xp̃ianȩ
74
+ genti. corporis sanguinis sui manna in xecclesia subministrat.'
75
+ - source_sentence: 'ad renonationem ietłm ꝑueniret. Ee xagesim oita scdo anno regno
76
+ rogis chasdeoꝵ subũso. cui ad huc octo anti: ad regnandͥ: restabant. oꝵ etiam
77
+ sexta insio damelis ostendit pp̃ cãm : ettinctim. datiꝰ qui medis imꝑabat u septima
78
+ uisio danieł ostendit memoratori tegno fucces sit. Cmꝰ pͥmo anno regtui: ut decimauisio
79
+ ba ncł ostendit. supputatis unis uidens aꝓpĩ. qͣre tempꝰ reũsionis. que ꝑ leremiã
80
+ fũat ꝓtais.'
81
+ sentences:
82
+ - 'Nos igitur attendentes, quòd ad Religionem conversi, si fuerint in suis locis
83
+ laudabiliter conversati, illegitimitatis macula non obstante, juris permissione
84
+ licenter possunt ad Ordines promoveri b , discretioni tuæ præsentium auctoritate
85
+ committimus, quatenus cum eodem Presbytero, ejus ad hoc suffragantibus meritis,
86
+ super quibus tuam intendimus conscientiam onerare, quòd hujusmodi non obstante
87
+ defectu in susceptis Ordinibus ministrare, & ad regulares administrationes dicti
88
+ Ordinis dumtaxat assumi valeat, auctoritate nostra dispenses, prout secundùm Deum
89
+ animæ suæ saluti videris expedire: Ita tamen quòd dictus Conradus nullatenus præficiatur
90
+ in eodem Ordine in Ministrum . Datum Romæ apud Sanctam Mariam Majorem sexto Nonas
91
+ Maii, Pontificatus nostri Anno Secundo.'
92
+ - Nam intellectum proferentis in eo significare uox dicitur, quod ipsum auditori
93
+ manifestat, dum consimilem in auditore generat. Unde Priscianus articulatam, id
94
+ est significatiuam, uocem esse dicit, quae coartatur cum sensu proferentis, id
95
+ est quam ipse proferens intendit proferre ad manifestandum intellectum suum. In
96
+ quo quidem uocem articulatam, id est significatiuam, eum accipere dicunt quantum
97
+ ad intellectum proferentis quem manifestat, non ad intellectum auditoris, quem
98
+ generat.
99
+ - 'ad renonationem ietłm ꝑueniret. Ee xagesim oita scdo anno regno rogis chasdeoꝵ
100
+ subũso. cui ad huc octo anti: ad regnandͥ: restabant. oꝵ etiam sexta insio damelis
101
+ ostendit pp̃ cãm : ettinctim. datiꝰ qui medis imꝑabat u septima uisio danieł
102
+ ostendit memoratori tegno fucces sit. Cmꝰ pͥmo anno regtui: ut decimauisio ba
103
+ ncł ostendit. supputatis unis uidens aꝓpĩ. qͣre tempꝰ reũsionis. que ꝑ leremiã
104
+ fũat ꝓtais.'
105
+ - source_sentence: 'cta test̾i sur: u̾bi qd̾ madauit in mil egones. Q uod disposunt
106
+ ad abrahã. mũti sui ad p̃saaci Et statuit il acob ĩ p̾ceptũ: ⁊ isrł mn testiñ
107
+ etꝰ Dices tibi dabo t̾ram chanaan: fu ncdũ heditatis ur̃e. Dũ e̾e̾nt nũo ocui.
108
+ paucissimi ⁊ ĩcole ouis. Et ꝑtͣni eẽt de gnͣte ĩ gentẽ: ⁊ de regno ad ulũ
109
+ alterũ. Non reliquit hoĩem'
110
+ sentences:
111
+ - p̾mioꝵ. p̃s. b̾ildixit finis tuus inte. Et ĩminitas apee cato. Qua xp̃c donatus
112
+ e̾ ps. p̾ucinsti eũ i bñ. dicidis ¶Infernans quo dupiex .s. adinacio. ps. laudat᷑ͣ
113
+ ptc̃ce indesidus aĩe sue ⁊ ñquis bñdi. et cũ quis sibi tribuit bona que ht̃
114
+ atco. Iob. timebat enĩ ne forte peccau̾int fuii eius. ⁊ bñdix̾int deo incordib
115
+ suis. Corꝑans ẽ ad carnis delecta tr̃em us. or̃s caro feñ. ⁊ oĩs gła euis qiͣ
116
+ d̾r ꝑ ysaiam. ue qͥ niungitis domũ addom̃. ⁊ agr̃ ago copłatis us ad t̾minũ
117
+ ioci. Nñquid ħ̾itabitis uos so
118
+ - 'cta test̾i sur: u̾bi qd̾ madauit in mil egones. Q uod disposunt ad abrahã. mũti
119
+ sui ad p̃saaci Et statuit il acob ĩ p̾ceptũ: ⁊ isrł mn testiñ etꝰ Dices tibi
120
+ dabo t̾ram chanaan: fu ncdũ heditatis ur̃e. Dũ e̾e̾nt nũo ocui. paucissimi
121
+ ⁊ ĩcole ouis. Et ꝑtͣni eẽt de gnͣte ĩ gentẽ: ⁊ de regno ad ulũ alterũ. Non
122
+ reliquit hoĩem'
123
+ - 'uintia orientalium anglorũque fuerint gesta eccdesi astica pastim & strap̃tis
124
+ ul tra diqone pranũ partim re uerentis simi abbatis ̃ relatione conperimus: Atiuero
125
+ in prouintia lindissi que sint gesta eo fidemxp̃i queue suc cessia sacerdotalis
126
+ eꝓterit uollitteris reueren tissii antisteris ciniberti uł aliorũ fidebũ uirorum
127
+ anagubee alidigimus: D auterinnoa danhrsibrorum prouintia eo quo rem pu fueiu
128
+ api ꝑceperunt us quea sp̃ sens ꝑduersas regiones: ineccle siitio aecta non uno
129
+ quolibet auqtere sed fidelimnumerta umtestiũ nulie csar uel neminis sepoterant
130
+ adsertione cagnoui: eceptis his'
131
+ pipeline_tag: sentence-similarity
132
+ library_name: sentence-transformers
133
+ ---
134
+
135
+ # SentenceTransformer
136
+
137
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
138
+
139
+ ## Model Details
140
+
141
+ ### Model Description
142
+ - **Model Type:** Sentence Transformer
143
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
144
+ - **Maximum Sequence Length:** 8192 tokens
145
+ - **Output Dimensionality:** 768 dimensions
146
+ - **Similarity Function:** Cosine Similarity
147
+ <!-- - **Training Dataset:** Unknown -->
148
+ <!-- - **Language:** Unknown -->
149
+ <!-- - **License:** Unknown -->
150
+
151
+ ### Model Sources
152
+
153
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
154
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
155
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
156
+
157
+ ### Full Model Architecture
158
+
159
+ ```
160
+ SentenceTransformer(
161
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
162
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
163
+ )
164
+ ```
165
+
166
+ ## Usage
167
+
168
+ ### Direct Usage (Sentence Transformers)
169
+
170
+ First install the Sentence Transformers library:
171
+
172
+ ```bash
173
+ pip install -U sentence-transformers
174
+ ```
175
+
176
+ Then you can load this model and run inference.
177
+ ```python
178
+ from sentence_transformers import SentenceTransformer
179
+
180
+ # Download from the 🤗 Hub
181
+ model = SentenceTransformer("sentence_transformers_model_id")
182
+ # Run inference
183
+ sentences = [
184
+ 'cta test̾i sur: u̾bi qd̾ madauit in mil egones. Q uod disposunt ad abrahã. mũti sui ad p̃saaci Et statuit il acob ĩ p̾ceptũ: ⁊ isrł mn testiñ etꝰ Dices tibi dabo t̾ram chanaan: fu ncdũ heditatis ur̃e. Dũ e̾e̾nt nũo ocui. paucissimi ⁊ ĩcole ouis. Et ꝑtͣni eẽt de gnͣte ĩ gentẽ: ⁊ de regno ad ulũ alterũ. Non reliquit hoĩem',
185
+ 'cta test̾i sur: u̾bi qd̾ madauit in mil egones. Q uod disposunt ad abrahã. mũti sui ad p̃saaci Et statuit il acob ĩ p̾ceptũ: ⁊ isrł mn testiñ etꝰ Dices tibi dabo t̾ram chanaan: fu ncdũ heditatis ur̃e. Dũ e̾e̾nt nũo ocui. paucissimi ⁊ ĩcole ouis. Et ꝑtͣni eẽt de gnͣte ĩ gentẽ: ⁊ de regno ad ulũ alterũ. Non reliquit hoĩem',
186
+ 'p̾mioꝵ. p̃s. b̾ildixit finis tuus inte. Et ĩminitas apee cato. Qua xp̃c donatus e̾ ps. p̾ucinsti eũ i bñ. dicidis ¶Infernans quo\uf1ac dupiex .s. adinacio. ps. laudat᷑ͣ ptc̃ce indesidus aĩe sue ⁊ ñquis bñdi. et cũ quis sibi tribuit bona que ht̃ atco. Iob. timebat enĩ ne forte peccau̾int fuii eius. ⁊ bñdix̾int deo incordib\uf1ac suis. Corꝑans ẽ ad carnis delecta tr̃em us. or̃s caro feñ. ⁊ oĩs gła euis qiͣ d̾r ꝑ ysaiam. ue qͥ niungitis domũ addom̃. ⁊ agr̃ ago copłatis us\uf1ac ad t̾minũ ioci. Nñquid ħ̾itabitis uos so',
187
+ ]
188
+ embeddings = model.encode(sentences)
189
+ print(embeddings.shape)
190
+ # [3, 768]
191
+
192
+ # Get the similarity scores for the embeddings
193
+ similarities = model.similarity(embeddings, embeddings)
194
+ print(similarities)
195
+ # tensor([[1.0000, 1.0000, 0.2812],
196
+ # [1.0000, 1.0000, 0.2812],
197
+ # [0.2812, 0.2812, 1.0000]])
198
+ ```
199
+
200
+ <!--
201
+ ### Direct Usage (Transformers)
202
+
203
+ <details><summary>Click to see the direct usage in Transformers</summary>
204
+
205
+ </details>
206
+ -->
207
+
208
+ <!--
209
+ ### Downstream Usage (Sentence Transformers)
210
+
211
+ You can finetune this model on your own dataset.
212
+
213
+ <details><summary>Click to expand</summary>
214
+
215
+ </details>
216
+ -->
217
+
218
+ <!--
219
+ ### Out-of-Scope Use
220
+
221
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
222
+ -->
223
+
224
+ <!--
225
+ ## Bias, Risks and Limitations
226
+
227
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
228
+ -->
229
+
230
+ <!--
231
+ ### Recommendations
232
+
233
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
234
+ -->
235
+
236
+ ## Training Details
237
+
238
+ ### Training Dataset
239
+
240
+ #### Unnamed Dataset
241
+
242
+ * Size: 99,840 training samples
243
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
244
+ * Approximate statistics based on the first 1000 samples:
245
+ | | sentence_0 | sentence_1 |
246
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
247
+ | type | string | string |
248
+ | details | <ul><li>min: 6 tokens</li><li>mean: 85.65 tokens</li><li>max: 473 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 85.65 tokens</li><li>max: 473 tokens</li></ul> |
249
+ * Samples:
250
+ | sentence_0 | sentence_1 |
251
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
252
+ | <code>Per totum namque mundum est mundus; et mundum persequitur mundus, coinquinatus mundum, perditus redemptum, damnatus salvatum.</code> | <code>Per totum namque mundum est mundus; et mundum persequitur mundus, coinquinatus mundum, perditus redemptum, damnatus salvatum.</code> |
253
+ | <code>motꝰ siait supͣ sepe dixmꝰ gꝰ anteon nem generanonem est motus ge eti am aute generaitionem primi mobilis est mo tus go etiam motus est. inte p̾mum mo tum᷑ ꝙ est impossibile go fint hec caisa ꝙ motus non eet̾ sꝑ momĩ p̾tito iprẽ ꝙ primum mobile oportet᷑ prius generari mẽe et postea moneri qr absq dubio se queret᷑ ꝙ quedam mutatio eet̃ anteil</code> | <code>motꝰ siait supͣ sepe dixmꝰ gꝰ anteon nem generanonem est motus ge eti am aute generaitionem primi mobilis est mo tus go etiam motus est. inte p̾mum mo tum᷑ ꝙ est impossibile go fint hec caisa ꝙ motus non eet̾ sꝑ momĩ p̾tito iprẽ ꝙ primum mobile oportet᷑ prius generari mẽe et postea moneri qr absq dubio se queret᷑ ꝙ quedam mutatio eet̃ anteil</code> |
254
+ | <code>Dictum est, id quod in nomine confuse significaretur, in definitione quae fit enumeratione partium, aperiri atque explicari. Quod fieri non potest, nisi per quarumdam partium nuncupationem; nihil enim dum explicatur oratione, totum simul dici potest. Quae cum ita sint, cumque omnis hujusmodi definitio quaedam sit partium distributio, quatuor his modis fieri potest. Aut enim substantiales partes explicantur, aut proprietatis partes dicuntur, aut quasi totius membra enumerantur, aut tanquam species dividuntur. Substantiales partes explicantur, cum ex genere ac differentiis definitio constituitur. Genus enim quod singulariter praedicatur, speciei totum est. Id genus sumptum in definitione, pars quaedam fit. Non enim solum speciem complet, nisi adjiciantur etiam differentiae, in quibus eadem ratio quae in genere est. Nam cum ipsae singulariter dictae totam speciem claudant, in definitione sumptae, partes speciei fiunt, quia non solum speciem quidem esse designant, sed etiam genus.</code> | <code>Dictum est, id quod in nomine confuse significaretur, in definitione quae fit enumeratione partium, aperiri atque explicari. Quod fieri non potest, nisi per quarumdam partium nuncupationem; nihil enim dum explicatur oratione, totum simul dici potest. Quae cum ita sint, cumque omnis hujusmodi definitio quaedam sit partium distributio, quatuor his modis fieri potest. Aut enim substantiales partes explicantur, aut proprietatis partes dicuntur, aut quasi totius membra enumerantur, aut tanquam species dividuntur. Substantiales partes explicantur, cum ex genere ac differentiis definitio constituitur. Genus enim quod singulariter praedicatur, speciei totum est. Id genus sumptum in definitione, pars quaedam fit. Non enim solum speciem complet, nisi adjiciantur etiam differentiae, in quibus eadem ratio quae in genere est. Nam cum ipsae singulariter dictae totam speciem claudant, in definitione sumptae, partes speciei fiunt, quia non solum speciem quidem esse designant, sed etiam genus.</code> |
255
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
256
+ ```json
257
+ {
258
+ "scale": 20.0,
259
+ "similarity_fct": "cos_sim",
260
+ "gather_across_devices": false
261
+ }
262
+ ```
263
+
264
+ ### Training Hyperparameters
265
+ #### Non-Default Hyperparameters
266
+
267
+ - `per_device_train_batch_size`: 128
268
+ - `per_device_eval_batch_size`: 128
269
+ - `num_train_epochs`: 1
270
+ - `fp16`: True
271
+ - `multi_dataset_batch_sampler`: round_robin
272
+
273
+ #### All Hyperparameters
274
+ <details><summary>Click to expand</summary>
275
+
276
+ - `overwrite_output_dir`: False
277
+ - `do_predict`: False
278
+ - `eval_strategy`: no
279
+ - `prediction_loss_only`: True
280
+ - `per_device_train_batch_size`: 128
281
+ - `per_device_eval_batch_size`: 128
282
+ - `per_gpu_train_batch_size`: None
283
+ - `per_gpu_eval_batch_size`: None
284
+ - `gradient_accumulation_steps`: 1
285
+ - `eval_accumulation_steps`: None
286
+ - `torch_empty_cache_steps`: None
287
+ - `learning_rate`: 5e-05
288
+ - `weight_decay`: 0.0
289
+ - `adam_beta1`: 0.9
290
+ - `adam_beta2`: 0.999
291
+ - `adam_epsilon`: 1e-08
292
+ - `max_grad_norm`: 1
293
+ - `num_train_epochs`: 1
294
+ - `max_steps`: -1
295
+ - `lr_scheduler_type`: linear
296
+ - `lr_scheduler_kwargs`: {}
297
+ - `warmup_ratio`: 0.0
298
+ - `warmup_steps`: 0
299
+ - `log_level`: passive
300
+ - `log_level_replica`: warning
301
+ - `log_on_each_node`: True
302
+ - `logging_nan_inf_filter`: True
303
+ - `save_safetensors`: True
304
+ - `save_on_each_node`: False
305
+ - `save_only_model`: False
306
+ - `restore_callback_states_from_checkpoint`: False
307
+ - `no_cuda`: False
308
+ - `use_cpu`: False
309
+ - `use_mps_device`: False
310
+ - `seed`: 42
311
+ - `data_seed`: None
312
+ - `jit_mode_eval`: False
313
+ - `use_ipex`: False
314
+ - `bf16`: False
315
+ - `fp16`: True
316
+ - `fp16_opt_level`: O1
317
+ - `half_precision_backend`: auto
318
+ - `bf16_full_eval`: False
319
+ - `fp16_full_eval`: False
320
+ - `tf32`: None
321
+ - `local_rank`: 0
322
+ - `ddp_backend`: None
323
+ - `tpu_num_cores`: None
324
+ - `tpu_metrics_debug`: False
325
+ - `debug`: []
326
+ - `dataloader_drop_last`: False
327
+ - `dataloader_num_workers`: 0
328
+ - `dataloader_prefetch_factor`: None
329
+ - `past_index`: -1
330
+ - `disable_tqdm`: False
331
+ - `remove_unused_columns`: True
332
+ - `label_names`: None
333
+ - `load_best_model_at_end`: False
334
+ - `ignore_data_skip`: False
335
+ - `fsdp`: []
336
+ - `fsdp_min_num_params`: 0
337
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
338
+ - `fsdp_transformer_layer_cls_to_wrap`: None
339
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
340
+ - `parallelism_config`: None
341
+ - `deepspeed`: None
342
+ - `label_smoothing_factor`: 0.0
343
+ - `optim`: adamw_torch_fused
344
+ - `optim_args`: None
345
+ - `adafactor`: False
346
+ - `group_by_length`: False
347
+ - `length_column_name`: length
348
+ - `ddp_find_unused_parameters`: None
349
+ - `ddp_bucket_cap_mb`: None
350
+ - `ddp_broadcast_buffers`: False
351
+ - `dataloader_pin_memory`: True
352
+ - `dataloader_persistent_workers`: False
353
+ - `skip_memory_metrics`: True
354
+ - `use_legacy_prediction_loop`: False
355
+ - `push_to_hub`: False
356
+ - `resume_from_checkpoint`: None
357
+ - `hub_model_id`: None
358
+ - `hub_strategy`: every_save
359
+ - `hub_private_repo`: None
360
+ - `hub_always_push`: False
361
+ - `hub_revision`: None
362
+ - `gradient_checkpointing`: False
363
+ - `gradient_checkpointing_kwargs`: None
364
+ - `include_inputs_for_metrics`: False
365
+ - `include_for_metrics`: []
366
+ - `eval_do_concat_batches`: True
367
+ - `fp16_backend`: auto
368
+ - `push_to_hub_model_id`: None
369
+ - `push_to_hub_organization`: None
370
+ - `mp_parameters`:
371
+ - `auto_find_batch_size`: False
372
+ - `full_determinism`: False
373
+ - `torchdynamo`: None
374
+ - `ray_scope`: last
375
+ - `ddp_timeout`: 1800
376
+ - `torch_compile`: False
377
+ - `torch_compile_backend`: None
378
+ - `torch_compile_mode`: None
379
+ - `include_tokens_per_second`: False
380
+ - `include_num_input_tokens_seen`: False
381
+ - `neftune_noise_alpha`: None
382
+ - `optim_target_modules`: None
383
+ - `batch_eval_metrics`: False
384
+ - `eval_on_start`: False
385
+ - `use_liger_kernel`: False
386
+ - `liger_kernel_config`: None
387
+ - `eval_use_gather_object`: False
388
+ - `average_tokens_across_devices`: False
389
+ - `prompts`: None
390
+ - `batch_sampler`: batch_sampler
391
+ - `multi_dataset_batch_sampler`: round_robin
392
+ - `router_mapping`: {}
393
+ - `learning_rate_mapping`: {}
394
+
395
+ </details>
396
+
397
+ ### Training Logs
398
+ | Epoch | Step | Training Loss |
399
+ |:------:|:----:|:-------------:|
400
+ | 0.6410 | 500 | 0.1311 |
401
+
402
+
403
+ ### Framework Versions
404
+ - Python: 3.12.11
405
+ - Sentence Transformers: 5.1.0
406
+ - Transformers: 4.56.0
407
+ - PyTorch: 2.8.0+cu128
408
+ - Accelerate: 1.10.1
409
+ - Datasets: 4.0.0
410
+ - Tokenizers: 0.22.0
411
+
412
+ ## Citation
413
+
414
+ ### BibTeX
415
+
416
+ #### Sentence Transformers
417
+ ```bibtex
418
+ @inproceedings{reimers-2019-sentence-bert,
419
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
420
+ author = "Reimers, Nils and Gurevych, Iryna",
421
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
422
+ month = "11",
423
+ year = "2019",
424
+ publisher = "Association for Computational Linguistics",
425
+ url = "https://arxiv.org/abs/1908.10084",
426
+ }
427
+ ```
428
+
429
+ #### MultipleNegativesRankingLoss
430
+ ```bibtex
431
+ @misc{henderson2017efficient,
432
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
433
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
434
+ year={2017},
435
+ eprint={1705.00652},
436
+ archivePrefix={arXiv},
437
+ primaryClass={cs.CL}
438
+ }
439
+ ```
440
+
441
+ <!--
442
+ ## Glossary
443
+
444
+ *Clearly define terms in order to be accessible across audiences.*
445
+ -->
446
+
447
+ <!--
448
+ ## Model Card Authors
449
+
450
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
451
+ -->
452
+
453
+ <!--
454
+ ## Model Card Contact
455
+
456
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
457
+ -->
config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertModel"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 30001,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "cls",
12
+ "cls_token_id": 2,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "dtype": "float32",
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 30000,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000.0,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "initializer_cutoff_factor": 2.0,
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 1152,
25
+ "local_attention": 128,
26
+ "local_rope_theta": 10000.0,
27
+ "max_position_embeddings": 8192,
28
+ "mlp_bias": false,
29
+ "mlp_dropout": 0.0,
30
+ "model_type": "modernbert",
31
+ "norm_bias": false,
32
+ "norm_eps": 1e-05,
33
+ "num_attention_heads": 12,
34
+ "num_hidden_layers": 22,
35
+ "pad_token_id": 1,
36
+ "repad_logits_with_grad": false,
37
+ "sep_token_id": 3,
38
+ "sparse_pred_ignore_index": -100,
39
+ "sparse_prediction": false,
40
+ "transformers_version": "4.56.0",
41
+ "vocab_size": 30002
42
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.0",
5
+ "transformers": "4.56.0",
6
+ "pytorch": "2.8.0+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7084310b8e8efc0d1ec51a6dd4947bbf9c556ecaec00025309c0b5f82fb49477
3
+ size 533505776
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<bos>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<cls>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "<eos>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "<sep>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<unk>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<cls>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<sep>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30000": {
44
+ "content": "<eos>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "30001": {
52
+ "content": "<bos>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ }
59
+ },
60
+ "bos_token": "<bos>",
61
+ "clean_up_tokenization_spaces": false,
62
+ "cls_token": "<cls>",
63
+ "eos_token": "<eos>",
64
+ "extra_special_tokens": {},
65
+ "mask_token": "<mask>",
66
+ "model_max_length": 8192,
67
+ "pad_token": "<pad>",
68
+ "sep_token": "<sep>",
69
+ "tokenizer_class": "PreTrainedTokenizerFast",
70
+ "unk_token": "<unk>"
71
+ }