lbourdois commited on
Commit
f7f4e2f
·
verified ·
1 Parent(s): 9e5e405

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +679 -665
README.md CHANGED
@@ -1,666 +1,680 @@
1
- ---
2
- tags:
3
- - sentence-transformers
4
- - sentence-similarity
5
- - feature-extraction
6
- - generated_from_trainer
7
- - dataset_size:2859594
8
- - loss:MatryoshkaLoss
9
- - loss:MultipleNegativesRankingLoss
10
- base_model: Qwen/Qwen2.5-0.5B-Instruct
11
- widget:
12
- - source_sentence: How old is Garry Marshall?
13
- sentences:
14
- - 'Garry Marshall
15
-
16
- On the morning of July 19, 2016, Marshall died at a hospital in Burbank, California
17
- at the age of 81 due to complications of pneumonia after suffering a stroke.[20][21]'
18
- - 'Gregg Marshall
19
-
20
- Michael Gregg Marshall (born February 27, 1963) is an American college basketball
21
- coach who currently leads the Shockers team at Wichita State University. Marshall
22
- has coached his teams to appearances in the NCAA Men''s Division I Basketball
23
- Tournament in twelve of his eighteen years as a head coach. He is the most successful
24
- head coach in Wichita State University history (261 wins), and is also the most
25
- successful head coach in Winthrop University history (194 wins).'
26
- - 'Guillotine
27
-
28
- For a period of time after its invention, the guillotine was called a louisette.
29
- However, it was later named after Guillotin who had proposed that a less painful
30
- method of execution should be found in place of the breaking wheel, though he
31
- opposed the death penalty and bemoaned the association of the guillotine with
32
- his name.'
33
- - source_sentence: Are there cherry trees in Cherry Springs State Park?
34
- sentences:
35
- - 'Cherry Springs State Park
36
-
37
- Awards and press recognition have come to Cherry Springs and its staff. Thom Bemus,
38
- who initiated and coordinates the Stars-n-Parks program, was named DCNR''s 2002Volunteer
39
- of the Year.[66] In 2007the park''s Dark Sky Programming and staff received the
40
- Environmental Education Excellence in Programming award from the Pennsylvania
41
- Recreation and Parks Society.[67] Operations manager Chip Harrison and his wife
42
- Maxine, who directs the Dark Sky Fund, received a 2008award from the Pennsylvania
43
- Outdoor Lighting Council for "steadfast adherence and active promotion of the
44
- principles of responsible outdoor lighting at Cherry Springs State Park".[68]
45
- The DCNR has named Cherry Springs one of "25 Must-See Pennsylvania State Parks",
46
- specifically for having the "darkest night skies on the east coast".[69] Cherry
47
- Springs State Park was featured in the national press in 2003when USA Today named
48
- it one of "10Great Places to get some stars in your eyes",[70] in 2006when National
49
- Geographic Adventure featured it in "Pennsylvania: The Wild, Wild East",[71] and
50
- in The New York Times in 2007.[53] All these were before it was named an International
51
- Dark Sky Park by the International Dark-Sky Association in 2008.[38]'
52
- - 'Cantonese
53
-
54
- Although Cantonese shares a lot of vocabulary with Mandarin, the two varieties
55
- are mutually unintelligible because of differences in pronunciation, grammar and
56
- lexicon. Sentence structure, in particular the placement of verbs, sometimes differs
57
- between the two varieties. A notable difference between Cantonese and Mandarin
58
- is how the spoken word is written; both can be recorded verbatim, but very few
59
- Cantonese speakers are knowledgeable in the full Cantonese written vocabulary,
60
- so a non-verbatim formalized written form is adopted, which is more akin to the
61
- Mandarin written form.[4][5] This results in the situation in which a Cantonese
62
- and a Mandarin text may look similar but are pronounced differently.'
63
- - 'Cherry Springs State Park
64
-
65
- Cherry Springs State Park is an 82-acre (33ha)[a] Pennsylvania state park in Potter
66
- County, Pennsylvania, United States. The park was created from land within the
67
- Susquehannock State Forest, and is on Pennsylvania Route 44 in West Branch Township.
68
- Cherry Springs, named for a large stand of Black Cherry trees in the park, is
69
- atop the dissected Allegheny Plateau at an elevation of 2,300 feet (701m). It
70
- is popular with astronomers and stargazers for having "some of the darkest night
71
- skies on the east coast" of the United States, and was chosen by the Pennsylvania
72
- Department of Conservation and Natural Resources (DCNR) and its Bureau of Parks
73
- as one of "25 Must-See Pennsylvania State Parks".[4]'
74
- - source_sentence: How many regions are in Belgium?
75
- sentences:
76
- - 'Pine City, Minnesota
77
-
78
- Pine City is a city in Pine County, Minnesota, in East Central Minnesota. Pine
79
- City is the county seat of, and the largest city in, Pine County.[7] A portion
80
- of the city is located on the Mille Lacs Indian Reservation. Founded as a railway
81
- town, it quickly became a logging community and the surrounding lakes made it
82
- a resort town. Today, it is an arts town and commuter town to jobs in the Minneapolis–Saint
83
- Paul metropolitan area.[8] It is also a green city.[9] The population was 3,127
84
- at the 2010 census.'
85
- - 'Provinces of Belgium
86
-
87
- The country of Belgium is divided into three regions. Two of these regions, the
88
- Flemish Region or Flanders, and Walloon Region, or Wallonia, are each subdivided
89
- into five provinces. The third region, the Brussels-Capital Region, is not divided
90
- into provinces, as it was originally only a small part of a province itself.'
91
- - 'United Belgian States
92
-
93
- The United Belgian States was a confederal republic of eight provinces which had
94
- their own governments, were sovereign and independent, and were governed directly
95
- by the Sovereign Congress (; ), the confederal government. The Sovereign Congress
96
- was seated in Brussels and consisted of representatives of each of the eight provinces.
97
- The provinces of the republic were divided into 11 smaller separate territories,
98
- each with their own regional identities:In 1789, a church-inspired popular revolt
99
- broke out in reaction to the emperor''s centralizing and anticlerical policies.
100
- Two factions appeared: the "Statists" who opposed the reforms, and the "Vonckists"
101
- named for Jan Frans Vonck who initially supported the reforms but then joined
102
- the opposition, due to the clumsy way in which the reforms were carried out.'
103
- - source_sentence: Are there black holes near the galactic nucleus?
104
- sentences:
105
- - 'Supermassive black hole
106
-
107
- In September 2014, data from different X-ray telescopes has shown that the extremely
108
- small, dense, ultracompact dwarf galaxy M60-UCD1 hosts a 20 million solar mass
109
- black hole at its center, accounting for more than 10% of the total mass of the
110
- galaxy. The discovery is quite surprising, since the black hole is five times
111
- more massive than the Milky Way''s black hole despite the galaxy being less than
112
- five-thousandths the mass of the Milky Way.'
113
- - 'Aquarela do Brasil
114
-
115
- "Aquarela do Brasil" (Portuguese:[akwaˈɾɛlɐ du bɾaˈziw], Watercolor of Brazil),
116
- written by Ary Barroso in 1939 and known in the English-speaking world simply
117
- as "Brazil", is one of the most famous Brazilian songs.'
118
- - 'Supermassive black hole
119
-
120
- The difficulty in forming a supermassive black hole resides in the need for enough
121
- matter to be in a small enough volume. This matter needs to have very little angular
122
- momentum in order for this to happen. Normally, the process of accretion involves
123
- transporting a large initial endowment of angular momentum outwards, and this
124
- appears to be the limiting factor in black hole growth. This is a major component
125
- of the theory of accretion disks. Gas accretion is the most efficient and also
126
- the most conspicuous way in which black holes grow. The majority of the mass growth
127
- of supermassive black holes is thought to occur through episodes of rapid gas
128
- accretion, which are observable as active galactic nuclei or quasars. Observations
129
- reveal that quasars were much more frequent when the Universe was younger, indicating
130
- that supermassive black holes formed and grew early. A major constraining factor
131
- for theories of supermassive black hole formation is the observation of distant
132
- luminous quasars, which indicate that supermassive black holes of billions of
133
- solar masses had already formed when the Universe was less than one billion years
134
- old. This suggests that supermassive black holes arose very early in the Universe,
135
- inside the first massive galaxies.'
136
- - source_sentence: When did the July Monarchy end?
137
- sentences:
138
- - 'July Monarchy
139
-
140
- Despite the return of the House of Bourbon to power, France was much changed from
141
- the era of the ancien régime. The egalitarianism and liberalism of the revolutionaries
142
- remained an important force and the autocracy and hierarchy of the earlier era
143
- could not be fully restored. Economic changes, which had been underway long before
144
- the revolution, had progressed further during the years of turmoil and were firmly
145
- entrenched by 1815. These changes had seen power shift from the noble landowners
146
- to the urban merchants. The administrative reforms of Napoleon, such as the Napoleonic
147
- Code and efficient bureaucracy, also remained in place. These changes produced
148
- a unified central government that was fiscally sound and had much control over
149
- all areas of French life, a sharp difference from the complicated mix of feudal
150
- and absolutist traditions and institutions of pre-Revolutionary Bourbons.'
151
- - 'Wachovia
152
-
153
- Wachovia Corporation began on June 16, 1879 in Winston-Salem, North Carolina as
154
- the Wachovia National Bank. The bank was co-founded by James Alexander Gray and
155
- William Lemly.[9] In 1911, the bank merged with Wachovia Loan and Trust Company,
156
- "the largest trust company between Baltimore and New Orleans",[10] which had been
157
- founded on June 15, 1893. Wachovia grew to become one of the largest banks in
158
- the Southeast partly on the strength of its accounts from the R.J. Reynolds Tobacco
159
- Company, which was also headquartered in Winston-Salem.[11] On December 12, 1986,
160
- Wachovia purchased First Atlanta. Founded as Atlanta National Bank on September
161
- 14, 1865, and later renamed to First National Bank of Atlanta, this institution
162
- was the oldest national bank in Atlanta. This purchase made Wachovia one of the
163
- few companies with dual headquarters: one in Winston-Salem and one in Atlanta.
164
- In 1991, Wachovia entered the South Carolina market by acquiring South Carolina
165
- National Corporation,[12] founded as the Bank of Charleston in 1834. In 1998,
166
- Wachovia acquired two Virginia-based banks, Jefferson National Bank and Central
167
- Fidelity Bank. In 1997, Wachovia acquired both 1st United Bancorp and American
168
- Bankshares Inc, giving its first entry into Florida. In 2000, Wachovia made its
169
- final purchase, which was Republic Security Bank.'
170
- - 'July Monarchy
171
-
172
- The July Monarchy (French: Monarchie de Juillet) was a liberal constitutional
173
- monarchy in France under Louis Philippe I, starting with the July Revolution of
174
- 1830 and ending with the Revolution of 1848. It marks the end of the Bourbon Restoration
175
- (1814–1830). It began with the overthrow of the conservative government of Charles
176
- X, the last king of the House of Bourbon.'
177
- pipeline_tag: sentence-similarity
178
- library_name: sentence-transformers
179
- metrics:
180
- - pearson_cosine
181
- - spearman_cosine
182
- model-index:
183
- - name: SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct
184
- results:
185
- - task:
186
- type: semantic-similarity
187
- name: Semantic Similarity
188
- dataset:
189
- name: sts dev 896
190
- type: sts-dev-896
191
- metrics:
192
- - type: pearson_cosine
193
- value: 0.45729692013517886
194
- name: Pearson Cosine
195
- - type: spearman_cosine
196
- value: 0.49645340246652353
197
- name: Spearman Cosine
198
- - task:
199
- type: semantic-similarity
200
- name: Semantic Similarity
201
- dataset:
202
- name: sts dev 768
203
- type: sts-dev-768
204
- metrics:
205
- - type: pearson_cosine
206
- value: 0.4455125981991164
207
- name: Pearson Cosine
208
- - type: spearman_cosine
209
- value: 0.4896539219726307
210
- name: Spearman Cosine
211
- ---
212
-
213
- # SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct
214
-
215
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct). It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
216
-
217
- ## Model Details
218
-
219
- ### Model Description
220
- - **Model Type:** Sentence Transformer
221
- - **Base model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) <!-- at revision 7ae557604adf67be50417f59c2c2f167def9a775 -->
222
- - **Maximum Sequence Length:** 1024 tokens
223
- - **Output Dimensionality:** 896 dimensions
224
- - **Similarity Function:** Cosine Similarity
225
- <!-- - **Training Dataset:** Unknown -->
226
- <!-- - **Language:** Unknown -->
227
- <!-- - **License:** Unknown -->
228
-
229
- ### Model Sources
230
-
231
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
232
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
233
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
234
-
235
- ### Full Model Architecture
236
-
237
- ```
238
- SentenceTransformer(
239
- (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: Qwen2Model
240
- (1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
241
- )
242
- ```
243
-
244
- ## Usage
245
-
246
- ### Direct Usage (Sentence Transformers)
247
-
248
- First install the Sentence Transformers library:
249
-
250
- ```bash
251
- pip install -U sentence-transformers
252
- ```
253
-
254
- Then you can load this model and run inference.
255
- ```python
256
- from sentence_transformers import SentenceTransformer
257
-
258
- # Download from the 🤗 Hub
259
- model = SentenceTransformer("AlexWortega/qwen1k")
260
- # Run inference
261
- sentences = [
262
- 'When did the July Monarchy end?',
263
- 'July Monarchy\nThe July Monarchy (French: Monarchie de Juillet) was a liberal constitutional monarchy in France under Louis Philippe I, starting with the July Revolution of 1830 and ending with the Revolution of 1848. It marks the end of the Bourbon Restoration (1814–1830). It began with the overthrow of the conservative government of Charles X, the last king of the House of Bourbon.',
264
- 'July Monarchy\nDespite the return of the House of Bourbon to power, France was much changed from the era of the ancien régime. The egalitarianism and liberalism of the revolutionaries remained an important force and the autocracy and hierarchy of the earlier era could not be fully restored. Economic changes, which had been underway long before the revolution, had progressed further during the years of turmoil and were firmly entrenched by 1815. These changes had seen power shift from the noble landowners to the urban merchants. The administrative reforms of Napoleon, such as the Napoleonic Code and efficient bureaucracy, also remained in place. These changes produced a unified central government that was fiscally sound and had much control over all areas of French life, a sharp difference from the complicated mix of feudal and absolutist traditions and institutions of pre-Revolutionary Bourbons.',
265
- ]
266
- embeddings = model.encode(sentences)
267
- print(embeddings.shape)
268
- # [3, 896]
269
-
270
- # Get the similarity scores for the embeddings
271
- similarities = model.similarity(embeddings, embeddings)
272
- print(similarities.shape)
273
- # [3, 3]
274
- ```
275
-
276
- <!--
277
- ### Direct Usage (Transformers)
278
-
279
- <details><summary>Click to see the direct usage in Transformers</summary>
280
-
281
- </details>
282
- -->
283
-
284
- <!--
285
- ### Downstream Usage (Sentence Transformers)
286
-
287
- You can finetune this model on your own dataset.
288
-
289
- <details><summary>Click to expand</summary>
290
-
291
- </details>
292
- -->
293
-
294
- <!--
295
- ### Out-of-Scope Use
296
-
297
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
298
- -->
299
-
300
- ## Evaluation
301
-
302
- ### Metrics
303
-
304
- #### Semantic Similarity
305
-
306
- * Datasets: `sts-dev-896` and `sts-dev-768`
307
- * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
308
-
309
- | Metric | sts-dev-896 | sts-dev-768 |
310
- |:--------------------|:------------|:------------|
311
- | pearson_cosine | 0.4573 | 0.4455 |
312
- | **spearman_cosine** | **0.4965** | **0.4897** |
313
-
314
- <!--
315
- ## Bias, Risks and Limitations
316
-
317
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
318
- -->
319
-
320
- <!--
321
- ### Recommendations
322
-
323
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
324
- -->
325
-
326
- ## Training Details
327
-
328
- ### Training Dataset
329
-
330
- #### Unnamed Dataset
331
-
332
-
333
- * Size: 2,859,594 training samples
334
- * Columns: <code>query</code>, <code>response</code>, and <code>negative</code>
335
- * Approximate statistics based on the first 1000 samples:
336
- | | query | response | negative |
337
- |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
338
- | type | string | string | string |
339
- | details | <ul><li>min: 4 tokens</li><li>mean: 8.76 tokens</li><li>max: 26 tokens</li></ul> | <ul><li>min: 23 tokens</li><li>mean: 141.88 tokens</li><li>max: 532 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 134.02 tokens</li><li>max: 472 tokens</li></ul> |
340
- * Samples:
341
- | query | response | negative |
342
- |:--------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
343
- | <code>Was there a year 0?</code> | <code>Year zero<br>Year zero does not exist in the anno Domini system usually used to number years in the Gregorian calendar and in its predecessor, the Julian calendar. In this system, the year 1 BC is followed by AD 1. However, there is a year zero in astronomical year numbering (where it coincides with the Julian year 1 BC) and in ISO 8601:2004 (where it coincides with the Gregorian year 1 BC) as well as in all Buddhist and Hindu calendars.</code> | <code>504<br>Year 504 (DIV) was a leap year starting on Thursday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Nicomachus without colleague (or, less frequently, year 1257 "Ab urbe condita"). The denomination 504 for this year has been used since the early medieval period, when the Anno Domini calendar era became the prevalent method in Europe for naming years.</code> |
344
- | <code>When is the dialectical method used?</code> | <code>Dialectic<br>Dialectic or dialectics (Greek: διαλεκτική, dialektikḗ; related to dialogue), also known as the dialectical method, is at base a discourse between two or more people holding different points of view about a subject but wishing to establish the truth through reasoned arguments. Dialectic resembles debate, but the concept excludes subjective elements such as emotional appeal and the modern pejorative sense of rhetoric.[1][2] Dialectic may be contrasted with the didactic method, wherein one side of the conversation teaches the other. Dialectic is alternatively known as minor logic, as opposed to major logic or critique.</code> | <code>Derek Bentley case<br>Another factor in the posthumous defence was that a "confession" recorded by Bentley, which was claimed by the prosecution to be a "verbatim record of dictated monologue", was shown by forensic linguistics methods to have been largely edited by policemen. Linguist Malcolm Coulthard showed that certain patterns, such as the frequency of the word "then" and the grammatical use of "then" after the grammatical subject ("I then" rather than "then I"), were not consistent with Bentley's use of language (his idiolect), as evidenced in court testimony. These patterns fit better the recorded testimony of the policemen involved. This is one of the earliest uses of forensic linguistics on record.</code> |
345
- | <code>What do Grasshoppers eat?</code> | <code>Grasshopper<br>Grasshoppers are plant-eaters, with a few species at times becoming serious pests of cereals, vegetables and pasture, especially when they swarm in their millions as locusts and destroy crops over wide areas. They protect themselves from predators by camouflage; when detected, many species attempt to startle the predator with a brilliantly-coloured wing-flash while jumping and (if adult) launching themselves into the air, usually flying for only a short distance. Other species such as the rainbow grasshopper have warning coloration which deters predators. Grasshoppers are affected by parasites and various diseases, and many predatory creatures feed on both nymphs and adults. The eggs are the subject of attack by parasitoids and predators.</code> | <code>Groundhog<br>Very often the dens of groundhogs provide homes for other animals including skunks, red foxes, and cottontail rabbits. The fox and skunk feed upon field mice, grasshoppers, beetles and other creatures that destroy farm crops. In aiding these animals, the groundhog indirectly helps the farmer. In addition to providing homes for itself and other animals, the groundhog aids in soil improvement by bringing subsoil to the surface. The groundhog is also a valuable game animal and is considered a difficult sport when hunted in a fair manner. In some parts of Appalachia, they are eaten.</code> |
346
- * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
347
- ```json
348
- {
349
- "loss": "MultipleNegativesRankingLoss",
350
- "matryoshka_dims": [
351
- 896,
352
- 768
353
- ],
354
- "matryoshka_weights": [
355
- 1,
356
- 1
357
- ],
358
- "n_dims_per_step": -1
359
- }
360
- ```
361
-
362
- ### Training Hyperparameters
363
- #### Non-Default Hyperparameters
364
-
365
- - `eval_strategy`: steps
366
- - `per_device_train_batch_size`: 12
367
- - `per_device_eval_batch_size`: 12
368
- - `gradient_accumulation_steps`: 4
369
- - `num_train_epochs`: 1
370
- - `warmup_ratio`: 0.3
371
- - `bf16`: True
372
- - `batch_sampler`: no_duplicates
373
-
374
- #### All Hyperparameters
375
- <details><summary>Click to expand</summary>
376
-
377
- - `overwrite_output_dir`: False
378
- - `do_predict`: False
379
- - `eval_strategy`: steps
380
- - `prediction_loss_only`: True
381
- - `per_device_train_batch_size`: 12
382
- - `per_device_eval_batch_size`: 12
383
- - `per_gpu_train_batch_size`: None
384
- - `per_gpu_eval_batch_size`: None
385
- - `gradient_accumulation_steps`: 4
386
- - `eval_accumulation_steps`: None
387
- - `torch_empty_cache_steps`: None
388
- - `learning_rate`: 5e-05
389
- - `weight_decay`: 0.0
390
- - `adam_beta1`: 0.9
391
- - `adam_beta2`: 0.999
392
- - `adam_epsilon`: 1e-08
393
- - `max_grad_norm`: 1.0
394
- - `num_train_epochs`: 1
395
- - `max_steps`: -1
396
- - `lr_scheduler_type`: linear
397
- - `lr_scheduler_kwargs`: {}
398
- - `warmup_ratio`: 0.3
399
- - `warmup_steps`: 0
400
- - `log_level`: passive
401
- - `log_level_replica`: warning
402
- - `log_on_each_node`: True
403
- - `logging_nan_inf_filter`: True
404
- - `save_safetensors`: True
405
- - `save_on_each_node`: False
406
- - `save_only_model`: False
407
- - `restore_callback_states_from_checkpoint`: False
408
- - `no_cuda`: False
409
- - `use_cpu`: False
410
- - `use_mps_device`: False
411
- - `seed`: 42
412
- - `data_seed`: None
413
- - `jit_mode_eval`: False
414
- - `use_ipex`: False
415
- - `bf16`: True
416
- - `fp16`: False
417
- - `fp16_opt_level`: O1
418
- - `half_precision_backend`: auto
419
- - `bf16_full_eval`: False
420
- - `fp16_full_eval`: False
421
- - `tf32`: None
422
- - `local_rank`: 0
423
- - `ddp_backend`: None
424
- - `tpu_num_cores`: None
425
- - `tpu_metrics_debug`: False
426
- - `debug`: []
427
- - `dataloader_drop_last`: False
428
- - `dataloader_num_workers`: 0
429
- - `dataloader_prefetch_factor`: None
430
- - `past_index`: -1
431
- - `disable_tqdm`: False
432
- - `remove_unused_columns`: True
433
- - `label_names`: None
434
- - `load_best_model_at_end`: False
435
- - `ignore_data_skip`: False
436
- - `fsdp`: []
437
- - `fsdp_min_num_params`: 0
438
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
439
- - `fsdp_transformer_layer_cls_to_wrap`: None
440
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
441
- - `deepspeed`: None
442
- - `label_smoothing_factor`: 0.0
443
- - `optim`: adamw_torch
444
- - `optim_args`: None
445
- - `adafactor`: False
446
- - `group_by_length`: False
447
- - `length_column_name`: length
448
- - `ddp_find_unused_parameters`: None
449
- - `ddp_bucket_cap_mb`: None
450
- - `ddp_broadcast_buffers`: False
451
- - `dataloader_pin_memory`: True
452
- - `dataloader_persistent_workers`: False
453
- - `skip_memory_metrics`: True
454
- - `use_legacy_prediction_loop`: False
455
- - `push_to_hub`: False
456
- - `resume_from_checkpoint`: None
457
- - `hub_model_id`: None
458
- - `hub_strategy`: every_save
459
- - `hub_private_repo`: False
460
- - `hub_always_push`: False
461
- - `gradient_checkpointing`: False
462
- - `gradient_checkpointing_kwargs`: None
463
- - `include_inputs_for_metrics`: False
464
- - `include_for_metrics`: []
465
- - `eval_do_concat_batches`: True
466
- - `fp16_backend`: auto
467
- - `push_to_hub_model_id`: None
468
- - `push_to_hub_organization`: None
469
- - `mp_parameters`:
470
- - `auto_find_batch_size`: False
471
- - `full_determinism`: False
472
- - `torchdynamo`: None
473
- - `ray_scope`: last
474
- - `ddp_timeout`: 1800
475
- - `torch_compile`: False
476
- - `torch_compile_backend`: None
477
- - `torch_compile_mode`: None
478
- - `dispatch_batches`: None
479
- - `split_batches`: None
480
- - `include_tokens_per_second`: False
481
- - `include_num_input_tokens_seen`: False
482
- - `neftune_noise_alpha`: None
483
- - `optim_target_modules`: None
484
- - `batch_eval_metrics`: False
485
- - `eval_on_start`: False
486
- - `use_liger_kernel`: False
487
- - `eval_use_gather_object`: False
488
- - `average_tokens_across_devices`: False
489
- - `prompts`: None
490
- - `batch_sampler`: no_duplicates
491
- - `multi_dataset_batch_sampler`: proportional
492
-
493
- </details>
494
-
495
- ### Training Logs
496
- | Epoch | Step | Training Loss | sts-dev-896_spearman_cosine | sts-dev-768_spearman_cosine |
497
- |:------:|:----:|:-------------:|:---------------------------:|:---------------------------:|
498
- | 0.0002 | 10 | 4.4351 | - | - |
499
- | 0.0003 | 20 | 4.6508 | - | - |
500
- | 0.0005 | 30 | 4.7455 | - | - |
501
- | 0.0007 | 40 | 4.5427 | - | - |
502
- | 0.0008 | 50 | 4.3982 | - | - |
503
- | 0.0010 | 60 | 4.3755 | - | - |
504
- | 0.0012 | 70 | 4.4105 | - | - |
505
- | 0.0013 | 80 | 5.2227 | - | - |
506
- | 0.0015 | 90 | 5.8062 | - | - |
507
- | 0.0017 | 100 | 5.7645 | - | - |
508
- | 0.0018 | 110 | 5.9261 | - | - |
509
- | 0.0020 | 120 | 5.8301 | - | - |
510
- | 0.0022 | 130 | 5.7602 | - | - |
511
- | 0.0023 | 140 | 5.9392 | - | - |
512
- | 0.0025 | 150 | 5.7523 | - | - |
513
- | 0.0027 | 160 | 5.8585 | - | - |
514
- | 0.0029 | 170 | 5.7916 | - | - |
515
- | 0.0030 | 180 | 5.8157 | - | - |
516
- | 0.0032 | 190 | 5.7102 | - | - |
517
- | 0.0034 | 200 | 5.5844 | - | - |
518
- | 0.0035 | 210 | 5.5463 | - | - |
519
- | 0.0037 | 220 | 5.5823 | - | - |
520
- | 0.0039 | 230 | 5.5514 | - | - |
521
- | 0.0040 | 240 | 5.5646 | - | - |
522
- | 0.0042 | 250 | 5.5783 | - | - |
523
- | 0.0044 | 260 | 5.5344 | - | - |
524
- | 0.0045 | 270 | 5.523 | - | - |
525
- | 0.0047 | 280 | 5.4969 | - | - |
526
- | 0.0049 | 290 | 5.5407 | - | - |
527
- | 0.0050 | 300 | 5.6171 | - | - |
528
- | 0.0052 | 310 | 5.5581 | - | - |
529
- | 0.0054 | 320 | 5.8903 | - | - |
530
- | 0.0055 | 330 | 5.8675 | - | - |
531
- | 0.0057 | 340 | 5.745 | - | - |
532
- | 0.0059 | 350 | 5.6041 | - | - |
533
- | 0.0060 | 360 | 5.5476 | - | - |
534
- | 0.0062 | 370 | 5.3964 | - | - |
535
- | 0.0064 | 380 | 5.3564 | - | - |
536
- | 0.0065 | 390 | 5.3054 | - | - |
537
- | 0.0067 | 400 | 5.2779 | - | - |
538
- | 0.0069 | 410 | 5.206 | - | - |
539
- | 0.0070 | 420 | 5.2168 | - | - |
540
- | 0.0072 | 430 | 5.1645 | - | - |
541
- | 0.0074 | 440 | 5.1797 | - | - |
542
- | 0.0076 | 450 | 5.2526 | - | - |
543
- | 0.0077 | 460 | 5.1768 | - | - |
544
- | 0.0079 | 470 | 5.3519 | - | - |
545
- | 0.0081 | 480 | 5.2982 | - | - |
546
- | 0.0082 | 490 | 5.3229 | - | - |
547
- | 0.0084 | 500 | 5.3758 | - | - |
548
- | 0.0086 | 510 | 5.2478 | - | - |
549
- | 0.0087 | 520 | 5.1799 | - | - |
550
- | 0.0089 | 530 | 5.1088 | - | - |
551
- | 0.0091 | 540 | 4.977 | - | - |
552
- | 0.0092 | 550 | 4.9108 | - | - |
553
- | 0.0094 | 560 | 4.811 | - | - |
554
- | 0.0096 | 570 | 4.7203 | - | - |
555
- | 0.0097 | 580 | 4.6499 | - | - |
556
- | 0.0099 | 590 | 4.4548 | - | - |
557
- | 0.0101 | 600 | 4.2891 | - | - |
558
- | 0.0102 | 610 | 4.1881 | - | - |
559
- | 0.0104 | 620 | 4.6 | - | - |
560
- | 0.0106 | 630 | 4.5365 | - | - |
561
- | 0.0107 | 640 | 4.3086 | - | - |
562
- | 0.0109 | 650 | 4.0452 | - | - |
563
- | 0.0111 | 660 | 3.9041 | - | - |
564
- | 0.0112 | 670 | 4.3938 | - | - |
565
- | 0.0114 | 680 | 4.3198 | - | - |
566
- | 0.0116 | 690 | 4.1294 | - | - |
567
- | 0.0117 | 700 | 4.077 | - | - |
568
- | 0.0119 | 710 | 3.9174 | - | - |
569
- | 0.0121 | 720 | 4.1629 | - | - |
570
- | 0.0123 | 730 | 3.9611 | - | - |
571
- | 0.0124 | 740 | 3.7768 | - | - |
572
- | 0.0126 | 750 | 3.5842 | - | - |
573
- | 0.0128 | 760 | 3.1196 | - | - |
574
- | 0.0129 | 770 | 3.6288 | - | - |
575
- | 0.0131 | 780 | 3.273 | - | - |
576
- | 0.0133 | 790 | 2.7889 | - | - |
577
- | 0.0134 | 800 | 2.5096 | - | - |
578
- | 0.0136 | 810 | 1.8878 | - | - |
579
- | 0.0138 | 820 | 2.3423 | - | - |
580
- | 0.0139 | 830 | 1.7687 | - | - |
581
- | 0.0141 | 840 | 2.0781 | - | - |
582
- | 0.0143 | 850 | 2.4598 | - | - |
583
- | 0.0144 | 860 | 1.7667 | - | - |
584
- | 0.0146 | 870 | 2.6247 | - | - |
585
- | 0.0148 | 880 | 1.916 | - | - |
586
- | 0.0149 | 890 | 2.0817 | - | - |
587
- | 0.0151 | 900 | 2.3679 | - | - |
588
- | 0.0153 | 910 | 1.418 | - | - |
589
- | 0.0154 | 920 | 2.7353 | - | - |
590
- | 0.0156 | 930 | 1.992 | - | - |
591
- | 0.0158 | 940 | 1.4564 | - | - |
592
- | 0.0159 | 950 | 1.4154 | - | - |
593
- | 0.0161 | 960 | 0.9499 | - | - |
594
- | 0.0163 | 970 | 1.6304 | - | - |
595
- | 0.0164 | 980 | 0.9264 | - | - |
596
- | 0.0166 | 990 | 1.3278 | - | - |
597
- | 0.0168 | 1000 | 1.686 | 0.4965 | 0.4897 |
598
-
599
-
600
- ### Framework Versions
601
- - Python: 3.10.12
602
- - Sentence Transformers: 3.3.0
603
- - Transformers: 4.46.2
604
- - PyTorch: 2.1.0+cu118
605
- - Accelerate: 1.1.1
606
- - Datasets: 3.1.0
607
- - Tokenizers: 0.20.3
608
-
609
- ## Citation
610
-
611
- ### BibTeX
612
-
613
- #### Sentence Transformers
614
- ```bibtex
615
- @inproceedings{reimers-2019-sentence-bert,
616
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
617
- author = "Reimers, Nils and Gurevych, Iryna",
618
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
619
- month = "11",
620
- year = "2019",
621
- publisher = "Association for Computational Linguistics",
622
- url = "https://arxiv.org/abs/1908.10084",
623
- }
624
- ```
625
-
626
- #### MatryoshkaLoss
627
- ```bibtex
628
- @misc{kusupati2024matryoshka,
629
- title={Matryoshka Representation Learning},
630
- author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
631
- year={2024},
632
- eprint={2205.13147},
633
- archivePrefix={arXiv},
634
- primaryClass={cs.LG}
635
- }
636
- ```
637
-
638
- #### MultipleNegativesRankingLoss
639
- ```bibtex
640
- @misc{henderson2017efficient,
641
- title={Efficient Natural Language Response Suggestion for Smart Reply},
642
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
643
- year={2017},
644
- eprint={1705.00652},
645
- archivePrefix={arXiv},
646
- primaryClass={cs.CL}
647
- }
648
- ```
649
-
650
- <!--
651
- ## Glossary
652
-
653
- *Clearly define terms in order to be accessible across audiences.*
654
- -->
655
-
656
- <!--
657
- ## Model Card Authors
658
-
659
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
660
- -->
661
-
662
- <!--
663
- ## Model Card Contact
664
-
665
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
666
  -->
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:2859594
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Qwen/Qwen2.5-0.5B-Instruct
11
+ widget:
12
+ - source_sentence: How old is Garry Marshall?
13
+ sentences:
14
+ - 'Garry Marshall
15
+
16
+ On the morning of July 19, 2016, Marshall died at a hospital in Burbank, California
17
+ at the age of 81 due to complications of pneumonia after suffering a stroke.[20][21]'
18
+ - 'Gregg Marshall
19
+
20
+ Michael Gregg Marshall (born February 27, 1963) is an American college basketball
21
+ coach who currently leads the Shockers team at Wichita State University. Marshall
22
+ has coached his teams to appearances in the NCAA Men''s Division I Basketball
23
+ Tournament in twelve of his eighteen years as a head coach. He is the most successful
24
+ head coach in Wichita State University history (261 wins), and is also the most
25
+ successful head coach in Winthrop University history (194 wins).'
26
+ - 'Guillotine
27
+
28
+ For a period of time after its invention, the guillotine was called a louisette.
29
+ However, it was later named after Guillotin who had proposed that a less painful
30
+ method of execution should be found in place of the breaking wheel, though he
31
+ opposed the death penalty and bemoaned the association of the guillotine with
32
+ his name.'
33
+ - source_sentence: Are there cherry trees in Cherry Springs State Park?
34
+ sentences:
35
+ - 'Cherry Springs State Park
36
+
37
+ Awards and press recognition have come to Cherry Springs and its staff. Thom Bemus,
38
+ who initiated and coordinates the Stars-n-Parks program, was named DCNR''s 2002Volunteer
39
+ of the Year.[66] In 2007the park''s Dark Sky Programming and staff received the
40
+ Environmental Education Excellence in Programming award from the Pennsylvania
41
+ Recreation and Parks Society.[67] Operations manager Chip Harrison and his wife
42
+ Maxine, who directs the Dark Sky Fund, received a 2008award from the Pennsylvania
43
+ Outdoor Lighting Council for "steadfast adherence and active promotion of the
44
+ principles of responsible outdoor lighting at Cherry Springs State Park".[68]
45
+ The DCNR has named Cherry Springs one of "25 Must-See Pennsylvania State Parks",
46
+ specifically for having the "darkest night skies on the east coast".[69] Cherry
47
+ Springs State Park was featured in the national press in 2003when USA Today named
48
+ it one of "10Great Places to get some stars in your eyes",[70] in 2006when National
49
+ Geographic Adventure featured it in "Pennsylvania: The Wild, Wild East",[71] and
50
+ in The New York Times in 2007.[53] All these were before it was named an International
51
+ Dark Sky Park by the International Dark-Sky Association in 2008.[38]'
52
+ - 'Cantonese
53
+
54
+ Although Cantonese shares a lot of vocabulary with Mandarin, the two varieties
55
+ are mutually unintelligible because of differences in pronunciation, grammar and
56
+ lexicon. Sentence structure, in particular the placement of verbs, sometimes differs
57
+ between the two varieties. A notable difference between Cantonese and Mandarin
58
+ is how the spoken word is written; both can be recorded verbatim, but very few
59
+ Cantonese speakers are knowledgeable in the full Cantonese written vocabulary,
60
+ so a non-verbatim formalized written form is adopted, which is more akin to the
61
+ Mandarin written form.[4][5] This results in the situation in which a Cantonese
62
+ and a Mandarin text may look similar but are pronounced differently.'
63
+ - 'Cherry Springs State Park
64
+
65
+ Cherry Springs State Park is an 82-acre (33ha)[a] Pennsylvania state park in Potter
66
+ County, Pennsylvania, United States. The park was created from land within the
67
+ Susquehannock State Forest, and is on Pennsylvania Route 44 in West Branch Township.
68
+ Cherry Springs, named for a large stand of Black Cherry trees in the park, is
69
+ atop the dissected Allegheny Plateau at an elevation of 2,300 feet (701m). It
70
+ is popular with astronomers and stargazers for having "some of the darkest night
71
+ skies on the east coast" of the United States, and was chosen by the Pennsylvania
72
+ Department of Conservation and Natural Resources (DCNR) and its Bureau of Parks
73
+ as one of "25 Must-See Pennsylvania State Parks".[4]'
74
+ - source_sentence: How many regions are in Belgium?
75
+ sentences:
76
+ - 'Pine City, Minnesota
77
+
78
+ Pine City is a city in Pine County, Minnesota, in East Central Minnesota. Pine
79
+ City is the county seat of, and the largest city in, Pine County.[7] A portion
80
+ of the city is located on the Mille Lacs Indian Reservation. Founded as a railway
81
+ town, it quickly became a logging community and the surrounding lakes made it
82
+ a resort town. Today, it is an arts town and commuter town to jobs in the Minneapolis–Saint
83
+ Paul metropolitan area.[8] It is also a green city.[9] The population was 3,127
84
+ at the 2010 census.'
85
+ - 'Provinces of Belgium
86
+
87
+ The country of Belgium is divided into three regions. Two of these regions, the
88
+ Flemish Region or Flanders, and Walloon Region, or Wallonia, are each subdivided
89
+ into five provinces. The third region, the Brussels-Capital Region, is not divided
90
+ into provinces, as it was originally only a small part of a province itself.'
91
+ - 'United Belgian States
92
+
93
+ The United Belgian States was a confederal republic of eight provinces which had
94
+ their own governments, were sovereign and independent, and were governed directly
95
+ by the Sovereign Congress (; ), the confederal government. The Sovereign Congress
96
+ was seated in Brussels and consisted of representatives of each of the eight provinces.
97
+ The provinces of the republic were divided into 11 smaller separate territories,
98
+ each with their own regional identities:In 1789, a church-inspired popular revolt
99
+ broke out in reaction to the emperor''s centralizing and anticlerical policies.
100
+ Two factions appeared: the "Statists" who opposed the reforms, and the "Vonckists"
101
+ named for Jan Frans Vonck who initially supported the reforms but then joined
102
+ the opposition, due to the clumsy way in which the reforms were carried out.'
103
+ - source_sentence: Are there black holes near the galactic nucleus?
104
+ sentences:
105
+ - 'Supermassive black hole
106
+
107
+ In September 2014, data from different X-ray telescopes has shown that the extremely
108
+ small, dense, ultracompact dwarf galaxy M60-UCD1 hosts a 20 million solar mass
109
+ black hole at its center, accounting for more than 10% of the total mass of the
110
+ galaxy. The discovery is quite surprising, since the black hole is five times
111
+ more massive than the Milky Way''s black hole despite the galaxy being less than
112
+ five-thousandths the mass of the Milky Way.'
113
+ - 'Aquarela do Brasil
114
+
115
+ "Aquarela do Brasil" (Portuguese:[akwaˈɾɛlɐ du bɾaˈziw], Watercolor of Brazil),
116
+ written by Ary Barroso in 1939 and known in the English-speaking world simply
117
+ as "Brazil", is one of the most famous Brazilian songs.'
118
+ - 'Supermassive black hole
119
+
120
+ The difficulty in forming a supermassive black hole resides in the need for enough
121
+ matter to be in a small enough volume. This matter needs to have very little angular
122
+ momentum in order for this to happen. Normally, the process of accretion involves
123
+ transporting a large initial endowment of angular momentum outwards, and this
124
+ appears to be the limiting factor in black hole growth. This is a major component
125
+ of the theory of accretion disks. Gas accretion is the most efficient and also
126
+ the most conspicuous way in which black holes grow. The majority of the mass growth
127
+ of supermassive black holes is thought to occur through episodes of rapid gas
128
+ accretion, which are observable as active galactic nuclei or quasars. Observations
129
+ reveal that quasars were much more frequent when the Universe was younger, indicating
130
+ that supermassive black holes formed and grew early. A major constraining factor
131
+ for theories of supermassive black hole formation is the observation of distant
132
+ luminous quasars, which indicate that supermassive black holes of billions of
133
+ solar masses had already formed when the Universe was less than one billion years
134
+ old. This suggests that supermassive black holes arose very early in the Universe,
135
+ inside the first massive galaxies.'
136
+ - source_sentence: When did the July Monarchy end?
137
+ sentences:
138
+ - 'July Monarchy
139
+
140
+ Despite the return of the House of Bourbon to power, France was much changed from
141
+ the era of the ancien régime. The egalitarianism and liberalism of the revolutionaries
142
+ remained an important force and the autocracy and hierarchy of the earlier era
143
+ could not be fully restored. Economic changes, which had been underway long before
144
+ the revolution, had progressed further during the years of turmoil and were firmly
145
+ entrenched by 1815. These changes had seen power shift from the noble landowners
146
+ to the urban merchants. The administrative reforms of Napoleon, such as the Napoleonic
147
+ Code and efficient bureaucracy, also remained in place. These changes produced
148
+ a unified central government that was fiscally sound and had much control over
149
+ all areas of French life, a sharp difference from the complicated mix of feudal
150
+ and absolutist traditions and institutions of pre-Revolutionary Bourbons.'
151
+ - 'Wachovia
152
+
153
+ Wachovia Corporation began on June 16, 1879 in Winston-Salem, North Carolina as
154
+ the Wachovia National Bank. The bank was co-founded by James Alexander Gray and
155
+ William Lemly.[9] In 1911, the bank merged with Wachovia Loan and Trust Company,
156
+ "the largest trust company between Baltimore and New Orleans",[10] which had been
157
+ founded on June 15, 1893. Wachovia grew to become one of the largest banks in
158
+ the Southeast partly on the strength of its accounts from the R.J. Reynolds Tobacco
159
+ Company, which was also headquartered in Winston-Salem.[11] On December 12, 1986,
160
+ Wachovia purchased First Atlanta. Founded as Atlanta National Bank on September
161
+ 14, 1865, and later renamed to First National Bank of Atlanta, this institution
162
+ was the oldest national bank in Atlanta. This purchase made Wachovia one of the
163
+ few companies with dual headquarters: one in Winston-Salem and one in Atlanta.
164
+ In 1991, Wachovia entered the South Carolina market by acquiring South Carolina
165
+ National Corporation,[12] founded as the Bank of Charleston in 1834. In 1998,
166
+ Wachovia acquired two Virginia-based banks, Jefferson National Bank and Central
167
+ Fidelity Bank. In 1997, Wachovia acquired both 1st United Bancorp and American
168
+ Bankshares Inc, giving its first entry into Florida. In 2000, Wachovia made its
169
+ final purchase, which was Republic Security Bank.'
170
+ - 'July Monarchy
171
+
172
+ The July Monarchy (French: Monarchie de Juillet) was a liberal constitutional
173
+ monarchy in France under Louis Philippe I, starting with the July Revolution of
174
+ 1830 and ending with the Revolution of 1848. It marks the end of the Bourbon Restoration
175
+ (1814–1830). It began with the overthrow of the conservative government of Charles
176
+ X, the last king of the House of Bourbon.'
177
+ pipeline_tag: sentence-similarity
178
+ library_name: sentence-transformers
179
+ metrics:
180
+ - pearson_cosine
181
+ - spearman_cosine
182
+ language:
183
+ - zho
184
+ - eng
185
+ - fra
186
+ - spa
187
+ - por
188
+ - deu
189
+ - ita
190
+ - rus
191
+ - jpn
192
+ - kor
193
+ - vie
194
+ - tha
195
+ - ara
196
+ model-index:
197
+ - name: SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct
198
+ results:
199
+ - task:
200
+ type: semantic-similarity
201
+ name: Semantic Similarity
202
+ dataset:
203
+ name: sts dev 896
204
+ type: sts-dev-896
205
+ metrics:
206
+ - type: pearson_cosine
207
+ value: 0.45729692013517886
208
+ name: Pearson Cosine
209
+ - type: spearman_cosine
210
+ value: 0.49645340246652353
211
+ name: Spearman Cosine
212
+ - task:
213
+ type: semantic-similarity
214
+ name: Semantic Similarity
215
+ dataset:
216
+ name: sts dev 768
217
+ type: sts-dev-768
218
+ metrics:
219
+ - type: pearson_cosine
220
+ value: 0.4455125981991164
221
+ name: Pearson Cosine
222
+ - type: spearman_cosine
223
+ value: 0.4896539219726307
224
+ name: Spearman Cosine
225
+ ---
226
+
227
+ # SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct
228
+
229
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct). It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
230
+
231
+ ## Model Details
232
+
233
+ ### Model Description
234
+ - **Model Type:** Sentence Transformer
235
+ - **Base model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) <!-- at revision 7ae557604adf67be50417f59c2c2f167def9a775 -->
236
+ - **Maximum Sequence Length:** 1024 tokens
237
+ - **Output Dimensionality:** 896 dimensions
238
+ - **Similarity Function:** Cosine Similarity
239
+ <!-- - **Training Dataset:** Unknown -->
240
+ <!-- - **Language:** Unknown -->
241
+ <!-- - **License:** Unknown -->
242
+
243
+ ### Model Sources
244
+
245
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
246
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
247
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
248
+
249
+ ### Full Model Architecture
250
+
251
+ ```
252
+ SentenceTransformer(
253
+ (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: Qwen2Model
254
+ (1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
255
+ )
256
+ ```
257
+
258
+ ## Usage
259
+
260
+ ### Direct Usage (Sentence Transformers)
261
+
262
+ First install the Sentence Transformers library:
263
+
264
+ ```bash
265
+ pip install -U sentence-transformers
266
+ ```
267
+
268
+ Then you can load this model and run inference.
269
+ ```python
270
+ from sentence_transformers import SentenceTransformer
271
+
272
+ # Download from the 🤗 Hub
273
+ model = SentenceTransformer("AlexWortega/qwen1k")
274
+ # Run inference
275
+ sentences = [
276
+ 'When did the July Monarchy end?',
277
+ 'July Monarchy\nThe July Monarchy (French: Monarchie de Juillet) was a liberal constitutional monarchy in France under Louis Philippe I, starting with the July Revolution of 1830 and ending with the Revolution of 1848. It marks the end of the Bourbon Restoration (1814–1830). It began with the overthrow of the conservative government of Charles X, the last king of the House of Bourbon.',
278
+ 'July Monarchy\nDespite the return of the House of Bourbon to power, France was much changed from the era of the ancien régime. The egalitarianism and liberalism of the revolutionaries remained an important force and the autocracy and hierarchy of the earlier era could not be fully restored. Economic changes, which had been underway long before the revolution, had progressed further during the years of turmoil and were firmly entrenched by 1815. These changes had seen power shift from the noble landowners to the urban merchants. The administrative reforms of Napoleon, such as the Napoleonic Code and efficient bureaucracy, also remained in place. These changes produced a unified central government that was fiscally sound and had much control over all areas of French life, a sharp difference from the complicated mix of feudal and absolutist traditions and institutions of pre-Revolutionary Bourbons.',
279
+ ]
280
+ embeddings = model.encode(sentences)
281
+ print(embeddings.shape)
282
+ # [3, 896]
283
+
284
+ # Get the similarity scores for the embeddings
285
+ similarities = model.similarity(embeddings, embeddings)
286
+ print(similarities.shape)
287
+ # [3, 3]
288
+ ```
289
+
290
+ <!--
291
+ ### Direct Usage (Transformers)
292
+
293
+ <details><summary>Click to see the direct usage in Transformers</summary>
294
+
295
+ </details>
296
+ -->
297
+
298
+ <!--
299
+ ### Downstream Usage (Sentence Transformers)
300
+
301
+ You can finetune this model on your own dataset.
302
+
303
+ <details><summary>Click to expand</summary>
304
+
305
+ </details>
306
+ -->
307
+
308
+ <!--
309
+ ### Out-of-Scope Use
310
+
311
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
312
+ -->
313
+
314
+ ## Evaluation
315
+
316
+ ### Metrics
317
+
318
+ #### Semantic Similarity
319
+
320
+ * Datasets: `sts-dev-896` and `sts-dev-768`
321
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
322
+
323
+ | Metric | sts-dev-896 | sts-dev-768 |
324
+ |:--------------------|:------------|:------------|
325
+ | pearson_cosine | 0.4573 | 0.4455 |
326
+ | **spearman_cosine** | **0.4965** | **0.4897** |
327
+
328
+ <!--
329
+ ## Bias, Risks and Limitations
330
+
331
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
332
+ -->
333
+
334
+ <!--
335
+ ### Recommendations
336
+
337
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
338
+ -->
339
+
340
+ ## Training Details
341
+
342
+ ### Training Dataset
343
+
344
+ #### Unnamed Dataset
345
+
346
+
347
+ * Size: 2,859,594 training samples
348
+ * Columns: <code>query</code>, <code>response</code>, and <code>negative</code>
349
+ * Approximate statistics based on the first 1000 samples:
350
+ | | query | response | negative |
351
+ |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
352
+ | type | string | string | string |
353
+ | details | <ul><li>min: 4 tokens</li><li>mean: 8.76 tokens</li><li>max: 26 tokens</li></ul> | <ul><li>min: 23 tokens</li><li>mean: 141.88 tokens</li><li>max: 532 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 134.02 tokens</li><li>max: 472 tokens</li></ul> |
354
+ * Samples:
355
+ | query | response | negative |
356
+ |:--------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
357
+ | <code>Was there a year 0?</code> | <code>Year zero<br>Year zero does not exist in the anno Domini system usually used to number years in the Gregorian calendar and in its predecessor, the Julian calendar. In this system, the year 1 BC is followed by AD 1. However, there is a year zero in astronomical year numbering (where it coincides with the Julian year 1 BC) and in ISO 8601:2004 (where it coincides with the Gregorian year 1 BC) as well as in all Buddhist and Hindu calendars.</code> | <code>504<br>Year 504 (DIV) was a leap year starting on Thursday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Nicomachus without colleague (or, less frequently, year 1257 "Ab urbe condita"). The denomination 504 for this year has been used since the early medieval period, when the Anno Domini calendar era became the prevalent method in Europe for naming years.</code> |
358
+ | <code>When is the dialectical method used?</code> | <code>Dialectic<br>Dialectic or dialectics (Greek: διαλεκτική, dialektikḗ; related to dialogue), also known as the dialectical method, is at base a discourse between two or more people holding different points of view about a subject but wishing to establish the truth through reasoned arguments. Dialectic resembles debate, but the concept excludes subjective elements such as emotional appeal and the modern pejorative sense of rhetoric.[1][2] Dialectic may be contrasted with the didactic method, wherein one side of the conversation teaches the other. Dialectic is alternatively known as minor logic, as opposed to major logic or critique.</code> | <code>Derek Bentley case<br>Another factor in the posthumous defence was that a "confession" recorded by Bentley, which was claimed by the prosecution to be a "verbatim record of dictated monologue", was shown by forensic linguistics methods to have been largely edited by policemen. Linguist Malcolm Coulthard showed that certain patterns, such as the frequency of the word "then" and the grammatical use of "then" after the grammatical subject ("I then" rather than "then I"), were not consistent with Bentley's use of language (his idiolect), as evidenced in court testimony. These patterns fit better the recorded testimony of the policemen involved. This is one of the earliest uses of forensic linguistics on record.</code> |
359
+ | <code>What do Grasshoppers eat?</code> | <code>Grasshopper<br>Grasshoppers are plant-eaters, with a few species at times becoming serious pests of cereals, vegetables and pasture, especially when they swarm in their millions as locusts and destroy crops over wide areas. They protect themselves from predators by camouflage; when detected, many species attempt to startle the predator with a brilliantly-coloured wing-flash while jumping and (if adult) launching themselves into the air, usually flying for only a short distance. Other species such as the rainbow grasshopper have warning coloration which deters predators. Grasshoppers are affected by parasites and various diseases, and many predatory creatures feed on both nymphs and adults. The eggs are the subject of attack by parasitoids and predators.</code> | <code>Groundhog<br>Very often the dens of groundhogs provide homes for other animals including skunks, red foxes, and cottontail rabbits. The fox and skunk feed upon field mice, grasshoppers, beetles and other creatures that destroy farm crops. In aiding these animals, the groundhog indirectly helps the farmer. In addition to providing homes for itself and other animals, the groundhog aids in soil improvement by bringing subsoil to the surface. The groundhog is also a valuable game animal and is considered a difficult sport when hunted in a fair manner. In some parts of Appalachia, they are eaten.</code> |
360
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
361
+ ```json
362
+ {
363
+ "loss": "MultipleNegativesRankingLoss",
364
+ "matryoshka_dims": [
365
+ 896,
366
+ 768
367
+ ],
368
+ "matryoshka_weights": [
369
+ 1,
370
+ 1
371
+ ],
372
+ "n_dims_per_step": -1
373
+ }
374
+ ```
375
+
376
+ ### Training Hyperparameters
377
+ #### Non-Default Hyperparameters
378
+
379
+ - `eval_strategy`: steps
380
+ - `per_device_train_batch_size`: 12
381
+ - `per_device_eval_batch_size`: 12
382
+ - `gradient_accumulation_steps`: 4
383
+ - `num_train_epochs`: 1
384
+ - `warmup_ratio`: 0.3
385
+ - `bf16`: True
386
+ - `batch_sampler`: no_duplicates
387
+
388
+ #### All Hyperparameters
389
+ <details><summary>Click to expand</summary>
390
+
391
+ - `overwrite_output_dir`: False
392
+ - `do_predict`: False
393
+ - `eval_strategy`: steps
394
+ - `prediction_loss_only`: True
395
+ - `per_device_train_batch_size`: 12
396
+ - `per_device_eval_batch_size`: 12
397
+ - `per_gpu_train_batch_size`: None
398
+ - `per_gpu_eval_batch_size`: None
399
+ - `gradient_accumulation_steps`: 4
400
+ - `eval_accumulation_steps`: None
401
+ - `torch_empty_cache_steps`: None
402
+ - `learning_rate`: 5e-05
403
+ - `weight_decay`: 0.0
404
+ - `adam_beta1`: 0.9
405
+ - `adam_beta2`: 0.999
406
+ - `adam_epsilon`: 1e-08
407
+ - `max_grad_norm`: 1.0
408
+ - `num_train_epochs`: 1
409
+ - `max_steps`: -1
410
+ - `lr_scheduler_type`: linear
411
+ - `lr_scheduler_kwargs`: {}
412
+ - `warmup_ratio`: 0.3
413
+ - `warmup_steps`: 0
414
+ - `log_level`: passive
415
+ - `log_level_replica`: warning
416
+ - `log_on_each_node`: True
417
+ - `logging_nan_inf_filter`: True
418
+ - `save_safetensors`: True
419
+ - `save_on_each_node`: False
420
+ - `save_only_model`: False
421
+ - `restore_callback_states_from_checkpoint`: False
422
+ - `no_cuda`: False
423
+ - `use_cpu`: False
424
+ - `use_mps_device`: False
425
+ - `seed`: 42
426
+ - `data_seed`: None
427
+ - `jit_mode_eval`: False
428
+ - `use_ipex`: False
429
+ - `bf16`: True
430
+ - `fp16`: False
431
+ - `fp16_opt_level`: O1
432
+ - `half_precision_backend`: auto
433
+ - `bf16_full_eval`: False
434
+ - `fp16_full_eval`: False
435
+ - `tf32`: None
436
+ - `local_rank`: 0
437
+ - `ddp_backend`: None
438
+ - `tpu_num_cores`: None
439
+ - `tpu_metrics_debug`: False
440
+ - `debug`: []
441
+ - `dataloader_drop_last`: False
442
+ - `dataloader_num_workers`: 0
443
+ - `dataloader_prefetch_factor`: None
444
+ - `past_index`: -1
445
+ - `disable_tqdm`: False
446
+ - `remove_unused_columns`: True
447
+ - `label_names`: None
448
+ - `load_best_model_at_end`: False
449
+ - `ignore_data_skip`: False
450
+ - `fsdp`: []
451
+ - `fsdp_min_num_params`: 0
452
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
453
+ - `fsdp_transformer_layer_cls_to_wrap`: None
454
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
455
+ - `deepspeed`: None
456
+ - `label_smoothing_factor`: 0.0
457
+ - `optim`: adamw_torch
458
+ - `optim_args`: None
459
+ - `adafactor`: False
460
+ - `group_by_length`: False
461
+ - `length_column_name`: length
462
+ - `ddp_find_unused_parameters`: None
463
+ - `ddp_bucket_cap_mb`: None
464
+ - `ddp_broadcast_buffers`: False
465
+ - `dataloader_pin_memory`: True
466
+ - `dataloader_persistent_workers`: False
467
+ - `skip_memory_metrics`: True
468
+ - `use_legacy_prediction_loop`: False
469
+ - `push_to_hub`: False
470
+ - `resume_from_checkpoint`: None
471
+ - `hub_model_id`: None
472
+ - `hub_strategy`: every_save
473
+ - `hub_private_repo`: False
474
+ - `hub_always_push`: False
475
+ - `gradient_checkpointing`: False
476
+ - `gradient_checkpointing_kwargs`: None
477
+ - `include_inputs_for_metrics`: False
478
+ - `include_for_metrics`: []
479
+ - `eval_do_concat_batches`: True
480
+ - `fp16_backend`: auto
481
+ - `push_to_hub_model_id`: None
482
+ - `push_to_hub_organization`: None
483
+ - `mp_parameters`:
484
+ - `auto_find_batch_size`: False
485
+ - `full_determinism`: False
486
+ - `torchdynamo`: None
487
+ - `ray_scope`: last
488
+ - `ddp_timeout`: 1800
489
+ - `torch_compile`: False
490
+ - `torch_compile_backend`: None
491
+ - `torch_compile_mode`: None
492
+ - `dispatch_batches`: None
493
+ - `split_batches`: None
494
+ - `include_tokens_per_second`: False
495
+ - `include_num_input_tokens_seen`: False
496
+ - `neftune_noise_alpha`: None
497
+ - `optim_target_modules`: None
498
+ - `batch_eval_metrics`: False
499
+ - `eval_on_start`: False
500
+ - `use_liger_kernel`: False
501
+ - `eval_use_gather_object`: False
502
+ - `average_tokens_across_devices`: False
503
+ - `prompts`: None
504
+ - `batch_sampler`: no_duplicates
505
+ - `multi_dataset_batch_sampler`: proportional
506
+
507
+ </details>
508
+
509
+ ### Training Logs
510
+ | Epoch | Step | Training Loss | sts-dev-896_spearman_cosine | sts-dev-768_spearman_cosine |
511
+ |:------:|:----:|:-------------:|:---------------------------:|:---------------------------:|
512
+ | 0.0002 | 10 | 4.4351 | - | - |
513
+ | 0.0003 | 20 | 4.6508 | - | - |
514
+ | 0.0005 | 30 | 4.7455 | - | - |
515
+ | 0.0007 | 40 | 4.5427 | - | - |
516
+ | 0.0008 | 50 | 4.3982 | - | - |
517
+ | 0.0010 | 60 | 4.3755 | - | - |
518
+ | 0.0012 | 70 | 4.4105 | - | - |
519
+ | 0.0013 | 80 | 5.2227 | - | - |
520
+ | 0.0015 | 90 | 5.8062 | - | - |
521
+ | 0.0017 | 100 | 5.7645 | - | - |
522
+ | 0.0018 | 110 | 5.9261 | - | - |
523
+ | 0.0020 | 120 | 5.8301 | - | - |
524
+ | 0.0022 | 130 | 5.7602 | - | - |
525
+ | 0.0023 | 140 | 5.9392 | - | - |
526
+ | 0.0025 | 150 | 5.7523 | - | - |
527
+ | 0.0027 | 160 | 5.8585 | - | - |
528
+ | 0.0029 | 170 | 5.7916 | - | - |
529
+ | 0.0030 | 180 | 5.8157 | - | - |
530
+ | 0.0032 | 190 | 5.7102 | - | - |
531
+ | 0.0034 | 200 | 5.5844 | - | - |
532
+ | 0.0035 | 210 | 5.5463 | - | - |
533
+ | 0.0037 | 220 | 5.5823 | - | - |
534
+ | 0.0039 | 230 | 5.5514 | - | - |
535
+ | 0.0040 | 240 | 5.5646 | - | - |
536
+ | 0.0042 | 250 | 5.5783 | - | - |
537
+ | 0.0044 | 260 | 5.5344 | - | - |
538
+ | 0.0045 | 270 | 5.523 | - | - |
539
+ | 0.0047 | 280 | 5.4969 | - | - |
540
+ | 0.0049 | 290 | 5.5407 | - | - |
541
+ | 0.0050 | 300 | 5.6171 | - | - |
542
+ | 0.0052 | 310 | 5.5581 | - | - |
543
+ | 0.0054 | 320 | 5.8903 | - | - |
544
+ | 0.0055 | 330 | 5.8675 | - | - |
545
+ | 0.0057 | 340 | 5.745 | - | - |
546
+ | 0.0059 | 350 | 5.6041 | - | - |
547
+ | 0.0060 | 360 | 5.5476 | - | - |
548
+ | 0.0062 | 370 | 5.3964 | - | - |
549
+ | 0.0064 | 380 | 5.3564 | - | - |
550
+ | 0.0065 | 390 | 5.3054 | - | - |
551
+ | 0.0067 | 400 | 5.2779 | - | - |
552
+ | 0.0069 | 410 | 5.206 | - | - |
553
+ | 0.0070 | 420 | 5.2168 | - | - |
554
+ | 0.0072 | 430 | 5.1645 | - | - |
555
+ | 0.0074 | 440 | 5.1797 | - | - |
556
+ | 0.0076 | 450 | 5.2526 | - | - |
557
+ | 0.0077 | 460 | 5.1768 | - | - |
558
+ | 0.0079 | 470 | 5.3519 | - | - |
559
+ | 0.0081 | 480 | 5.2982 | - | - |
560
+ | 0.0082 | 490 | 5.3229 | - | - |
561
+ | 0.0084 | 500 | 5.3758 | - | - |
562
+ | 0.0086 | 510 | 5.2478 | - | - |
563
+ | 0.0087 | 520 | 5.1799 | - | - |
564
+ | 0.0089 | 530 | 5.1088 | - | - |
565
+ | 0.0091 | 540 | 4.977 | - | - |
566
+ | 0.0092 | 550 | 4.9108 | - | - |
567
+ | 0.0094 | 560 | 4.811 | - | - |
568
+ | 0.0096 | 570 | 4.7203 | - | - |
569
+ | 0.0097 | 580 | 4.6499 | - | - |
570
+ | 0.0099 | 590 | 4.4548 | - | - |
571
+ | 0.0101 | 600 | 4.2891 | - | - |
572
+ | 0.0102 | 610 | 4.1881 | - | - |
573
+ | 0.0104 | 620 | 4.6 | - | - |
574
+ | 0.0106 | 630 | 4.5365 | - | - |
575
+ | 0.0107 | 640 | 4.3086 | - | - |
576
+ | 0.0109 | 650 | 4.0452 | - | - |
577
+ | 0.0111 | 660 | 3.9041 | - | - |
578
+ | 0.0112 | 670 | 4.3938 | - | - |
579
+ | 0.0114 | 680 | 4.3198 | - | - |
580
+ | 0.0116 | 690 | 4.1294 | - | - |
581
+ | 0.0117 | 700 | 4.077 | - | - |
582
+ | 0.0119 | 710 | 3.9174 | - | - |
583
+ | 0.0121 | 720 | 4.1629 | - | - |
584
+ | 0.0123 | 730 | 3.9611 | - | - |
585
+ | 0.0124 | 740 | 3.7768 | - | - |
586
+ | 0.0126 | 750 | 3.5842 | - | - |
587
+ | 0.0128 | 760 | 3.1196 | - | - |
588
+ | 0.0129 | 770 | 3.6288 | - | - |
589
+ | 0.0131 | 780 | 3.273 | - | - |
590
+ | 0.0133 | 790 | 2.7889 | - | - |
591
+ | 0.0134 | 800 | 2.5096 | - | - |
592
+ | 0.0136 | 810 | 1.8878 | - | - |
593
+ | 0.0138 | 820 | 2.3423 | - | - |
594
+ | 0.0139 | 830 | 1.7687 | - | - |
595
+ | 0.0141 | 840 | 2.0781 | - | - |
596
+ | 0.0143 | 850 | 2.4598 | - | - |
597
+ | 0.0144 | 860 | 1.7667 | - | - |
598
+ | 0.0146 | 870 | 2.6247 | - | - |
599
+ | 0.0148 | 880 | 1.916 | - | - |
600
+ | 0.0149 | 890 | 2.0817 | - | - |
601
+ | 0.0151 | 900 | 2.3679 | - | - |
602
+ | 0.0153 | 910 | 1.418 | - | - |
603
+ | 0.0154 | 920 | 2.7353 | - | - |
604
+ | 0.0156 | 930 | 1.992 | - | - |
605
+ | 0.0158 | 940 | 1.4564 | - | - |
606
+ | 0.0159 | 950 | 1.4154 | - | - |
607
+ | 0.0161 | 960 | 0.9499 | - | - |
608
+ | 0.0163 | 970 | 1.6304 | - | - |
609
+ | 0.0164 | 980 | 0.9264 | - | - |
610
+ | 0.0166 | 990 | 1.3278 | - | - |
611
+ | 0.0168 | 1000 | 1.686 | 0.4965 | 0.4897 |
612
+
613
+
614
+ ### Framework Versions
615
+ - Python: 3.10.12
616
+ - Sentence Transformers: 3.3.0
617
+ - Transformers: 4.46.2
618
+ - PyTorch: 2.1.0+cu118
619
+ - Accelerate: 1.1.1
620
+ - Datasets: 3.1.0
621
+ - Tokenizers: 0.20.3
622
+
623
+ ## Citation
624
+
625
+ ### BibTeX
626
+
627
+ #### Sentence Transformers
628
+ ```bibtex
629
+ @inproceedings{reimers-2019-sentence-bert,
630
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
631
+ author = "Reimers, Nils and Gurevych, Iryna",
632
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
633
+ month = "11",
634
+ year = "2019",
635
+ publisher = "Association for Computational Linguistics",
636
+ url = "https://arxiv.org/abs/1908.10084",
637
+ }
638
+ ```
639
+
640
+ #### MatryoshkaLoss
641
+ ```bibtex
642
+ @misc{kusupati2024matryoshka,
643
+ title={Matryoshka Representation Learning},
644
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
645
+ year={2024},
646
+ eprint={2205.13147},
647
+ archivePrefix={arXiv},
648
+ primaryClass={cs.LG}
649
+ }
650
+ ```
651
+
652
+ #### MultipleNegativesRankingLoss
653
+ ```bibtex
654
+ @misc{henderson2017efficient,
655
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
656
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
657
+ year={2017},
658
+ eprint={1705.00652},
659
+ archivePrefix={arXiv},
660
+ primaryClass={cs.CL}
661
+ }
662
+ ```
663
+
664
+ <!--
665
+ ## Glossary
666
+
667
+ *Clearly define terms in order to be accessible across audiences.*
668
+ -->
669
+
670
+ <!--
671
+ ## Model Card Authors
672
+
673
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
674
+ -->
675
+
676
+ <!--
677
+ ## Model Card Contact
678
+
679
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
680
  -->