File size: 52,142 Bytes
3380355
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
---
base_model: sentence-transformers/all-mpnet-base-v2
datasets: []
language: []
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:129
- loss:CoSENTLoss
widget:
- source_sentence: 'traces historical and scientific advancement of our understanding
    of earths cosmic context  introduces basic physical principles by which planets
    form and create their associated features of rings satellites diverse landscapes
    atmospheres and climates  includes the physics of asteroids and comets and their
    orbital characteristics and links to meteorites  considers one of the most fundamental
    questions  whether or not we are alone  by detailing the scientific exploration
    goals to be achieved at the moon mars and beyond '
  sentences:
  - 'this is an introduction to the study of the solar system with emphasis on the
    latest spacecraft results  the subject covers basic principles rather than detailed
    mathematical and physical models topics include an overview of the solar system  planetary
    orbits  rings planetary formation meteorites asteroids comets  planetary surfaces
    and cratering planetary interiors  planetary atmospheres and life in the solar
    system '
  - 'in this course describes the largescale circulation systems of the tropical atmosphere
    are used to infer the nalyses the dynamics of such systems  the course includes
    phase equilibria of homogeneous and heterogeneous systems and thermodynamic modeling
    of nonideal crystalline solutions  it also surveys the processes that lead to
    the formation of metamorphic and igneous rocks in the major tectonic environments
    in the earths crust and mantle '
  - this introductory course presents a basic study in oceanography and the utilization
    of seismic waves for the study of ocean it introduces techniques necessary for
    understanding of elastic wave propagation in layered media
- source_sentence: introduction to the physics of atmospheric radiation remote sensing
    and convection  including use of computer codes  risotopic contents occurrence
    in modern organisms and environments diagenetic pathways analytical techniques  physics
    of dry and moist convection including moist thermodynamics  radiativeconvective
    equilibrium  solution of inverse problems in remote sensing of atmospheric temperature
    and composition  students taking the graduate version complete additional assignments
  sentences:
  - the aim of this course is to introduce the principles of geostatistics and to
    demonstrate its application to various aspects of earth sciences  the specific
    content of the course depends each year on the interests of the students in the
    class in some cases the class interests are towards the spatial sampling for statistical
    analysis and we concentrate on sample augmentation in other cases the interests
    have been more toward engineering applications of kinematic positioning with gps
    in which case the concentration is on positioning with slightly less accuracy
    but being able to do so for a moving object  in all cases we concentrate on the
    fundamental issues so that students should gain an understanding of the basic
    limitations of the system and how to extend its application to areas not yet fully
    explored
  - 'this is an introduction to the principles of thermodynamics including use of
    computer codes  subjects covered include physical conditions of formation and
    modification of igneous and metamorphic rocks including emission and scattering
    spectroscopy mie theory and numerical solutions  we examine the solution of inverse
    problems in remote sensing of atmospheric temperature and composition '
  - 'this course presents the phenomena theory and modeling of turbulence in the earths
    oceans and atmosphere  the scope ranges from centimeter to planetary scale motions  the
    regimes of turbulence include homogeneous isotropic threedimensional turbulence  convection  quasigeostrophic
    turbulence  shallow water turbulence  baroclinic turbulence  and macroturbulence
    in the ocean and atmosphere '
- source_sentence: 'introduction on the interactive earth system  biology in geologic
    environmental and climate change throughout earths history introduces the concept
    of life as a geological agent and examines the interaction between biology and
    the earth system during the roughly 4 billion years since life first appeared
    topics include the origin of the solar system and the early earth atmosphere  the
    origin and evolution of life and its influence on climate up through and including
    the modern age and the problem of global warming  the global carbon cycle  and
    astrobiology '
  sentences:
  - this course introduces the parallel evolution of life and the environment  life
    processes are influenced by chemical and physical processes in the atmosphere
    hydrosphere cryosphere and the solid earth  in turn life can influence chemical
    and physical processes on our planet this course explores the concept of life
    as a geological agent and examines the interaction between biology and the earth
    system during the roughly 4 billion years since life first appeared
  - this undergraduate class is designed to introduce students to the physics that
    govern the earthquakes  the focus of the course is on the processes that control
    the earthquake intensity of the planet the course demonstrates underlying mechanisms
    through computare simulations and modeling of atmospheric and oceanic data
  - 'the electron microprobe provides a complete micrometerscale emission of electromagnetic
    radiation by atoms solids  the method is nondestructive and utilizes characteristic
    xrays excited by an electron beam incident on a flat surface of the sample this
    course provides an introduction to the sensors and digital imagery through wavelength
    and energy dispersive spectrometry wds and eds  zaf matrix correction procedures
    and scanning electron imaging with backscattered electron bse  secondary electron
    se  xray using wds or eds elemental mapping  and cathodoluminescence cl  lab sessions
    involve handson use of the jeol jxa8200 superprobe '
- source_sentence: classical mechanics in a computational framework  lagrangian formulation  action
    variational principles and hamiltons principle  conserved quantities hamiltonian
    formulation surfaces of section chaos and liouvilles theorem  poincaré integral
    invariants poincarébirkhoff and kam theorems  invariant curves and cantori  nonlinear
    resonances resonance overlap and transition to chaos  symplectic integration  adiabatic
    invariants  applications to simple physical systems and solar system dynamics  extensive
    use of computation to capture methods for simulation and for symbolic analysis  programming
    experience required level of difficulty
  sentences:
  - 'we will study the fundamental principles of classical mechanics  with a modern
    emphasis on the qualitative structure of phase space  we will use computational
    ideas to formulate the principles of mechanics precisely expression in a computational
    framework encourages clear thinking and active exploration we will consider the
    following topics lagrangian formulation action variational principles and equations
    of motion  hamiltons principle conserved quantities rigid bodies and tops  hamiltonian
    formulation and canonical equations  surfaces of section chaos canonical transformations
    and generating functions  liouvilles theorem and poincaré integral invariants  poincarébirkhoff
    and kam theorems  invariant curves and cantori  nonlinear resonances  resonance
    overlap and transition to chaos properties of chaotic motion  ideas will be illustrated
    and supported with physical examples  we will make extensive use of computing
    to capture methods for simulation and for symbolic analysis '
  - 'this course covers the basic principles of planet atmospheres and interiors applied
    to the study of extrasolar planets exoplanets  we focus on fundamental physical
    processes related to observable exoplanet properties  we also provide a quantitative
    overview of detection techniques and an introduction to the feasibility of the
    search for earthlike planets biosignatures and habitable conditions on exoplanets '
  - this course introduces the parallel evolution of life and the environment  life
    processes are influenced by volcano magnitude in the atmosphere hydrosphere cryosphere
    and the solid earth  in turn life can influence volcano occurrences on our planet
    this course explores the concept of volcano predictions and examines the interaction
    between biology and the earth system during the roughly 4 billion years since
    life first appeared
- source_sentence: examines the fundamentals of sedimentary deposits and geological
    reasoning through first hand fieldwork students practice methods of modern geological
    field study offcampus during a required trip over spring break making field observations
    measuring stratigraphic sections and making a sedimentological map relevant topics
    introduced are map and figure making in arcgis and adobe illustrator and sedimentary
    petrology  culminates in an oral and written report built around data gathered
    in the field field sites and ice core isotope data studied rotate annually and
    include atmospheric composition volcanic eruptions dust storms even wind patterns
    satisfies 6 units of institute laboratory credit may be taken multiple times for
    credit students taking graduate version complete additional assignments
  sentences:
  - 'this class examines tools data and ideas related to past climate changes as seen
    in flood maps  the most recent climate changes mainly the past 500000 years ranging
    up to about 2 million years ago will be emphasized numerical models for the examination
    of rainfall data will be introduced eg statistics factor analysis time series
    analysis simple climatology  '
  - this introductory course presents a basic study in seismology and the utilization
    of seismic waves for the study of earths interior it introduces techniques necessary
    for understanding of elastic wave propagation in layered media
  - this course covers sediments in the rock cycle production of sediments at the
    earths surface physics and chemistry of sedimentary materials and scale and geometry
    of nearsurface sedimentary bodies including aquifers we will also explore topics
    like sediment transport and deposition in modern sedimentary environments burial
    and lithification survey of major sedimentary rock types stratigraphic relationships
    of sedimentary basins and evolution of sedimentary processes through geologic
    time this course satisfies 6 units of highschool laboratory credit and may be
    taken multiple times for credit students will be introduced to python and qgis
    as part of their studies
model-index:
- name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
  results:
  - task:
      type: semantic-similarity
      name: Semantic Similarity
    dataset:
      name: fair oer dev
      type: fair-oer-dev
    metrics:
    - type: pearson_cosine
      value: 0.6766633081596867
      name: Pearson Cosine
    - type: spearman_cosine
      value: 0.7004537271955967
      name: Spearman Cosine
    - type: pearson_manhattan
      value: 0.6766701961023414
      name: Pearson Manhattan
    - type: spearman_manhattan
      value: 0.7118775018619872
      name: Spearman Manhattan
    - type: pearson_euclidean
      value: 0.6774930713812672
      name: Pearson Euclidean
    - type: spearman_euclidean
      value: 0.7004537271955967
      name: Spearman Euclidean
    - type: pearson_dot
      value: 0.6766633663251878
      name: Pearson Dot
    - type: spearman_dot
      value: 0.7004537271955967
      name: Spearman Dot
    - type: pearson_max
      value: 0.6774930713812672
      name: Pearson Max
    - type: spearman_max
      value: 0.7118775018619872
      name: Spearman Max
  - task:
      type: semantic-similarity
      name: Semantic Similarity
    dataset:
      name: fair oer test
      type: fair-oer-test
    metrics:
    - type: pearson_cosine
      value: 0.7409764421917553
      name: Pearson Cosine
    - type: spearman_cosine
      value: 0.7473025735565767
      name: Spearman Cosine
    - type: pearson_manhattan
      value: 0.7363301285462346
      name: Pearson Manhattan
    - type: spearman_manhattan
      value: 0.7390870824057955
      name: Spearman Manhattan
    - type: pearson_euclidean
      value: 0.7413213451539604
      name: Pearson Euclidean
    - type: spearman_euclidean
      value: 0.7473025735565767
      name: Spearman Euclidean
    - type: pearson_dot
      value: 0.7409764734754448
      name: Pearson Dot
    - type: spearman_dot
      value: 0.7473025735565767
      name: Spearman Dot
    - type: pearson_max
      value: 0.7413213451539604
      name: Pearson Max
    - type: spearman_max
      value: 0.7473025735565767
      name: Spearman Max
---

# SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 84f2bcc00d77236f9e89c8a360a00fb1139bf47d -->
- **Maximum Sequence Length:** 384 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'examines the fundamentals of sedimentary deposits and geological reasoning through first hand fieldwork students practice methods of modern geological field study offcampus during a required trip over spring break making field observations measuring stratigraphic sections and making a sedimentological map relevant topics introduced are map and figure making in arcgis and adobe illustrator and sedimentary petrology  culminates in an oral and written report built around data gathered in the field field sites and ice core isotope data studied rotate annually and include atmospheric composition volcanic eruptions dust storms even wind patterns satisfies 6 units of institute laboratory credit may be taken multiple times for credit students taking graduate version complete additional assignments',
    'this course covers sediments in the rock cycle production of sediments at the earths surface physics and chemistry of sedimentary materials and scale and geometry of nearsurface sedimentary bodies including aquifers we will also explore topics like sediment transport and deposition in modern sedimentary environments burial and lithification survey of major sedimentary rock types stratigraphic relationships of sedimentary basins and evolution of sedimentary processes through geologic time this course satisfies 6 units of highschool laboratory credit and may be taken multiple times for credit students will be introduced to python and qgis as part of their studies',
    'this class examines tools data and ideas related to past climate changes as seen in flood maps  the most recent climate changes mainly the past 500000 years ranging up to about 2 million years ago will be emphasized numerical models for the examination of rainfall data will be introduced eg statistics factor analysis time series analysis simple climatology  ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Semantic Similarity
* Dataset: `fair-oer-dev`
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| pearson_cosine      | 0.6767     |
| **spearman_cosine** | **0.7005** |
| pearson_manhattan   | 0.6767     |
| spearman_manhattan  | 0.7119     |
| pearson_euclidean   | 0.6775     |
| spearman_euclidean  | 0.7005     |
| pearson_dot         | 0.6767     |
| spearman_dot        | 0.7005     |
| pearson_max         | 0.6775     |
| spearman_max        | 0.7119     |

#### Semantic Similarity
* Dataset: `fair-oer-test`
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| pearson_cosine      | 0.741      |
| **spearman_cosine** | **0.7473** |
| pearson_manhattan   | 0.7363     |
| spearman_manhattan  | 0.7391     |
| pearson_euclidean   | 0.7413     |
| spearman_euclidean  | 0.7473     |
| pearson_dot         | 0.741      |
| spearman_dot        | 0.7473     |
| pearson_max         | 0.7413     |
| spearman_max        | 0.7473     |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset


* Size: 129 training samples
* Columns: <code>description-mit</code>, <code>description-ocw</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
  |         | description-mit                                                                      | description-ocw                                                                     | label                                                            |
  |:--------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------|
  | type    | string                                                                               | string                                                                              | float                                                            |
  | details | <ul><li>min: 28 tokens</li><li>mean: 104.74 tokens</li><li>max: 164 tokens</li></ul> | <ul><li>min: 36 tokens</li><li>mean: 90.01 tokens</li><li>max: 239 tokens</li></ul> | <ul><li>min: 0.05</li><li>mean: 0.53</li><li>max: 0.95</li></ul> |
* Samples:
  | description-mit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | description-ocw                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | label                          |
  |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------|
  | <code>covers the basic concepts of sedimentation from the properties of individual grains to largescale basin analysis  lectures cover sediment textures and composition fluid flow and sediment transport  and formation of sedimentary structures  depositional models  for both modern and ancient environments are a major component and are studied in detail with an eye toward interpretation of depositional processes and reconstructing ecological dynamics from the rock record satisfies 6 units of institute laboratory credit level of difficulty students taking graduate version complete additional assignments students will explore siliciclastic and carbonate diagenesis and paleontology with a focus on fossils in sedimentary rocks</code> | <code>survey of the basic aspects of modern sediments and ancient sedimentary rocks  emphasis is on fundamental materials features and processes textures of ice fraction and ice rocks  size shape and packing mechanics of ice transport  survey of siliciclastic sedimentary rocks  sandstones conglomerates and shales carbonate sediments and sedimentary rocks  cherts evaporites siliciclastic and carbonate diagenesis  paleontology  with special reference to fossils in sedimentary rocks modern and ancient depositional environments  sedimentary basins  fossil fuels  coal petroleumcovers 6 institute laboratory credit units</code>                      | <code>0.5</code>               |
  | <code>provides a comprehensive introduction to crystalline structure crystal chemistry and bonding in rockforming minerals  introduces the theory relating crystal structure and crystal symmetry to physical properties such as refractive index elastic modulus and seismic velocity  surveys the distribution of silicate oxide and metallic minerals in the interiors and on the surfaces of planets  and discusses the processes that led to their formation </code>                                                                                                                                                                                                                                                                                          | <code>this course provides a comprehensive introduction to crystalline structure crystal chemistry and bonding in rockforming minerals  it introduces the theory relating crystal structure and crystal symmetry to physical properties such as refractive index elastic modulus and seismic velocity  it surveys the distribution of silicate oxide and metallic minerals in the interiors and on the surfaces of planets  and discusses the processes that led to their formation  it also addresses why diamonds are hard and why micas split into thin sheets </code>                                                                                                 | <code>0.949999988079071</code> |
  | <code>introduction to the theory of xray microanalysis through the electron microprobe including zaf matrix corrections  techniques to be discussed are wavelength and energy dispersive spectrometry  scanning backscattered electron  secondary electron  cathodoluminescence  and xray imaging  lab sessions involve the use of the electron microprobe  the method is nondestructive and utilizes characteristic xrays excited by an electron beam incident on a flat surface of the sample lab sessions provide handson experience with the jeol jxa8200 superprobe</code>                                                                                                                                                                                    | <code>the electron microprobe provides a complete micrometerscale quantitative chemical analysis of inorganic solids  the method is nondestructive and utilizes characteristic xrays excited by an electron beam incident on a flat surface of the sample this course provides an introduction to the theory of xray microanalysis through wavelength and energy dispersive spectrometry wds and eds  zaf matrix correction procedures and scanning electron imaging with backscattered electron bse  secondary electron se  xray using wds or eds elemental mapping  and cathodoluminescence cl  lab sessions involve handson use of the jeol jxa8200 superprobe </code> | <code>0.949999988079071</code> |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "pairwise_cos_sim"
  }
  ```

### Evaluation Dataset

#### Unnamed Dataset


* Size: 43 evaluation samples
* Columns: <code>description-mit</code>, <code>description-ocw</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
  |         | description-mit                                                                     | description-ocw                                                                     | label                                                            |
  |:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------|
  | type    | string                                                                              | string                                                                              | float                                                            |
  | details | <ul><li>min: 51 tokens</li><li>mean: 95.84 tokens</li><li>max: 150 tokens</li></ul> | <ul><li>min: 36 tokens</li><li>mean: 83.28 tokens</li><li>max: 175 tokens</li></ul> | <ul><li>min: 0.05</li><li>mean: 0.53</li><li>max: 0.95</li></ul> |
* Samples:
  | description-mit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | description-ocw                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | label                           |
  |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------|
  | <code>presents basic principles of planet atmospheres and interiors applied to the study of extrasolar planets  focuses on fundamental physical processes related to observable extrasolar planet properties  provides a quantitative overview of detection techniques  introduction to the feasibility of the search for earthlike planets biosignatures and habitable conditions on extrasolar planets  students taking graduate version complete additional assignments level of difficulty</code>                                                                                                                                                                                                                                                              | <code>this course covers the basic principles of planet atmospheres and interiors applied to the study of extrasolar planets exoplanets  we focus on fundamental physical processes related to observable exoplanet properties  we also provide a quantitative overview of detection techniques and an introduction to the feasibility of the search for earthlike planets biosignatures and habitable conditions on exoplanets </code>                                                                                                                                                                                                                                                   | <code>0.6499999761581421</code> |
  | <code>presents basic principles of planet atmospheres and interiors applied to the study of extrasolar planets  focuses on fundamental physical processes related to observable extrasolar planet properties  provides a quantitative overview of detection techniques  introduction to the feasibility of the search for earthlike planets biosignatures and habitable conditions on extrasolar planets  students taking graduate version complete additional assignments level of difficulty</code>                                                                                                                                                                                                                                                              | <code>this course covers the survey of the various subdisciplines of geophysics applied to the study of geodesy gravity geomagnetism seismology and geodynamics exoplanets  we focus on fundamental physical processes related to observable exoplanet properties  we also provide a quantitative overview of detection techniques and an introduction to the feasibility of the search for earthlike planets biosignatures and habitable conditions on exoplanets </code>                                                                                                                                                                                                                | <code>0.6499999761581421</code> |
  | <code>covers the basic concepts of sedimentation from the properties of individual grains to largescale basin analysis  lectures cover sediment textures and composition fluid flow and sediment transport  and formation of sedimentary structures  depositional models  for both modern and ancient environments are a major component and are studied in detail with an eye toward interpretation of depositional processes and reconstructing ecological dynamics from the rock record satisfies 6 units of institute laboratory credit level of difficulty students taking graduate version complete additional assignments students will explore siliciclastic and carbonate diagenesis and paleontology with a focus on fossils in sedimentary rocks</code> | <code>survey of the basic aspects of wave motion  flow instability  and turbulence  emphasis is on fundamental materials features and processes textures of siliciclastic sediments and sedimentary rocks  particle size particle shape and particle packing mechanics of sediment transport  survey of the dynamics of surface and internal gravity waves poincare waves kelvin waves and topographic waves  siliciclastic and carbonate diagenesis  paleontology  with special reference to fossils in sedimentary rocks modern and ancient depositional environments  stratigraphy  sedimentary basins  fossil fuels  coal petroleum covers 6 institute laboratory credit units</code> | <code>0.5</code>                |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "pairwise_cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: epoch
- `per_device_train_batch_size`: 256
- `per_device_eval_batch_size`: 256
- `num_train_epochs`: 107
- `warmup_ratio`: 0.1
- `fp16`: True

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: epoch
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 256
- `per_device_eval_batch_size`: 256
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 107
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `eval_use_gather_object`: False
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
<details><summary>Click to expand</summary>

| Epoch | Step | loss    | fair-oer-dev_spearman_cosine | fair-oer-test_spearman_cosine |
|:-----:|:----:|:-------:|:----------------------------:|:-----------------------------:|
| 1.0   | 1    | 9.7759  | 0.6292                       | -                             |
| 2.0   | 2    | 9.6581  | 0.6341                       | -                             |
| 3.0   | 3    | 9.4181  | 0.6271                       | -                             |
| 4.0   | 4    | 9.0745  | 0.6420                       | -                             |
| 5.0   | 5    | 8.6646  | 0.6524                       | -                             |
| 6.0   | 6    | 8.2165  | 0.6679                       | -                             |
| 7.0   | 7    | 7.8114  | 0.6680                       | -                             |
| 8.0   | 8    | 7.5601  | 0.6633                       | -                             |
| 9.0   | 9    | 7.3990  | 0.6423                       | -                             |
| 10.0  | 10   | 7.2400  | 0.6330                       | -                             |
| 11.0  | 11   | 7.1190  | 0.6068                       | -                             |
| 12.0  | 12   | 7.0208  | 0.5861                       | -                             |
| 13.0  | 13   | 6.9463  | 0.6038                       | -                             |
| 14.0  | 14   | 6.8670  | 0.6043                       | -                             |
| 15.0  | 15   | 6.7977  | 0.5943                       | -                             |
| 16.0  | 16   | 6.7435  | 0.6127                       | -                             |
| 17.0  | 17   | 6.7221  | 0.6160                       | -                             |
| 18.0  | 18   | 6.7073  | 0.6420                       | -                             |
| 19.0  | 19   | 6.7120  | 0.6702                       | -                             |
| 20.0  | 20   | 6.7506  | 0.6674                       | -                             |
| 21.0  | 21   | 6.7998  | 0.6736                       | -                             |
| 22.0  | 22   | 6.9053  | 0.6776                       | -                             |
| 23.0  | 23   | 7.0869  | 0.6684                       | -                             |
| 24.0  | 24   | 7.3077  | 0.6663                       | -                             |
| 25.0  | 25   | 7.5744  | 0.6385                       | -                             |
| 26.0  | 26   | 7.8442  | 0.6467                       | -                             |
| 27.0  | 27   | 8.0424  | 0.6428                       | -                             |
| 28.0  | 28   | 8.1636  | 0.6482                       | -                             |
| 29.0  | 29   | 8.2419  | 0.6555                       | -                             |
| 30.0  | 30   | 8.2826  | 0.6661                       | -                             |
| 31.0  | 31   | 8.3410  | 0.6719                       | -                             |
| 32.0  | 32   | 8.3956  | 0.6678                       | -                             |
| 33.0  | 33   | 8.4566  | 0.6667                       | -                             |
| 34.0  | 34   | 8.4874  | 0.6653                       | -                             |
| 35.0  | 35   | 8.4888  | 0.6727                       | -                             |
| 36.0  | 36   | 8.4657  | 0.6617                       | -                             |
| 37.0  | 37   | 8.4654  | 0.6733                       | -                             |
| 38.0  | 38   | 8.4697  | 0.6830                       | -                             |
| 39.0  | 39   | 8.4993  | 0.6788                       | -                             |
| 40.0  | 40   | 8.5351  | 0.6775                       | -                             |
| 41.0  | 41   | 8.5518  | 0.6907                       | -                             |
| 42.0  | 42   | 8.5360  | 0.6983                       | -                             |
| 43.0  | 43   | 8.5675  | 0.7085                       | -                             |
| 44.0  | 44   | 8.5537  | 0.7194                       | -                             |
| 45.0  | 45   | 8.5644  | 0.7187                       | -                             |
| 46.0  | 46   | 8.6108  | 0.7181                       | -                             |
| 47.0  | 47   | 8.6788  | 0.6951                       | -                             |
| 48.0  | 48   | 8.7507  | 0.6833                       | -                             |
| 49.0  | 49   | 8.8212  | 0.6667                       | -                             |
| 50.0  | 50   | 8.8551  | 0.6639                       | -                             |
| 51.0  | 51   | 8.8956  | 0.6649                       | -                             |
| 52.0  | 52   | 8.9308  | 0.6818                       | -                             |
| 53.0  | 53   | 8.9567  | 0.6888                       | -                             |
| 54.0  | 54   | 9.0068  | 0.6854                       | -                             |
| 55.0  | 55   | 9.0578  | 0.6905                       | -                             |
| 56.0  | 56   | 9.1408  | 0.6831                       | -                             |
| 57.0  | 57   | 9.2814  | 0.6954                       | -                             |
| 58.0  | 58   | 9.4346  | 0.6988                       | -                             |
| 59.0  | 59   | 9.5225  | 0.6913                       | -                             |
| 60.0  | 60   | 9.6025  | 0.6883                       | -                             |
| 61.0  | 61   | 9.7100  | 0.6832                       | -                             |
| 62.0  | 62   | 9.8010  | 0.6810                       | -                             |
| 63.0  | 63   | 9.8612  | 0.6851                       | -                             |
| 64.0  | 64   | 9.9173  | 0.6817                       | -                             |
| 65.0  | 65   | 9.9991  | 0.6784                       | -                             |
| 66.0  | 66   | 10.1267 | 0.6738                       | -                             |
| 67.0  | 67   | 10.2853 | 0.6740                       | -                             |
| 68.0  | 68   | 10.4325 | 0.6806                       | -                             |
| 69.0  | 69   | 10.5536 | 0.6760                       | -                             |
| 70.0  | 70   | 10.6870 | 0.6732                       | -                             |
| 71.0  | 71   | 10.7818 | 0.6726                       | -                             |
| 72.0  | 72   | 10.8700 | 0.6755                       | -                             |
| 73.0  | 73   | 10.9502 | 0.6771                       | -                             |
| 74.0  | 74   | 11.0337 | 0.6783                       | -                             |
| 75.0  | 75   | 11.0625 | 0.6857                       | -                             |
| 76.0  | 76   | 11.0907 | 0.6844                       | -                             |
| 77.0  | 77   | 11.1157 | 0.6844                       | -                             |
| 78.0  | 78   | 11.1711 | 0.6844                       | -                             |
| 79.0  | 79   | 11.2116 | 0.6846                       | -                             |
| 80.0  | 80   | 11.2587 | 0.6849                       | -                             |
| 81.0  | 81   | 11.3408 | 0.6801                       | -                             |
| 82.0  | 82   | 11.3927 | 0.6782                       | -                             |
| 83.0  | 83   | 11.4829 | 0.6779                       | -                             |
| 84.0  | 84   | 11.5753 | 0.6811                       | -                             |
| 85.0  | 85   | 11.6758 | 0.6821                       | -                             |
| 86.0  | 86   | 11.7435 | 0.6851                       | -                             |
| 87.0  | 87   | 11.8001 | 0.6920                       | -                             |
| 88.0  | 88   | 11.8933 | 0.6953                       | -                             |
| 89.0  | 89   | 11.9564 | 0.6966                       | -                             |
| 90.0  | 90   | 12.0058 | 0.6985                       | -                             |
| 91.0  | 91   | 12.0442 | 0.7018                       | -                             |
| 92.0  | 92   | 12.0632 | 0.7032                       | -                             |
| 93.0  | 93   | 12.1156 | 0.7024                       | -                             |
| 94.0  | 94   | 12.1354 | 0.7005                       | -                             |
| 95.0  | 95   | 12.1454 | 0.7027                       | -                             |
| 96.0  | 96   | 12.1282 | 0.6999                       | -                             |
| 97.0  | 97   | 12.1065 | 0.6999                       | -                             |
| 98.0  | 98   | 12.0973 | 0.7039                       | -                             |
| 99.0  | 99   | 12.0881 | 0.7051                       | -                             |
| 100.0 | 100  | 12.0714 | 0.7051                       | -                             |
| 101.0 | 101  | 12.0595 | 0.7051                       | -                             |
| 102.0 | 102  | 12.0560 | 0.7038                       | -                             |
| 103.0 | 103  | 12.0585 | 0.7038                       | -                             |
| 104.0 | 104  | 12.0569 | 0.7038                       | -                             |
| 105.0 | 105  | 12.0600 | 0.7038                       | -                             |
| 106.0 | 106  | 12.0623 | 0.7005                       | -                             |
| 107.0 | 107  | 12.0643 | 0.7005                       | 0.7473                        |

</details>

### Framework Versions
- Python: 3.11.9
- Sentence Transformers: 3.0.1
- Transformers: 4.44.2
- PyTorch: 2.4.1+cu118
- Accelerate: 0.30.0
- Datasets: 2.21.0
- Tokenizers: 0.19.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### CoSENTLoss
```bibtex
@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->