Add new SentenceTransformer model

Browse files

Files changed (11) hide show

1_Pooling/config.json +10 -0
README.md +1137 -0
config.json +24 -0
config_sentence_transformers.json +14 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +56 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 384,
+    "pooling_mode_cls_token": false,
+    "pooling_mode_mean_tokens": true,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,1137 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- dense
+- generated_from_trainer
+- dataset_size:2778
+- loss:MultipleNegativesRankingLoss
+base_model: intfloat/e5-small-v2
+widget:
+- source_sentence: In the induction example for the sum $1+2+\dots+n$, why do we add
+    $n+1$ to both sides of (1.16)?
+  sentences:
+  - "Subsection 1.6.3: Subsets\n\n\n\nIf $A$ and $B$ are sets, then <footnote>$A\\\
+    subseteq B$->At times, the symbol $\\subset$ is used instead of $\\subseteq$.\
+    \ In our context these two symbols mean the same. However, the notation $A\\subsetneq\
+    \ B$ means that $A\\subseteq B$ and\n  $A\\neq B$. For example,\n  $\\{1, 2, 3\\\
+    } \\subseteq \\{1, 2, 3\\}$ and\n  $\\{1, 2, 3\\} \\subset \\{1, 2, 3\\}$.</footnote>\
+    \ means that\nevery element of $A$ is an element of $B$. So $A\\subseteq B$ is\
+    \ a placeholder for the proposition\n$$\n\\forall x\\in A : x\\in B\n$$\n\nIn\
+    \ this case we say\nthat *$A$ is a subset of $B$*. We also use the notation $A\\\
+    subsetneq B$ to\nindicate that $A\\subseteq B$ and $A\\neq B$. In this case we\
+    \ say that\n$A$ is a *strict* subset of $B$.\n\n\nExercise 1.47:\n\nList the subsets\
+    \ of $\\{1, 2\\}$. How many are there?\n\n/Exercise\n\n\nExercise 1.48:\n\nIt\
+    \ turns out that the empty set $\\emptyset$ is a subset of any set.\n\n\n\nExplain\
+    \ why this is so using the definition of $\\subseteq$.\n\n\\begin{prompting}\n\
+    Explain precisely in terms of propositions and logic why the empty set is a subset\
+    \ of any given set.\n\\end{prompting}\n\n/Exercise\n\n\nExercise 1.49:\n\nBelow\
+    \ Sage (not python) will list all subsets of the set $\\{1, 2, 3\\}$. Before pressing\n\
+    the Compute button, try to write them down on your own.\n\n\n\nList all the subsets\
+    \ of a set with five elements. In general,  how many subsets does a set with $n$\
+    \ elements have?\n\n/Exercise\n\n\nQuizexercise 1.50:\n\n\\begin{paragraphquiz}\n\
+    \  \\question\n  The set \\box is not a subset of $A=$\\box, simply because \\\
+    box does not belong to $A$.\n  This exercise actually has \\box possible correct\
+    \ solutions.\n\n\\answer\n  $\\{1, 2, 3\\}$\n  \\answer\n  $\\{-1, 1, 2, 3, 4\\\
+    }$\n  \\answer\n  $\\{-1, 0, 1, 2, 4\\}$\n  \\answer\n  $3$\n  \\answer\n  $-1$\n\
+    \  \\answer\n  $5$\n  \\answer\n  $6$\n  \\answer\n  $0$\n  \\case{(is 1347)}{T}\
+    \   Correct!\n  \\case{(is 2157)}{T}   Correct!\n  \\case{(is 2347)}{T}   Correct!\n\
+    \  \\case{(is 3287)}{T}   Correct!\n  \\case{(is 3157)}{T}   Correct!\n  \\case{(is\
+    \ 3187)}{T}   Correct!\n  \\default\n  Nope. Try again!\n\\end{paragraphquiz}\n\
+    \n/Quizexercise\n\n\nQuizexercise 1.51:\n\n\\begin{paragraphquiz}\n  \\question\n\
+    \  The empty set has \\box elements. A set with \\box elements has \\box subsets.\
+    \ In general a set with\n  $n$ elements has \\box subsets.\n  \\answer\n  $1$\n\
+    \  \\answer\n  $0$\n  \\answer\n  $5$\n  \\answer\n  $25$\n  \\answer\n  $32$\n\
+    \  \\answer\n  $n^2$\n  \\answer\n  $2^n$\n  \\case{(is 2357)}{T}\n  Correct!\n\
+    \  \\default\n  Nope. Try again!\n\\end{paragraphquiz}\n\n/Quizexercise"
+  - "Section 1.8: Proof by induction\n\n\n\nA <footnote>precocious Gauss->See the\
+    \ article Gauss's Day of Reckoning: https://www.americanscientist.org/article/gausss-day-of-reckoning\
+    \ for some history of this anecdote.</footnote>  proved the formula\n$$\n1 + 2\
+    \ + \\cdots + n = \\frac{n(n+1)}{2}\n\\tag{1.15}$$\nat the age of seven displaying\
+    \ remarkable ingenuity for his age. Lesser\nmortals usually use induction to prove\
+    \ this formula. Gauss was asked\nalong with his classmates to compute the sum\
+    \ of all natural numbers\n$1, 2, \\dots, 100$. Using his formula he quickly came\
+    \ up with the correct\nanswer $5050$. His classmates had to work for the entire\
+    \ lesson. \n\nSuppose that the formula in (1.15) is viewed as a\nproposition $p(n)$.\
+    \ To prove the formula we need to prove it for all\nnatural numbers (you can easily\
+    \ see that $p(1)$ and $p(2)$ are true) i.e.,\nwe need to prove\n$$\n\\forall n\\\
+    in \\mathbb{N}: p(n).\n$$\nAn induction proof is a way of proving this statement\
+    \ by showing two things:\n\\begin{enumerate}\\item (i)\n  $p(1)$\n\\item (ii)\n\
+    \  $\\forall n\\in \\mathbb{N}: p(n)\\implies p(n+1)$\n\\end{enumerate}\nThese\
+    \ two statements ensure that $p(1) \\implies p(2)$. Therefore\n$p(2)$ must be\
+    \ true, since we assumed $p(1)$ true from the\nbeginning. Similarly $p(2)\\implies\
+    \ p(3)$ ensures that $p(3)$\nis true and so on. In fact we have proved $p(n)$\
+    \ for every $n\\in \\mathbb{N}$\nusing this technique. One can prove this using\
+    \ proof by\ncontradiction and that every non-empty subset\nof $\\mathbb{N}$ has\
+    \ a first element. In general if $S$ is a subset of set with an order $\\leq$,\
+    \ then\n$s\\in S$ is called a first element if\n$$\n\\forall x\\in S: s \\leq\
+    \ x.\n$$\nA crucial rule (or axiom) is that every non-empty subset of $\\mathbb{N}$\
+    \ has\n a first element! Notice that this is false for $\\mathbb{Z}$.\n\n\nTheorem\
+    \ 1.82:\n\n  Suppose that $p(n)$ are infinitely many propositions given by $n\\\
+    in \\mathbb{N}$. Then\n  $$\n  \\forall n\\in \\mathbb{N}: p(n)\n  $$\n  is true\
+    \ if\n\\begin{enumerate}\\item (i)\n  $p(1)$ is true.\n\\item (ii)\n  $\\left(\\\
+    forall n\\in \\mathbb{N}: p(n)\\implies p(n+1)\\right)$ is true.\n\\end{enumerate}\n\
+    \n/Theorem\n\n\\begin{proof}\nSuppose by contradiction that there exists $n\\\
+    in \\mathbb{N}$, such that\n$p(n)$ is false. Then the subset\n$$\nS = \\{n\\in\
+    \ \\mathbb{N} \\mid \\neg p(n)\\}\\subseteq \\mathbb{N}\n$$\nis non-empty. Therefore\
+    \ it has a first element $n_0\\in S$. \nHere $n_0 > 1$, since $p(1)$ is assumed\
+    \ to be true. So we\nknow that $p(n_0-1)$ is true and that\n$p(n_0-1)\\implies\
+    \ p(n_0)$ is true. But the latter\nimplication is a contradiction, since true\
+    \ implies\nfalse is false.\n\\end{proof}\n\n\n\nLet us see how an induction proof\
+    \ plays out in the above example\nwith the statement $p(n)$ that\n$$\n1 + 2 +\
+    \ \\cdots + n = \\frac{n(n+1)}{2}.\n\\tag{1.16}$$\nClearly $p(1)$ is true. We\
+    \ need to prove $p(n)\\implies p(n+1)$, so\nwe assume that $p(n)$ holds i.e.,\
+    \ that (1.16) is true.\nThen we may add $n+1$ to both sides of (1.16) to get\n\
+    $$\n1 + 2 + \\cdots + n + (n+1) = \\frac{n(n+1)}{2} + (n+1).\n$$\nHere the right\
+    \ hand side can be rewritten as\n$$\n\\frac{n(n+1) + 2(n+1)}{2} = \\frac{(n+1)(n+2)}{2},\n\
+    $$\nwhich is exactly what we want. This is the conjectured formula for\nthe sum\
+    \ of the numbers $1, 2, \\dots, n, n+1$. Therefore\nwe have proved that $p(n)\\\
+    implies p(n+1)$ and the induction\nproof is complete.\n\n\nExample 1.83:\n\n \
+    \ For a real number $r\\neq 1$, the extremely useful formula\n  $$\n  1 + r +\
+    \ \\cdots + r^n = \\frac{1 - r^{n+1}}{1-r}\n  \\tag{1.17}$$\n  holds. Let us prove\
+    \ this formula by induction. For $n=1$ this amounts to the identity\n  $$\n  1\
+    \ + r = \\frac{1-r^2}{1-r},\n  $$\n  which is true since $1-r^2 = (1+r)(1-r)$.\
+    \ We let $p(n)$ denote\n  the identity in (1.17). We have seen that $p(1)$ is\
+    \ true. The induction step\n  consists in proving $p(n)\\implies p(n+1)$. We can\
+    \ prove this\n  by adding $r^{n+1}$ to the right hand side in (1.17):\n  $$\n\
+    \  \\frac{1 - r^{n+1}}{1-r} + r^{n+1} = \\frac{1 - r^{n+1} + (1-r) r^{n+1}}{1-r}\
+    \ = \\frac{1 - r^{n+2}}{1-r}.\n  \\tag{1.18}$$\nReal life application:\n    In\
+    \ order to pay for a house you borrow $P$ DKK at an interest of\n    $r$ per year.\
+    \ You want to pay off your debt over $N$ years by\n    paying a fixed amount each\
+    \ year. How much is the fixed yearly\n    amount you need to pay?\n\nLet us analyze\
+    \ the setup: suppose that the fixed yearly amount\n    is $Y$. We will find an\
+    \ equation giving us $Y$ in terms of\n    $P, N$ and $r$. Put $q = 1+ r$.\n\n\
+    After one year you owe\n    $$\n    q P - Y.\n    $$\n    After two years you\
+    \ owe\n    $$\n    q(q P - Y) - Y.\n    $$\n    After three years you owe\n  \
+    \  $$\n    q ( q ( q P - Y) - Y) - Y.\n    $$\n    In general after $n$ years\
+    \ you owe\n    $$\n    q^n P - Y (1 + q + \\cdots + q^{n-1}).\n    $$\n    Since\
+    \ we want to be debt free after $N$ years, the yearly payment will have to satisfy\n\
+    \    $$\n    q^N P = Y ( 1 + q + \\cdots + q^{N-1}).\n    $$\n    By the formula\
+    \ (1.17), we get\n    $$\n    q^N P = Y \\frac{1-q^N}{1-q}.\n    $$\n    Here\
+    \ $Y$ can be isolated giving the formula\n    $$\n    Y = \\frac{r P}{1 - \\left(\\\
+    frac{1}{1+r}\\right)^N}.\n    $$ \n    With the current (August 2024) interest\
+    \ rate around four percent, you pay a fixed monthly\n    amount of around 4770\
+    \ DKK (down from 5420 DKK in 2023, when the interest rate was five percent) for\
+    \ borrowing one million DKK over $30$ years.\n    \n  \n\n/Example"
+  - "Subsection 1.9.5: Injective and surjective functions\n\n\n\nWe now define three\
+    \ very important notions related to functions.\n\n\nDefinition 1.101:\n\n  Let\
+    \ $f: S\\rightarrow T$ be a function. Then $f$ is called\n  \\begin{enumerate}\\\
+    item (i)\n    *injective*, if $f(x) = f(y) \\implies x = y$ for every $x, y\\\
+    in S$.\n  \\item (ii)\n    *surjective*, if for every $y\\in T$, there exists\
+    \ $x\\in S$, such that $f(x) = y$.\n  \\item (iii)\n    *bijective*, if it is\
+    \ both injective and surjective.\n  \\end{enumerate}\n\n/Definition\n\n\nExercise\
+    \ 1.102:\n\nIs a cryptographic hash-function as defined in Example (1.92) injective?\n\
+    \n/Exercise\n\n\nExercise 1.103:\n\nSuppose that\n$$\nS = \\{1, 2, 3\\}\\qquad\\\
+    text{and}\\qquad T = \\{1, 2, 3, 4\\}\n$$\nand that the function $f: S\\rightarrow\
+    \ T$ is defined by the table\n$$\n\\def\\arraystretch{1.5}\n\\begin{array}{c|ccccccc}\n\
+    x & 1 & 2 & 3\\\\ \\hline\nf(x) & 1 & 2 & 4\n\\end{array}\n$$\nIs $f$ injective?\
+    \ Is it surjective? Is it possible to adjust the table so that\n$f$ becomes injective?\n\
+    Is it possible to adjust the table so that\n$f$ becomes surjective?\n\n/Exercise\n\
+    \n\nExercise 1.104:\n\nConsider the function $f:S \\rightarrow T$ given by\n$$\n\
+    f(x) = x^2,\n$$\nwhere $S = T = \\mathbb{R}$.\nIs $f$ injective? Is $f$ surjective?\
+    \ Suggest how to change $S$ and $T$ so that $f:S\\rightarrow T$ becomes\nbijective.\n\
+    \n/Exercise\n\n\nExercise 1.105:\n\nConsider the function $f:\\mathbb{Z} \\rightarrow\
+    \ \\mathbb{Z}$ given by\n$$\nf(x) = x + 1\n$$\nShow that $f$ is bijective.\n\n\
+    /Exercise\n\n\nExercise 1.106:\n\nWrite down precisely how the truth table for\
+    \ $p\\implies q$ may\nbe expressed in terms of a function $f: S\\rightarrow T$.\
+    \ What are the sets $S$ and $T$ in this case?\n\n/Exercise\n\nSubsection 1.9.6:\
+    \ The inverse function\n\n\n\nIf $f:S\\rightarrow T$ is bijective, then we may\
+    \ define a function $g: T\\rightarrow S$, so\nthat $(f\\circ g)(y) = y$ for every\
+    \ $y\\in T$ and $(g\\circ f)(x)$ for every $x\\in S$. This\nfunction is denoted\
+    \ $f^{-1}$.\n\nHow do we define $f^{-1}(y)$ for $y\\in T$? Well, since $f$ is\
+    \ surjective, we may find\n$x\\in S$ so that $y = f(x)$. Now, we simply define\n\
+    $$\nf^{-1}(y) = x.\n\\tag{1.20}$$\nWe cannot have $x_1 \\neq x_2$ in $S$ with\
+    \ $f(x_1) = f(x_2) = y$, since $f$ is injective. We only have one choice for\n\
+    $x$ in (1.20). Therefore (1.20) really is a good and sound definition.\n\n\nExample\
+    \ 1.107:\n\nLet $f: S\\rightarrow S$, where $S = \\{1, 2, 3\\}$ be given by\n\
+    the table\n$$\n\\def\\arraystretch{1.5}\n\\begin{array}{c|ccccccc}\nx & 1 & 2\
+    \ & 3\\\\ \\hline\nf(x) & 3 & 1 & 2\n\\end{array}.\n$$\nThen $f^{-1}$is given\
+    \ by the table\n$$\n\\def\\arraystretch{1.5}\n\\begin{array}{c|ccccccc}\nx & 1\
+    \ & 2 & 3\\\\ \\hline\nf^{-1}(x) & 2 & 3 & 1\n\\end{array}.\n$$\n\n/Example\n\n\
+    \nExercise 1.108:\n\nWhat if the definition of $f$ in Example (1.107) is changed\
+    \ to\n$$\n\\def\\arraystretch{1.5}\n\\begin{array}{c|ccccccc}\nx & 1 & 2 & 3\\\
+    \\ \\hline\nf(x) & 3 & 2 & 2\n\\end{array}.\n$$\nDoes $f^{-1}$ make sense here?\n\
+    \n/Exercise\n\n\nExercise 1.109:\n\nWhat is the inverse function of $f:\\mathbb{Z}\\\
+    rightarrow \\mathbb{Z}$ given by $f(x) = x + 1$?\nWhat is the inverse function\
+    \ of $g: S \\rightarrow S$, where $g(x) = \\sqrt{x}$ and\n$S = \\{x\\in \\mathbb{R}\\\
+    mid x\\geq 0\\}$?\n\n/Exercise"
+- source_sentence: Why do we need truth tables for these logical connectives—can’t
+    we just rely on intuition?
+  sentences:
+  - "Section 1.5: Propositional logic\n\n\n\nA proposition is a (mathematical) statement\
+    \ that is\ntrue ($t$) or false ($f$). This could be a boolean\nexpression in a\
+    \ computer program, like $1 < 2$.\n\nSage:\n\n\n\nLater we will see propositions\
+    \ with\nvariables in them like $x < 2$. These are called predicates.\n\nPropositions\
+    \ can be combined into\nnew (compound) propositions. Take for example the propositions\n\
+    \n$$\\begin{aligned}\n&p: \\text{it rains}\\\\\n&q: \\text{it is cloudy}.\n\\\
+    end{aligned}$$\n\nThen ($p$ and $q$) is a perfectly good\n  new proposition reading\
+    \ *it rains and it is cloudy*. The same goes for (if $p$ then $q$), which reads\n\
+    \  *if it rains then it is cloudy*. The proposition (if $q$ then $p$) reads *if\
+    \ it is cloudy then\n    it rains*. This proposition is (clearly) false.\n\nWe\
+    \ need some notation to describe these compound propositions:\n\n$$\n\\begin{array}{ll}\n\
+    p \\land q\\qquad\\qquad & \\qquad\\qquad p \\text{ and } q\\\\\n\\\\\np \\lor\
+    \ q\\qquad\\qquad & \\qquad\\qquad p \\text{ or } q\\\\\n\\\\\np\\implies q\\\
+    qquad\\qquad & \\qquad\\qquad \\text{if } p \\text{ then } q\\\\\n\\\\\n\\neg\
+    \ p\\qquad\\qquad & \\qquad\\qquad \\text{not } p\n\\end{array}\n$$\n\nThe compound\
+    \ propositions are either true($t$) or false ($f$) depending on\n$p$ and $q$.\
+    \ The dependencies are displayed in the *truth tables* below.\n\n\nDefinition\
+    \ 1.18:\n\n$$\n\\def\\arraystretch{1.2}\n      \\begin{array}{c|c|c}\n       \
+    \ p & q  & p\\land q  \\\\\n        \\hline \n        t & t  & t    \\\\\n   \
+    \     t & f & f\\\\\n        f & t & f\\\\\n        f & f & f\n      \\end{array}\\\
+    qquad\n      \\begin{array}{c|c|c}\n        p & q  & p\\lor q  \\\\\n        \\\
+    hline\n        t & t  & t    \\\\\n        t & f & t\\\\\n        f & t & t\\\\\
+    \n        f & f & f\n      \\end{array}\n      \\qquad\n      \\begin{array}{c|c|c}\n\
+    \        p & q  & p\\implies q  \\\\\n        \\hline\n        t & t  & t    \\\
+    \\\n        t & f & f\\\\\n        f & t & t\\\\\n        f & f & t\n      \\\
+    end{array}\\qquad\n      \\begin{array}{c|c}\n        p & \\neg p \\\\\n     \
+    \   \\hline\n        t & f\\\\\n        f & t\n      \\end{array}\n  $$\n\n/Definition\n\
+    \nThe tables for the compound propositions $p\\land q, p\\lor q$ and also\n$\\\
+    neg p$ are not too hard to grasp. The table for $p\\implies q$ \nraises a few\
+    \ more questions. Why is $f\\implies t$ true?\nI will not go into this at this\
+    \ point (see Example 1.31), but just point out that there are\nmany explanations\
+    \ available online and, \nperhaps more importantly, refer you to Exercise 1.19."
+  - 'Subsection 1.7.2: Ordering $\mathbb{Q}$
+    We define the positive rational numbers as
+    $$
+    \mathbb{Q}_+ = \left\{\frac{m}{n} \in \mathbb{Q} \middle| m > 0\right\} = \left\{1,
+    \frac{1}{2}, \frac{1}{3}, \frac{2}{3}, \frac{1}{4}, \frac{3}{4}, \dots \right\}.
+    $$
+    One can check that $\mathbb{Q}_+$ satisfies the conditions in
+    Definition (1.69). So formally we
+    get
+    Proposition 1.74:
+    For $\cfrac{a}{b},\,\,\, \cfrac{c}{d}\in \mathbb{Q}$,
+    $$
+    \frac{a}{b}\, < \frac{c}{d}\qquad \iff\qquad a d < b c\qquad (\text{in }\mathbb{Z}).
+    $$
+    /Proposition
+    \begin{proof}
+    We must check when
+    $$
+    \frac{c}{d} - \frac{a}{b} = \frac{b c - a d}{b d} \in \mathbb{Q}_+.
+    $$
+    This happens precisely when the numerator $b c - a d\in \mathbb{N}$ or $b c -
+    a d > 0$. Therefore
+    the condition in the proposition is satsified.
+    \end{proof}'
+  - 'Exercise 1.36:
+    Consider the proposition $q(n) = n \text{ is even}$. Prove that
+    $$
+    \forall n\in \mathbb{Z}: q(n^2)\implies q(n).
+    $$
+    \begin{hint}
+    Use that $q(n) = \neg p(n)$, where $p(n)$ is defined in Example (1.35).
+    \end{hint}
+    /Exercise'
+- source_sentence: In Exercise 1.72, how should we correctly rewrite the chain $0
+    < 1 < 2$ so that each comparison involves only two integers?
+  sentences:
+  - "Subsection 1.7.1: Ordering $\\mathbb{Z}$\n\n\n\nAs we saw in Remark (1.70), the\
+    \ natural order on $\\mathbb{Z}$ is\ndefined by $\\mathbb{Z}_+ = \\mathbb{N}$,\
+    \ so that $x < y$ if $y-x\\in \\mathbb{N}$ for $x, y\\in \\mathbb{Z}$.\nThis completely\
+    \ agrees with our preconception that\n$$\n\\cdots < -3 < -2 < -1 < 0 < 1 < 2 <\
+    \ \\cdots\n\\tag{1.14}$$\n\nTo be precise, writing $\\cdots < -3 < -2 < -1 < 0\
+    \ < 1 < 2 < \\cdots$ is nonsense, since $<$ is only defined for two integers.\n\
+    \n\nExercise 1.72:\n\n  How is one supposed to interpret $0 < 1 < 2$ for example?\
+    \ Go ahead and formulate (1.14) correctly comparing only two integers at a time.\n\
+    \  How does Python/Sage interpret $-3 < -2 < -1< 0 < 1 < 2$? Find out using the\
+    \ Sage snippet below.\n\n\n\nWhat about $1 < 5 > 3 < 4$? What about $0 < 1 > 2$?\n\
+    \n/Exercise\n\n\nQuizexercise 1.73:\n\n\\begin{orderquiz}\n  \\question\n  Assume\
+    \ that $x, y, z\\in \\mathbb{Z}$ and that $x \\leq y$. Then drag and drop the\n\
+    \  elements from the left to the right below to explain that\n  $x + z \\leq y\
+    \ + z$.\n  \\answer   By assumption $x\\leq y$.\n  \\answer   This means that\
+    \ $z - x + y\\in \\mathbb{N}$\n  \\answer   This means that $y - x\\in \\mathbb{N}$\n\
+    \  \\answer   To show that $x + z \\leq y + z$, we need to show that\n  $(y +\
+    \ z) - (x + z) \\in \\mathbb{N}$.\n  \\answer   But $(y + z) - (x + z) = y + z\
+    \ - x + z$. Therefore,\n  \\answer   But $(y + z) - (x + z) = y + z - x - z =\
+    \ y - x$. Therefore,\n  \\answer   $(y + z) - (x + z)\\in \\mathbb{N}$, since\n\
+    \  \\answer   $y - x \\in \\mathbb{N}$\n  \\expected{6}\n\n\\case{(is 134678)}{T}\n\
+    \  Spot on, my friend.\n\n\\case{(is 467813)}{T}\n  This is right!\n\n\\case{(is\
+    \ 413678)}{T}\n  This is right!\n\n\\default\n  Wrong order. Check the definition\
+    \ of $\\leq$ in UNDEFINED: ordZ once more!\n\\end{orderquiz}\n\n/Quizexercise"
+  - 'Exercise 1.75:
+    Use proof by contradiction (see section (1.5.8))
+    to show precisely that there does not
+    exist a smallest positive rational number.
+    /Exercise'
+  - "Section 1.9: The concept of a function\n\n\n\nA function is a crucial concept\
+    \ in mathematics. In Sage (actually python here) a simple function can be\nprogrammed\
+    \ like\n\n\n\nThe code above seems to take a number and returns the number plus\
+    \ one. This (f) is in fact a function \ntaking as *input* a number and returning\
+    \ as *output* the number plus one. Notice that\nwe do not even know which numbers\
+    \ we are talking about here. In mathematics we need to have\na more precise notion\
+    \ of a function. \n\nThe above python function could more formally be denoted\
+    \ as $f: \\mathbb{Z}\\rightarrow \\mathbb{Z}$ with\n$f(n) = n+1$ if we are dealing\
+    \ with the integers, but we cannot tell from the code.\n\nWell, to be fair ...:\n\
+    To be completely fair, it is possible from Python 3.5 to add type annotations\
+    \ to functions, so that we could write\n<code>def f(n: int) -&gt; int: return(n+1)\n\
+    </code>\n\n\nin the Python code to state that the function should take values\
+    \ in the integers and return integers.\n\n\nThe precise mathematical definition\
+    \ of a function in terms of sets is\nthe following. A function $f: S\\rightarrow\
+    \ T$ is a subset\n$f\\subseteq S\\times T$, such that\n$(s, t_1)\\in f \\land\
+    \ (s, t_2)\\in f \\implies t_1 = t_2$. In words it states that a\nfunction $f:\
+    \ S\\rightarrow T$ is a subset $f$ of $S\\times T$, containing pairs\nhaving only\
+    \ one second coordinate for every first coordinate.\n\nThe everyday working definition\
+    \ of a\nfunction is more intuitive: a machine taking input from some set\n$S$\
+    \ and giving output in some set $T$. The uniqueness of the output\nis encoded\
+    \ in the mathematical definition of a function.\n\n\nDefinition 1.90:\n\nMathematically\
+    \ a function $f$ takes values from a set $S$ and returns values in a set $T$.\
+    \ In details,\nit is denoted $f: S\\rightarrow T$ and the value associated with\
+    \ $s\\in S$ is denoted $f(s)\\in T$.\nHere $S$ is called *the domain* of $f$ and\
+    \ $T$ is called *the codomain* of $f$. Less,\nformally $S$ is called the input\
+    \ set and $T$ the output set for $f$.\n\n/Definition\n\n\nRemark 1.91:\n\n  Please\
+    \ notice that a function is a very, very general concept. It is not just something\n\
+    \  that you draw as a graph on a piece of paper. Of course, you can draw a function\n\
+    \  $f:\\mathbb{R}\\rightarrow \\mathbb{R}$ like $f(x) = x^2$:\n  \n  Generally,\
+    \ a function $f: S\\rightarrow T$ is given by a machine, formula or algorithm\
+    \ that\n  computes $f(x)\\in T$ for every $x\\in S$. Nothing more, nothing less.\
+    \ It really has nothing to\n  do with a graph (even though graphs can sometimes\
+    \ be useful for visualizing certain functions like $f(x) = x^2$).\n\n/Remark\n\
+    \n\nExample 1.92:\n\n  Good examples of functions can be found in the cryptographic\
+    \ hash functions: https://en.wikipedia.org/wiki/Cryptographic_hash_function. They\
+    \ are examples of complicated functions $f:S \\rightarrow T$, where\n  $S$ is\
+    \ infinite and $T$ finite. Here $S$ could be data like plain text files and $T$\
+    \ could be\n  a $256$ bit number. This is the setup for the widely used sha-256\
+    \ cryptographic hash function.\n  The whole point of a cryptographic hash function\
+    \ is that it must be humanly impossible to\n  <footnote>compute $y$ with $f(y)\
+    \ = f(x)$ given $f(x)$->A pair $x\\neq y$ with $f(x) = f(y)$ is called a collision</footnote>.\
+    \ \n  In fact, sha-256 is used in the Bitcoin block chain. The precise definition\
+    \ of\n  sha-256 can be found in FIPS PUB 180-4: http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf\
+    \ approved by the Secretary of Commerce.\n\nOther interesting functions output\
+    \ a bounded size digital footprint (checksum) of a file (like md5: https://en.wikipedia.org/wiki/MD5).\
+    \ This is very useful\nfor checking data integrity of downloads over the internet.\
+    \ The md5 hash is a $128$ bit number.\n\nInstead of listing $256$ or $128$ bits\
+    \ for the hash value one uses hexadecimal notation with digits\nin 0, 1, 2, 3,\
+    \ 4, 5, 6, 7, 8, 9 , a, b, c, d, e, f. A pair of hexadecimal digits then represents\n\
+    a byte or $8$ bits. Output from sha-256 and md5 consist of $64$ and $32$ hexadecimal\n\
+    digits respectively. You are welcome to experiment with these two hash functions\
+    \ in the\nSage window below.\n\n\n\n\n/Example\n\n\nExercise 1.93:\n\nWhat is\
+    \ the sha-256 hash of your name? Change a\nfew letters and recompute. Do you see\
+    \ any system? What about the md5 hash function?\nCan you find two different strings\
+    \ with the same md5 hash using your computer?\n\n\\begin{hint}\n  I have not answered\
+    \ the last question myself, but I am told that it is possible to find\n  a collision\
+    \ for md5 using a garden variety home computer. Browsing the internet, it\n  seems\
+    \ that the two strings $s_1$ and $s_2$ given in <footnote>hexadecimal notation->This\
+    \ notation represents a sequence of bytes given by pairs of hexadecimal digits</footnote>\
+    \ by\n<code>d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89 \n\
+    55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b \nd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0\
+    \ \ne99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70\n</code>\n\
+    \n\nand\n<code>d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89\
+    \ \n55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b \nd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0\
+    \ \ne99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70 \n</code>\n\
+    \n\ngive a collision for md5. Verify that $s_1\\neq s_2$ and\nthat they give the\
+    \ same md5 hash. If you find a collision\nfor sha-256 you will\nbecome world famous.\n\
+    \n\\begin{hint}\n\n\\end{hint}\n\\end{hint}\n\n/Exercise"
+- source_sentence: Why does the textbook emphasize that the ordering in the listing
+    of elements is unimportant?
+  sentences:
+  - 'Exercise 1.14:
+    We know that zero times any number is zero. Deduce this from the rules in
+    Proposition (1.12) starting with $0 + 0 = 0$.
+    /Exercise'
+  - 'A set is a collection of (mathematical) objects or elements. When defining a
+    set we use the symbols $\{$
+    and $\}$ to denote the beginning and end of its definition. For example, $\{\text{N,
+    i, e, l, s}\}$
+    is the set of characters in my first name and $\{8, 0\}$ are the digits in the
+    postal code
+    for Aarhus C. The ordering in the listing of the elements is unimportant so that
+    $$\begin{aligned}
+    \{\text{N, i, e, l, s}\} &= \{\text{l, e, i, s, N}\}\\
+    \{8, 0\} &= \{0, 8\}
+    \end{aligned}$$
+    are identical sets. If $S$ is a set, we will use the notation $x\in S$ to denote
+    that $x$ is an element in $S$. For example, e$\in \{\text{N, i, e, l, s}\}$.'
+  - "Exercise 1.84:\n\nVerify the computation (induction step) in (1.18) i.e., explain\n\
+    the operations used to go from the left to the right of the two equalities.\n\n\
+    /Exercise\n\n\nExercise 1.85:\n\nLocate the mistake in the following fake induction\
+    \ proof of the curious fact that\n$2^n = 2$ \nfor every $n\\in \\mathbb{N}$.\n\
+    \nLet $p(n)$ be\nthe proposition $2^n = 2$. Then $p(1)$ is true.\n\nWe wish to\
+    \ prove that $p(n) \\implies p(n+1)$ assuming that $p(1), \\dots, p(n)$ are true:\n\
+    $$\\begin{aligned}\n  2^{n+1} &= 2^n \\cdot 2\\\\\n          &= 2^n \\cdot \\\
+    frac{2^n}{2^{n-1}}\\\\\n          &= 2 \\cdot \\frac{2}{2}\\,\\,\\text{(by }p(n)\\\
+    text{ and }p(n-1)\\text{)}\\\\\n          &= 2.\n\\end{aligned}$$\nThis shows\
+    \ that $p(n) \\implies p(n+1)$ and therefore that $2^n = 2$ for every\n$n\\in\
+    \ \\mathbb{N}\\setminus\\{0\\}$.\n\n/Exercise\n\n\nExercise 1.86:\n\nProve by\
+    \ induction that the sum of the first $n$ odd numbers is\ngiven by the formula\n\
+    $$\n1 + 3 + \\cdots + (2 n - 1) = n^2,\n$$\ni.e., for $n=5$ we have\n$$\n1 + 3\
+    \ + 5 + 7 + 9 = 25.\n$$\n\n/Exercise\n\n\nExercise 1.87:\n\nProve by induction\
+    \ that\n$$\n1^2 + 2^2 + 3^2 + \\cdots + n^2 = \\frac{n(n+1)(2n + 1)}{6}.\n$$\n\
+    \n/Exercise\n\n\nExercise 1.88:\n\nProve using the idea of induction that\n$$\n\
+    2^n < n!\n$$\nfor $n\\geq 4$.\n\n\n/Exercise\n\nThe last exercise related to induction\
+    \ concerns the famous pigeonhole principle: https://en.wikipedia.org/wiki/Pigeonhole_principle.\
+    \ The statement itself looks innocent, well almost ridiculous, but it is very\
+    \ powerful: https://mindyourdecisions.com/blog/2008/11/25/16-fun-applications-of-the-pigeonhole-principle/.\
+    \ Even the go-to website \nmathoverflow: https://mathoverflow.net/ for research\
+    \ mathematicians has \na quite nice thread: https://mathoverflow.net/questions/4279/interesting-applications-of-the-pigeonhole-principle\
+    \ \nabout this.\n\n\nExercise 1.89:\n\nProve the following by induction on $m$:\
+    \ if $n$ items are put into $m$ containers and \n$n > m$, then at least one container\
+    \ must contain more than one item.\n\n/Exercise"
+- source_sentence: In Exercise 1.41 we are asked to find identities for \((a+b)^3\)
+    and \((a+b)^4\). What are the correct expanded forms, and how do they relate to
+    the binomial theorem?
+  sentences:
+  - 'Chapter 1 on the language of mathematics is an introduction to the fundamental
+    mathematics used in the notes.
+    Without understanding the basic concepts in it, you do not have the background
+    to understand
+    the rest of the notes. Important highlights from the chapter are
+    - Introduction to prompting. This is your ticket to using large language models
+    effectively
+    - How to use computer algebra (Sage). Sage can be very helpful in understanding
+    the mathematics
+    - Introduction of the numbers we use. Here the natural numbers, integers, rationals
+    and real numbers are defined. Also the arithmetic rules for using them are given
+    - Logic is the framework for reasoning in mathematics. Study this! First comes
+    propositional logic. This is basic logic involving true and false statements with
+    and, or etc as seen in truth tables. Then comes predicate logic, where variables
+    are used. Here you must learn the meaning of "for every" and "there exists"
+    - Proofs are described. Proof by contradiction is a must here! Do not skip it
+    - The language of sets. Learn the operations on sets. Especially focus on the
+    set builder notation and products of sets
+    - Ordering of numbers. This is the formal definition of comparing numbers
+    - Proof by induction. How to prove infinitely many propositions involving the
+    natural numbers with one hack
+    - The concept of a function. This is extremely important. Notice that a function
+    is defined not by a rule. Also, in its definition enters crucially where it is
+    defined
+    - Functions from and into products
+    - The preimage. This will become very important working with continuous functions'
+  - "Exercise 1.84:\n\nVerify the computation (induction step) in (1.18) i.e., explain\n\
+    the operations used to go from the left to the right of the two equalities.\n\n\
+    /Exercise\n\n\nExercise 1.85:\n\nLocate the mistake in the following fake induction\
+    \ proof of the curious fact that\n$2^n = 2$ \nfor every $n\\in \\mathbb{N}$.\n\
+    \nLet $p(n)$ be\nthe proposition $2^n = 2$. Then $p(1)$ is true.\n\nWe wish to\
+    \ prove that $p(n) \\implies p(n+1)$ assuming that $p(1), \\dots, p(n)$ are true:\n\
+    $$\\begin{aligned}\n  2^{n+1} &= 2^n \\cdot 2\\\\\n          &= 2^n \\cdot \\\
+    frac{2^n}{2^{n-1}}\\\\\n          &= 2 \\cdot \\frac{2}{2}\\,\\,\\text{(by }p(n)\\\
+    text{ and }p(n-1)\\text{)}\\\\\n          &= 2.\n\\end{aligned}$$\nThis shows\
+    \ that $p(n) \\implies p(n+1)$ and therefore that $2^n = 2$ for every\n$n\\in\
+    \ \\mathbb{N}\\setminus\\{0\\}$.\n\n/Exercise\n\n\nExercise 1.86:\n\nProve by\
+    \ induction that the sum of the first $n$ odd numbers is\ngiven by the formula\n\
+    $$\n1 + 3 + \\cdots + (2 n - 1) = n^2,\n$$\ni.e., for $n=5$ we have\n$$\n1 + 3\
+    \ + 5 + 7 + 9 = 25.\n$$\n\n/Exercise\n\n\nExercise 1.87:\n\nProve by induction\
+    \ that\n$$\n1^2 + 2^2 + 3^2 + \\cdots + n^2 = \\frac{n(n+1)(2n + 1)}{6}.\n$$\n\
+    \n/Exercise\n\n\nExercise 1.88:\n\nProve using the idea of induction that\n$$\n\
+    2^n < n!\n$$\nfor $n\\geq 4$.\n\n\n/Exercise\n\nThe last exercise related to induction\
+    \ concerns the famous pigeonhole principle: https://en.wikipedia.org/wiki/Pigeonhole_principle.\
+    \ The statement itself looks innocent, well almost ridiculous, but it is very\
+    \ powerful: https://mindyourdecisions.com/blog/2008/11/25/16-fun-applications-of-the-pigeonhole-principle/.\
+    \ Even the go-to website \nmathoverflow: https://mathoverflow.net/ for research\
+    \ mathematicians has \na quite nice thread: https://mathoverflow.net/questions/4279/interesting-applications-of-the-pigeonhole-principle\
+    \ \nabout this.\n\n\nExercise 1.89:\n\nProve the following by induction on $m$:\
+    \ if $n$ items are put into $m$ containers and \n$n > m$, then at least one container\
+    \ must contain more than one item.\n\n/Exercise"
+  - "Section 1.6: More on sets\n\n\n\nPropositions are important, but are confined\
+    \ by the binary values\nof true and false. We would like to work mathematically\
+    \ with \nobjects like integers, floating point numbers, neural networks,\ncomputer\
+    \ programs and so on.\n\nSubsection 1.6.1: Objects and equality\n\n\n\nOne of\
+    \ the cornerstones of modern mathematics is\ndeciding when two objects are the\
+    \ same i.e.,\ngiven two objects $A$ and $B$, deciding whether\nthe proposition\
+    \ $A=B$ is true of false. Oftentimes\nan algorithm for evaluating $A=B$ is needed.\n\
+    \nYou may laugh here, but this is\nnot always that easy. Even though objects appear\
+    \ different they are the same as\nin, for example the propositions\n$$\n\\frac{105}{189}\
+    \ = \\frac{35}{63}\\qquad\\text{and}\\qquad \\sin\\left(\\frac{\\pi}{2}\\right)\
+    \ = 1.\n$$\nThe first proposition above is an identity of fractions (rational\
+    \ numbers). The second is\nan identity, which calls for knowledge of the sine\
+    \ function and real numbers. Each of these\nidentities calls for some rather advanced\
+    \ mathematics. The first proposition is true in\na very precise way, since $105\\\
+    cdot 63 = 189 \\cdot 35$.\n\n\nExercise 1.40:\n\n\n\n\nUse the Sage window above\
+    \ to reason \nabout equality in the quiz below. In each case describe the objects\
+    \ i.e.,\nare they numbers, symbols, etc.? Also, please check your computations\n\
+    by hand with the old fashioned paper and pencil, especially $(a+b)(a-b)$.\n\n\\\
+    begin{quiz}\n\\question\nClick on the right equalities below.\n\\answer{T}\n$$a\
+    \ + b - 2 b = a - b$$\n\\answer{F}\n$$(a+b)^2 = a^2 + b^2$$\n\\answer{T}\n$$(a\
+    \ + b)(a - b) = a^2 - b^2$$\n\\answer{T}\n$$(a + b)^2 = a^2 + 2 a b +  b^2$$\n\
+    \\answer{F}\n$$(a+b)^3 = a^3 + 2 a^2 b + 2 a b^2 + b^3$$\n\\answer{F}\n$$\\frac{3}{8}\
+    \ = \\frac{5}{13}$$ \n\\answer{F}\n$$\n\\pi = \\frac{22}{7}\n$$\n\\answer{T}\n\
+    $$\n\\cos^2(\\pi) + \\sin^2(\\pi) = 1\n$$\n\\end{quiz}\n\n/Exercise\n\n\nExercise\
+    \ 1.41:\n\nYou know that $(a+ b)^2 = a^2 + 2 a b + b^2$. Use Sage to find a similar\
+    \ identities\nfor $(a + b)^3$ and $(a + b)^4$.\n\n\\begin{hint}\n  Go back and\
+    \ look at (the beginning of) Exercise (1.40).\n\\end{hint}\n\n/Exercise\n\nFor\
+    \ two objects $A$ and $B$ we will use the notation $A \\neq B$ for the proposition\
+    \ $\\neg (A = B)$.\n\nWe have already defined a set (informally) as a collection\
+    \ of distinct objects or *elements*.\nWe introduce some more set theory here.\n\
+    A set\nis also an object as described in section (1.6.1) and it makes sense to\n\
+    ask when two sets are equal.\n\n\nDefinition 1.42:\n\nTwo sets $A$ and $B$ are\
+    \ equal i.e., $A = B$ if they contain the same elements.\n\n/Definition\n\nAn\
+    \ example of a set could be \nthe set $\\{1,2,3\\}$ of natural numbers between\
+    \ $0$ and $4$. Notice again that we use the symbol\n\"$\\{$\" to start the listing\
+    \ of elements in a set and the symbol \"$\\}$\" to denote the end of the listing.\n\
+    Notice also that (by our definition of equality between sets), the order of the\
+    \ elements in the listing does not matter i.e.,\n$$\n\\{1, 2, 3\\} = \\{2, 3,\
+    \ 1\\}.\n$$\nWe are also not allowing duplicates like for\nexample in the listing\
+    \ $\\{1, 2, 2, 3, 3, 3\\}$ (such a thing is called a multiset: https://en.m.wikipedia.org/wiki/Multiset).\n\
+    \nAn example of a set not involving numbers could be the set of letters \n$$\n\
+    S=\\{A, n, e, x, a, m, p, l, c, o, u, d, b, t, h, s, r, i\\}\n$$ \nused in this\
+    \ sentence. The number of elements in a set $S$ is called the *cardinality* of\
+    \ the set.\nWe will denote it by $|S|$.\n\nTo convince someone beyond a doubt\
+    \ (we will talk about this formally later in this chapter) that two sets $A$ and\
+    \ $B$ are equal, one needs to argue that if $x$ is an element of $A$, then $x$\
+    \ is an element of $B$ and the other way round, if $y$ is an element of $B$, then\
+    \ $y$ is an element of $A$. If this is true, then\n$A$ and $B$ must contain the\
+    \ same elements.\n\n\nExercise 1.43:\n\nGive a precise reason as to why the two\
+    \ sets $\\{1, 2, 3\\}$ and $\\{1, 2, 4\\}$ are not equal.\nIs it possible for\
+    \ a set with $5$ elements to be equal to a set with $7$ elements?\n\n/Exercise\
+    \  \n\nSets may be explored using (only) python. This is illustrated in the snippet\
+    \ below. \n\n<a href=\"#a314f450-54ad-4acd-bbf0-475e00ac5949\" class =\"btn btn-default\
+    \ Sagebutton\" data-toggle=\"collapse\"></a><div id=a314f450-54ad-4acd-bbf0-475e00ac5949\
+    \ class = \"collapse Sage envbuttons\"><div class=sagepython><script type=\"text/x-sage\"\
+    >\nX = {1, 2, 3}\nY = {2, 3, 1}\nprint(\"X=Y is \", X==Y)\n\nS = {'A','n','e','x','a','m','p','l','c','o','u','d','b','t','h','s','r','i'}\n\
+    print(\"S = \", S) \nprint(\"The number of elements in S is |S|=\", len(S))\n\
+    </script></div></div>\n\n\n\nExercise 1.44:\n\nCome up with three lines of Sage\
+    \ code that verifies $\\{1, 2, 3\\} \\neq \\{1, 2, 4\\}$. Try it out.\n\n/Exercise"
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy@1
+- cosine_accuracy@3
+- cosine_accuracy@5
+- cosine_accuracy@10
+- cosine_precision@1
+- cosine_precision@3
+- cosine_precision@5
+- cosine_precision@10
+- cosine_recall@1
+- cosine_recall@3
+- cosine_recall@5
+- cosine_recall@10
+- cosine_ndcg@3
+- cosine_ndcg@5
+- cosine_ndcg@10
+- cosine_mrr@3
+- cosine_mrr@5
+- cosine_mrr@10
+- cosine_map@100
+model-index:
+- name: SentenceTransformer based on intfloat/e5-small-v2
+  results:
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: Unknown
+      type: unknown
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.6908315565031983
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.8347547974413646
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.8880597014925373
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.9253731343283582
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.6908315565031983
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.27825159914712155
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.17761194029850746
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.09253731343283582
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.6908315565031983
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.8347547974413646
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.8880597014925373
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.9253731343283582
+      name: Cosine Recall@10
+    - type: cosine_ndcg@3
+      value: 0.7763328209983278
+      name: Cosine Ndcg@3
+    - type: cosine_ndcg@5
+      value: 0.7980285423533605
+      name: Cosine Ndcg@5
+    - type: cosine_ndcg@10
+      value: 0.8099677414320194
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@3
+      value: 0.7560412224591327
+      name: Cosine Mrr@3
+    - type: cosine_mrr@5
+      value: 0.7679282160625444
+      name: Cosine Mrr@5
+    - type: cosine_mrr@10
+      value: 0.7727815006599654
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.7764057538430271
+      name: Cosine Map@100
+    - type: cosine_accuracy@1
+      value: 0.6663078579117331
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.8116254036598493
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.8697524219590959
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.9117330462863293
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.6663078579117331
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.27054180121994975
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.17395048439181915
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.09117330462863295
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.6663078579117331
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.8116254036598493
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.8697524219590959
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.9117330462863293
+      name: Cosine Recall@10
+    - type: cosine_ndcg@3
+      value: 0.7519327635399075
+      name: Cosine Ndcg@3
+    - type: cosine_ndcg@5
+      value: 0.7761647660928707
+      name: Cosine Ndcg@5
+    - type: cosine_ndcg@10
+      value: 0.7896982949533798
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@3
+      value: 0.7312522425547185
+      name: Cosine Mrr@3
+    - type: cosine_mrr@5
+      value: 0.7448690348044498
+      name: Cosine Mrr@5
+    - type: cosine_mrr@10
+      value: 0.7504190373673694
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.7541642286378775
+      name: Cosine Map@100
+---
+# SentenceTransformer based on intfloat/e5-small-v2
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-small-v2](https://huggingface.co/intfloat/e5-small-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [intfloat/e5-small-v2](https://huggingface.co/intfloat/e5-small-v2) <!-- at revision ffb93f3bd4047442299a41ebb6fa998a38507c52 -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 384 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
+  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("Krelle/e5-small-v2-imo-pairs")
+# Run inference
+sentences = [
+    'In Exercise\u202f1.41 we are asked to find identities for \\((a+b)^3\\) and \\((a+b)^4\\). What are the correct expanded forms, and how do they relate to the binomial theorem?',
+    'Section 1.6: More on sets\n\n\n\nPropositions are important, but are confined by the binary values\nof true and false. We would like to work mathematically with \nobjects like integers, floating point numbers, neural networks,\ncomputer programs and so on.\n\nSubsection 1.6.1: Objects and equality\n\n\n\nOne of the cornerstones of modern mathematics is\ndeciding when two objects are the same i.e.,\ngiven two objects $A$ and $B$, deciding whether\nthe proposition $A=B$ is true of false. Oftentimes\nan algorithm for evaluating $A=B$ is needed.\n\nYou may laugh here, but this is\nnot always that easy. Even though objects appear different they are the same as\nin, for example the propositions\n$$\n\\frac{105}{189} = \\frac{35}{63}\\qquad\\text{and}\\qquad \\sin\\left(\\frac{\\pi}{2}\\right) = 1.\n$$\nThe first proposition above is an identity of fractions (rational numbers). The second is\nan identity, which calls for knowledge of the sine function and real numbers. Each of these\nidentities calls for some rather advanced mathematics. The first proposition is true in\na very precise way, since $105\\cdot 63 = 189 \\cdot 35$.\n\n\nExercise 1.40:\n\n\n\n\nUse the Sage window above to reason \nabout equality in the quiz below. In each case describe the objects i.e.,\nare they numbers, symbols, etc.? Also, please check your computations\nby hand with the old fashioned paper and pencil, especially $(a+b)(a-b)$.\n\n\\begin{quiz}\n\\question\nClick on the right equalities below.\n\\answer{T}\n$$a + b - 2 b = a - b$$\n\\answer{F}\n$$(a+b)^2 = a^2 + b^2$$\n\\answer{T}\n$$(a + b)(a - b) = a^2 - b^2$$\n\\answer{T}\n$$(a + b)^2 = a^2 + 2 a b +  b^2$$\n\\answer{F}\n$$(a+b)^3 = a^3 + 2 a^2 b + 2 a b^2 + b^3$$\n\\answer{F}\n$$\\frac{3}{8} = \\frac{5}{13}$$ \n\\answer{F}\n$$\n\\pi = \\frac{22}{7}\n$$\n\\answer{T}\n$$\n\\cos^2(\\pi) + \\sin^2(\\pi) = 1\n$$\n\\end{quiz}\n\n/Exercise\n\n\nExercise 1.41:\n\nYou know that $(a+ b)^2 = a^2 + 2 a b + b^2$. Use Sage to find a similar identities\nfor $(a + b)^3$ and $(a + b)^4$.\n\n\\begin{hint}\n  Go back and look at (the beginning of) Exercise (1.40).\n\\end{hint}\n\n/Exercise\n\nFor two objects $A$ and $B$ we will use the notation $A \\neq B$ for the proposition $\\neg (A = B)$.\n\nWe have already defined a set (informally) as a collection of distinct objects or *elements*.\nWe introduce some more set theory here.\nA set\nis also an object as described in section (1.6.1) and it makes sense to\nask when two sets are equal.\n\n\nDefinition 1.42:\n\nTwo sets $A$ and $B$ are equal i.e., $A = B$ if they contain the same elements.\n\n/Definition\n\nAn example of a set could be \nthe set $\\{1,2,3\\}$ of natural numbers between $0$ and $4$. Notice again that we use the symbol\n"$\\{$" to start the listing of elements in a set and the symbol "$\\}$" to denote the end of the listing.\nNotice also that (by our definition of equality between sets), the order of the elements in the listing does not matter i.e.,\n$$\n\\{1, 2, 3\\} = \\{2, 3, 1\\}.\n$$\nWe are also not allowing duplicates like for\nexample in the listing $\\{1, 2, 2, 3, 3, 3\\}$ (such a thing is called a multiset: https://en.m.wikipedia.org/wiki/Multiset).\n\nAn example of a set not involving numbers could be the set of letters \n$$\nS=\\{A, n, e, x, a, m, p, l, c, o, u, d, b, t, h, s, r, i\\}\n$$ \nused in this sentence. The number of elements in a set $S$ is called the *cardinality* of the set.\nWe will denote it by $|S|$.\n\nTo convince someone beyond a doubt (we will talk about this formally later in this chapter) that two sets $A$ and $B$ are equal, one needs to argue that if $x$ is an element of $A$, then $x$ is an element of $B$ and the other way round, if $y$ is an element of $B$, then $y$ is an element of $A$. If this is true, then\n$A$ and $B$ must contain the same elements.\n\n\nExercise 1.43:\n\nGive a precise reason as to why the two sets $\\{1, 2, 3\\}$ and $\\{1, 2, 4\\}$ are not equal.\nIs it possible for a set with $5$ elements to be equal to a set with $7$ elements?\n\n/Exercise  \n\nSets may be explored using (only) python. This is illustrated in the snippet below. \n\n<a href="#a314f450-54ad-4acd-bbf0-475e00ac5949" class ="btn btn-default Sagebutton" data-toggle="collapse"></a><div id=a314f450-54ad-4acd-bbf0-475e00ac5949 class = "collapse Sage envbuttons"><div class=sagepython><script type="text/x-sage">\nX = {1, 2, 3}\nY = {2, 3, 1}\nprint("X=Y is ", X==Y)\n\nS = {\'A\',\'n\',\'e\',\'x\',\'a\',\'m\',\'p\',\'l\',\'c\',\'o\',\'u\',\'d\',\'b\',\'t\',\'h\',\'s\',\'r\',\'i\'}\nprint("S = ", S) \nprint("The number of elements in S is |S|=", len(S))\n</script></div></div>\n\n\n\nExercise 1.44:\n\nCome up with three lines of Sage code that verifies $\\{1, 2, 3\\} \\neq \\{1, 2, 4\\}$. Try it out.\n\n/Exercise',
+    'Chapter 1 on the language of mathematics is an introduction to the fundamental mathematics used in the notes.\nWithout understanding the basic concepts in it, you do not have the background to understand\nthe rest of the notes. Important highlights from the chapter are\n\n- Introduction to prompting. This is your ticket to using large language models effectively\n- How to use computer algebra (Sage). Sage can be very helpful in understanding the mathematics\n- Introduction of the numbers we use. Here the natural numbers, integers, rationals and real numbers are defined. Also the arithmetic rules for using them are given\n- Logic is the framework for reasoning in mathematics. Study this! First comes propositional logic. This is basic logic involving true and false statements with and, or etc as seen in truth tables. Then comes predicate logic, where variables are used. Here you must learn the meaning of "for every" and "there exists"\n- Proofs are described. Proof by contradiction is a must here! Do not skip it\n- The language of sets. Learn the operations on sets. Especially focus on the set builder notation and products of sets\n- Ordering of numbers. This is the formal definition of comparing numbers\n- Proof by induction. How to prove infinitely many propositions involving the natural numbers with one hack\n- The concept of a function. This is extremely important. Notice that a function is defined not by a rule. Also, in its definition enters crucially where it is defined\n- Functions from and into products\n- The preimage. This will become very important working with continuous functions',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 384]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities)
+# tensor([[1.0000, 0.3078, 0.0796],
+#         [0.3078, 1.0000, 0.2794],
+#         [0.0796, 0.2794, 1.0000]])
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Information Retrieval
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
+  ```json
+  {
+      "query_prompt": "query:",
+      "corpus_prompt": "passage:"
+  }
+  ```
+| Metric              | Value    |
+|:--------------------|:---------|
+| cosine_accuracy@1   | 0.6908   |
+| cosine_accuracy@3   | 0.8348   |
+| cosine_accuracy@5   | 0.8881   |
+| cosine_accuracy@10  | 0.9254   |
+| cosine_precision@1  | 0.6908   |
+| cosine_precision@3  | 0.2783   |
+| cosine_precision@5  | 0.1776   |
+| cosine_precision@10 | 0.0925   |
+| cosine_recall@1     | 0.6908   |
+| cosine_recall@3     | 0.8348   |
+| cosine_recall@5     | 0.8881   |
+| cosine_recall@10    | 0.9254   |
+| cosine_ndcg@3       | 0.7763   |
+| cosine_ndcg@5       | 0.798    |
+| **cosine_ndcg@10**  | **0.81** |
+| cosine_mrr@3        | 0.756    |
+| cosine_mrr@5        | 0.7679   |
+| cosine_mrr@10       | 0.7728   |
+| cosine_map@100      | 0.7764   |
+#### Information Retrieval
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
+  ```json
+  {
+      "query_prompt": "query:",
+      "corpus_prompt": "passage:"
+  }
+  ```
+| Metric              | Value      |
+|:--------------------|:-----------|
+| cosine_accuracy@1   | 0.6663     |
+| cosine_accuracy@3   | 0.8116     |
+| cosine_accuracy@5   | 0.8698     |
+| cosine_accuracy@10  | 0.9117     |
+| cosine_precision@1  | 0.6663     |
+| cosine_precision@3  | 0.2705     |
+| cosine_precision@5  | 0.174      |
+| cosine_precision@10 | 0.0912     |
+| cosine_recall@1     | 0.6663     |
+| cosine_recall@3     | 0.8116     |
+| cosine_recall@5     | 0.8698     |
+| cosine_recall@10    | 0.9117     |
+| cosine_ndcg@3       | 0.7519     |
+| cosine_ndcg@5       | 0.7762     |
+| **cosine_ndcg@10**  | **0.7897** |
+| cosine_mrr@3        | 0.7313     |
+| cosine_mrr@5        | 0.7449     |
+| cosine_mrr@10       | 0.7504     |
+| cosine_map@100      | 0.7542     |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 2,778 training samples
+* Columns: <code>anchor</code> and <code>positive</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                              | positive                                                                             |
+  |:--------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                              | string                                                                               |
+  | details | <ul><li>min: 14 tokens</li><li>mean: 41.25 tokens</li><li>max: 125 tokens</li></ul> | <ul><li>min: 37 tokens</li><li>mean: 351.42 tokens</li><li>max: 512 tokens</li></ul> |
+* Samples:
+  | anchor                                                                                                                                                       | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+  |:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>In Definition 8.2, why is the Hessian matrix defined with second partial derivatives evaluated at the point \(v\)?</code>                              | <code>Definition 8.2:<br><br><br>The *Hessian matrix* of $F$ at the point<br>$v\in \mathbb{R}^n$ is defined by<br><br>$$<br>  \nabla^2 F(v) :=<br>  \begin{pmatrix}<br>    \dfrac{ \partial^2 F}{ \partial x_1 \partial x_1}(v) &<br>    \cdots & \dfrac{ \partial^2 F}{ \partial x_1 \partial<br>      x_n}(v)<br>    \\<br>    \vdots & \ddots & \vdots<br>    \\<br>    \dfrac{ \partial^2 F}{ \partial x_n \partial x_1}(v) &<br>    \cdots & \dfrac{\partial^2 F}{ \partial x_n\partial<br>      x_n}(v)<br>  \end{pmatrix}<br>  .<br><br>$$<br><br>/Definition<br><br>A very important observation is that $\nabla^2 F(v)$ above is a<br>symmetric matrix if $F$ satisfies the condition in the last part of Theorem 7.13.</code> |
+  | <code>The definition shows the entry \(\frac{\partial^2 F}{\partial x_i \partial x_j}(v)\). Does the order of differentiation matter for the Hessian?</code> | <code>Definition 8.2:<br><br><br>The *Hessian matrix* of $F$ at the point<br>$v\in \mathbb{R}^n$ is defined by<br><br>$$<br>  \nabla^2 F(v) :=<br>  \begin{pmatrix}<br>    \dfrac{ \partial^2 F}{ \partial x_1 \partial x_1}(v) &<br>    \cdots & \dfrac{ \partial^2 F}{ \partial x_1 \partial<br>      x_n}(v)<br>    \\<br>    \vdots & \ddots & \vdots<br>    \\<br>    \dfrac{ \partial^2 F}{ \partial x_n \partial x_1}(v) &<br>    \cdots & \dfrac{\partial^2 F}{ \partial x_n\partial<br>      x_n}(v)<br>  \end{pmatrix}<br>  .<br><br>$$<br><br>/Definition<br><br>A very important observation is that $\nabla^2 F(v)$ above is a<br>symmetric matrix if $F$ satisfies the condition in the last part of Theorem 7.13.</code> |
+  | <code>The text says the Hessian is symmetric if \(F\) satisfies the condition in the last part of Theorem 7.13. What is that condition exactly?</code>       | <code>Definition 8.2:<br><br><br>The *Hessian matrix* of $F$ at the point<br>$v\in \mathbb{R}^n$ is defined by<br><br>$$<br>  \nabla^2 F(v) :=<br>  \begin{pmatrix}<br>    \dfrac{ \partial^2 F}{ \partial x_1 \partial x_1}(v) &<br>    \cdots & \dfrac{ \partial^2 F}{ \partial x_1 \partial<br>      x_n}(v)<br>    \\<br>    \vdots & \ddots & \vdots<br>    \\<br>    \dfrac{ \partial^2 F}{ \partial x_n \partial x_1}(v) &<br>    \cdots & \dfrac{\partial^2 F}{ \partial x_n\partial<br>      x_n}(v)<br>  \end{pmatrix}<br>  .<br><br>$$<br><br>/Definition<br><br>A very important observation is that $\nabla^2 F(v)$ above is a<br>symmetric matrix if $F$ satisfies the condition in the last part of Theorem 7.13.</code> |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim",
+      "gather_across_devices": false
+  }
+  ```
+### Evaluation Dataset
+#### Unnamed Dataset
+* Size: 929 evaluation samples
+* Columns: <code>anchor</code> and <code>positive</code>
+* Approximate statistics based on the first 929 samples:
+  |         | anchor                                                                            | positive                                                                             |
+  |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                               |
+  | details | <ul><li>min: 6 tokens</li><li>mean: 29.95 tokens</li><li>max: 96 tokens</li></ul> | <ul><li>min: 36 tokens</li><li>mean: 383.98 tokens</li><li>max: 512 tokens</li></ul> |
+* Samples:
+  | anchor                                                                                                                                                               | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>In Section 1.1, why does the author warn that prompting without any knowledge of the mathematics can be disastrous?</code>                                     | <code>Chapter 1: The language of mathematics and prompting<br><br><br><br>Section 1.1: The art of prompting<br><br><br><br>As of August 2024, there is a multitude of chatbots available on the internet. Some of them, like<br>ChatGPT: https://chatgpt.com, Claude: https://claude.ai and Gemini: https://gemini.google.com (and Llama 3.1, Mistral, ... the list goes on)<br><br><br><br>have quite impressive reasoning capabilities. <br>These models are now multimodal i.e., they <br>even accept non-textual input, such as images, sound and video. In principle you can upload a picture of a math exercise and<br>the chatbot will provide a solution. Well, that is, on a good day and for a not too difficult exercise.<br><br>The use of chatbots is encouraged throughout this course. In fact,<br>they are even allowed during the exam. It is my hope that you will<br>learn mathematics on a deeper level by communicating with the machine<br>using carefully designed prompts - see <br>the OpenAI guide: https://platform.openai.com/docs/guides/prompt-engineering on prompt engineering.<br>...</code> |
+  | <code>The first prompting block asks for "two examples of good prompts"—how should I include LaTeX code in such a prompt according to the example?</code>            | <code>Chapter 1: The language of mathematics and prompting<br><br><br><br>Section 1.1: The art of prompting<br><br><br><br>As of August 2024, there is a multitude of chatbots available on the internet. Some of them, like<br>ChatGPT: https://chatgpt.com, Claude: https://claude.ai and Gemini: https://gemini.google.com (and Llama 3.1, Mistral, ... the list goes on)<br><br><br><br>have quite impressive reasoning capabilities. <br>These models are now multimodal i.e., they <br>even accept non-textual input, such as images, sound and video. In principle you can upload a picture of a math exercise and<br>the chatbot will provide a solution. Well, that is, on a good day and for a not too difficult exercise.<br><br>The use of chatbots is encouraged throughout this course. In fact,<br>they are even allowed during the exam. It is my hope that you will<br>learn mathematics on a deeper level by communicating with the machine<br>using carefully designed prompts - see <br>the OpenAI guide: https://platform.openai.com/docs/guides/prompt-engineering on prompt engineering.<br>...</code> |
+  | <code>In the second prompting block, the equation $x^2 - x - 1 = 0$ is given; what level of detail does "Guide me through the steps" expect from the chatbot?</code> | <code>Chapter 1: The language of mathematics and prompting<br><br><br><br>Section 1.1: The art of prompting<br><br><br><br>As of August 2024, there is a multitude of chatbots available on the internet. Some of them, like<br>ChatGPT: https://chatgpt.com, Claude: https://claude.ai and Gemini: https://gemini.google.com (and Llama 3.1, Mistral, ... the list goes on)<br><br><br><br>have quite impressive reasoning capabilities. <br>These models are now multimodal i.e., they <br>even accept non-textual input, such as images, sound and video. In principle you can upload a picture of a math exercise and<br>the chatbot will provide a solution. Well, that is, on a good day and for a not too difficult exercise.<br><br>The use of chatbots is encouraged throughout this course. In fact,<br>they are even allowed during the exam. It is my hope that you will<br>learn mathematics on a deeper level by communicating with the machine<br>using carefully designed prompts - see <br>the OpenAI guide: https://platform.openai.com/docs/guides/prompt-engineering on prompt engineering.<br>...</code> |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim",
+      "gather_across_devices": false
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 32
+- `per_device_eval_batch_size`: 32
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 8
+- `warmup_ratio`: 0.1
+- `fp16`: True
+- `load_best_model_at_end`: True
+- `prompts`: {'anchor': 'query:', 'positive': 'passage:', 'negative': 'passage:'}
+- `batch_sampler`: no_duplicates
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 32
+- `per_device_eval_batch_size`: 32
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 8
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `bf16`: False
+- `fp16`: True
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: True
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `parallelism_config`: None
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch_fused
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `project`: huggingface
+- `trackio_space_id`: trackio
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `hub_revision`: None
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: no
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `liger_kernel_config`: None
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: True
+- `prompts`: {'anchor': 'query:', 'positive': 'passage:', 'negative': 'passage:'}
+- `batch_sampler`: no_duplicates
+- `multi_dataset_batch_sampler`: proportional
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
+</details>
+### Training Logs
+| Epoch      | Step    | Training Loss | Validation Loss | cosine_ndcg@10 |
+|:----------:|:-------:|:-------------:|:---------------:|:--------------:|
+| -1         | -1      | -             | -               | 0.4709         |
+| 1.1494     | 100     | 1.2817        | 0.7786          | 0.7818         |
+| 2.2989     | 200     | 0.3207        | 0.7569          | 0.7762         |
+| 3.4483     | 300     | 0.2454        | 0.7324          | 0.7823         |
+| **4.5977** | **400** | **0.1875**    | **0.7012**      | **0.7948**     |
+| 5.7471     | 500     | 0.1479        | 0.7016          | 0.7897         |
+| 6.8966     | 600     | 0.1325        | 0.6992          | 0.7897         |
+| -1         | -1      | -             | -               | 0.8100         |
+* The bold row denotes the saved checkpoint.
+### Framework Versions
+- Python: 3.12.12
+- Sentence Transformers: 5.1.2
+- Transformers: 4.57.1
+- PyTorch: 2.8.0+cu126
+- Accelerate: 1.11.0
+- Datasets: 4.0.0
+- Tokenizers: 0.22.1
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "transformers_version": "4.57.1",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "model_type": "SentenceTransformer",
+  "__version__": {
+    "sentence_transformers": "5.1.2",
+    "transformers": "4.57.1",
+    "pytorch": "2.8.0+cu126"
+  },
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:33349852bb867909c7611815a0fca713f0cb10b20516eb1164474ae519b30fd3
+size 133462128

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 512,
+    "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff