File size: 46,205 Bytes

c806aec

---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:79876
- loss:TripletLoss
base_model: Master-thesis-NAP/ModernBert-DAPT-math
widget:
- source_sentence: What is the error estimate for the difference between the exact
    solution and the local oscillation decomposition (LOD) solution in terms of the
    $L_0$ norm?
  sentences:
  - '\label{RL1}

    The system \eqref{R3} has the following positive fixed points if $0 <\alpha\leq1$
    and $b>d$

    $$E^*=\left(\dfrac{d}{b}, \dfrac{(b-d) r}{b^2}\right)$$'
  - "\\label{theo1d}\nWith the assumptions and setting is this section,  the finite\
    \ difference solution  computed using the improved harmonic average method applied\
    \ to \\eqn{eq1d} or \\eqn{eq1dB}  has second order convergence in the infinity\
    \ norm, that is,\n\\eqm\n  \\|\\mathbf{E} \\|_{\\infty}\\le C h^2,\n\\enm\nassuming\
    \ that the true solution of \\eqn{eq1d} is piecewise $C^4$ excluding the interface\
    \ $\\alf$, that is, \n$u(x) \\in C^4(0,\\alf)  \\cup C^4(\\alf,1)$. \n%where $C$\
    \ is a generic error constant."
  - "\\label{Corollary}\n     Let Assumptions~\\ref{assum_1} and~\\ref{assump2} be\
    \ satisfied. Let $u$ be the solution of~\\eqref{WeakForm} and let $u_{H,k}$ be\
    \ the LOD solution of~\\eqref{local_probelm }. Then we have \n     \\begin{equation}\\\
    label{L2Estimate}\n         \\|u-I_Hu_{H,k}\\|_0\\lesssim  \\|u-I_Hu\\|_0+\\|u-u_{H,k}\\\
    |_0 +H|u-u_{H,k}|_1.\n     \\end{equation}\n     %\\[\\|u-I_Hu_{H,k}\\|_0\\lesssim\
    \ H |u|_1 +|u-u_{H,k}|_1.\\]"
- source_sentence: What is the expected value of the number of individuals in a Markov
    branching process with non-homogeneous Poisson immigration (MBPNPI) at time $t=0$,
    given that the immigration rate is $\lambda$?
  sentences:
  - '\label{lemma-sampling}

    Fix an integer~$n\geq 1$.

    Consider the initial configuration with one active particle on each

    site of~$V_n$ and let the system evolve, with particles being killed

    when they jump out of~$V_n$, until no active particle remains

    in~$V_n$.

    Then the distribution of the resulting stable configuration is exactly

    the stationary distribution of the driven-dissipative Markov chain

    on~$V_n$.

    In particular, the number of sleeping particles remaining in~$V_n$ is

    distributed as~$S_n$.'
  - "The process $Y(t)$, $t\\geq 0,$ is called Markov branching process with\r\nnon-homogeneous\
    \ Poisson immigration (MBPNPI)."
  - "For any $\\lambda \\in(0,1)$ and $s \\in\\mathbb N$,\n  \\begin{equation*}\n\\\
    sum_{k=s}^{\\infty}\\binom {k}{s}\n(1-\\lambda)^{k-s}=    \\lambda^{-s-1}.\n\\\
    end{equation*}"
- source_sentence: Does the theorem imply that the rate of convergence of the sequence
    $T_{m,j}(E)$ to $T_{m+k_n,j+k_n}(E)$ is exponential in the distance between $m$
    and $j$, and that this rate is bounded by a constant $C$ times an exponential
    decay factor involving the parameter $\gamma$?
  sentences:
  - "\\label{lem1}\n\t\tFor all $m,j\\in\\Z$,  we have\n\t\t\\begin{equation*}\n\t\
    \t|| T_{m,j} (E)-T_{m+k_n,j+k_n}(E)||\\leq C e^{-\\gamma  k_n}  e^{(\\mathcal\
    \ L(E)+\\varepsilon) |m-j|}. \n\t\t\\end{equation*}"
  - "[Divergence Theorem or Gauss-Green Theorem for Surfaces in $\\R^3$]\n\t\\label{thm:surface_int}\n\
    \t        Let $\\Sigma \\subset \\Omega\\subseteq\\R^3$ be a bounded smooth surface.\n\
    \t        Further, $\\bb a:\\Sigma\\to\\R^3$ is a continuously differentiable\
    \ vector field that is either defined on the\n\t\t\t\t\tboundary $\\partial\\\
    Sigma$ or has a bounded continuous extension to this boundary.\n\t        Like\
    \ in \\eqref{eq:decomp} it may be decomposed into tangential and normal components\n\
    \t\t\t\t\tas follows $\\bb a = \\bb a^\\shortparallel + a_\\nu\\bs\\nu_\\Sigma$.\
    \ By $\\dd l$ we denote the line element on \n\t\t\t\t\tthe curve $\\partial \\\
    Sigma$. We assume that the curve is continuous and consists of finitely many\n\
    \t\t\t\t\tsmooth pieces.\n\t        Then the following divergence formula for\
    \ surface integrals holds\n\t        %\n\t        \\begin{align}\n\t         \
    \   %\n\t            \\int\\limits_\\Sigma \\left[\\nabla_\\Sigma\\cdot\\bb a^\\\
    shortparallel\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\partial\\\
    Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\,\\dd l .\n\
    \t            \\label{eq:surface_div}\n\t            %\n\t        \\end{align}\n\
    \t\t\t\t\t%\n\t\t\t\t\tFrom this we obtain the formula\n\t\t\t\t\t%\n\t      \
    \  \\begin{align}\n\t            %\n\t            \\int\\limits_\\Sigma \\left[\\\
    nabla_\\Sigma\\cdot\\bb a\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\\
    partial\\Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\\
    ,\\dd l \n\t\t\t\t\t\t\t-\\int\\limits_\\Sigma\\left[ 2\\kappa_Ma_\\nu\\right](\\\
    x)\\;\\dd S.\n\t            \\label{eq:surface_div_2}\n\t            %\n\t   \
    \     \\end{align}\n\t    %"
  - '\label{theo:helper3}

    Assume that $\{\PP_N\}_{N\ge 1}$ is a sequence of probability measures that is
    HT-appropriate in the sense of \cref{def:appropriate} and satisfies the LLN in
    the sense of \cref{def:LLN}.

    Let $(\kappa_n)_{n\ge 1}$ and $(m_n)_{n\ge 1}$ be the sequences that arise from
    these definitions.

    Moreover, assume that there exists a constant $C>0$ such that $|\kappa_n|\leq
    C^n$, for all $n \geq 1$.

    Then $(m_n)_{n\ge 1}$ is the sequence of moments of a unique probability measure
    on $\R$.'
- source_sentence: What is the error estimate for the eigenfunction approximation
    in terms of the weak eigenvalue and the norm of the difference between the exact
    and approximate eigenfunctions?
  sentences:
  - "Consider dynamics \\eqref{avg} and define the corresponding average dynamics\
    \ as $\\label{T-avg}\n\\mathring{\\chi} = \\epsilon h_{av}(\\chi)$, with the average\
    \ function defined as\n\\begin{equation*} \nh_{av}(\\chi):=\\lim_{T \\to \\infty}\
    \ \\frac{1}{T}\\int_{t}^{t+T} h(\\mu, \\chi, 0) d \\mu, \\ T>0,\n\\end{equation*}\n\
    both \\eqref{avg} and \\eqref{T-avg} twice differentiable and bounded in every\
    \ compact set of the $\\chi$-domain $\\mathcal{D} \\subset \\mathbb{R}^{3}$. \n\
    %\nLet $\\chi(\\tau,\\epsilon)$ and $\\chi_{av}(\\epsilon\\tau)$ denote the solutions\
    \ of \\eqref{avg} and \\eqref{T-avg}, respectively. If $\\chi_{av}(\\epsilon\\\
    tau)\\in \\mathcal{D}$ for all $\\tau\\in[0,\\zeta/\\epsilon]$, $\\zeta\\geq 0$,\
    \ and $\\chi(0,\\epsilon) - \\chi_{av}(0)=\\mathcal{O}(\\nu(\\epsilon))$, then\
    \ there exists an $\\epsilon^{*}>0$ such that for all $0<\\epsilon<\\epsilon^{*}$,\
    \ $\\chi(\\tau,\\epsilon)$ is well defined and\n$$\n\\chi(\\tau,\\epsilon) - \\\
    chi_{av}(\\epsilon\\tau) = \\mathcal{O}(\\nu(\\epsilon)) \\ \\textnormal{on} \\\
    \ \\tau \\in [0, \\zeta/\\epsilon],\n$$\nfor some function $\\nu\\in \\mathcal{K}$."
  - "(\\cite{DangWangXieZhou})\\label{Theorem_Error_Estimate_k}\nLet us define the\
    \ spectral projection $F_{k,h}^{(\\ell)}: V\\mapsto {\\rm span}\\{u_{1,h}^{(\\\
    ell)}, \\cdots, u_{k,h}^{(\\ell)}\\}$ for any integer $\\ell \\geq 1$ as follows:\n\
    \\begin{eqnarray*}\na(F_{k,h}^{(\\ell)}w, u_{i,h}^{(\\ell)}) = a(w, u_{i,h}^{(\\\
    ell)}), \\ \\ \\ i=1, \\cdots, k\\ \\ {\\rm for}\\ w\\in V.\n\\end{eqnarray*}\n\
    Then the exact eigenfunctions $\\bar u_{1,h},\\cdots, \\bar u_{k,h}$ of (\\ref{Weak_Eigenvalue_Discrete})\
    \ and the eigenfunction approximations $u_{1,h}^{(\\ell+1)}$, $\\cdots$,  $u_{k,h}^{(\\\
    ell+1)}$ from Algorithm \\ref{Algorithm_k} with the integer $\\ell > 1$ have the\
    \ following error estimate:\n\\begin{eqnarray*}\\label{Error_Estimate_Inverse}\n\
    \ \\left\\|\\bar u_{i,h} - F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\|_a \\leq\n\
    \ \\bar\\lambda_{i,h} \\sqrt{1+\\frac{\\eta_a^2(V_H)}{\\bar\\lambda_{1,h}\\big(\\\
    delta_{k,i,h}^{(\\ell+1)}\\big)^2}}\n\\left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\\
    ell)}}\\right)\\eta_a^2(V_H)\\left\\|\\bar u_{i,h} - F_{k,h}^{(\\ell)}\\bar u_{i,h}\
    \ \\right\\|_a,\n\\end{eqnarray*}\nwhere $\\delta_{k,i,h}^{(\\ell)} $ is defined\
    \ as follows:\n\\begin{eqnarray*}\n\\delta_{k,i,h}^{(\\ell)} = \\min_{j\\not\\\
    in \\{1, \\cdots, k\\}}\\left|\\frac{1}{\\lambda_{j,h}^{(\\ell)}}-\\frac{1}{\\\
    bar\\lambda_{i,h}}\\right|,\\ \\ \\ i=1, \\cdots, k.\n\\end{eqnarray*}\nFurthermore,\
    \ the following $\\left\\|\\cdot\\right\\|_b$-norm error estimate holds:\n\\begin{eqnarray*}\n\
    \\left\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\|_b\\leq \n\\\
    left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\ell+1)}}\\right)\\eta_a(V_H)\
    \ \\left\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h}\\right\\|_a.\n\\end{eqnarray*}"
  - "\\big[{\\bf Condition $SD1(h)$}\\big]\\label{DefnSD1(h)}\n\nIn \\cite{MDL} an\
    \ approximation order $O(h^s)$, as $h\\to 0$, is proved, where $h$ is the sampling\
    \ distance. The achievable order $s$ is of course limited by the smoothness order\
    \ of the boundaries of $Graph(F)$. Then, the order $s$ depends upon the degree\
    \ of the polynomials used to approximate the boundary near the neighborhood of\
    \ points of topology change and upon the degree of splines used at regular regions.\
    \ \n\nFor example, let us view Step C of the approximation algorithm described\
    \ in Section 5.2 of \\cite{MDL}. \nIt is assumed that the boundary curves are\
    \ $C^{2k}$ smooth, and it is implicitly assumed that $h$ is small enough so that\
    \ there are $2k$ sample points close to the point of topology change, for computing\
    \ the polynomial $p_{2k-1}$ therein.\nThis condition is related to the more general\
    \ condition $SD(h)$ and it can serve as a practical way of checking it for the\
    \ case $d=1$. That is, near a point of topology change, we check whether there\
    \ are enough sample points for applying the approximation algorithm in \\cite{MDL}.\
    \ We denote this condition as the $SD1(h)$ condition."
- source_sentence: Does Werner-Young's inequality imply that the convolution of two
    $L^p$ spaces is always $L^r$ for $1 < r < \infty$?
  sentences:
  - "$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion.  If $1\
    \ < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition.\
    \  \n %"
  - "A \\emph{bond system} is a tuple $(B,C,s,t,1,\\cdot)$, where $B$ is a set of\
    \ \\emph{bonds}, $C$ is a set of \\emph{content} relations, and $s,t:C\\to B$\
    \ are \\emph{source} and \\emph{target} functions. For $c\\in C$ with $s(c)=x$\
    \ and $t(c)=y$, we write $x\\xrightarrow{c}y$ or $c:x\\to y$, indicating that\
    \ $x$ \\emph{contains} $y$. Each bond $x\\in B$ has an \\emph{identity} containment\
    \ $1_x:x\\to x$, meaning every bond trivially contains itself. For $c:x\\to y$\
    \ and $c':y\\to z$, their composition is $cc':x\\to z$. These data must satisfy:\n\
    \    \\begin{enumerate}\n        \\item Identity laws: For each $c:x\\to y$, $1_x\
    \ c= c=c1_y$\n        \\item Associativity: For $c:x\\to y$, $c':y\\to z$, $c'':z\\\
    to w$, $c(c'c'')=(cc')c''$\n        \\item Anti-symmetry: For $c:x\\to y$ and\
    \ $c':y\\to x$, $x=y$\n        \\item Left cancellation: For $c,c':x\\to y$ and\
    \ $c'':y\\to z$, if $cc''=c'c''$, then $c=c'$\n    \\end{enumerate}"
  - "[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\\
    in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$\
    \ and\n\\begin{align*}\n    \\|S\\star T\\|_{L^{r}}\\leq \\|S\\|_{\\cS^p}\\|T\\\
    |_{\\cS^q}.\n\\end{align*}"
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: ModernBERT DAPT Embed DAPT Math
  results:
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: TESTING
      type: TESTING
    metrics:
    - type: cosine_accuracy@1
      value: 0.5679510844485464
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.6324411628980157
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.6586294416243654
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.6938163359483156
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.5679510844485464
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.36494385479157054
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.27741116751269035
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.18192201199815417
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.026541702012005317
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.048742014322369596
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.0598887341486898
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.07516536747041261
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.25320633940615317
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.6070309695944213
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.07416668442975916
      name: Cosine Map@100
---

# ModernBERT DAPT Embed DAPT Math

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math) <!-- at revision a30384f91d764c272e6b740c256d5581325ea4bb -->
- **Maximum Sequence Length:** 8192 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
- **Language:** en
- **License:** apache-2.0

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Master-thesis-NAP/ModernBERT-DAPT-Embed-DAPT-Math")
# Run inference
sentences = [
    "Does Werner-Young's inequality imply that the convolution of two $L^p$ spaces is always $L^r$ for $1 < r < \\infty$?",
    "[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$ and\n\\begin{align*}\n    \\|S\\star T\\|_{L^{r}}\\leq \\|S\\|_{\\cS^p}\\|T\\|_{\\cS^q}.\n\\end{align*}",
    '$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion.  If $1 < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition.  \n %',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Information Retrieval

* Dataset: `TESTING`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.568      |
| cosine_accuracy@3   | 0.6324     |
| cosine_accuracy@5   | 0.6586     |
| cosine_accuracy@10  | 0.6938     |
| cosine_precision@1  | 0.568      |
| cosine_precision@3  | 0.3649     |
| cosine_precision@5  | 0.2774     |
| cosine_precision@10 | 0.1819     |
| cosine_recall@1     | 0.0265     |
| cosine_recall@3     | 0.0487     |
| cosine_recall@5     | 0.0599     |
| cosine_recall@10    | 0.0752     |
| **cosine_ndcg@10**  | **0.2532** |
| cosine_mrr@10       | 0.607      |
| cosine_map@100      | 0.0742     |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 79,876 training samples
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
* Approximate statistics based on the first 1000 samples:
  |         | anchor                                                                             | positive                                                                            | negative                                                                            |
  |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
  | type    | string                                                                             | string                                                                              | string                                                                              |
  | details | <ul><li>min: 9 tokens</li><li>mean: 38.48 tokens</li><li>max: 142 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 210.43 tokens</li><li>max: 924 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 91.02 tokens</li><li>max: 481 tokens</li></ul> |
* Samples:
  | anchor                                                                                                                                                                                                                                             | positive                                                                                                                                                                                                                                                                                                                                    | negative                                                                                                                                                                                                                                                                                                                                                                     |
  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>What is the limit of the proportion of 1's in the sequence $a_n$ as $n$ approaches infinity, given that $0 \leq 3g_n -2n \leq 4$?</code>                                                                                                     | <code>Let $g_n$ be the number of $1$'s in the sequence $a_1 a_2 \cdots a_n$.<br>Then <br>\begin{equation}<br>0 \leq 3g_n -2n \leq 4<br>\label{star}<br>\end{equation}<br>for all $n$, and hence<br>$\lim_{n \rightarrow \infty} g_n/n = 2/3$.<br>\label{thm1}</code>                                                                        | <code>\label{thm:bounds_initial}<br>                Let $\seqq{s}$ be a sequence of rank $r$ for which the roots of the characteristic polynomial are all different. Then, for any positive integer $M$, the rank of $\seq{s^M}$ is at most<br>                \begin{align*}<br>                    \rank s^M \leq \binom{M+r-1}{M}.<br>                \end{align*}</code> |
  | <code>Does the statement of \textbf{ThmConjAreTrue} imply that the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is always equal to $g(d,s)$?</code> | <code>\label{ThmConjAreTrue}<br>Conjectures \ref{Conj1} and \ref{Conj2} are true.<br>As a consequence, <br>if either $d=s \geq 1$ or $d \geq 2s+1 \geq 3$, <br>the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is equal to $g(d,s)$.</code> | <code>[{\cite[Corollary 2.2.2 with $p=3$]{BSY}}]<br>    Let $S$ be a non-trivial Severi-Brauer surface over a perfect field $\textbf{k}$. Then $S$ does not contain points of degree $d$, where $d$ is not divisible by $3$. On the other hand $S$ contains a point of degree $3$.</code>                                                                                    |
  | <code>\\emph{Is the statement \emph{If $X$ is a compact Hausdorff space, then $X$ is normal}, proven in the first isomorphism theorem for topological groups, or is it a well-known result in topology?}</code>                                    | <code>}<br>\newcommand{\ep}{</code>                                                                                                                                                                                                                                                                                                         | <code>\label{prop:coherence}<br>	If $X$ is a qcqs scheme, then $RX$ is coherent in the sense that the set of quasi-compact open subsets of $RX$ is closed under finite intersections and forms a basis for the topology of $RX$.</code>                                                                                                                                      |
* Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
  ```json
  {
      "distance_metric": "TripletDistanceMetric.COSINE",
      "triplet_margin": 0.1
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: epoch
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 8
- `learning_rate`: 2e-05
- `num_train_epochs`: 4
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `bf16`: True
- `tf32`: True
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: epoch
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 8
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 4
- `max_steps`: -1
- `lr_scheduler_type`: cosine
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: True
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
<details><summary>Click to expand</summary>

| Epoch   | Step    | Training Loss | TESTING_cosine_ndcg@10 |
|:-------:|:-------:|:-------------:|:----------------------:|
| 0.0160  | 10      | 1.1162        | -                      |
| 0.0320  | 20      | 1.0465        | -                      |
| 0.0481  | 30      | 0.9663        | -                      |
| 0.0641  | 40      | 0.8758        | -                      |
| 0.0801  | 50      | 0.8215        | -                      |
| 0.0961  | 60      | 0.7492        | -                      |
| 0.1122  | 70      | 0.6356        | -                      |
| 0.1282  | 80      | 0.3573        | -                      |
| 0.1442  | 90      | 0.166         | -                      |
| 0.1602  | 100     | 0.0797        | -                      |
| 0.1762  | 110     | 0.046         | -                      |
| 0.1923  | 120     | 0.0419        | -                      |
| 0.2083  | 130     | 0.025         | -                      |
| 0.2243  | 140     | 0.0233        | -                      |
| 0.2403  | 150     | 0.0205        | -                      |
| 0.2564  | 160     | 0.0142        | -                      |
| 0.2724  | 170     | 0.017         | -                      |
| 0.2884  | 180     | 0.0157        | -                      |
| 0.3044  | 190     | 0.0104        | -                      |
| 0.3204  | 200     | 0.0126        | -                      |
| 0.3365  | 210     | 0.019         | -                      |
| 0.3525  | 220     | 0.0153        | -                      |
| 0.3685  | 230     | 0.0171        | -                      |
| 0.3845  | 240     | 0.0124        | -                      |
| 0.4006  | 250     | 0.01          | -                      |
| 0.4166  | 260     | 0.0071        | -                      |
| 0.4326  | 270     | 0.0125        | -                      |
| 0.4486  | 280     | 0.0096        | -                      |
| 0.4647  | 290     | 0.0092        | -                      |
| 0.4807  | 300     | 0.0067        | -                      |
| 0.4967  | 310     | 0.0069        | -                      |
| 0.5127  | 320     | 0.0054        | -                      |
| 0.5287  | 330     | 0.0107        | -                      |
| 0.5448  | 340     | 0.0115        | -                      |
| 0.5608  | 350     | 0.0083        | -                      |
| 0.5768  | 360     | 0.0175        | -                      |
| 0.5928  | 370     | 0.0162        | -                      |
| 0.6089  | 380     | 0.0094        | -                      |
| 0.6249  | 390     | 0.0124        | -                      |
| 0.6409  | 400     | 0.0078        | -                      |
| 0.6569  | 410     | 0.014         | -                      |
| 0.6729  | 420     | 0.0117        | -                      |
| 0.6890  | 430     | 0.0097        | -                      |
| 0.7050  | 440     | 0.0094        | -                      |
| 0.7210  | 450     | 0.0077        | -                      |
| 0.7370  | 460     | 0.0103        | -                      |
| 0.7531  | 470     | 0.0099        | -                      |
| 0.7691  | 480     | 0.0123        | -                      |
| 0.7851  | 490     | 0.0103        | -                      |
| 0.8011  | 500     | 0.0098        | -                      |
| 0.8171  | 510     | 0.0059        | -                      |
| 0.8332  | 520     | 0.0031        | -                      |
| 0.8492  | 530     | 0.0075        | -                      |
| 0.8652  | 540     | 0.0101        | -                      |
| 0.8812  | 550     | 0.0099        | -                      |
| 0.8973  | 560     | 0.0098        | -                      |
| 0.9133  | 570     | 0.0072        | -                      |
| 0.9293  | 580     | 0.0057        | -                      |
| 0.9453  | 590     | 0.0074        | -                      |
| 0.9613  | 600     | 0.0038        | -                      |
| 0.9774  | 610     | 0.0127        | -                      |
| 0.9934  | 620     | 0.0098        | -                      |
| **1.0** | **625** | **-**         | **0.2532**             |
| 1.0080  | 630     | 0.0064        | -                      |
| 1.0240  | 640     | 0.0066        | -                      |
| 1.0401  | 650     | 0.0056        | -                      |
| 1.0561  | 660     | 0.0031        | -                      |
| 1.0721  | 670     | 0.0023        | -                      |
| 1.0881  | 680     | 0.0032        | -                      |
| 1.1041  | 690     | 0.0021        | -                      |
| 1.1202  | 700     | 0.0011        | -                      |
| 1.1362  | 710     | 0.006         | -                      |
| 1.1522  | 720     | 0.0045        | -                      |
| 1.1682  | 730     | 0.0041        | -                      |
| 1.1843  | 740     | 0.0026        | -                      |
| 1.2003  | 750     | 0.0019        | -                      |
| 1.2163  | 760     | 0.0058        | -                      |
| 1.2323  | 770     | 0.0054        | -                      |
| 1.2483  | 780     | 0.0066        | -                      |
| 1.2644  | 790     | 0.0033        | -                      |
| 1.2804  | 800     | 0.004         | -                      |
| 1.2964  | 810     | 0.0028        | -                      |
| 1.3124  | 820     | 0.0027        | -                      |
| 1.3285  | 830     | 0.0017        | -                      |
| 1.3445  | 840     | 0.0009        | -                      |
| 1.3605  | 850     | 0.0048        | -                      |
| 1.3765  | 860     | 0.0037        | -                      |
| 1.3925  | 870     | 0.0045        | -                      |
| 1.4086  | 880     | 0.0043        | -                      |
| 1.4246  | 890     | 0.0046        | -                      |
| 1.4406  | 900     | 0.0023        | -                      |
| 1.4566  | 910     | 0.0031        | -                      |
| 1.4727  | 920     | 0.0027        | -                      |
| 1.4887  | 930     | 0.0022        | -                      |
| 1.5047  | 940     | 0.0042        | -                      |
| 1.5207  | 950     | 0.0026        | -                      |
| 1.5368  | 960     | 0.0049        | -                      |
| 1.5528  | 970     | 0.0024        | -                      |
| 1.5688  | 980     | 0.0019        | -                      |
| 1.5848  | 990     | 0.0038        | -                      |
| 1.6008  | 1000    | 0.0036        | -                      |
| 1.6169  | 1010    | 0.0023        | -                      |
| 1.6329  | 1020    | 0.0021        | -                      |
| 1.6489  | 1030    | 0.0011        | -                      |
| 1.6649  | 1040    | 0.0025        | -                      |
| 1.6810  | 1050    | 0.0026        | -                      |
| 1.6970  | 1060    | 0.0034        | -                      |
| 1.7130  | 1070    | 0.0024        | -                      |
| 1.7290  | 1080    | 0.0038        | -                      |
| 1.7450  | 1090    | 0.002         | -                      |
| 1.7611  | 1100    | 0.0046        | -                      |
| 1.7771  | 1110    | 0.0003        | -                      |
| 1.7931  | 1120    | 0.0062        | -                      |
| 1.8091  | 1130    | 0.0057        | -                      |
| 1.8252  | 1140    | 0.0012        | -                      |
| 1.8412  | 1150    | 0.0021        | -                      |
| 1.8572  | 1160    | 0.0038        | -                      |
| 1.8732  | 1170    | 0.0024        | -                      |
| 1.8892  | 1180    | 0.0026        | -                      |
| 1.9053  | 1190    | 0.0034        | -                      |
| 1.9213  | 1200    | 0.0064        | -                      |
| 1.9373  | 1210    | 0.0041        | -                      |
| 1.9533  | 1220    | 0.0032        | -                      |
| 1.9694  | 1230    | 0.0028        | -                      |
| 1.9854  | 1240    | 0.0009        | -                      |
| 2.0     | 1250    | 0.0042        | 0.2488                 |
| 2.0160  | 1260    | 0.0005        | -                      |
| 2.0320  | 1270    | 0.0018        | -                      |
| 2.0481  | 1280    | 0.0009        | -                      |
| 2.0641  | 1290    | 0.001         | -                      |
| 2.0801  | 1300    | 0.0024        | -                      |
| 2.0961  | 1310    | 0.0011        | -                      |
| 2.1122  | 1320    | 0.0008        | -                      |
| 2.1282  | 1330    | 0.0001        | -                      |
| 2.1442  | 1340    | 0.0006        | -                      |
| 2.1602  | 1350    | 0.0005        | -                      |
| 2.1762  | 1360    | 0.0003        | -                      |
| 2.1923  | 1370    | 0.0           | -                      |
| 2.2083  | 1380    | 0.0           | -                      |
| 2.2243  | 1390    | 0.0001        | -                      |
| 2.2403  | 1400    | 0.0001        | -                      |
| 2.2564  | 1410    | 0.0027        | -                      |
| 2.2724  | 1420    | 0.0005        | -                      |
| 2.2884  | 1430    | 0.0007        | -                      |
| 2.3044  | 1440    | 0.0001        | -                      |
| 2.3204  | 1450    | 0.0002        | -                      |
| 2.3365  | 1460    | 0.001         | -                      |
| 2.3525  | 1470    | 0.0003        | -                      |
| 2.3685  | 1480    | 0.001         | -                      |
| 2.3845  | 1490    | 0.0           | -                      |
| 2.4006  | 1500    | 0.0006        | -                      |
| 2.4166  | 1510    | 0.0007        | -                      |
| 2.4326  | 1520    | 0.0007        | -                      |
| 2.4486  | 1530    | 0.0004        | -                      |
| 2.4647  | 1540    | 0.0007        | -                      |
| 2.4807  | 1550    | 0.0012        | -                      |
| 2.4967  | 1560    | 0.0015        | -                      |
| 2.5127  | 1570    | 0.0014        | -                      |
| 2.5287  | 1580    | 0.0005        | -                      |
| 2.5448  | 1590    | 0.0005        | -                      |
| 2.5608  | 1600    | 0.0014        | -                      |
| 2.5768  | 1610    | 0.0016        | -                      |
| 2.5928  | 1620    | 0.0           | -                      |
| 2.6089  | 1630    | 0.0002        | -                      |
| 2.6249  | 1640    | 0.0006        | -                      |
| 2.6409  | 1650    | 0.0002        | -                      |
| 2.6569  | 1660    | 0.0003        | -                      |
| 2.6729  | 1670    | 0.0007        | -                      |
| 2.6890  | 1680    | 0.0005        | -                      |
| 2.7050  | 1690    | 0.0007        | -                      |
| 2.7210  | 1700    | 0.0           | -                      |
| 2.7370  | 1710    | 0.0008        | -                      |
| 2.7531  | 1720    | 0.0019        | -                      |
| 2.7691  | 1730    | 0.0017        | -                      |
| 2.7851  | 1740    | 0.0002        | -                      |
| 2.8011  | 1750    | 0.0002        | -                      |
| 2.8171  | 1760    | 0.0002        | -                      |
| 2.8332  | 1770    | 0.0014        | -                      |
| 2.8492  | 1780    | 0.0005        | -                      |
| 2.8652  | 1790    | 0.0021        | -                      |
| 2.8812  | 1800    | 0.002         | -                      |
| 2.8973  | 1810    | 0.0021        | -                      |
| 2.9133  | 1820    | 0.0007        | -                      |
| 2.9293  | 1830    | 0.0           | -                      |
| 2.9453  | 1840    | 0.0011        | -                      |
| 2.9613  | 1850    | 0.0006        | -                      |
| 2.9774  | 1860    | 0.0008        | -                      |
| 2.9934  | 1870    | 0.0001        | -                      |
| 3.0     | 1875    | -             | 0.2516                 |
| 3.0080  | 1880    | 0.0033        | -                      |
| 3.0240  | 1890    | 0.0           | -                      |
| 3.0401  | 1900    | 0.0           | -                      |
| 3.0561  | 1910    | 0.0009        | -                      |
| 3.0721  | 1920    | 0.0001        | -                      |
| 3.0881  | 1930    | 0.001         | -                      |
| 3.1041  | 1940    | 0.0001        | -                      |
| 3.1202  | 1950    | 0.0001        | -                      |
| 3.1362  | 1960    | 0.0           | -                      |
| 3.1522  | 1970    | 0.0003        | -                      |
| 3.1682  | 1980    | 0.0001        | -                      |
| 3.1843  | 1990    | 0.0005        | -                      |
| 3.2003  | 2000    | 0.0           | -                      |
| 3.2163  | 2010    | 0.0           | -                      |
| 3.2323  | 2020    | 0.0           | -                      |
| 3.2483  | 2030    | 0.0           | -                      |
| 3.2644  | 2040    | 0.0           | -                      |
| 3.2804  | 2050    | 0.0           | -                      |
| 3.2964  | 2060    | 0.0001        | -                      |
| 3.3124  | 2070    | 0.0001        | -                      |
| 3.3285  | 2080    | 0.0           | -                      |
| 3.3445  | 2090    | 0.0001        | -                      |
| 3.3605  | 2100    | 0.0           | -                      |
| 3.3765  | 2110    | 0.0005        | -                      |
| 3.3925  | 2120    | 0.0001        | -                      |
| 3.4086  | 2130    | 0.0           | -                      |
| 3.4246  | 2140    | 0.0           | -                      |
| 3.4406  | 2150    | 0.0004        | -                      |
| 3.4566  | 2160    | 0.0005        | -                      |
| 3.4727  | 2170    | 0.0           | -                      |
| 3.4887  | 2180    | 0.0006        | -                      |
| 3.5047  | 2190    | 0.0002        | -                      |
| 3.5207  | 2200    | 0.0007        | -                      |
| 3.5368  | 2210    | 0.0           | -                      |
| 3.5528  | 2220    | 0.0           | -                      |
| 3.5688  | 2230    | 0.0008        | -                      |
| 3.5848  | 2240    | 0.0001        | -                      |
| 3.6008  | 2250    | 0.0013        | -                      |
| 3.6169  | 2260    | 0.0004        | -                      |
| 3.6329  | 2270    | 0.0006        | -                      |
| 3.6489  | 2280    | 0.0001        | -                      |
| 3.6649  | 2290    | 0.0           | -                      |
| 3.6810  | 2300    | 0.0011        | -                      |
| 3.6970  | 2310    | 0.0005        | -                      |
| 3.7130  | 2320    | 0.0           | -                      |
| 3.7290  | 2330    | 0.0           | -                      |
| 3.7450  | 2340    | 0.0006        | -                      |
| 3.7611  | 2350    | 0.0           | -                      |
| 3.7771  | 2360    | 0.0002        | -                      |
| 3.7931  | 2370    | 0.0006        | -                      |
| 3.8091  | 2380    | 0.0002        | -                      |
| 3.8252  | 2390    | 0.0004        | -                      |
| 3.8412  | 2400    | 0.0           | -                      |
| 3.8572  | 2410    | 0.0007        | -                      |
| 3.8732  | 2420    | 0.0006        | -                      |
| 3.8892  | 2430    | 0.0002        | -                      |
| 3.9053  | 2440    | 0.0009        | -                      |
| 3.9213  | 2450    | 0.0009        | -                      |
| 3.9373  | 2460    | 0.0           | -                      |
| 3.9533  | 2470    | 0.0001        | -                      |
| 3.9694  | 2480    | 0.0012        | -                      |
| 3.9854  | 2490    | 0.0003        | -                      |
| 3.9950  | 2496    | -             | 0.2524                 |
| -1      | -1      | -             | 0.2532                 |

* The bold row denotes the saved checkpoint.
</details>

### Framework Versions
- Python: 3.11.12
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.6.0
- Datasets: 2.14.4
- Tokenizers: 0.21.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### TripletLoss
```bibtex
@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->