RosaMelo's picture
Add new SentenceTransformer model
c806aec verified
---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:79876
- loss:TripletLoss
base_model: Master-thesis-NAP/ModernBert-DAPT-math
widget:
- source_sentence: What is the error estimate for the difference between the exact
solution and the local oscillation decomposition (LOD) solution in terms of the
$L_0$ norm?
sentences:
- '\label{RL1}
The system \eqref{R3} has the following positive fixed points if $0 <\alpha\leq1$
and $b>d$
$$E^*=\left(\dfrac{d}{b}, \dfrac{(b-d) r}{b^2}\right)$$'
- "\\label{theo1d}\nWith the assumptions and setting is this section, the finite\
\ difference solution computed using the improved harmonic average method applied\
\ to \\eqn{eq1d} or \\eqn{eq1dB} has second order convergence in the infinity\
\ norm, that is,\n\\eqm\n \\|\\mathbf{E} \\|_{\\infty}\\le C h^2,\n\\enm\nassuming\
\ that the true solution of \\eqn{eq1d} is piecewise $C^4$ excluding the interface\
\ $\\alf$, that is, \n$u(x) \\in C^4(0,\\alf) \\cup C^4(\\alf,1)$. \n%where $C$\
\ is a generic error constant."
- "\\label{Corollary}\n Let Assumptions~\\ref{assum_1} and~\\ref{assump2} be\
\ satisfied. Let $u$ be the solution of~\\eqref{WeakForm} and let $u_{H,k}$ be\
\ the LOD solution of~\\eqref{local_probelm }. Then we have \n \\begin{equation}\\\
label{L2Estimate}\n \\|u-I_Hu_{H,k}\\|_0\\lesssim \\|u-I_Hu\\|_0+\\|u-u_{H,k}\\\
|_0 +H|u-u_{H,k}|_1.\n \\end{equation}\n %\\[\\|u-I_Hu_{H,k}\\|_0\\lesssim\
\ H |u|_1 +|u-u_{H,k}|_1.\\]"
- source_sentence: What is the expected value of the number of individuals in a Markov
branching process with non-homogeneous Poisson immigration (MBPNPI) at time $t=0$,
given that the immigration rate is $\lambda$?
sentences:
- '\label{lemma-sampling}
Fix an integer~$n\geq 1$.
Consider the initial configuration with one active particle on each
site of~$V_n$ and let the system evolve, with particles being killed
when they jump out of~$V_n$, until no active particle remains
in~$V_n$.
Then the distribution of the resulting stable configuration is exactly
the stationary distribution of the driven-dissipative Markov chain
on~$V_n$.
In particular, the number of sleeping particles remaining in~$V_n$ is
distributed as~$S_n$.'
- "The process $Y(t)$, $t\\geq 0,$ is called Markov branching process with\r\nnon-homogeneous\
\ Poisson immigration (MBPNPI)."
- "For any $\\lambda \\in(0,1)$ and $s \\in\\mathbb N$,\n \\begin{equation*}\n\\\
sum_{k=s}^{\\infty}\\binom {k}{s}\n(1-\\lambda)^{k-s}= \\lambda^{-s-1}.\n\\\
end{equation*}"
- source_sentence: Does the theorem imply that the rate of convergence of the sequence
$T_{m,j}(E)$ to $T_{m+k_n,j+k_n}(E)$ is exponential in the distance between $m$
and $j$, and that this rate is bounded by a constant $C$ times an exponential
decay factor involving the parameter $\gamma$?
sentences:
- "\\label{lem1}\n\t\tFor all $m,j\\in\\Z$,  we have\n\t\t\\begin{equation*}\n\t\
\t|| T_{m,j} (E)-T_{m+k_n,j+k_n}(E)||\\leq C e^{-\\gamma k_n} e^{(\\mathcal\
\ L(E)+\\varepsilon) |m-j|}. \n\t\t\\end{equation*}"
- "[Divergence Theorem or Gauss-Green Theorem for Surfaces in $\\R^3$]\n\t\\label{thm:surface_int}\n\
\t Let $\\Sigma \\subset \\Omega\\subseteq\\R^3$ be a bounded smooth surface.\n\
\t Further, $\\bb a:\\Sigma\\to\\R^3$ is a continuously differentiable\
\ vector field that is either defined on the\n\t\t\t\t\tboundary $\\partial\\\
Sigma$ or has a bounded continuous extension to this boundary.\n\t Like\
\ in \\eqref{eq:decomp} it may be decomposed into tangential and normal components\n\
\t\t\t\t\tas follows $\\bb a = \\bb a^\\shortparallel + a_\\nu\\bs\\nu_\\Sigma$.\
\ By $\\dd l$ we denote the line element on \n\t\t\t\t\tthe curve $\\partial \\\
Sigma$. We assume that the curve is continuous and consists of finitely many\n\
\t\t\t\t\tsmooth pieces.\n\t Then the following divergence formula for\
\ surface integrals holds\n\t %\n\t \\begin{align}\n\t \
\ %\n\t \\int\\limits_\\Sigma \\left[\\nabla_\\Sigma\\cdot\\bb a^\\\
shortparallel\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\partial\\\
Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\,\\dd l .\n\
\t \\label{eq:surface_div}\n\t %\n\t \\end{align}\n\
\t\t\t\t\t%\n\t\t\t\t\tFrom this we obtain the formula\n\t\t\t\t\t%\n\t \
\ \\begin{align}\n\t %\n\t \\int\\limits_\\Sigma \\left[\\\
nabla_\\Sigma\\cdot\\bb a\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\\
partial\\Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\\
,\\dd l \n\t\t\t\t\t\t\t-\\int\\limits_\\Sigma\\left[ 2\\kappa_Ma_\\nu\\right](\\\
x)\\;\\dd S.\n\t \\label{eq:surface_div_2}\n\t %\n\t \
\ \\end{align}\n\t %"
- '\label{theo:helper3}
Assume that $\{\PP_N\}_{N\ge 1}$ is a sequence of probability measures that is
HT-appropriate in the sense of \cref{def:appropriate} and satisfies the LLN in
the sense of \cref{def:LLN}.
Let $(\kappa_n)_{n\ge 1}$ and $(m_n)_{n\ge 1}$ be the sequences that arise from
these definitions.
Moreover, assume that there exists a constant $C>0$ such that $|\kappa_n|\leq
C^n$, for all $n \geq 1$.
Then $(m_n)_{n\ge 1}$ is the sequence of moments of a unique probability measure
on $\R$.'
- source_sentence: What is the error estimate for the eigenfunction approximation
in terms of the weak eigenvalue and the norm of the difference between the exact
and approximate eigenfunctions?
sentences:
- "Consider dynamics \\eqref{avg} and define the corresponding average dynamics\
\ as $\\label{T-avg}\n\\mathring{\\chi} = \\epsilon h_{av}(\\chi)$, with the average\
\ function defined as\n\\begin{equation*} \nh_{av}(\\chi):=\\lim_{T \\to \\infty}\
\ \\frac{1}{T}\\int_{t}^{t+T} h(\\mu, \\chi, 0) d \\mu, \\ T>0,\n\\end{equation*}\n\
both \\eqref{avg} and \\eqref{T-avg} twice differentiable and bounded in every\
\ compact set of the $\\chi$-domain $\\mathcal{D} \\subset \\mathbb{R}^{3}$. \n\
%\nLet $\\chi(\\tau,\\epsilon)$ and $\\chi_{av}(\\epsilon\\tau)$ denote the solutions\
\ of \\eqref{avg} and \\eqref{T-avg}, respectively. If $\\chi_{av}(\\epsilon\\\
tau)\\in \\mathcal{D}$ for all $\\tau\\in[0,\\zeta/\\epsilon]$, $\\zeta\\geq 0$,\
\ and $\\chi(0,\\epsilon) - \\chi_{av}(0)=\\mathcal{O}(\\nu(\\epsilon))$, then\
\ there exists an $\\epsilon^{*}>0$ such that for all $0<\\epsilon<\\epsilon^{*}$,\
\ $\\chi(\\tau,\\epsilon)$ is well defined and\n$$\n\\chi(\\tau,\\epsilon) - \\\
chi_{av}(\\epsilon\\tau) = \\mathcal{O}(\\nu(\\epsilon)) \\ \\textnormal{on} \\\
\ \\tau \\in [0, \\zeta/\\epsilon],\n$$\nfor some function $\\nu\\in \\mathcal{K}$."
- "(\\cite{DangWangXieZhou})\\label{Theorem_Error_Estimate_k}\nLet us define the\
\ spectral projection $F_{k,h}^{(\\ell)}: V\\mapsto {\\rm span}\\{u_{1,h}^{(\\\
ell)}, \\cdots, u_{k,h}^{(\\ell)}\\}$ for any integer $\\ell \\geq 1$ as follows:\n\
\\begin{eqnarray*}\na(F_{k,h}^{(\\ell)}w, u_{i,h}^{(\\ell)}) = a(w, u_{i,h}^{(\\\
ell)}), \\ \\ \\ i=1, \\cdots, k\\ \\ {\\rm for}\\ w\\in V.\n\\end{eqnarray*}\n\
Then the exact eigenfunctions $\\bar u_{1,h},\\cdots, \\bar u_{k,h}$ of (\\ref{Weak_Eigenvalue_Discrete})\
\ and the eigenfunction approximations $u_{1,h}^{(\\ell+1)}$, $\\cdots$, $u_{k,h}^{(\\\
ell+1)}$ from Algorithm \\ref{Algorithm_k} with the integer $\\ell > 1$ have the\
\ following error estimate:\n\\begin{eqnarray*}\\label{Error_Estimate_Inverse}\n\
\ \\left\\|\\bar u_{i,h} - F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\|_a \\leq\n\
\ \\bar\\lambda_{i,h} \\sqrt{1+\\frac{\\eta_a^2(V_H)}{\\bar\\lambda_{1,h}\\big(\\\
delta_{k,i,h}^{(\\ell+1)}\\big)^2}}\n\\left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\\
ell)}}\\right)\\eta_a^2(V_H)\\left\\|\\bar u_{i,h} - F_{k,h}^{(\\ell)}\\bar u_{i,h}\
\ \\right\\|_a,\n\\end{eqnarray*}\nwhere $\\delta_{k,i,h}^{(\\ell)} $ is defined\
\ as follows:\n\\begin{eqnarray*}\n\\delta_{k,i,h}^{(\\ell)} = \\min_{j\\not\\\
in \\{1, \\cdots, k\\}}\\left|\\frac{1}{\\lambda_{j,h}^{(\\ell)}}-\\frac{1}{\\\
bar\\lambda_{i,h}}\\right|,\\ \\ \\ i=1, \\cdots, k.\n\\end{eqnarray*}\nFurthermore,\
\ the following $\\left\\|\\cdot\\right\\|_b$-norm error estimate holds:\n\\begin{eqnarray*}\n\
\\left\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\|_b\\leq \n\\\
left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\ell+1)}}\\right)\\eta_a(V_H)\
\ \\left\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h}\\right\\|_a.\n\\end{eqnarray*}"
- "\\big[{\\bf Condition $SD1(h)$}\\big]\\label{DefnSD1(h)}\n\nIn \\cite{MDL} an\
\ approximation order $O(h^s)$, as $h\\to 0$, is proved, where $h$ is the sampling\
\ distance. The achievable order $s$ is of course limited by the smoothness order\
\ of the boundaries of $Graph(F)$. Then, the order $s$ depends upon the degree\
\ of the polynomials used to approximate the boundary near the neighborhood of\
\ points of topology change and upon the degree of splines used at regular regions.\
\ \n\nFor example, let us view Step C of the approximation algorithm described\
\ in Section 5.2 of \\cite{MDL}. \nIt is assumed that the boundary curves are\
\ $C^{2k}$ smooth, and it is implicitly assumed that $h$ is small enough so that\
\ there are $2k$ sample points close to the point of topology change, for computing\
\ the polynomial $p_{2k-1}$ therein.\nThis condition is related to the more general\
\ condition $SD(h)$ and it can serve as a practical way of checking it for the\
\ case $d=1$. That is, near a point of topology change, we check whether there\
\ are enough sample points for applying the approximation algorithm in \\cite{MDL}.\
\ We denote this condition as the $SD1(h)$ condition."
- source_sentence: Does Werner-Young's inequality imply that the convolution of two
$L^p$ spaces is always $L^r$ for $1 < r < \infty$?
sentences:
- "$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion. If $1\
\ < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition.\
\ \n %"
- "A \\emph{bond system} is a tuple $(B,C,s,t,1,\\cdot)$, where $B$ is a set of\
\ \\emph{bonds}, $C$ is a set of \\emph{content} relations, and $s,t:C\\to B$\
\ are \\emph{source} and \\emph{target} functions. For $c\\in C$ with $s(c)=x$\
\ and $t(c)=y$, we write $x\\xrightarrow{c}y$ or $c:x\\to y$, indicating that\
\ $x$ \\emph{contains} $y$. Each bond $x\\in B$ has an \\emph{identity} containment\
\ $1_x:x\\to x$, meaning every bond trivially contains itself. For $c:x\\to y$\
\ and $c':y\\to z$, their composition is $cc':x\\to z$. These data must satisfy:\n\
\ \\begin{enumerate}\n \\item Identity laws: For each $c:x\\to y$, $1_x\
\ c= c=c1_y$\n \\item Associativity: For $c:x\\to y$, $c':y\\to z$, $c'':z\\\
to w$, $c(c'c'')=(cc')c''$\n \\item Anti-symmetry: For $c:x\\to y$ and\
\ $c':y\\to x$, $x=y$\n \\item Left cancellation: For $c,c':x\\to y$ and\
\ $c'':y\\to z$, if $cc''=c'c''$, then $c=c'$\n \\end{enumerate}"
- "[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\\
in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$\
\ and\n\\begin{align*}\n \\|S\\star T\\|_{L^{r}}\\leq \\|S\\|_{\\cS^p}\\|T\\\
|_{\\cS^q}.\n\\end{align*}"
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: ModernBERT DAPT Embed DAPT Math
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: TESTING
type: TESTING
metrics:
- type: cosine_accuracy@1
value: 0.5679510844485464
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.6324411628980157
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6586294416243654
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.6938163359483156
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.5679510844485464
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.36494385479157054
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.27741116751269035
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.18192201199815417
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.026541702012005317
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.048742014322369596
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.0598887341486898
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.07516536747041261
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.25320633940615317
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.6070309695944213
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.07416668442975916
name: Cosine Map@100
---
# ModernBERT DAPT Embed DAPT Math
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math) <!-- at revision a30384f91d764c272e6b740c256d5581325ea4bb -->
- **Maximum Sequence Length:** 8192 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
- **Language:** en
- **License:** apache-2.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Master-thesis-NAP/ModernBERT-DAPT-Embed-DAPT-Math")
# Run inference
sentences = [
"Does Werner-Young's inequality imply that the convolution of two $L^p$ spaces is always $L^r$ for $1 < r < \\infty$?",
"[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$ and\n\\begin{align*}\n \\|S\\star T\\|_{L^{r}}\\leq \\|S\\|_{\\cS^p}\\|T\\|_{\\cS^q}.\n\\end{align*}",
'$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion. If $1 < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition. \n %',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
## Evaluation
### Metrics
#### Information Retrieval
* Dataset: `TESTING`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.568 |
| cosine_accuracy@3 | 0.6324 |
| cosine_accuracy@5 | 0.6586 |
| cosine_accuracy@10 | 0.6938 |
| cosine_precision@1 | 0.568 |
| cosine_precision@3 | 0.3649 |
| cosine_precision@5 | 0.2774 |
| cosine_precision@10 | 0.1819 |
| cosine_recall@1 | 0.0265 |
| cosine_recall@3 | 0.0487 |
| cosine_recall@5 | 0.0599 |
| cosine_recall@10 | 0.0752 |
| **cosine_ndcg@10** | **0.2532** |
| cosine_mrr@10 | 0.607 |
| cosine_map@100 | 0.0742 |
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 79,876 training samples
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
* Approximate statistics based on the first 1000 samples:
| | anchor | positive | negative |
|:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string | string |
| details | <ul><li>min: 9 tokens</li><li>mean: 38.48 tokens</li><li>max: 142 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 210.43 tokens</li><li>max: 924 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 91.02 tokens</li><li>max: 481 tokens</li></ul> |
* Samples:
| anchor | positive | negative |
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>What is the limit of the proportion of 1's in the sequence $a_n$ as $n$ approaches infinity, given that $0 \leq 3g_n -2n \leq 4$?</code> | <code>Let $g_n$ be the number of $1$'s in the sequence $a_1 a_2 \cdots a_n$.<br>Then <br>\begin{equation}<br>0 \leq 3g_n -2n \leq 4<br>\label{star}<br>\end{equation}<br>for all $n$, and hence<br>$\lim_{n \rightarrow \infty} g_n/n = 2/3$.<br>\label{thm1}</code> | <code>\label{thm:bounds_initial}<br> Let $\seqq{s}$ be a sequence of rank $r$ for which the roots of the characteristic polynomial are all different. Then, for any positive integer $M$, the rank of $\seq{s^M}$ is at most<br> \begin{align*}<br> \rank s^M \leq \binom{M+r-1}{M}.<br> \end{align*}</code> |
| <code>Does the statement of \textbf{ThmConjAreTrue} imply that the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is always equal to $g(d,s)$?</code> | <code>\label{ThmConjAreTrue}<br>Conjectures \ref{Conj1} and \ref{Conj2} are true.<br>As a consequence, <br>if either $d=s \geq 1$ or $d \geq 2s+1 \geq 3$, <br>the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is equal to $g(d,s)$.</code> | <code>[{\cite[Corollary 2.2.2 with $p=3$]{BSY}}]<br> Let $S$ be a non-trivial Severi-Brauer surface over a perfect field $\textbf{k}$. Then $S$ does not contain points of degree $d$, where $d$ is not divisible by $3$. On the other hand $S$ contains a point of degree $3$.</code> |
| <code>\\emph{Is the statement \emph{If $X$ is a compact Hausdorff space, then $X$ is normal}, proven in the first isomorphism theorem for topological groups, or is it a well-known result in topology?}</code> | <code>}<br>\newcommand{\ep}{</code> | <code>\label{prop:coherence}<br> If $X$ is a qcqs scheme, then $RX$ is coherent in the sense that the set of quasi-compact open subsets of $RX$ is closed under finite intersections and forms a basis for the topology of $RX$.</code> |
* Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
```json
{
"distance_metric": "TripletDistanceMetric.COSINE",
"triplet_margin": 0.1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: epoch
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 8
- `learning_rate`: 2e-05
- `num_train_epochs`: 4
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `bf16`: True
- `tf32`: True
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: epoch
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 8
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 4
- `max_steps`: -1
- `lr_scheduler_type`: cosine
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: True
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional
</details>
### Training Logs
<details><summary>Click to expand</summary>
| Epoch | Step | Training Loss | TESTING_cosine_ndcg@10 |
|:-------:|:-------:|:-------------:|:----------------------:|
| 0.0160 | 10 | 1.1162 | - |
| 0.0320 | 20 | 1.0465 | - |
| 0.0481 | 30 | 0.9663 | - |
| 0.0641 | 40 | 0.8758 | - |
| 0.0801 | 50 | 0.8215 | - |
| 0.0961 | 60 | 0.7492 | - |
| 0.1122 | 70 | 0.6356 | - |
| 0.1282 | 80 | 0.3573 | - |
| 0.1442 | 90 | 0.166 | - |
| 0.1602 | 100 | 0.0797 | - |
| 0.1762 | 110 | 0.046 | - |
| 0.1923 | 120 | 0.0419 | - |
| 0.2083 | 130 | 0.025 | - |
| 0.2243 | 140 | 0.0233 | - |
| 0.2403 | 150 | 0.0205 | - |
| 0.2564 | 160 | 0.0142 | - |
| 0.2724 | 170 | 0.017 | - |
| 0.2884 | 180 | 0.0157 | - |
| 0.3044 | 190 | 0.0104 | - |
| 0.3204 | 200 | 0.0126 | - |
| 0.3365 | 210 | 0.019 | - |
| 0.3525 | 220 | 0.0153 | - |
| 0.3685 | 230 | 0.0171 | - |
| 0.3845 | 240 | 0.0124 | - |
| 0.4006 | 250 | 0.01 | - |
| 0.4166 | 260 | 0.0071 | - |
| 0.4326 | 270 | 0.0125 | - |
| 0.4486 | 280 | 0.0096 | - |
| 0.4647 | 290 | 0.0092 | - |
| 0.4807 | 300 | 0.0067 | - |
| 0.4967 | 310 | 0.0069 | - |
| 0.5127 | 320 | 0.0054 | - |
| 0.5287 | 330 | 0.0107 | - |
| 0.5448 | 340 | 0.0115 | - |
| 0.5608 | 350 | 0.0083 | - |
| 0.5768 | 360 | 0.0175 | - |
| 0.5928 | 370 | 0.0162 | - |
| 0.6089 | 380 | 0.0094 | - |
| 0.6249 | 390 | 0.0124 | - |
| 0.6409 | 400 | 0.0078 | - |
| 0.6569 | 410 | 0.014 | - |
| 0.6729 | 420 | 0.0117 | - |
| 0.6890 | 430 | 0.0097 | - |
| 0.7050 | 440 | 0.0094 | - |
| 0.7210 | 450 | 0.0077 | - |
| 0.7370 | 460 | 0.0103 | - |
| 0.7531 | 470 | 0.0099 | - |
| 0.7691 | 480 | 0.0123 | - |
| 0.7851 | 490 | 0.0103 | - |
| 0.8011 | 500 | 0.0098 | - |
| 0.8171 | 510 | 0.0059 | - |
| 0.8332 | 520 | 0.0031 | - |
| 0.8492 | 530 | 0.0075 | - |
| 0.8652 | 540 | 0.0101 | - |
| 0.8812 | 550 | 0.0099 | - |
| 0.8973 | 560 | 0.0098 | - |
| 0.9133 | 570 | 0.0072 | - |
| 0.9293 | 580 | 0.0057 | - |
| 0.9453 | 590 | 0.0074 | - |
| 0.9613 | 600 | 0.0038 | - |
| 0.9774 | 610 | 0.0127 | - |
| 0.9934 | 620 | 0.0098 | - |
| **1.0** | **625** | **-** | **0.2532** |
| 1.0080 | 630 | 0.0064 | - |
| 1.0240 | 640 | 0.0066 | - |
| 1.0401 | 650 | 0.0056 | - |
| 1.0561 | 660 | 0.0031 | - |
| 1.0721 | 670 | 0.0023 | - |
| 1.0881 | 680 | 0.0032 | - |
| 1.1041 | 690 | 0.0021 | - |
| 1.1202 | 700 | 0.0011 | - |
| 1.1362 | 710 | 0.006 | - |
| 1.1522 | 720 | 0.0045 | - |
| 1.1682 | 730 | 0.0041 | - |
| 1.1843 | 740 | 0.0026 | - |
| 1.2003 | 750 | 0.0019 | - |
| 1.2163 | 760 | 0.0058 | - |
| 1.2323 | 770 | 0.0054 | - |
| 1.2483 | 780 | 0.0066 | - |
| 1.2644 | 790 | 0.0033 | - |
| 1.2804 | 800 | 0.004 | - |
| 1.2964 | 810 | 0.0028 | - |
| 1.3124 | 820 | 0.0027 | - |
| 1.3285 | 830 | 0.0017 | - |
| 1.3445 | 840 | 0.0009 | - |
| 1.3605 | 850 | 0.0048 | - |
| 1.3765 | 860 | 0.0037 | - |
| 1.3925 | 870 | 0.0045 | - |
| 1.4086 | 880 | 0.0043 | - |
| 1.4246 | 890 | 0.0046 | - |
| 1.4406 | 900 | 0.0023 | - |
| 1.4566 | 910 | 0.0031 | - |
| 1.4727 | 920 | 0.0027 | - |
| 1.4887 | 930 | 0.0022 | - |
| 1.5047 | 940 | 0.0042 | - |
| 1.5207 | 950 | 0.0026 | - |
| 1.5368 | 960 | 0.0049 | - |
| 1.5528 | 970 | 0.0024 | - |
| 1.5688 | 980 | 0.0019 | - |
| 1.5848 | 990 | 0.0038 | - |
| 1.6008 | 1000 | 0.0036 | - |
| 1.6169 | 1010 | 0.0023 | - |
| 1.6329 | 1020 | 0.0021 | - |
| 1.6489 | 1030 | 0.0011 | - |
| 1.6649 | 1040 | 0.0025 | - |
| 1.6810 | 1050 | 0.0026 | - |
| 1.6970 | 1060 | 0.0034 | - |
| 1.7130 | 1070 | 0.0024 | - |
| 1.7290 | 1080 | 0.0038 | - |
| 1.7450 | 1090 | 0.002 | - |
| 1.7611 | 1100 | 0.0046 | - |
| 1.7771 | 1110 | 0.0003 | - |
| 1.7931 | 1120 | 0.0062 | - |
| 1.8091 | 1130 | 0.0057 | - |
| 1.8252 | 1140 | 0.0012 | - |
| 1.8412 | 1150 | 0.0021 | - |
| 1.8572 | 1160 | 0.0038 | - |
| 1.8732 | 1170 | 0.0024 | - |
| 1.8892 | 1180 | 0.0026 | - |
| 1.9053 | 1190 | 0.0034 | - |
| 1.9213 | 1200 | 0.0064 | - |
| 1.9373 | 1210 | 0.0041 | - |
| 1.9533 | 1220 | 0.0032 | - |
| 1.9694 | 1230 | 0.0028 | - |
| 1.9854 | 1240 | 0.0009 | - |
| 2.0 | 1250 | 0.0042 | 0.2488 |
| 2.0160 | 1260 | 0.0005 | - |
| 2.0320 | 1270 | 0.0018 | - |
| 2.0481 | 1280 | 0.0009 | - |
| 2.0641 | 1290 | 0.001 | - |
| 2.0801 | 1300 | 0.0024 | - |
| 2.0961 | 1310 | 0.0011 | - |
| 2.1122 | 1320 | 0.0008 | - |
| 2.1282 | 1330 | 0.0001 | - |
| 2.1442 | 1340 | 0.0006 | - |
| 2.1602 | 1350 | 0.0005 | - |
| 2.1762 | 1360 | 0.0003 | - |
| 2.1923 | 1370 | 0.0 | - |
| 2.2083 | 1380 | 0.0 | - |
| 2.2243 | 1390 | 0.0001 | - |
| 2.2403 | 1400 | 0.0001 | - |
| 2.2564 | 1410 | 0.0027 | - |
| 2.2724 | 1420 | 0.0005 | - |
| 2.2884 | 1430 | 0.0007 | - |
| 2.3044 | 1440 | 0.0001 | - |
| 2.3204 | 1450 | 0.0002 | - |
| 2.3365 | 1460 | 0.001 | - |
| 2.3525 | 1470 | 0.0003 | - |
| 2.3685 | 1480 | 0.001 | - |
| 2.3845 | 1490 | 0.0 | - |
| 2.4006 | 1500 | 0.0006 | - |
| 2.4166 | 1510 | 0.0007 | - |
| 2.4326 | 1520 | 0.0007 | - |
| 2.4486 | 1530 | 0.0004 | - |
| 2.4647 | 1540 | 0.0007 | - |
| 2.4807 | 1550 | 0.0012 | - |
| 2.4967 | 1560 | 0.0015 | - |
| 2.5127 | 1570 | 0.0014 | - |
| 2.5287 | 1580 | 0.0005 | - |
| 2.5448 | 1590 | 0.0005 | - |
| 2.5608 | 1600 | 0.0014 | - |
| 2.5768 | 1610 | 0.0016 | - |
| 2.5928 | 1620 | 0.0 | - |
| 2.6089 | 1630 | 0.0002 | - |
| 2.6249 | 1640 | 0.0006 | - |
| 2.6409 | 1650 | 0.0002 | - |
| 2.6569 | 1660 | 0.0003 | - |
| 2.6729 | 1670 | 0.0007 | - |
| 2.6890 | 1680 | 0.0005 | - |
| 2.7050 | 1690 | 0.0007 | - |
| 2.7210 | 1700 | 0.0 | - |
| 2.7370 | 1710 | 0.0008 | - |
| 2.7531 | 1720 | 0.0019 | - |
| 2.7691 | 1730 | 0.0017 | - |
| 2.7851 | 1740 | 0.0002 | - |
| 2.8011 | 1750 | 0.0002 | - |
| 2.8171 | 1760 | 0.0002 | - |
| 2.8332 | 1770 | 0.0014 | - |
| 2.8492 | 1780 | 0.0005 | - |
| 2.8652 | 1790 | 0.0021 | - |
| 2.8812 | 1800 | 0.002 | - |
| 2.8973 | 1810 | 0.0021 | - |
| 2.9133 | 1820 | 0.0007 | - |
| 2.9293 | 1830 | 0.0 | - |
| 2.9453 | 1840 | 0.0011 | - |
| 2.9613 | 1850 | 0.0006 | - |
| 2.9774 | 1860 | 0.0008 | - |
| 2.9934 | 1870 | 0.0001 | - |
| 3.0 | 1875 | - | 0.2516 |
| 3.0080 | 1880 | 0.0033 | - |
| 3.0240 | 1890 | 0.0 | - |
| 3.0401 | 1900 | 0.0 | - |
| 3.0561 | 1910 | 0.0009 | - |
| 3.0721 | 1920 | 0.0001 | - |
| 3.0881 | 1930 | 0.001 | - |
| 3.1041 | 1940 | 0.0001 | - |
| 3.1202 | 1950 | 0.0001 | - |
| 3.1362 | 1960 | 0.0 | - |
| 3.1522 | 1970 | 0.0003 | - |
| 3.1682 | 1980 | 0.0001 | - |
| 3.1843 | 1990 | 0.0005 | - |
| 3.2003 | 2000 | 0.0 | - |
| 3.2163 | 2010 | 0.0 | - |
| 3.2323 | 2020 | 0.0 | - |
| 3.2483 | 2030 | 0.0 | - |
| 3.2644 | 2040 | 0.0 | - |
| 3.2804 | 2050 | 0.0 | - |
| 3.2964 | 2060 | 0.0001 | - |
| 3.3124 | 2070 | 0.0001 | - |
| 3.3285 | 2080 | 0.0 | - |
| 3.3445 | 2090 | 0.0001 | - |
| 3.3605 | 2100 | 0.0 | - |
| 3.3765 | 2110 | 0.0005 | - |
| 3.3925 | 2120 | 0.0001 | - |
| 3.4086 | 2130 | 0.0 | - |
| 3.4246 | 2140 | 0.0 | - |
| 3.4406 | 2150 | 0.0004 | - |
| 3.4566 | 2160 | 0.0005 | - |
| 3.4727 | 2170 | 0.0 | - |
| 3.4887 | 2180 | 0.0006 | - |
| 3.5047 | 2190 | 0.0002 | - |
| 3.5207 | 2200 | 0.0007 | - |
| 3.5368 | 2210 | 0.0 | - |
| 3.5528 | 2220 | 0.0 | - |
| 3.5688 | 2230 | 0.0008 | - |
| 3.5848 | 2240 | 0.0001 | - |
| 3.6008 | 2250 | 0.0013 | - |
| 3.6169 | 2260 | 0.0004 | - |
| 3.6329 | 2270 | 0.0006 | - |
| 3.6489 | 2280 | 0.0001 | - |
| 3.6649 | 2290 | 0.0 | - |
| 3.6810 | 2300 | 0.0011 | - |
| 3.6970 | 2310 | 0.0005 | - |
| 3.7130 | 2320 | 0.0 | - |
| 3.7290 | 2330 | 0.0 | - |
| 3.7450 | 2340 | 0.0006 | - |
| 3.7611 | 2350 | 0.0 | - |
| 3.7771 | 2360 | 0.0002 | - |
| 3.7931 | 2370 | 0.0006 | - |
| 3.8091 | 2380 | 0.0002 | - |
| 3.8252 | 2390 | 0.0004 | - |
| 3.8412 | 2400 | 0.0 | - |
| 3.8572 | 2410 | 0.0007 | - |
| 3.8732 | 2420 | 0.0006 | - |
| 3.8892 | 2430 | 0.0002 | - |
| 3.9053 | 2440 | 0.0009 | - |
| 3.9213 | 2450 | 0.0009 | - |
| 3.9373 | 2460 | 0.0 | - |
| 3.9533 | 2470 | 0.0001 | - |
| 3.9694 | 2480 | 0.0012 | - |
| 3.9854 | 2490 | 0.0003 | - |
| 3.9950 | 2496 | - | 0.2524 |
| -1 | -1 | - | 0.2532 |
* The bold row denotes the saved checkpoint.
</details>
### Framework Versions
- Python: 3.11.12
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.6.0
- Datasets: 2.14.4
- Tokenizers: 0.21.1
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### TripletLoss
```bibtex
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->