Add new SentenceTransformer model

c806aec verified 10 months ago

46.2 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:79876
	- loss:TripletLoss
	base_model: Master-thesis-NAP/ModernBert-DAPT-math
	widget:
	- source_sentence: What is the error estimate for the difference between the exact
	solution and the local oscillation decomposition (LOD) solution in terms of the
	$L_0$ norm?
	sentences:
	- '\label{RL1}

	The system \eqref{R3} has the following positive fixed points if $0 <\alpha\leq1$
	and $b>d$

	$$E^*=\left(\dfrac{d}{b}, \dfrac{(b-d) r}{b^2}\right)$$'
	- "\\label{theo1d}\nWith the assumptions and setting is this section, the finite\
	\ difference solution computed using the improved harmonic average method applied\
	\ to \\eqn{eq1d} or \\eqn{eq1dB} has second order convergence in the infinity\
	\ norm, that is,\n\\eqm\n \\\|\\mathbf{E} \\\|_{\\infty}\\le C h^2,\n\\enm\nassuming\
	\ that the true solution of \\eqn{eq1d} is piecewise $C^4$ excluding the interface\
	\ $\\alf$, that is, \n$u(x) \\in C^4(0,\\alf) \\cup C^4(\\alf,1)$. \n%where $C$\
	\ is a generic error constant."
	- "\\label{Corollary}\n Let Assumptions~\\ref{assum_1} and~\\ref{assump2} be\
	\ satisfied. Let $u$ be the solution of~\\eqref{WeakForm} and let $u_{H,k}$ be\
	\ the LOD solution of~\\eqref{local_probelm }. Then we have \n \\begin{equation}\\\
	label{L2Estimate}\n \\\|u-I_Hu_{H,k}\\\|_0\\lesssim \\\|u-I_Hu\\\|_0+\\\|u-u_{H,k}\\\
	\|_0 +H\|u-u_{H,k}\|_1.\n \\end{equation}\n %\\[\\\|u-I_Hu_{H,k}\\\|_0\\lesssim\
	\ H \|u\|_1 +\|u-u_{H,k}\|_1.\\]"
	- source_sentence: What is the expected value of the number of individuals in a Markov
	branching process with non-homogeneous Poisson immigration (MBPNPI) at time $t=0$,
	given that the immigration rate is $\lambda$?
	sentences:
	- '\label{lemma-sampling}

	Fix an integer~$n\geq 1$.

	Consider the initial configuration with one active particle on each

	site of~$V_n$ and let the system evolve, with particles being killed

	when they jump out of~$V_n$, until no active particle remains

	in~$V_n$.

	Then the distribution of the resulting stable configuration is exactly

	the stationary distribution of the driven-dissipative Markov chain

	on~$V_n$.

	In particular, the number of sleeping particles remaining in~$V_n$ is

	distributed as~$S_n$.'
	- "The process $Y(t)$, $t\\geq 0,$ is called Markov branching process with\r\nnon-homogeneous\
	\ Poisson immigration (MBPNPI)."
	- "For any $\\lambda \\in(0,1)$ and $s \\in\\mathbb N$,\n \\begin{equation*}\n\\\
	sum_{k=s}^{\\infty}\\binom {k}{s}\n(1-\\lambda)^{k-s}= \\lambda^{-s-1}.\n\\\
	end{equation*}"
	- source_sentence: Does the theorem imply that the rate of convergence of the sequence
	$T_{m,j}(E)$ to $T_{m+k_n,j+k_n}(E)$ is exponential in the distance between $m$
	and $j$, and that this rate is bounded by a constant $C$ times an exponential
	decay factor involving the parameter $\gamma$?
	sentences:
	- "\\label{lem1}\n\t\tFor all $m,j\\in\\Z$, we have\n\t\t\\begin{equation*}\n\t\
	\t\|\| T_{m,j} (E)-T_{m+k_n,j+k_n}(E)\|\|\\leq C e^{-\\gamma k_n} e^{(\\mathcal\
	\ L(E)+\\varepsilon) \|m-j\|}. \n\t\t\\end{equation*}"
	- "[Divergence Theorem or Gauss-Green Theorem for Surfaces in $\\R^3$]\n\t\\label{thm:surface_int}\n\
	\t Let $\\Sigma \\subset \\Omega\\subseteq\\R^3$ be a bounded smooth surface.\n\
	\t Further, $\\bb a:\\Sigma\\to\\R^3$ is a continuously differentiable\
	\ vector field that is either defined on the\n\t\t\t\t\tboundary $\\partial\\\
	Sigma$ or has a bounded continuous extension to this boundary.\n\t Like\
	\ in \\eqref{eq:decomp} it may be decomposed into tangential and normal components\n\
	\t\t\t\t\tas follows $\\bb a = \\bb a^\\shortparallel + a_\\nu\\bs\\nu_\\Sigma$.\
	\ By $\\dd l$ we denote the line element on \n\t\t\t\t\tthe curve $\\partial \\\
	Sigma$. We assume that the curve is continuous and consists of finitely many\n\
	\t\t\t\t\tsmooth pieces.\n\t Then the following divergence formula for\
	\ surface integrals holds\n\t %\n\t \\begin{align}\n\t \
	\ %\n\t \\int\\limits_\\Sigma \\left[\\nabla_\\Sigma\\cdot\\bb a^\\\
	shortparallel\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\partial\\\
	Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\,\\dd l .\n\
	\t \\label{eq:surface_div}\n\t %\n\t \\end{align}\n\
	\t\t\t\t\t%\n\t\t\t\t\tFrom this we obtain the formula\n\t\t\t\t\t%\n\t \
	\ \\begin{align}\n\t %\n\t \\int\\limits_\\Sigma \\left[\\\
	nabla_\\Sigma\\cdot\\bb a\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\\
	partial\\Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\\
	,\\dd l \n\t\t\t\t\t\t\t-\\int\\limits_\\Sigma\\left[ 2\\kappa_Ma_\\nu\\right](\\\
	x)\\;\\dd S.\n\t \\label{eq:surface_div_2}\n\t %\n\t \
	\ \\end{align}\n\t %"
	- '\label{theo:helper3}

	Assume that $\{\PP_N\}_{N\ge 1}$ is a sequence of probability measures that is
	HT-appropriate in the sense of \cref{def:appropriate} and satisfies the LLN in
	the sense of \cref{def:LLN}.

	Let $(\kappa_n)_{n\ge 1}$ and $(m_n)_{n\ge 1}$ be the sequences that arise from
	these definitions.

	Moreover, assume that there exists a constant $C>0$ such that $\|\kappa_n\|\leq
	C^n$, for all $n \geq 1$.

	Then $(m_n)_{n\ge 1}$ is the sequence of moments of a unique probability measure
	on $\R$.'
	- source_sentence: What is the error estimate for the eigenfunction approximation
	in terms of the weak eigenvalue and the norm of the difference between the exact
	and approximate eigenfunctions?
	sentences:
	- "Consider dynamics \\eqref{avg} and define the corresponding average dynamics\
	\ as $\\label{T-avg}\n\\mathring{\\chi} = \\epsilon h_{av}(\\chi)$, with the average\
	\ function defined as\n\\begin{equation*} \nh_{av}(\\chi):=\\lim_{T \\to \\infty}\
	\ \\frac{1}{T}\\int_{t}^{t+T} h(\\mu, \\chi, 0) d \\mu, \\ T>0,\n\\end{equation*}\n\
	both \\eqref{avg} and \\eqref{T-avg} twice differentiable and bounded in every\
	\ compact set of the $\\chi$-domain $\\mathcal{D} \\subset \\mathbb{R}^{3}$. \n\
	%\nLet $\\chi(\\tau,\\epsilon)$ and $\\chi_{av}(\\epsilon\\tau)$ denote the solutions\
	\ of \\eqref{avg} and \\eqref{T-avg}, respectively. If $\\chi_{av}(\\epsilon\\\
	tau)\\in \\mathcal{D}$ for all $\\tau\\in[0,\\zeta/\\epsilon]$, $\\zeta\\geq 0$,\
	\ and $\\chi(0,\\epsilon) - \\chi_{av}(0)=\\mathcal{O}(\\nu(\\epsilon))$, then\
	\ there exists an $\\epsilon^{}>0$ such that for all $0<\\epsilon<\\epsilon^{}$,\
	\ $\\chi(\\tau,\\epsilon)$ is well defined and\n$$\n\\chi(\\tau,\\epsilon) - \\\
	chi_{av}(\\epsilon\\tau) = \\mathcal{O}(\\nu(\\epsilon)) \\ \\textnormal{on} \\\
	\ \\tau \\in [0, \\zeta/\\epsilon],\n$$\nfor some function $\\nu\\in \\mathcal{K}$."
	- "(\\cite{DangWangXieZhou})\\label{Theorem_Error_Estimate_k}\nLet us define the\
	\ spectral projection $F_{k,h}^{(\\ell)}: V\\mapsto {\\rm span}\\{u_{1,h}^{(\\\
	ell)}, \\cdots, u_{k,h}^{(\\ell)}\\}$ for any integer $\\ell \\geq 1$ as follows:\n\
	\\begin{eqnarray*}\na(F_{k,h}^{(\\ell)}w, u_{i,h}^{(\\ell)}) = a(w, u_{i,h}^{(\\\
	ell)}), \\ \\ \\ i=1, \\cdots, k\\ \\ {\\rm for}\\ w\\in V.\n\\end{eqnarray*}\n\
	Then the exact eigenfunctions $\\bar u_{1,h},\\cdots, \\bar u_{k,h}$ of (\\ref{Weak_Eigenvalue_Discrete})\
	\ and the eigenfunction approximations $u_{1,h}^{(\\ell+1)}$, $\\cdots$, $u_{k,h}^{(\\\
	ell+1)}$ from Algorithm \\ref{Algorithm_k} with the integer $\\ell > 1$ have the\
	\ following error estimate:\n\\begin{eqnarray*}\\label{Error_Estimate_Inverse}\n\
	\ \\left\\\|\\bar u_{i,h} - F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\\|_a \\leq\n\
	\ \\bar\\lambda_{i,h} \\sqrt{1+\\frac{\\eta_a^2(V_H)}{\\bar\\lambda_{1,h}\\big(\\\
	delta_{k,i,h}^{(\\ell+1)}\\big)^2}}\n\\left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\\
	ell)}}\\right)\\eta_a^2(V_H)\\left\\\|\\bar u_{i,h} - F_{k,h}^{(\\ell)}\\bar u_{i,h}\
	\ \\right\\\|_a,\n\\end{eqnarray*}\nwhere $\\delta_{k,i,h}^{(\\ell)} $ is defined\
	\ as follows:\n\\begin{eqnarray*}\n\\delta_{k,i,h}^{(\\ell)} = \\min_{j\\not\\\
	in \\{1, \\cdots, k\\}}\\left\|\\frac{1}{\\lambda_{j,h}^{(\\ell)}}-\\frac{1}{\\\
	bar\\lambda_{i,h}}\\right\|,\\ \\ \\ i=1, \\cdots, k.\n\\end{eqnarray*}\nFurthermore,\
	\ the following $\\left\\\|\\cdot\\right\\\|_b$-norm error estimate holds:\n\\begin{eqnarray*}\n\
	\\left\\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\\|_b\\leq \n\\\
	left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\ell+1)}}\\right)\\eta_a(V_H)\
	\ \\left\\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h}\\right\\\|_a.\n\\end{eqnarray*}"
	- "\\big[{\\bf Condition $SD1(h)$}\\big]\\label{DefnSD1(h)}\n\nIn \\cite{MDL} an\
	\ approximation order $O(h^s)$, as $h\\to 0$, is proved, where $h$ is the sampling\
	\ distance. The achievable order $s$ is of course limited by the smoothness order\
	\ of the boundaries of $Graph(F)$. Then, the order $s$ depends upon the degree\
	\ of the polynomials used to approximate the boundary near the neighborhood of\
	\ points of topology change and upon the degree of splines used at regular regions.\
	\ \n\nFor example, let us view Step C of the approximation algorithm described\
	\ in Section 5.2 of \\cite{MDL}. \nIt is assumed that the boundary curves are\
	\ $C^{2k}$ smooth, and it is implicitly assumed that $h$ is small enough so that\
	\ there are $2k$ sample points close to the point of topology change, for computing\
	\ the polynomial $p_{2k-1}$ therein.\nThis condition is related to the more general\
	\ condition $SD(h)$ and it can serve as a practical way of checking it for the\
	\ case $d=1$. That is, near a point of topology change, we check whether there\
	\ are enough sample points for applying the approximation algorithm in \\cite{MDL}.\
	\ We denote this condition as the $SD1(h)$ condition."
	- source_sentence: Does Werner-Young's inequality imply that the convolution of two
	$L^p$ spaces is always $L^r$ for $1 < r < \infty$?
	sentences:
	- "$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion. If $1\
	\ < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition.\
	\ \n %"
	- "A \\emph{bond system} is a tuple $(B,C,s,t,1,\\cdot)$, where $B$ is a set of\
	\ \\emph{bonds}, $C$ is a set of \\emph{content} relations, and $s,t:C\\to B$\
	\ are \\emph{source} and \\emph{target} functions. For $c\\in C$ with $s(c)=x$\
	\ and $t(c)=y$, we write $x\\xrightarrow{c}y$ or $c:x\\to y$, indicating that\
	\ $x$ \\emph{contains} $y$. Each bond $x\\in B$ has an \\emph{identity} containment\
	\ $1_x:x\\to x$, meaning every bond trivially contains itself. For $c:x\\to y$\
	\ and $c':y\\to z$, their composition is $cc':x\\to z$. These data must satisfy:\n\
	\ \\begin{enumerate}\n \\item Identity laws: For each $c:x\\to y$, $1_x\
	\ c= c=c1_y$\n \\item Associativity: For $c:x\\to y$, $c':y\\to z$, $c'':z\\\
	to w$, $c(c'c'')=(cc')c''$\n \\item Anti-symmetry: For $c:x\\to y$ and\
	\ $c':y\\to x$, $x=y$\n \\item Left cancellation: For $c,c':x\\to y$ and\
	\ $c'':y\\to z$, if $cc''=c'c''$, then $c=c'$\n \\end{enumerate}"
	- "[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\\
	in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$\
	\ and\n\\begin{align*}\n \\\|S\\star T\\\|_{L^{r}}\\leq \\\|S\\\|_{\\cS^p}\\\|T\\\
	\|_{\\cS^q}.\n\\end{align*}"
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	model-index:
	- name: ModernBERT DAPT Embed DAPT Math
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: TESTING
	type: TESTING
	metrics:
	- type: cosine_accuracy@1
	value: 0.5679510844485464
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.6324411628980157
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.6586294416243654
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.6938163359483156
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.5679510844485464
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.36494385479157054
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.27741116751269035
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.18192201199815417
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.026541702012005317
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.048742014322369596
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.0598887341486898
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.07516536747041261
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.25320633940615317
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.6070309695944213
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.07416668442975916
	name: Cosine Map@100
	---

	# ModernBERT DAPT Embed DAPT Math

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math) <!-- at revision a30384f91d764c272e6b740c256d5581325ea4bb -->
	- Maximum Sequence Length: 8192 tokens
	- Output Dimensionality: 768 dimensions
	- Similarity Function: Cosine Similarity
	<!-- - Training Dataset: Unknown -->
	- Language: en
	- License: apache-2.0

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("Master-thesis-NAP/ModernBERT-DAPT-Embed-DAPT-Math")
	# Run inference
	sentences = [
	"Does Werner-Young's inequality imply that the convolution of two $L^p$ spaces is always $L^r$ for $1 < r < \\infty$?",
	"[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$ and\n\\begin{align}\n \\\|S\\star T\\\|_{L^{r}}\\leq \\\|S\\\|_{\\cS^p}\\\|T\\\|_{\\cS^q}.\n\\end{align}",
	'$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion. If $1 < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition. \n %',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval

	* Dataset: `TESTING`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.568 \|
	\| cosine_accuracy@3 \| 0.6324 \|
	\| cosine_accuracy@5 \| 0.6586 \|
	\| cosine_accuracy@10 \| 0.6938 \|
	\| cosine_precision@1 \| 0.568 \|
	\| cosine_precision@3 \| 0.3649 \|
	\| cosine_precision@5 \| 0.2774 \|
	\| cosine_precision@10 \| 0.1819 \|
	\| cosine_recall@1 \| 0.0265 \|
	\| cosine_recall@3 \| 0.0487 \|
	\| cosine_recall@5 \| 0.0599 \|
	\| cosine_recall@10 \| 0.0752 \|
	\| cosine_ndcg@10 \| 0.2532 \|
	\| cosine_mrr@10 \| 0.607 \|
	\| cosine_map@100 \| 0.0742 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### Unnamed Dataset

	* Size: 79,876 training samples
	* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| anchor \| positive \| negative \|
	\|:--------\|:-----------------------------------------------------------------------------------\|:------------------------------------------------------------------------------------\|:------------------------------------------------------------------------------------\|
	\| type \| string \| string \| string \|
	\| details \| <ul><li>min: 9 tokens</li><li>mean: 38.48 tokens</li><li>max: 142 tokens</li></ul> \| <ul><li>min: 5 tokens</li><li>mean: 210.43 tokens</li><li>max: 924 tokens</li></ul> \| <ul><li>min: 14 tokens</li><li>mean: 91.02 tokens</li><li>max: 481 tokens</li></ul> \|
	* Samples:
	\| anchor \| positive \| negative \|
	\|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>What is the limit of the proportion of 1's in the sequence $a_n$ as $n$ approaches infinity, given that $0 \leq 3g_n -2n \leq 4$?</code> \| <code>Let $g_n$ be the number of $1$'s in the sequence $a_1 a_2 \cdots a_n$.<br>Then <br>\begin{equation}<br>0 \leq 3g_n -2n \leq 4<br>\label{star}<br>\end{equation}<br>for all $n$, and hence<br>$\lim_{n \rightarrow \infty} g_n/n = 2/3$.<br>\label{thm1}</code> \| <code>\label{thm:bounds_initial}<br> Let $\seqq{s}$ be a sequence of rank $r$ for which the roots of the characteristic polynomial are all different. Then, for any positive integer $M$, the rank of $\seq{s^M}$ is at most<br> \begin{align}<br> \rank s^M \leq \binom{M+r-1}{M}.<br> \end{align}</code> \|
	\| <code>Does the statement of \textbf{ThmConjAreTrue} imply that the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is always equal to $g(d,s)$?</code> \| <code>\label{ThmConjAreTrue}<br>Conjectures \ref{Conj1} and \ref{Conj2} are true.<br>As a consequence, <br>if either $d=s \geq 1$ or $d \geq 2s+1 \geq 3$, <br>the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is equal to $g(d,s)$.</code> \| <code>[{\cite[Corollary 2.2.2 with $p=3$]{BSY}}]<br> Let $S$ be a non-trivial Severi-Brauer surface over a perfect field $\textbf{k}$. Then $S$ does not contain points of degree $d$, where $d$ is not divisible by $3$. On the other hand $S$ contains a point of degree $3$.</code> \|
	\| <code>\\emph{Is the statement \emph{If $X$ is a compact Hausdorff space, then $X$ is normal}, proven in the first isomorphism theorem for topological groups, or is it a well-known result in topology?}</code> \| <code>}<br>\newcommand{\ep}{</code> \| <code>\label{prop:coherence}<br> If $X$ is a qcqs scheme, then $RX$ is coherent in the sense that the set of quasi-compact open subsets of $RX$ is closed under finite intersections and forms a basis for the topology of $RX$.</code> \|
	* Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
	```json
	{
	"distance_metric": "TripletDistanceMetric.COSINE",
	"triplet_margin": 0.1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: epoch
	- `per_device_train_batch_size`: 16
	- `per_device_eval_batch_size`: 16
	- `gradient_accumulation_steps`: 8
	- `learning_rate`: 2e-05
	- `num_train_epochs`: 4
	- `lr_scheduler_type`: cosine
	- `warmup_ratio`: 0.1
	- `bf16`: True
	- `tf32`: True
	- `load_best_model_at_end`: True
	- `optim`: adamw_torch_fused
	- `batch_sampler`: no_duplicates

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: epoch
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 16
	- `per_device_eval_batch_size`: 16
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 8
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 2e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1.0
	- `num_train_epochs`: 4
	- `max_steps`: -1
	- `lr_scheduler_type`: cosine
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.1
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: True
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: True
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: True
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `tp_size`: 0
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch_fused
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: False
	- `prompts`: None
	- `batch_sampler`: no_duplicates
	- `multi_dataset_batch_sampler`: proportional

	</details>

	### Training Logs
	<details><summary>Click to expand</summary>

	\| Epoch \| Step \| Training Loss \| TESTING_cosine_ndcg@10 \|
	\|:-------:\|:-------:\|:-------------:\|:----------------------:\|
	\| 0.0160 \| 10 \| 1.1162 \| - \|
	\| 0.0320 \| 20 \| 1.0465 \| - \|
	\| 0.0481 \| 30 \| 0.9663 \| - \|
	\| 0.0641 \| 40 \| 0.8758 \| - \|
	\| 0.0801 \| 50 \| 0.8215 \| - \|
	\| 0.0961 \| 60 \| 0.7492 \| - \|
	\| 0.1122 \| 70 \| 0.6356 \| - \|
	\| 0.1282 \| 80 \| 0.3573 \| - \|
	\| 0.1442 \| 90 \| 0.166 \| - \|
	\| 0.1602 \| 100 \| 0.0797 \| - \|
	\| 0.1762 \| 110 \| 0.046 \| - \|
	\| 0.1923 \| 120 \| 0.0419 \| - \|
	\| 0.2083 \| 130 \| 0.025 \| - \|
	\| 0.2243 \| 140 \| 0.0233 \| - \|
	\| 0.2403 \| 150 \| 0.0205 \| - \|
	\| 0.2564 \| 160 \| 0.0142 \| - \|
	\| 0.2724 \| 170 \| 0.017 \| - \|
	\| 0.2884 \| 180 \| 0.0157 \| - \|
	\| 0.3044 \| 190 \| 0.0104 \| - \|
	\| 0.3204 \| 200 \| 0.0126 \| - \|
	\| 0.3365 \| 210 \| 0.019 \| - \|
	\| 0.3525 \| 220 \| 0.0153 \| - \|
	\| 0.3685 \| 230 \| 0.0171 \| - \|
	\| 0.3845 \| 240 \| 0.0124 \| - \|
	\| 0.4006 \| 250 \| 0.01 \| - \|
	\| 0.4166 \| 260 \| 0.0071 \| - \|
	\| 0.4326 \| 270 \| 0.0125 \| - \|
	\| 0.4486 \| 280 \| 0.0096 \| - \|
	\| 0.4647 \| 290 \| 0.0092 \| - \|
	\| 0.4807 \| 300 \| 0.0067 \| - \|
	\| 0.4967 \| 310 \| 0.0069 \| - \|
	\| 0.5127 \| 320 \| 0.0054 \| - \|
	\| 0.5287 \| 330 \| 0.0107 \| - \|
	\| 0.5448 \| 340 \| 0.0115 \| - \|
	\| 0.5608 \| 350 \| 0.0083 \| - \|
	\| 0.5768 \| 360 \| 0.0175 \| - \|
	\| 0.5928 \| 370 \| 0.0162 \| - \|
	\| 0.6089 \| 380 \| 0.0094 \| - \|
	\| 0.6249 \| 390 \| 0.0124 \| - \|
	\| 0.6409 \| 400 \| 0.0078 \| - \|
	\| 0.6569 \| 410 \| 0.014 \| - \|
	\| 0.6729 \| 420 \| 0.0117 \| - \|
	\| 0.6890 \| 430 \| 0.0097 \| - \|
	\| 0.7050 \| 440 \| 0.0094 \| - \|
	\| 0.7210 \| 450 \| 0.0077 \| - \|
	\| 0.7370 \| 460 \| 0.0103 \| - \|
	\| 0.7531 \| 470 \| 0.0099 \| - \|
	\| 0.7691 \| 480 \| 0.0123 \| - \|
	\| 0.7851 \| 490 \| 0.0103 \| - \|
	\| 0.8011 \| 500 \| 0.0098 \| - \|
	\| 0.8171 \| 510 \| 0.0059 \| - \|
	\| 0.8332 \| 520 \| 0.0031 \| - \|
	\| 0.8492 \| 530 \| 0.0075 \| - \|
	\| 0.8652 \| 540 \| 0.0101 \| - \|
	\| 0.8812 \| 550 \| 0.0099 \| - \|
	\| 0.8973 \| 560 \| 0.0098 \| - \|
	\| 0.9133 \| 570 \| 0.0072 \| - \|
	\| 0.9293 \| 580 \| 0.0057 \| - \|
	\| 0.9453 \| 590 \| 0.0074 \| - \|
	\| 0.9613 \| 600 \| 0.0038 \| - \|
	\| 0.9774 \| 610 \| 0.0127 \| - \|
	\| 0.9934 \| 620 \| 0.0098 \| - \|
	\| 1.0 \| 625 \| - \| 0.2532 \|
	\| 1.0080 \| 630 \| 0.0064 \| - \|
	\| 1.0240 \| 640 \| 0.0066 \| - \|
	\| 1.0401 \| 650 \| 0.0056 \| - \|
	\| 1.0561 \| 660 \| 0.0031 \| - \|
	\| 1.0721 \| 670 \| 0.0023 \| - \|
	\| 1.0881 \| 680 \| 0.0032 \| - \|
	\| 1.1041 \| 690 \| 0.0021 \| - \|
	\| 1.1202 \| 700 \| 0.0011 \| - \|
	\| 1.1362 \| 710 \| 0.006 \| - \|
	\| 1.1522 \| 720 \| 0.0045 \| - \|
	\| 1.1682 \| 730 \| 0.0041 \| - \|
	\| 1.1843 \| 740 \| 0.0026 \| - \|
	\| 1.2003 \| 750 \| 0.0019 \| - \|
	\| 1.2163 \| 760 \| 0.0058 \| - \|
	\| 1.2323 \| 770 \| 0.0054 \| - \|
	\| 1.2483 \| 780 \| 0.0066 \| - \|
	\| 1.2644 \| 790 \| 0.0033 \| - \|
	\| 1.2804 \| 800 \| 0.004 \| - \|
	\| 1.2964 \| 810 \| 0.0028 \| - \|
	\| 1.3124 \| 820 \| 0.0027 \| - \|
	\| 1.3285 \| 830 \| 0.0017 \| - \|
	\| 1.3445 \| 840 \| 0.0009 \| - \|
	\| 1.3605 \| 850 \| 0.0048 \| - \|
	\| 1.3765 \| 860 \| 0.0037 \| - \|
	\| 1.3925 \| 870 \| 0.0045 \| - \|
	\| 1.4086 \| 880 \| 0.0043 \| - \|
	\| 1.4246 \| 890 \| 0.0046 \| - \|
	\| 1.4406 \| 900 \| 0.0023 \| - \|
	\| 1.4566 \| 910 \| 0.0031 \| - \|
	\| 1.4727 \| 920 \| 0.0027 \| - \|
	\| 1.4887 \| 930 \| 0.0022 \| - \|
	\| 1.5047 \| 940 \| 0.0042 \| - \|
	\| 1.5207 \| 950 \| 0.0026 \| - \|
	\| 1.5368 \| 960 \| 0.0049 \| - \|
	\| 1.5528 \| 970 \| 0.0024 \| - \|
	\| 1.5688 \| 980 \| 0.0019 \| - \|
	\| 1.5848 \| 990 \| 0.0038 \| - \|
	\| 1.6008 \| 1000 \| 0.0036 \| - \|
	\| 1.6169 \| 1010 \| 0.0023 \| - \|
	\| 1.6329 \| 1020 \| 0.0021 \| - \|
	\| 1.6489 \| 1030 \| 0.0011 \| - \|
	\| 1.6649 \| 1040 \| 0.0025 \| - \|
	\| 1.6810 \| 1050 \| 0.0026 \| - \|
	\| 1.6970 \| 1060 \| 0.0034 \| - \|
	\| 1.7130 \| 1070 \| 0.0024 \| - \|
	\| 1.7290 \| 1080 \| 0.0038 \| - \|
	\| 1.7450 \| 1090 \| 0.002 \| - \|
	\| 1.7611 \| 1100 \| 0.0046 \| - \|
	\| 1.7771 \| 1110 \| 0.0003 \| - \|
	\| 1.7931 \| 1120 \| 0.0062 \| - \|
	\| 1.8091 \| 1130 \| 0.0057 \| - \|
	\| 1.8252 \| 1140 \| 0.0012 \| - \|
	\| 1.8412 \| 1150 \| 0.0021 \| - \|
	\| 1.8572 \| 1160 \| 0.0038 \| - \|
	\| 1.8732 \| 1170 \| 0.0024 \| - \|
	\| 1.8892 \| 1180 \| 0.0026 \| - \|
	\| 1.9053 \| 1190 \| 0.0034 \| - \|
	\| 1.9213 \| 1200 \| 0.0064 \| - \|
	\| 1.9373 \| 1210 \| 0.0041 \| - \|
	\| 1.9533 \| 1220 \| 0.0032 \| - \|
	\| 1.9694 \| 1230 \| 0.0028 \| - \|
	\| 1.9854 \| 1240 \| 0.0009 \| - \|
	\| 2.0 \| 1250 \| 0.0042 \| 0.2488 \|
	\| 2.0160 \| 1260 \| 0.0005 \| - \|
	\| 2.0320 \| 1270 \| 0.0018 \| - \|
	\| 2.0481 \| 1280 \| 0.0009 \| - \|
	\| 2.0641 \| 1290 \| 0.001 \| - \|
	\| 2.0801 \| 1300 \| 0.0024 \| - \|
	\| 2.0961 \| 1310 \| 0.0011 \| - \|
	\| 2.1122 \| 1320 \| 0.0008 \| - \|
	\| 2.1282 \| 1330 \| 0.0001 \| - \|
	\| 2.1442 \| 1340 \| 0.0006 \| - \|
	\| 2.1602 \| 1350 \| 0.0005 \| - \|
	\| 2.1762 \| 1360 \| 0.0003 \| - \|
	\| 2.1923 \| 1370 \| 0.0 \| - \|
	\| 2.2083 \| 1380 \| 0.0 \| - \|
	\| 2.2243 \| 1390 \| 0.0001 \| - \|
	\| 2.2403 \| 1400 \| 0.0001 \| - \|
	\| 2.2564 \| 1410 \| 0.0027 \| - \|
	\| 2.2724 \| 1420 \| 0.0005 \| - \|
	\| 2.2884 \| 1430 \| 0.0007 \| - \|
	\| 2.3044 \| 1440 \| 0.0001 \| - \|
	\| 2.3204 \| 1450 \| 0.0002 \| - \|
	\| 2.3365 \| 1460 \| 0.001 \| - \|
	\| 2.3525 \| 1470 \| 0.0003 \| - \|
	\| 2.3685 \| 1480 \| 0.001 \| - \|
	\| 2.3845 \| 1490 \| 0.0 \| - \|
	\| 2.4006 \| 1500 \| 0.0006 \| - \|
	\| 2.4166 \| 1510 \| 0.0007 \| - \|
	\| 2.4326 \| 1520 \| 0.0007 \| - \|
	\| 2.4486 \| 1530 \| 0.0004 \| - \|
	\| 2.4647 \| 1540 \| 0.0007 \| - \|
	\| 2.4807 \| 1550 \| 0.0012 \| - \|
	\| 2.4967 \| 1560 \| 0.0015 \| - \|
	\| 2.5127 \| 1570 \| 0.0014 \| - \|
	\| 2.5287 \| 1580 \| 0.0005 \| - \|
	\| 2.5448 \| 1590 \| 0.0005 \| - \|
	\| 2.5608 \| 1600 \| 0.0014 \| - \|
	\| 2.5768 \| 1610 \| 0.0016 \| - \|
	\| 2.5928 \| 1620 \| 0.0 \| - \|
	\| 2.6089 \| 1630 \| 0.0002 \| - \|
	\| 2.6249 \| 1640 \| 0.0006 \| - \|
	\| 2.6409 \| 1650 \| 0.0002 \| - \|
	\| 2.6569 \| 1660 \| 0.0003 \| - \|
	\| 2.6729 \| 1670 \| 0.0007 \| - \|
	\| 2.6890 \| 1680 \| 0.0005 \| - \|
	\| 2.7050 \| 1690 \| 0.0007 \| - \|
	\| 2.7210 \| 1700 \| 0.0 \| - \|
	\| 2.7370 \| 1710 \| 0.0008 \| - \|
	\| 2.7531 \| 1720 \| 0.0019 \| - \|
	\| 2.7691 \| 1730 \| 0.0017 \| - \|
	\| 2.7851 \| 1740 \| 0.0002 \| - \|
	\| 2.8011 \| 1750 \| 0.0002 \| - \|
	\| 2.8171 \| 1760 \| 0.0002 \| - \|
	\| 2.8332 \| 1770 \| 0.0014 \| - \|
	\| 2.8492 \| 1780 \| 0.0005 \| - \|
	\| 2.8652 \| 1790 \| 0.0021 \| - \|
	\| 2.8812 \| 1800 \| 0.002 \| - \|
	\| 2.8973 \| 1810 \| 0.0021 \| - \|
	\| 2.9133 \| 1820 \| 0.0007 \| - \|
	\| 2.9293 \| 1830 \| 0.0 \| - \|
	\| 2.9453 \| 1840 \| 0.0011 \| - \|
	\| 2.9613 \| 1850 \| 0.0006 \| - \|
	\| 2.9774 \| 1860 \| 0.0008 \| - \|
	\| 2.9934 \| 1870 \| 0.0001 \| - \|
	\| 3.0 \| 1875 \| - \| 0.2516 \|
	\| 3.0080 \| 1880 \| 0.0033 \| - \|
	\| 3.0240 \| 1890 \| 0.0 \| - \|
	\| 3.0401 \| 1900 \| 0.0 \| - \|
	\| 3.0561 \| 1910 \| 0.0009 \| - \|
	\| 3.0721 \| 1920 \| 0.0001 \| - \|
	\| 3.0881 \| 1930 \| 0.001 \| - \|
	\| 3.1041 \| 1940 \| 0.0001 \| - \|
	\| 3.1202 \| 1950 \| 0.0001 \| - \|
	\| 3.1362 \| 1960 \| 0.0 \| - \|
	\| 3.1522 \| 1970 \| 0.0003 \| - \|
	\| 3.1682 \| 1980 \| 0.0001 \| - \|
	\| 3.1843 \| 1990 \| 0.0005 \| - \|
	\| 3.2003 \| 2000 \| 0.0 \| - \|
	\| 3.2163 \| 2010 \| 0.0 \| - \|
	\| 3.2323 \| 2020 \| 0.0 \| - \|
	\| 3.2483 \| 2030 \| 0.0 \| - \|
	\| 3.2644 \| 2040 \| 0.0 \| - \|
	\| 3.2804 \| 2050 \| 0.0 \| - \|
	\| 3.2964 \| 2060 \| 0.0001 \| - \|
	\| 3.3124 \| 2070 \| 0.0001 \| - \|
	\| 3.3285 \| 2080 \| 0.0 \| - \|
	\| 3.3445 \| 2090 \| 0.0001 \| - \|
	\| 3.3605 \| 2100 \| 0.0 \| - \|
	\| 3.3765 \| 2110 \| 0.0005 \| - \|
	\| 3.3925 \| 2120 \| 0.0001 \| - \|
	\| 3.4086 \| 2130 \| 0.0 \| - \|
	\| 3.4246 \| 2140 \| 0.0 \| - \|
	\| 3.4406 \| 2150 \| 0.0004 \| - \|
	\| 3.4566 \| 2160 \| 0.0005 \| - \|
	\| 3.4727 \| 2170 \| 0.0 \| - \|
	\| 3.4887 \| 2180 \| 0.0006 \| - \|
	\| 3.5047 \| 2190 \| 0.0002 \| - \|
	\| 3.5207 \| 2200 \| 0.0007 \| - \|
	\| 3.5368 \| 2210 \| 0.0 \| - \|
	\| 3.5528 \| 2220 \| 0.0 \| - \|
	\| 3.5688 \| 2230 \| 0.0008 \| - \|
	\| 3.5848 \| 2240 \| 0.0001 \| - \|
	\| 3.6008 \| 2250 \| 0.0013 \| - \|
	\| 3.6169 \| 2260 \| 0.0004 \| - \|
	\| 3.6329 \| 2270 \| 0.0006 \| - \|
	\| 3.6489 \| 2280 \| 0.0001 \| - \|
	\| 3.6649 \| 2290 \| 0.0 \| - \|
	\| 3.6810 \| 2300 \| 0.0011 \| - \|
	\| 3.6970 \| 2310 \| 0.0005 \| - \|
	\| 3.7130 \| 2320 \| 0.0 \| - \|
	\| 3.7290 \| 2330 \| 0.0 \| - \|
	\| 3.7450 \| 2340 \| 0.0006 \| - \|
	\| 3.7611 \| 2350 \| 0.0 \| - \|
	\| 3.7771 \| 2360 \| 0.0002 \| - \|
	\| 3.7931 \| 2370 \| 0.0006 \| - \|
	\| 3.8091 \| 2380 \| 0.0002 \| - \|
	\| 3.8252 \| 2390 \| 0.0004 \| - \|
	\| 3.8412 \| 2400 \| 0.0 \| - \|
	\| 3.8572 \| 2410 \| 0.0007 \| - \|
	\| 3.8732 \| 2420 \| 0.0006 \| - \|
	\| 3.8892 \| 2430 \| 0.0002 \| - \|
	\| 3.9053 \| 2440 \| 0.0009 \| - \|
	\| 3.9213 \| 2450 \| 0.0009 \| - \|
	\| 3.9373 \| 2460 \| 0.0 \| - \|
	\| 3.9533 \| 2470 \| 0.0001 \| - \|
	\| 3.9694 \| 2480 \| 0.0012 \| - \|
	\| 3.9854 \| 2490 \| 0.0003 \| - \|
	\| 3.9950 \| 2496 \| - \| 0.2524 \|
	\| -1 \| -1 \| - \| 0.2532 \|

	* The bold row denotes the saved checkpoint.
	</details>

	### Framework Versions
	- Python: 3.11.12
	- Sentence Transformers: 4.1.0
	- Transformers: 4.51.3
	- PyTorch: 2.6.0+cu124
	- Accelerate: 1.6.0
	- Datasets: 2.14.4
	- Tokenizers: 0.21.1

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### TripletLoss
	```bibtex
	@misc{hermans2017defense,
	title={In Defense of the Triplet Loss for Person Re-Identification},
	author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
	year={2017},
	eprint={1703.07737},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->