| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - sentence-transformers |
| - sentence-similarity |
| - feature-extraction |
| - generated_from_trainer |
| - dataset_size:79876 |
| - loss:TripletLoss |
| base_model: Master-thesis-NAP/ModernBert-DAPT-math |
| widget: |
| - source_sentence: What is the error estimate for the difference between the exact |
| solution and the local oscillation decomposition (LOD) solution in terms of the |
| $L_0$ norm? |
| sentences: |
| - '\label{RL1} |
| |
| The system \eqref{R3} has the following positive fixed points if $0 <\alpha\leq1$ |
| and $b>d$ |
| |
| $$E^*=\left(\dfrac{d}{b}, \dfrac{(b-d) r}{b^2}\right)$$' |
| - "\\label{theo1d}\nWith the assumptions and setting is this section, the finite\ |
| \ difference solution computed using the improved harmonic average method applied\ |
| \ to \\eqn{eq1d} or \\eqn{eq1dB} has second order convergence in the infinity\ |
| \ norm, that is,\n\\eqm\n \\|\\mathbf{E} \\|_{\\infty}\\le C h^2,\n\\enm\nassuming\ |
| \ that the true solution of \\eqn{eq1d} is piecewise $C^4$ excluding the interface\ |
| \ $\\alf$, that is, \n$u(x) \\in C^4(0,\\alf) \\cup C^4(\\alf,1)$. \n%where $C$\ |
| \ is a generic error constant." |
| - "\\label{Corollary}\n Let Assumptions~\\ref{assum_1} and~\\ref{assump2} be\ |
| \ satisfied. Let $u$ be the solution of~\\eqref{WeakForm} and let $u_{H,k}$ be\ |
| \ the LOD solution of~\\eqref{local_probelm }. Then we have \n \\begin{equation}\\\ |
| label{L2Estimate}\n \\|u-I_Hu_{H,k}\\|_0\\lesssim \\|u-I_Hu\\|_0+\\|u-u_{H,k}\\\ |
| |_0 +H|u-u_{H,k}|_1.\n \\end{equation}\n %\\[\\|u-I_Hu_{H,k}\\|_0\\lesssim\ |
| \ H |u|_1 +|u-u_{H,k}|_1.\\]" |
| - source_sentence: What is the expected value of the number of individuals in a Markov |
| branching process with non-homogeneous Poisson immigration (MBPNPI) at time $t=0$, |
| given that the immigration rate is $\lambda$? |
| sentences: |
| - '\label{lemma-sampling} |
| |
| Fix an integer~$n\geq 1$. |
| |
| Consider the initial configuration with one active particle on each |
| |
| site of~$V_n$ and let the system evolve, with particles being killed |
| |
| when they jump out of~$V_n$, until no active particle remains |
| |
| in~$V_n$. |
| |
| Then the distribution of the resulting stable configuration is exactly |
| |
| the stationary distribution of the driven-dissipative Markov chain |
| |
| on~$V_n$. |
| |
| In particular, the number of sleeping particles remaining in~$V_n$ is |
| |
| distributed as~$S_n$.' |
| - "The process $Y(t)$, $t\\geq 0,$ is called Markov branching process with\r\nnon-homogeneous\ |
| \ Poisson immigration (MBPNPI)." |
| - "For any $\\lambda \\in(0,1)$ and $s \\in\\mathbb N$,\n \\begin{equation*}\n\\\ |
| sum_{k=s}^{\\infty}\\binom {k}{s}\n(1-\\lambda)^{k-s}= \\lambda^{-s-1}.\n\\\ |
| end{equation*}" |
| - source_sentence: Does the theorem imply that the rate of convergence of the sequence |
| $T_{m,j}(E)$ to $T_{m+k_n,j+k_n}(E)$ is exponential in the distance between $m$ |
| and $j$, and that this rate is bounded by a constant $C$ times an exponential |
| decay factor involving the parameter $\gamma$? |
| sentences: |
| - "\\label{lem1}\n\t\tFor all $m,j\\in\\Z$, we have\n\t\t\\begin{equation*}\n\t\ |
| \t|| T_{m,j} (E)-T_{m+k_n,j+k_n}(E)||\\leq C e^{-\\gamma k_n} e^{(\\mathcal\ |
| \ L(E)+\\varepsilon) |m-j|}. \n\t\t\\end{equation*}" |
| - "[Divergence Theorem or Gauss-Green Theorem for Surfaces in $\\R^3$]\n\t\\label{thm:surface_int}\n\ |
| \t Let $\\Sigma \\subset \\Omega\\subseteq\\R^3$ be a bounded smooth surface.\n\ |
| \t Further, $\\bb a:\\Sigma\\to\\R^3$ is a continuously differentiable\ |
| \ vector field that is either defined on the\n\t\t\t\t\tboundary $\\partial\\\ |
| Sigma$ or has a bounded continuous extension to this boundary.\n\t Like\ |
| \ in \\eqref{eq:decomp} it may be decomposed into tangential and normal components\n\ |
| \t\t\t\t\tas follows $\\bb a = \\bb a^\\shortparallel + a_\\nu\\bs\\nu_\\Sigma$.\ |
| \ By $\\dd l$ we denote the line element on \n\t\t\t\t\tthe curve $\\partial \\\ |
| Sigma$. We assume that the curve is continuous and consists of finitely many\n\ |
| \t\t\t\t\tsmooth pieces.\n\t Then the following divergence formula for\ |
| \ surface integrals holds\n\t %\n\t \\begin{align}\n\t \ |
| \ %\n\t \\int\\limits_\\Sigma \\left[\\nabla_\\Sigma\\cdot\\bb a^\\\ |
| shortparallel\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\partial\\\ |
| Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\,\\dd l .\n\ |
| \t \\label{eq:surface_div}\n\t %\n\t \\end{align}\n\ |
| \t\t\t\t\t%\n\t\t\t\t\tFrom this we obtain the formula\n\t\t\t\t\t%\n\t \ |
| \ \\begin{align}\n\t %\n\t \\int\\limits_\\Sigma \\left[\\\ |
| nabla_\\Sigma\\cdot\\bb a\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\\ |
| partial\\Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\\ |
| ,\\dd l \n\t\t\t\t\t\t\t-\\int\\limits_\\Sigma\\left[ 2\\kappa_Ma_\\nu\\right](\\\ |
| x)\\;\\dd S.\n\t \\label{eq:surface_div_2}\n\t %\n\t \ |
| \ \\end{align}\n\t %" |
| - '\label{theo:helper3} |
| |
| Assume that $\{\PP_N\}_{N\ge 1}$ is a sequence of probability measures that is |
| HT-appropriate in the sense of \cref{def:appropriate} and satisfies the LLN in |
| the sense of \cref{def:LLN}. |
| |
| Let $(\kappa_n)_{n\ge 1}$ and $(m_n)_{n\ge 1}$ be the sequences that arise from |
| these definitions. |
| |
| Moreover, assume that there exists a constant $C>0$ such that $|\kappa_n|\leq |
| C^n$, for all $n \geq 1$. |
| |
| Then $(m_n)_{n\ge 1}$ is the sequence of moments of a unique probability measure |
| on $\R$.' |
| - source_sentence: What is the error estimate for the eigenfunction approximation |
| in terms of the weak eigenvalue and the norm of the difference between the exact |
| and approximate eigenfunctions? |
| sentences: |
| - "Consider dynamics \\eqref{avg} and define the corresponding average dynamics\ |
| \ as $\\label{T-avg}\n\\mathring{\\chi} = \\epsilon h_{av}(\\chi)$, with the average\ |
| \ function defined as\n\\begin{equation*} \nh_{av}(\\chi):=\\lim_{T \\to \\infty}\ |
| \ \\frac{1}{T}\\int_{t}^{t+T} h(\\mu, \\chi, 0) d \\mu, \\ T>0,\n\\end{equation*}\n\ |
| both \\eqref{avg} and \\eqref{T-avg} twice differentiable and bounded in every\ |
| \ compact set of the $\\chi$-domain $\\mathcal{D} \\subset \\mathbb{R}^{3}$. \n\ |
| %\nLet $\\chi(\\tau,\\epsilon)$ and $\\chi_{av}(\\epsilon\\tau)$ denote the solutions\ |
| \ of \\eqref{avg} and \\eqref{T-avg}, respectively. If $\\chi_{av}(\\epsilon\\\ |
| tau)\\in \\mathcal{D}$ for all $\\tau\\in[0,\\zeta/\\epsilon]$, $\\zeta\\geq 0$,\ |
| \ and $\\chi(0,\\epsilon) - \\chi_{av}(0)=\\mathcal{O}(\\nu(\\epsilon))$, then\ |
| \ there exists an $\\epsilon^{*}>0$ such that for all $0<\\epsilon<\\epsilon^{*}$,\ |
| \ $\\chi(\\tau,\\epsilon)$ is well defined and\n$$\n\\chi(\\tau,\\epsilon) - \\\ |
| chi_{av}(\\epsilon\\tau) = \\mathcal{O}(\\nu(\\epsilon)) \\ \\textnormal{on} \\\ |
| \ \\tau \\in [0, \\zeta/\\epsilon],\n$$\nfor some function $\\nu\\in \\mathcal{K}$." |
| - "(\\cite{DangWangXieZhou})\\label{Theorem_Error_Estimate_k}\nLet us define the\ |
| \ spectral projection $F_{k,h}^{(\\ell)}: V\\mapsto {\\rm span}\\{u_{1,h}^{(\\\ |
| ell)}, \\cdots, u_{k,h}^{(\\ell)}\\}$ for any integer $\\ell \\geq 1$ as follows:\n\ |
| \\begin{eqnarray*}\na(F_{k,h}^{(\\ell)}w, u_{i,h}^{(\\ell)}) = a(w, u_{i,h}^{(\\\ |
| ell)}), \\ \\ \\ i=1, \\cdots, k\\ \\ {\\rm for}\\ w\\in V.\n\\end{eqnarray*}\n\ |
| Then the exact eigenfunctions $\\bar u_{1,h},\\cdots, \\bar u_{k,h}$ of (\\ref{Weak_Eigenvalue_Discrete})\ |
| \ and the eigenfunction approximations $u_{1,h}^{(\\ell+1)}$, $\\cdots$, $u_{k,h}^{(\\\ |
| ell+1)}$ from Algorithm \\ref{Algorithm_k} with the integer $\\ell > 1$ have the\ |
| \ following error estimate:\n\\begin{eqnarray*}\\label{Error_Estimate_Inverse}\n\ |
| \ \\left\\|\\bar u_{i,h} - F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\|_a \\leq\n\ |
| \ \\bar\\lambda_{i,h} \\sqrt{1+\\frac{\\eta_a^2(V_H)}{\\bar\\lambda_{1,h}\\big(\\\ |
| delta_{k,i,h}^{(\\ell+1)}\\big)^2}}\n\\left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\\ |
| ell)}}\\right)\\eta_a^2(V_H)\\left\\|\\bar u_{i,h} - F_{k,h}^{(\\ell)}\\bar u_{i,h}\ |
| \ \\right\\|_a,\n\\end{eqnarray*}\nwhere $\\delta_{k,i,h}^{(\\ell)} $ is defined\ |
| \ as follows:\n\\begin{eqnarray*}\n\\delta_{k,i,h}^{(\\ell)} = \\min_{j\\not\\\ |
| in \\{1, \\cdots, k\\}}\\left|\\frac{1}{\\lambda_{j,h}^{(\\ell)}}-\\frac{1}{\\\ |
| bar\\lambda_{i,h}}\\right|,\\ \\ \\ i=1, \\cdots, k.\n\\end{eqnarray*}\nFurthermore,\ |
| \ the following $\\left\\|\\cdot\\right\\|_b$-norm error estimate holds:\n\\begin{eqnarray*}\n\ |
| \\left\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\|_b\\leq \n\\\ |
| left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\ell+1)}}\\right)\\eta_a(V_H)\ |
| \ \\left\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h}\\right\\|_a.\n\\end{eqnarray*}" |
| - "\\big[{\\bf Condition $SD1(h)$}\\big]\\label{DefnSD1(h)}\n\nIn \\cite{MDL} an\ |
| \ approximation order $O(h^s)$, as $h\\to 0$, is proved, where $h$ is the sampling\ |
| \ distance. The achievable order $s$ is of course limited by the smoothness order\ |
| \ of the boundaries of $Graph(F)$. Then, the order $s$ depends upon the degree\ |
| \ of the polynomials used to approximate the boundary near the neighborhood of\ |
| \ points of topology change and upon the degree of splines used at regular regions.\ |
| \ \n\nFor example, let us view Step C of the approximation algorithm described\ |
| \ in Section 5.2 of \\cite{MDL}. \nIt is assumed that the boundary curves are\ |
| \ $C^{2k}$ smooth, and it is implicitly assumed that $h$ is small enough so that\ |
| \ there are $2k$ sample points close to the point of topology change, for computing\ |
| \ the polynomial $p_{2k-1}$ therein.\nThis condition is related to the more general\ |
| \ condition $SD(h)$ and it can serve as a practical way of checking it for the\ |
| \ case $d=1$. That is, near a point of topology change, we check whether there\ |
| \ are enough sample points for applying the approximation algorithm in \\cite{MDL}.\ |
| \ We denote this condition as the $SD1(h)$ condition." |
| - source_sentence: Does Werner-Young's inequality imply that the convolution of two |
| $L^p$ spaces is always $L^r$ for $1 < r < \infty$? |
| sentences: |
| - "$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion. If $1\ |
| \ < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition.\ |
| \ \n %" |
| - "A \\emph{bond system} is a tuple $(B,C,s,t,1,\\cdot)$, where $B$ is a set of\ |
| \ \\emph{bonds}, $C$ is a set of \\emph{content} relations, and $s,t:C\\to B$\ |
| \ are \\emph{source} and \\emph{target} functions. For $c\\in C$ with $s(c)=x$\ |
| \ and $t(c)=y$, we write $x\\xrightarrow{c}y$ or $c:x\\to y$, indicating that\ |
| \ $x$ \\emph{contains} $y$. Each bond $x\\in B$ has an \\emph{identity} containment\ |
| \ $1_x:x\\to x$, meaning every bond trivially contains itself. For $c:x\\to y$\ |
| \ and $c':y\\to z$, their composition is $cc':x\\to z$. These data must satisfy:\n\ |
| \ \\begin{enumerate}\n \\item Identity laws: For each $c:x\\to y$, $1_x\ |
| \ c= c=c1_y$\n \\item Associativity: For $c:x\\to y$, $c':y\\to z$, $c'':z\\\ |
| to w$, $c(c'c'')=(cc')c''$\n \\item Anti-symmetry: For $c:x\\to y$ and\ |
| \ $c':y\\to x$, $x=y$\n \\item Left cancellation: For $c,c':x\\to y$ and\ |
| \ $c'':y\\to z$, if $cc''=c'c''$, then $c=c'$\n \\end{enumerate}" |
| - "[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\\ |
| in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$\ |
| \ and\n\\begin{align*}\n \\|S\\star T\\|_{L^{r}}\\leq \\|S\\|_{\\cS^p}\\|T\\\ |
| |_{\\cS^q}.\n\\end{align*}" |
| pipeline_tag: sentence-similarity |
| library_name: sentence-transformers |
| metrics: |
| - cosine_accuracy@1 |
| - cosine_accuracy@3 |
| - cosine_accuracy@5 |
| - cosine_accuracy@10 |
| - cosine_precision@1 |
| - cosine_precision@3 |
| - cosine_precision@5 |
| - cosine_precision@10 |
| - cosine_recall@1 |
| - cosine_recall@3 |
| - cosine_recall@5 |
| - cosine_recall@10 |
| - cosine_ndcg@10 |
| - cosine_mrr@10 |
| - cosine_map@100 |
| model-index: |
| - name: ModernBERT DAPT Embed DAPT Math |
| results: |
| - task: |
| type: information-retrieval |
| name: Information Retrieval |
| dataset: |
| name: TESTING |
| type: TESTING |
| metrics: |
| - type: cosine_accuracy@1 |
| value: 0.5679510844485464 |
| name: Cosine Accuracy@1 |
| - type: cosine_accuracy@3 |
| value: 0.6324411628980157 |
| name: Cosine Accuracy@3 |
| - type: cosine_accuracy@5 |
| value: 0.6586294416243654 |
| name: Cosine Accuracy@5 |
| - type: cosine_accuracy@10 |
| value: 0.6938163359483156 |
| name: Cosine Accuracy@10 |
| - type: cosine_precision@1 |
| value: 0.5679510844485464 |
| name: Cosine Precision@1 |
| - type: cosine_precision@3 |
| value: 0.36494385479157054 |
| name: Cosine Precision@3 |
| - type: cosine_precision@5 |
| value: 0.27741116751269035 |
| name: Cosine Precision@5 |
| - type: cosine_precision@10 |
| value: 0.18192201199815417 |
| name: Cosine Precision@10 |
| - type: cosine_recall@1 |
| value: 0.026541702012005317 |
| name: Cosine Recall@1 |
| - type: cosine_recall@3 |
| value: 0.048742014322369596 |
| name: Cosine Recall@3 |
| - type: cosine_recall@5 |
| value: 0.0598887341486898 |
| name: Cosine Recall@5 |
| - type: cosine_recall@10 |
| value: 0.07516536747041261 |
| name: Cosine Recall@10 |
| - type: cosine_ndcg@10 |
| value: 0.25320633940615317 |
| name: Cosine Ndcg@10 |
| - type: cosine_mrr@10 |
| value: 0.6070309695944213 |
| name: Cosine Mrr@10 |
| - type: cosine_map@100 |
| value: 0.07416668442975916 |
| name: Cosine Map@100 |
| --- |
| |
| # ModernBERT DAPT Embed DAPT Math |
|
|
| This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
| ## Model Details |
|
|
| ### Model Description |
| - **Model Type:** Sentence Transformer |
| - **Base model:** [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math) <!-- at revision a30384f91d764c272e6b740c256d5581325ea4bb --> |
| - **Maximum Sequence Length:** 8192 tokens |
| - **Output Dimensionality:** 768 dimensions |
| - **Similarity Function:** Cosine Similarity |
| <!-- - **Training Dataset:** Unknown --> |
| - **Language:** en |
| - **License:** apache-2.0 |
|
|
| ### Model Sources |
|
|
| - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
| - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
| - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
| ### Full Model Architecture |
|
|
| ``` |
| SentenceTransformer( |
| (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel |
| (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
| (2): Normalize() |
| ) |
| ``` |
|
|
| ## Usage |
|
|
| ### Direct Usage (Sentence Transformers) |
|
|
| First install the Sentence Transformers library: |
|
|
| ```bash |
| pip install -U sentence-transformers |
| ``` |
|
|
| Then you can load this model and run inference. |
| ```python |
| from sentence_transformers import SentenceTransformer |
| |
| # Download from the 🤗 Hub |
| model = SentenceTransformer("Master-thesis-NAP/ModernBERT-DAPT-Embed-DAPT-Math") |
| # Run inference |
| sentences = [ |
| "Does Werner-Young's inequality imply that the convolution of two $L^p$ spaces is always $L^r$ for $1 < r < \\infty$?", |
| "[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$ and\n\\begin{align*}\n \\|S\\star T\\|_{L^{r}}\\leq \\|S\\|_{\\cS^p}\\|T\\|_{\\cS^q}.\n\\end{align*}", |
| '$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion. If $1 < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition. \n %', |
| ] |
| embeddings = model.encode(sentences) |
| print(embeddings.shape) |
| # [3, 768] |
| |
| # Get the similarity scores for the embeddings |
| similarities = model.similarity(embeddings, embeddings) |
| print(similarities.shape) |
| # [3, 3] |
| ``` |
|
|
| <!-- |
| ### Direct Usage (Transformers) |
|
|
| <details><summary>Click to see the direct usage in Transformers</summary> |
|
|
| </details> |
| --> |
|
|
| <!-- |
| ### Downstream Usage (Sentence Transformers) |
|
|
| You can finetune this model on your own dataset. |
|
|
| <details><summary>Click to expand</summary> |
|
|
| </details> |
| --> |
|
|
| <!-- |
| ### Out-of-Scope Use |
|
|
| *List how the model may foreseeably be misused and address what users ought not to do with the model.* |
| --> |
|
|
| ## Evaluation |
|
|
| ### Metrics |
|
|
| #### Information Retrieval |
|
|
| * Dataset: `TESTING` |
| * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
| | Metric | Value | |
| |:--------------------|:-----------| |
| | cosine_accuracy@1 | 0.568 | |
| | cosine_accuracy@3 | 0.6324 | |
| | cosine_accuracy@5 | 0.6586 | |
| | cosine_accuracy@10 | 0.6938 | |
| | cosine_precision@1 | 0.568 | |
| | cosine_precision@3 | 0.3649 | |
| | cosine_precision@5 | 0.2774 | |
| | cosine_precision@10 | 0.1819 | |
| | cosine_recall@1 | 0.0265 | |
| | cosine_recall@3 | 0.0487 | |
| | cosine_recall@5 | 0.0599 | |
| | cosine_recall@10 | 0.0752 | |
| | **cosine_ndcg@10** | **0.2532** | |
| | cosine_mrr@10 | 0.607 | |
| | cosine_map@100 | 0.0742 | |
| |
| <!-- |
| ## Bias, Risks and Limitations |
| |
| *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
| --> |
| |
| <!-- |
| ### Recommendations |
| |
| *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
| --> |
| |
| ## Training Details |
| |
| ### Training Dataset |
| |
| #### Unnamed Dataset |
| |
| * Size: 79,876 training samples |
| * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code> |
| * Approximate statistics based on the first 1000 samples: |
| | | anchor | positive | negative | |
| |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| |
| | type | string | string | string | |
| | details | <ul><li>min: 9 tokens</li><li>mean: 38.48 tokens</li><li>max: 142 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 210.43 tokens</li><li>max: 924 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 91.02 tokens</li><li>max: 481 tokens</li></ul> | |
| * Samples: |
| | anchor | positive | negative | |
| |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| | <code>What is the limit of the proportion of 1's in the sequence $a_n$ as $n$ approaches infinity, given that $0 \leq 3g_n -2n \leq 4$?</code> | <code>Let $g_n$ be the number of $1$'s in the sequence $a_1 a_2 \cdots a_n$.<br>Then <br>\begin{equation}<br>0 \leq 3g_n -2n \leq 4<br>\label{star}<br>\end{equation}<br>for all $n$, and hence<br>$\lim_{n \rightarrow \infty} g_n/n = 2/3$.<br>\label{thm1}</code> | <code>\label{thm:bounds_initial}<br> Let $\seqq{s}$ be a sequence of rank $r$ for which the roots of the characteristic polynomial are all different. Then, for any positive integer $M$, the rank of $\seq{s^M}$ is at most<br> \begin{align*}<br> \rank s^M \leq \binom{M+r-1}{M}.<br> \end{align*}</code> | |
| | <code>Does the statement of \textbf{ThmConjAreTrue} imply that the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is always equal to $g(d,s)$?</code> | <code>\label{ThmConjAreTrue}<br>Conjectures \ref{Conj1} and \ref{Conj2} are true.<br>As a consequence, <br>if either $d=s \geq 1$ or $d \geq 2s+1 \geq 3$, <br>the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is equal to $g(d,s)$.</code> | <code>[{\cite[Corollary 2.2.2 with $p=3$]{BSY}}]<br> Let $S$ be a non-trivial Severi-Brauer surface over a perfect field $\textbf{k}$. Then $S$ does not contain points of degree $d$, where $d$ is not divisible by $3$. On the other hand $S$ contains a point of degree $3$.</code> | |
| | <code>\\emph{Is the statement \emph{If $X$ is a compact Hausdorff space, then $X$ is normal}, proven in the first isomorphism theorem for topological groups, or is it a well-known result in topology?}</code> | <code>}<br>\newcommand{\ep}{</code> | <code>\label{prop:coherence}<br> If $X$ is a qcqs scheme, then $RX$ is coherent in the sense that the set of quasi-compact open subsets of $RX$ is closed under finite intersections and forms a basis for the topology of $RX$.</code> | |
| * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters: |
| ```json |
| { |
| "distance_metric": "TripletDistanceMetric.COSINE", |
| "triplet_margin": 0.1 |
| } |
| ``` |
| |
| ### Training Hyperparameters |
| #### Non-Default Hyperparameters |
| |
| - `eval_strategy`: epoch |
| - `per_device_train_batch_size`: 16 |
| - `per_device_eval_batch_size`: 16 |
| - `gradient_accumulation_steps`: 8 |
| - `learning_rate`: 2e-05 |
| - `num_train_epochs`: 4 |
| - `lr_scheduler_type`: cosine |
| - `warmup_ratio`: 0.1 |
| - `bf16`: True |
| - `tf32`: True |
| - `load_best_model_at_end`: True |
| - `optim`: adamw_torch_fused |
| - `batch_sampler`: no_duplicates |
| |
| #### All Hyperparameters |
| <details><summary>Click to expand</summary> |
| |
| - `overwrite_output_dir`: False |
| - `do_predict`: False |
| - `eval_strategy`: epoch |
| - `prediction_loss_only`: True |
| - `per_device_train_batch_size`: 16 |
| - `per_device_eval_batch_size`: 16 |
| - `per_gpu_train_batch_size`: None |
| - `per_gpu_eval_batch_size`: None |
| - `gradient_accumulation_steps`: 8 |
| - `eval_accumulation_steps`: None |
| - `torch_empty_cache_steps`: None |
| - `learning_rate`: 2e-05 |
| - `weight_decay`: 0.0 |
| - `adam_beta1`: 0.9 |
| - `adam_beta2`: 0.999 |
| - `adam_epsilon`: 1e-08 |
| - `max_grad_norm`: 1.0 |
| - `num_train_epochs`: 4 |
| - `max_steps`: -1 |
| - `lr_scheduler_type`: cosine |
| - `lr_scheduler_kwargs`: {} |
| - `warmup_ratio`: 0.1 |
| - `warmup_steps`: 0 |
| - `log_level`: passive |
| - `log_level_replica`: warning |
| - `log_on_each_node`: True |
| - `logging_nan_inf_filter`: True |
| - `save_safetensors`: True |
| - `save_on_each_node`: False |
| - `save_only_model`: False |
| - `restore_callback_states_from_checkpoint`: False |
| - `no_cuda`: False |
| - `use_cpu`: False |
| - `use_mps_device`: False |
| - `seed`: 42 |
| - `data_seed`: None |
| - `jit_mode_eval`: False |
| - `use_ipex`: False |
| - `bf16`: True |
| - `fp16`: False |
| - `fp16_opt_level`: O1 |
| - `half_precision_backend`: auto |
| - `bf16_full_eval`: False |
| - `fp16_full_eval`: False |
| - `tf32`: True |
| - `local_rank`: 0 |
| - `ddp_backend`: None |
| - `tpu_num_cores`: None |
| - `tpu_metrics_debug`: False |
| - `debug`: [] |
| - `dataloader_drop_last`: False |
| - `dataloader_num_workers`: 0 |
| - `dataloader_prefetch_factor`: None |
| - `past_index`: -1 |
| - `disable_tqdm`: False |
| - `remove_unused_columns`: True |
| - `label_names`: None |
| - `load_best_model_at_end`: True |
| - `ignore_data_skip`: False |
| - `fsdp`: [] |
| - `fsdp_min_num_params`: 0 |
| - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
| - `tp_size`: 0 |
| - `fsdp_transformer_layer_cls_to_wrap`: None |
| - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
| - `deepspeed`: None |
| - `label_smoothing_factor`: 0.0 |
| - `optim`: adamw_torch_fused |
| - `optim_args`: None |
| - `adafactor`: False |
| - `group_by_length`: False |
| - `length_column_name`: length |
| - `ddp_find_unused_parameters`: None |
| - `ddp_bucket_cap_mb`: None |
| - `ddp_broadcast_buffers`: False |
| - `dataloader_pin_memory`: True |
| - `dataloader_persistent_workers`: False |
| - `skip_memory_metrics`: True |
| - `use_legacy_prediction_loop`: False |
| - `push_to_hub`: False |
| - `resume_from_checkpoint`: None |
| - `hub_model_id`: None |
| - `hub_strategy`: every_save |
| - `hub_private_repo`: None |
| - `hub_always_push`: False |
| - `gradient_checkpointing`: False |
| - `gradient_checkpointing_kwargs`: None |
| - `include_inputs_for_metrics`: False |
| - `include_for_metrics`: [] |
| - `eval_do_concat_batches`: True |
| - `fp16_backend`: auto |
| - `push_to_hub_model_id`: None |
| - `push_to_hub_organization`: None |
| - `mp_parameters`: |
| - `auto_find_batch_size`: False |
| - `full_determinism`: False |
| - `torchdynamo`: None |
| - `ray_scope`: last |
| - `ddp_timeout`: 1800 |
| - `torch_compile`: False |
| - `torch_compile_backend`: None |
| - `torch_compile_mode`: None |
| - `include_tokens_per_second`: False |
| - `include_num_input_tokens_seen`: False |
| - `neftune_noise_alpha`: None |
| - `optim_target_modules`: None |
| - `batch_eval_metrics`: False |
| - `eval_on_start`: False |
| - `use_liger_kernel`: False |
| - `eval_use_gather_object`: False |
| - `average_tokens_across_devices`: False |
| - `prompts`: None |
| - `batch_sampler`: no_duplicates |
| - `multi_dataset_batch_sampler`: proportional |
| |
| </details> |
| |
| ### Training Logs |
| <details><summary>Click to expand</summary> |
| |
| | Epoch | Step | Training Loss | TESTING_cosine_ndcg@10 | |
| |:-------:|:-------:|:-------------:|:----------------------:| |
| | 0.0160 | 10 | 1.1162 | - | |
| | 0.0320 | 20 | 1.0465 | - | |
| | 0.0481 | 30 | 0.9663 | - | |
| | 0.0641 | 40 | 0.8758 | - | |
| | 0.0801 | 50 | 0.8215 | - | |
| | 0.0961 | 60 | 0.7492 | - | |
| | 0.1122 | 70 | 0.6356 | - | |
| | 0.1282 | 80 | 0.3573 | - | |
| | 0.1442 | 90 | 0.166 | - | |
| | 0.1602 | 100 | 0.0797 | - | |
| | 0.1762 | 110 | 0.046 | - | |
| | 0.1923 | 120 | 0.0419 | - | |
| | 0.2083 | 130 | 0.025 | - | |
| | 0.2243 | 140 | 0.0233 | - | |
| | 0.2403 | 150 | 0.0205 | - | |
| | 0.2564 | 160 | 0.0142 | - | |
| | 0.2724 | 170 | 0.017 | - | |
| | 0.2884 | 180 | 0.0157 | - | |
| | 0.3044 | 190 | 0.0104 | - | |
| | 0.3204 | 200 | 0.0126 | - | |
| | 0.3365 | 210 | 0.019 | - | |
| | 0.3525 | 220 | 0.0153 | - | |
| | 0.3685 | 230 | 0.0171 | - | |
| | 0.3845 | 240 | 0.0124 | - | |
| | 0.4006 | 250 | 0.01 | - | |
| | 0.4166 | 260 | 0.0071 | - | |
| | 0.4326 | 270 | 0.0125 | - | |
| | 0.4486 | 280 | 0.0096 | - | |
| | 0.4647 | 290 | 0.0092 | - | |
| | 0.4807 | 300 | 0.0067 | - | |
| | 0.4967 | 310 | 0.0069 | - | |
| | 0.5127 | 320 | 0.0054 | - | |
| | 0.5287 | 330 | 0.0107 | - | |
| | 0.5448 | 340 | 0.0115 | - | |
| | 0.5608 | 350 | 0.0083 | - | |
| | 0.5768 | 360 | 0.0175 | - | |
| | 0.5928 | 370 | 0.0162 | - | |
| | 0.6089 | 380 | 0.0094 | - | |
| | 0.6249 | 390 | 0.0124 | - | |
| | 0.6409 | 400 | 0.0078 | - | |
| | 0.6569 | 410 | 0.014 | - | |
| | 0.6729 | 420 | 0.0117 | - | |
| | 0.6890 | 430 | 0.0097 | - | |
| | 0.7050 | 440 | 0.0094 | - | |
| | 0.7210 | 450 | 0.0077 | - | |
| | 0.7370 | 460 | 0.0103 | - | |
| | 0.7531 | 470 | 0.0099 | - | |
| | 0.7691 | 480 | 0.0123 | - | |
| | 0.7851 | 490 | 0.0103 | - | |
| | 0.8011 | 500 | 0.0098 | - | |
| | 0.8171 | 510 | 0.0059 | - | |
| | 0.8332 | 520 | 0.0031 | - | |
| | 0.8492 | 530 | 0.0075 | - | |
| | 0.8652 | 540 | 0.0101 | - | |
| | 0.8812 | 550 | 0.0099 | - | |
| | 0.8973 | 560 | 0.0098 | - | |
| | 0.9133 | 570 | 0.0072 | - | |
| | 0.9293 | 580 | 0.0057 | - | |
| | 0.9453 | 590 | 0.0074 | - | |
| | 0.9613 | 600 | 0.0038 | - | |
| | 0.9774 | 610 | 0.0127 | - | |
| | 0.9934 | 620 | 0.0098 | - | |
| | **1.0** | **625** | **-** | **0.2532** | |
| | 1.0080 | 630 | 0.0064 | - | |
| | 1.0240 | 640 | 0.0066 | - | |
| | 1.0401 | 650 | 0.0056 | - | |
| | 1.0561 | 660 | 0.0031 | - | |
| | 1.0721 | 670 | 0.0023 | - | |
| | 1.0881 | 680 | 0.0032 | - | |
| | 1.1041 | 690 | 0.0021 | - | |
| | 1.1202 | 700 | 0.0011 | - | |
| | 1.1362 | 710 | 0.006 | - | |
| | 1.1522 | 720 | 0.0045 | - | |
| | 1.1682 | 730 | 0.0041 | - | |
| | 1.1843 | 740 | 0.0026 | - | |
| | 1.2003 | 750 | 0.0019 | - | |
| | 1.2163 | 760 | 0.0058 | - | |
| | 1.2323 | 770 | 0.0054 | - | |
| | 1.2483 | 780 | 0.0066 | - | |
| | 1.2644 | 790 | 0.0033 | - | |
| | 1.2804 | 800 | 0.004 | - | |
| | 1.2964 | 810 | 0.0028 | - | |
| | 1.3124 | 820 | 0.0027 | - | |
| | 1.3285 | 830 | 0.0017 | - | |
| | 1.3445 | 840 | 0.0009 | - | |
| | 1.3605 | 850 | 0.0048 | - | |
| | 1.3765 | 860 | 0.0037 | - | |
| | 1.3925 | 870 | 0.0045 | - | |
| | 1.4086 | 880 | 0.0043 | - | |
| | 1.4246 | 890 | 0.0046 | - | |
| | 1.4406 | 900 | 0.0023 | - | |
| | 1.4566 | 910 | 0.0031 | - | |
| | 1.4727 | 920 | 0.0027 | - | |
| | 1.4887 | 930 | 0.0022 | - | |
| | 1.5047 | 940 | 0.0042 | - | |
| | 1.5207 | 950 | 0.0026 | - | |
| | 1.5368 | 960 | 0.0049 | - | |
| | 1.5528 | 970 | 0.0024 | - | |
| | 1.5688 | 980 | 0.0019 | - | |
| | 1.5848 | 990 | 0.0038 | - | |
| | 1.6008 | 1000 | 0.0036 | - | |
| | 1.6169 | 1010 | 0.0023 | - | |
| | 1.6329 | 1020 | 0.0021 | - | |
| | 1.6489 | 1030 | 0.0011 | - | |
| | 1.6649 | 1040 | 0.0025 | - | |
| | 1.6810 | 1050 | 0.0026 | - | |
| | 1.6970 | 1060 | 0.0034 | - | |
| | 1.7130 | 1070 | 0.0024 | - | |
| | 1.7290 | 1080 | 0.0038 | - | |
| | 1.7450 | 1090 | 0.002 | - | |
| | 1.7611 | 1100 | 0.0046 | - | |
| | 1.7771 | 1110 | 0.0003 | - | |
| | 1.7931 | 1120 | 0.0062 | - | |
| | 1.8091 | 1130 | 0.0057 | - | |
| | 1.8252 | 1140 | 0.0012 | - | |
| | 1.8412 | 1150 | 0.0021 | - | |
| | 1.8572 | 1160 | 0.0038 | - | |
| | 1.8732 | 1170 | 0.0024 | - | |
| | 1.8892 | 1180 | 0.0026 | - | |
| | 1.9053 | 1190 | 0.0034 | - | |
| | 1.9213 | 1200 | 0.0064 | - | |
| | 1.9373 | 1210 | 0.0041 | - | |
| | 1.9533 | 1220 | 0.0032 | - | |
| | 1.9694 | 1230 | 0.0028 | - | |
| | 1.9854 | 1240 | 0.0009 | - | |
| | 2.0 | 1250 | 0.0042 | 0.2488 | |
| | 2.0160 | 1260 | 0.0005 | - | |
| | 2.0320 | 1270 | 0.0018 | - | |
| | 2.0481 | 1280 | 0.0009 | - | |
| | 2.0641 | 1290 | 0.001 | - | |
| | 2.0801 | 1300 | 0.0024 | - | |
| | 2.0961 | 1310 | 0.0011 | - | |
| | 2.1122 | 1320 | 0.0008 | - | |
| | 2.1282 | 1330 | 0.0001 | - | |
| | 2.1442 | 1340 | 0.0006 | - | |
| | 2.1602 | 1350 | 0.0005 | - | |
| | 2.1762 | 1360 | 0.0003 | - | |
| | 2.1923 | 1370 | 0.0 | - | |
| | 2.2083 | 1380 | 0.0 | - | |
| | 2.2243 | 1390 | 0.0001 | - | |
| | 2.2403 | 1400 | 0.0001 | - | |
| | 2.2564 | 1410 | 0.0027 | - | |
| | 2.2724 | 1420 | 0.0005 | - | |
| | 2.2884 | 1430 | 0.0007 | - | |
| | 2.3044 | 1440 | 0.0001 | - | |
| | 2.3204 | 1450 | 0.0002 | - | |
| | 2.3365 | 1460 | 0.001 | - | |
| | 2.3525 | 1470 | 0.0003 | - | |
| | 2.3685 | 1480 | 0.001 | - | |
| | 2.3845 | 1490 | 0.0 | - | |
| | 2.4006 | 1500 | 0.0006 | - | |
| | 2.4166 | 1510 | 0.0007 | - | |
| | 2.4326 | 1520 | 0.0007 | - | |
| | 2.4486 | 1530 | 0.0004 | - | |
| | 2.4647 | 1540 | 0.0007 | - | |
| | 2.4807 | 1550 | 0.0012 | - | |
| | 2.4967 | 1560 | 0.0015 | - | |
| | 2.5127 | 1570 | 0.0014 | - | |
| | 2.5287 | 1580 | 0.0005 | - | |
| | 2.5448 | 1590 | 0.0005 | - | |
| | 2.5608 | 1600 | 0.0014 | - | |
| | 2.5768 | 1610 | 0.0016 | - | |
| | 2.5928 | 1620 | 0.0 | - | |
| | 2.6089 | 1630 | 0.0002 | - | |
| | 2.6249 | 1640 | 0.0006 | - | |
| | 2.6409 | 1650 | 0.0002 | - | |
| | 2.6569 | 1660 | 0.0003 | - | |
| | 2.6729 | 1670 | 0.0007 | - | |
| | 2.6890 | 1680 | 0.0005 | - | |
| | 2.7050 | 1690 | 0.0007 | - | |
| | 2.7210 | 1700 | 0.0 | - | |
| | 2.7370 | 1710 | 0.0008 | - | |
| | 2.7531 | 1720 | 0.0019 | - | |
| | 2.7691 | 1730 | 0.0017 | - | |
| | 2.7851 | 1740 | 0.0002 | - | |
| | 2.8011 | 1750 | 0.0002 | - | |
| | 2.8171 | 1760 | 0.0002 | - | |
| | 2.8332 | 1770 | 0.0014 | - | |
| | 2.8492 | 1780 | 0.0005 | - | |
| | 2.8652 | 1790 | 0.0021 | - | |
| | 2.8812 | 1800 | 0.002 | - | |
| | 2.8973 | 1810 | 0.0021 | - | |
| | 2.9133 | 1820 | 0.0007 | - | |
| | 2.9293 | 1830 | 0.0 | - | |
| | 2.9453 | 1840 | 0.0011 | - | |
| | 2.9613 | 1850 | 0.0006 | - | |
| | 2.9774 | 1860 | 0.0008 | - | |
| | 2.9934 | 1870 | 0.0001 | - | |
| | 3.0 | 1875 | - | 0.2516 | |
| | 3.0080 | 1880 | 0.0033 | - | |
| | 3.0240 | 1890 | 0.0 | - | |
| | 3.0401 | 1900 | 0.0 | - | |
| | 3.0561 | 1910 | 0.0009 | - | |
| | 3.0721 | 1920 | 0.0001 | - | |
| | 3.0881 | 1930 | 0.001 | - | |
| | 3.1041 | 1940 | 0.0001 | - | |
| | 3.1202 | 1950 | 0.0001 | - | |
| | 3.1362 | 1960 | 0.0 | - | |
| | 3.1522 | 1970 | 0.0003 | - | |
| | 3.1682 | 1980 | 0.0001 | - | |
| | 3.1843 | 1990 | 0.0005 | - | |
| | 3.2003 | 2000 | 0.0 | - | |
| | 3.2163 | 2010 | 0.0 | - | |
| | 3.2323 | 2020 | 0.0 | - | |
| | 3.2483 | 2030 | 0.0 | - | |
| | 3.2644 | 2040 | 0.0 | - | |
| | 3.2804 | 2050 | 0.0 | - | |
| | 3.2964 | 2060 | 0.0001 | - | |
| | 3.3124 | 2070 | 0.0001 | - | |
| | 3.3285 | 2080 | 0.0 | - | |
| | 3.3445 | 2090 | 0.0001 | - | |
| | 3.3605 | 2100 | 0.0 | - | |
| | 3.3765 | 2110 | 0.0005 | - | |
| | 3.3925 | 2120 | 0.0001 | - | |
| | 3.4086 | 2130 | 0.0 | - | |
| | 3.4246 | 2140 | 0.0 | - | |
| | 3.4406 | 2150 | 0.0004 | - | |
| | 3.4566 | 2160 | 0.0005 | - | |
| | 3.4727 | 2170 | 0.0 | - | |
| | 3.4887 | 2180 | 0.0006 | - | |
| | 3.5047 | 2190 | 0.0002 | - | |
| | 3.5207 | 2200 | 0.0007 | - | |
| | 3.5368 | 2210 | 0.0 | - | |
| | 3.5528 | 2220 | 0.0 | - | |
| | 3.5688 | 2230 | 0.0008 | - | |
| | 3.5848 | 2240 | 0.0001 | - | |
| | 3.6008 | 2250 | 0.0013 | - | |
| | 3.6169 | 2260 | 0.0004 | - | |
| | 3.6329 | 2270 | 0.0006 | - | |
| | 3.6489 | 2280 | 0.0001 | - | |
| | 3.6649 | 2290 | 0.0 | - | |
| | 3.6810 | 2300 | 0.0011 | - | |
| | 3.6970 | 2310 | 0.0005 | - | |
| | 3.7130 | 2320 | 0.0 | - | |
| | 3.7290 | 2330 | 0.0 | - | |
| | 3.7450 | 2340 | 0.0006 | - | |
| | 3.7611 | 2350 | 0.0 | - | |
| | 3.7771 | 2360 | 0.0002 | - | |
| | 3.7931 | 2370 | 0.0006 | - | |
| | 3.8091 | 2380 | 0.0002 | - | |
| | 3.8252 | 2390 | 0.0004 | - | |
| | 3.8412 | 2400 | 0.0 | - | |
| | 3.8572 | 2410 | 0.0007 | - | |
| | 3.8732 | 2420 | 0.0006 | - | |
| | 3.8892 | 2430 | 0.0002 | - | |
| | 3.9053 | 2440 | 0.0009 | - | |
| | 3.9213 | 2450 | 0.0009 | - | |
| | 3.9373 | 2460 | 0.0 | - | |
| | 3.9533 | 2470 | 0.0001 | - | |
| | 3.9694 | 2480 | 0.0012 | - | |
| | 3.9854 | 2490 | 0.0003 | - | |
| | 3.9950 | 2496 | - | 0.2524 | |
| | -1 | -1 | - | 0.2532 | |
|
|
| * The bold row denotes the saved checkpoint. |
| </details> |
|
|
| ### Framework Versions |
| - Python: 3.11.12 |
| - Sentence Transformers: 4.1.0 |
| - Transformers: 4.51.3 |
| - PyTorch: 2.6.0+cu124 |
| - Accelerate: 1.6.0 |
| - Datasets: 2.14.4 |
| - Tokenizers: 0.21.1 |
|
|
| ## Citation |
|
|
| ### BibTeX |
|
|
| #### Sentence Transformers |
| ```bibtex |
| @inproceedings{reimers-2019-sentence-bert, |
| title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
| author = "Reimers, Nils and Gurevych, Iryna", |
| booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
| month = "11", |
| year = "2019", |
| publisher = "Association for Computational Linguistics", |
| url = "https://arxiv.org/abs/1908.10084", |
| } |
| ``` |
|
|
| #### TripletLoss |
| ```bibtex |
| @misc{hermans2017defense, |
| title={In Defense of the Triplet Loss for Person Re-Identification}, |
| author={Alexander Hermans and Lucas Beyer and Bastian Leibe}, |
| year={2017}, |
| eprint={1703.07737}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV} |
| } |
| ``` |
|
|
| <!-- |
| ## Glossary |
|
|
| *Clearly define terms in order to be accessible across audiences.* |
| --> |
|
|
| <!-- |
| ## Model Card Authors |
|
|
| *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
| --> |
|
|
| <!-- |
| ## Model Card Contact |
|
|
| *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
| --> |