---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:79876
- loss:TripletLoss
base_model: Master-thesis-NAP/ModernBert-DAPT-math
widget:
- source_sentence: What is the error estimate for the difference between the exact
solution and the local oscillation decomposition (LOD) solution in terms of the
$L_0$ norm?
sentences:
- '\label{RL1}
The system \eqref{R3} has the following positive fixed points if $0 <\alpha\leq1$
and $b>d$
$$E^*=\left(\dfrac{d}{b}, \dfrac{(b-d) r}{b^2}\right)$$'
- "\\label{theo1d}\nWith the assumptions and setting is this section, the finite\
\ difference solution computed using the improved harmonic average method applied\
\ to \\eqn{eq1d} or \\eqn{eq1dB} has second order convergence in the infinity\
\ norm, that is,\n\\eqm\n \\|\\mathbf{E} \\|_{\\infty}\\le C h^2,\n\\enm\nassuming\
\ that the true solution of \\eqn{eq1d} is piecewise $C^4$ excluding the interface\
\ $\\alf$, that is, \n$u(x) \\in C^4(0,\\alf) \\cup C^4(\\alf,1)$. \n%where $C$\
\ is a generic error constant."
- "\\label{Corollary}\n Let Assumptions~\\ref{assum_1} and~\\ref{assump2} be\
\ satisfied. Let $u$ be the solution of~\\eqref{WeakForm} and let $u_{H,k}$ be\
\ the LOD solution of~\\eqref{local_probelm }. Then we have \n \\begin{equation}\\\
label{L2Estimate}\n \\|u-I_Hu_{H,k}\\|_0\\lesssim \\|u-I_Hu\\|_0+\\|u-u_{H,k}\\\
|_0 +H|u-u_{H,k}|_1.\n \\end{equation}\n %\\[\\|u-I_Hu_{H,k}\\|_0\\lesssim\
\ H |u|_1 +|u-u_{H,k}|_1.\\]"
- source_sentence: What is the expected value of the number of individuals in a Markov
branching process with non-homogeneous Poisson immigration (MBPNPI) at time $t=0$,
given that the immigration rate is $\lambda$?
sentences:
- '\label{lemma-sampling}
Fix an integer~$n\geq 1$.
Consider the initial configuration with one active particle on each
site of~$V_n$ and let the system evolve, with particles being killed
when they jump out of~$V_n$, until no active particle remains
in~$V_n$.
Then the distribution of the resulting stable configuration is exactly
the stationary distribution of the driven-dissipative Markov chain
on~$V_n$.
In particular, the number of sleeping particles remaining in~$V_n$ is
distributed as~$S_n$.'
- "The process $Y(t)$, $t\\geq 0,$ is called Markov branching process with\r\nnon-homogeneous\
\ Poisson immigration (MBPNPI)."
- "For any $\\lambda \\in(0,1)$ and $s \\in\\mathbb N$,\n \\begin{equation*}\n\\\
sum_{k=s}^{\\infty}\\binom {k}{s}\n(1-\\lambda)^{k-s}= \\lambda^{-s-1}.\n\\\
end{equation*}"
- source_sentence: Does the theorem imply that the rate of convergence of the sequence
$T_{m,j}(E)$ to $T_{m+k_n,j+k_n}(E)$ is exponential in the distance between $m$
and $j$, and that this rate is bounded by a constant $C$ times an exponential
decay factor involving the parameter $\gamma$?
sentences:
- "\\label{lem1}\n\t\tFor all $m,j\\in\\Z$, we have\n\t\t\\begin{equation*}\n\t\
\t|| T_{m,j} (E)-T_{m+k_n,j+k_n}(E)||\\leq C e^{-\\gamma k_n} e^{(\\mathcal\
\ L(E)+\\varepsilon) |m-j|}. \n\t\t\\end{equation*}"
- "[Divergence Theorem or Gauss-Green Theorem for Surfaces in $\\R^3$]\n\t\\label{thm:surface_int}\n\
\t Let $\\Sigma \\subset \\Omega\\subseteq\\R^3$ be a bounded smooth surface.\n\
\t Further, $\\bb a:\\Sigma\\to\\R^3$ is a continuously differentiable\
\ vector field that is either defined on the\n\t\t\t\t\tboundary $\\partial\\\
Sigma$ or has a bounded continuous extension to this boundary.\n\t Like\
\ in \\eqref{eq:decomp} it may be decomposed into tangential and normal components\n\
\t\t\t\t\tas follows $\\bb a = \\bb a^\\shortparallel + a_\\nu\\bs\\nu_\\Sigma$.\
\ By $\\dd l$ we denote the line element on \n\t\t\t\t\tthe curve $\\partial \\\
Sigma$. We assume that the curve is continuous and consists of finitely many\n\
\t\t\t\t\tsmooth pieces.\n\t Then the following divergence formula for\
\ surface integrals holds\n\t %\n\t \\begin{align}\n\t \
\ %\n\t \\int\\limits_\\Sigma \\left[\\nabla_\\Sigma\\cdot\\bb a^\\\
shortparallel\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\partial\\\
Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\,\\dd l .\n\
\t \\label{eq:surface_div}\n\t %\n\t \\end{align}\n\
\t\t\t\t\t%\n\t\t\t\t\tFrom this we obtain the formula\n\t\t\t\t\t%\n\t \
\ \\begin{align}\n\t %\n\t \\int\\limits_\\Sigma \\left[\\\
nabla_\\Sigma\\cdot\\bb a\\right](\\x)\\;\\dd S\n\t\t\t\t\t\t\t= \\int\\limits_{\\\
partial\\Sigma} \\left[\\bb a\\cdot\\bs\\nu_{\\partial\\Sigma}\\right](\\x)\\\
,\\dd l \n\t\t\t\t\t\t\t-\\int\\limits_\\Sigma\\left[ 2\\kappa_Ma_\\nu\\right](\\\
x)\\;\\dd S.\n\t \\label{eq:surface_div_2}\n\t %\n\t \
\ \\end{align}\n\t %"
- '\label{theo:helper3}
Assume that $\{\PP_N\}_{N\ge 1}$ is a sequence of probability measures that is
HT-appropriate in the sense of \cref{def:appropriate} and satisfies the LLN in
the sense of \cref{def:LLN}.
Let $(\kappa_n)_{n\ge 1}$ and $(m_n)_{n\ge 1}$ be the sequences that arise from
these definitions.
Moreover, assume that there exists a constant $C>0$ such that $|\kappa_n|\leq
C^n$, for all $n \geq 1$.
Then $(m_n)_{n\ge 1}$ is the sequence of moments of a unique probability measure
on $\R$.'
- source_sentence: What is the error estimate for the eigenfunction approximation
in terms of the weak eigenvalue and the norm of the difference between the exact
and approximate eigenfunctions?
sentences:
- "Consider dynamics \\eqref{avg} and define the corresponding average dynamics\
\ as $\\label{T-avg}\n\\mathring{\\chi} = \\epsilon h_{av}(\\chi)$, with the average\
\ function defined as\n\\begin{equation*} \nh_{av}(\\chi):=\\lim_{T \\to \\infty}\
\ \\frac{1}{T}\\int_{t}^{t+T} h(\\mu, \\chi, 0) d \\mu, \\ T>0,\n\\end{equation*}\n\
both \\eqref{avg} and \\eqref{T-avg} twice differentiable and bounded in every\
\ compact set of the $\\chi$-domain $\\mathcal{D} \\subset \\mathbb{R}^{3}$. \n\
%\nLet $\\chi(\\tau,\\epsilon)$ and $\\chi_{av}(\\epsilon\\tau)$ denote the solutions\
\ of \\eqref{avg} and \\eqref{T-avg}, respectively. If $\\chi_{av}(\\epsilon\\\
tau)\\in \\mathcal{D}$ for all $\\tau\\in[0,\\zeta/\\epsilon]$, $\\zeta\\geq 0$,\
\ and $\\chi(0,\\epsilon) - \\chi_{av}(0)=\\mathcal{O}(\\nu(\\epsilon))$, then\
\ there exists an $\\epsilon^{*}>0$ such that for all $0<\\epsilon<\\epsilon^{*}$,\
\ $\\chi(\\tau,\\epsilon)$ is well defined and\n$$\n\\chi(\\tau,\\epsilon) - \\\
chi_{av}(\\epsilon\\tau) = \\mathcal{O}(\\nu(\\epsilon)) \\ \\textnormal{on} \\\
\ \\tau \\in [0, \\zeta/\\epsilon],\n$$\nfor some function $\\nu\\in \\mathcal{K}$."
- "(\\cite{DangWangXieZhou})\\label{Theorem_Error_Estimate_k}\nLet us define the\
\ spectral projection $F_{k,h}^{(\\ell)}: V\\mapsto {\\rm span}\\{u_{1,h}^{(\\\
ell)}, \\cdots, u_{k,h}^{(\\ell)}\\}$ for any integer $\\ell \\geq 1$ as follows:\n\
\\begin{eqnarray*}\na(F_{k,h}^{(\\ell)}w, u_{i,h}^{(\\ell)}) = a(w, u_{i,h}^{(\\\
ell)}), \\ \\ \\ i=1, \\cdots, k\\ \\ {\\rm for}\\ w\\in V.\n\\end{eqnarray*}\n\
Then the exact eigenfunctions $\\bar u_{1,h},\\cdots, \\bar u_{k,h}$ of (\\ref{Weak_Eigenvalue_Discrete})\
\ and the eigenfunction approximations $u_{1,h}^{(\\ell+1)}$, $\\cdots$, $u_{k,h}^{(\\\
ell+1)}$ from Algorithm \\ref{Algorithm_k} with the integer $\\ell > 1$ have the\
\ following error estimate:\n\\begin{eqnarray*}\\label{Error_Estimate_Inverse}\n\
\ \\left\\|\\bar u_{i,h} - F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\|_a \\leq\n\
\ \\bar\\lambda_{i,h} \\sqrt{1+\\frac{\\eta_a^2(V_H)}{\\bar\\lambda_{1,h}\\big(\\\
delta_{k,i,h}^{(\\ell+1)}\\big)^2}}\n\\left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\\
ell)}}\\right)\\eta_a^2(V_H)\\left\\|\\bar u_{i,h} - F_{k,h}^{(\\ell)}\\bar u_{i,h}\
\ \\right\\|_a,\n\\end{eqnarray*}\nwhere $\\delta_{k,i,h}^{(\\ell)} $ is defined\
\ as follows:\n\\begin{eqnarray*}\n\\delta_{k,i,h}^{(\\ell)} = \\min_{j\\not\\\
in \\{1, \\cdots, k\\}}\\left|\\frac{1}{\\lambda_{j,h}^{(\\ell)}}-\\frac{1}{\\\
bar\\lambda_{i,h}}\\right|,\\ \\ \\ i=1, \\cdots, k.\n\\end{eqnarray*}\nFurthermore,\
\ the following $\\left\\|\\cdot\\right\\|_b$-norm error estimate holds:\n\\begin{eqnarray*}\n\
\\left\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h} \\right\\|_b\\leq \n\\\
left(1+\\frac{\\bar\\mu_{1,h}}{\\delta_{k,i,h}^{(\\ell+1)}}\\right)\\eta_a(V_H)\
\ \\left\\|\\bar u_{i,h} -F_{k,h}^{(\\ell+1)}\\bar u_{i,h}\\right\\|_a.\n\\end{eqnarray*}"
- "\\big[{\\bf Condition $SD1(h)$}\\big]\\label{DefnSD1(h)}\n\nIn \\cite{MDL} an\
\ approximation order $O(h^s)$, as $h\\to 0$, is proved, where $h$ is the sampling\
\ distance. The achievable order $s$ is of course limited by the smoothness order\
\ of the boundaries of $Graph(F)$. Then, the order $s$ depends upon the degree\
\ of the polynomials used to approximate the boundary near the neighborhood of\
\ points of topology change and upon the degree of splines used at regular regions.\
\ \n\nFor example, let us view Step C of the approximation algorithm described\
\ in Section 5.2 of \\cite{MDL}. \nIt is assumed that the boundary curves are\
\ $C^{2k}$ smooth, and it is implicitly assumed that $h$ is small enough so that\
\ there are $2k$ sample points close to the point of topology change, for computing\
\ the polynomial $p_{2k-1}$ therein.\nThis condition is related to the more general\
\ condition $SD(h)$ and it can serve as a practical way of checking it for the\
\ case $d=1$. That is, near a point of topology change, we check whether there\
\ are enough sample points for applying the approximation algorithm in \\cite{MDL}.\
\ We denote this condition as the $SD1(h)$ condition."
- source_sentence: Does Werner-Young's inequality imply that the convolution of two
$L^p$ spaces is always $L^r$ for $1 < r < \infty$?
sentences:
- "$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion. If $1\
\ < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition.\
\ \n %"
- "A \\emph{bond system} is a tuple $(B,C,s,t,1,\\cdot)$, where $B$ is a set of\
\ \\emph{bonds}, $C$ is a set of \\emph{content} relations, and $s,t:C\\to B$\
\ are \\emph{source} and \\emph{target} functions. For $c\\in C$ with $s(c)=x$\
\ and $t(c)=y$, we write $x\\xrightarrow{c}y$ or $c:x\\to y$, indicating that\
\ $x$ \\emph{contains} $y$. Each bond $x\\in B$ has an \\emph{identity} containment\
\ $1_x:x\\to x$, meaning every bond trivially contains itself. For $c:x\\to y$\
\ and $c':y\\to z$, their composition is $cc':x\\to z$. These data must satisfy:\n\
\ \\begin{enumerate}\n \\item Identity laws: For each $c:x\\to y$, $1_x\
\ c= c=c1_y$\n \\item Associativity: For $c:x\\to y$, $c':y\\to z$, $c'':z\\\
to w$, $c(c'c'')=(cc')c''$\n \\item Anti-symmetry: For $c:x\\to y$ and\
\ $c':y\\to x$, $x=y$\n \\item Left cancellation: For $c,c':x\\to y$ and\
\ $c'':y\\to z$, if $cc''=c'c''$, then $c=c'$\n \\end{enumerate}"
- "[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\\
in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$\
\ and\n\\begin{align*}\n \\|S\\star T\\|_{L^{r}}\\leq \\|S\\|_{\\cS^p}\\|T\\\
|_{\\cS^q}.\n\\end{align*}"
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: ModernBERT DAPT Embed DAPT Math
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: TESTING
type: TESTING
metrics:
- type: cosine_accuracy@1
value: 0.5679510844485464
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.6324411628980157
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6586294416243654
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.6938163359483156
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.5679510844485464
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.36494385479157054
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.27741116751269035
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.18192201199815417
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.026541702012005317
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.048742014322369596
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.0598887341486898
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.07516536747041261
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.25320633940615317
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.6070309695944213
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.07416668442975916
name: Cosine Map@100
---
# ModernBERT DAPT Embed DAPT Math
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Master-thesis-NAP/ModernBert-DAPT-math](https://huggingface.co/Master-thesis-NAP/ModernBert-DAPT-math)
- **Maximum Sequence Length:** 8192 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Language:** en
- **License:** apache-2.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Master-thesis-NAP/ModernBERT-DAPT-Embed-DAPT-Math")
# Run inference
sentences = [
"Does Werner-Young's inequality imply that the convolution of two $L^p$ spaces is always $L^r$ for $1 < r < \\infty$?",
"[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$ and\n\\begin{align*}\n \\|S\\star T\\|_{L^{r}}\\leq \\|S\\|_{\\cS^p}\\|T\\|_{\\cS^q}.\n\\end{align*}",
'$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion. If $1 < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition. \n %',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Information Retrieval
* Dataset: `TESTING`
* Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.568 |
| cosine_accuracy@3 | 0.6324 |
| cosine_accuracy@5 | 0.6586 |
| cosine_accuracy@10 | 0.6938 |
| cosine_precision@1 | 0.568 |
| cosine_precision@3 | 0.3649 |
| cosine_precision@5 | 0.2774 |
| cosine_precision@10 | 0.1819 |
| cosine_recall@1 | 0.0265 |
| cosine_recall@3 | 0.0487 |
| cosine_recall@5 | 0.0599 |
| cosine_recall@10 | 0.0752 |
| **cosine_ndcg@10** | **0.2532** |
| cosine_mrr@10 | 0.607 |
| cosine_map@100 | 0.0742 |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 79,876 training samples
* Columns: anchor, positive, and negative
* Approximate statistics based on the first 1000 samples:
| | anchor | positive | negative |
|:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string | string |
| details |
What is the limit of the proportion of 1's in the sequence $a_n$ as $n$ approaches infinity, given that $0 \leq 3g_n -2n \leq 4$? | Let $g_n$ be the number of $1$'s in the sequence $a_1 a_2 \cdots a_n$.
Then
\begin{equation}
0 \leq 3g_n -2n \leq 4
\label{star}
\end{equation}
for all $n$, and hence
$\lim_{n \rightarrow \infty} g_n/n = 2/3$.
\label{thm1} | \label{thm:bounds_initial}
Let $\seqq{s}$ be a sequence of rank $r$ for which the roots of the characteristic polynomial are all different. Then, for any positive integer $M$, the rank of $\seq{s^M}$ is at most
\begin{align*}
\rank s^M \leq \binom{M+r-1}{M}.
\end{align*} |
| Does the statement of \textbf{ThmConjAreTrue} imply that the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is always equal to $g(d,s)$? | \label{ThmConjAreTrue}
Conjectures \ref{Conj1} and \ref{Conj2} are true.
As a consequence,
if either $d=s \geq 1$ or $d \geq 2s+1 \geq 3$,
the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is equal to $g(d,s)$. | [{\cite[Corollary 2.2.2 with $p=3$]{BSY}}]
Let $S$ be a non-trivial Severi-Brauer surface over a perfect field $\textbf{k}$. Then $S$ does not contain points of degree $d$, where $d$ is not divisible by $3$. On the other hand $S$ contains a point of degree $3$. |
| \\emph{Is the statement \emph{If $X$ is a compact Hausdorff space, then $X$ is normal}, proven in the first isomorphism theorem for topological groups, or is it a well-known result in topology?} | }
\newcommand{\ep}{ | \label{prop:coherence}
If $X$ is a qcqs scheme, then $RX$ is coherent in the sense that the set of quasi-compact open subsets of $RX$ is closed under finite intersections and forms a basis for the topology of $RX$. |
* Loss: [TripletLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
```json
{
"distance_metric": "TripletDistanceMetric.COSINE",
"triplet_margin": 0.1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: epoch
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 8
- `learning_rate`: 2e-05
- `num_train_epochs`: 4
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `bf16`: True
- `tf32`: True
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates
#### All Hyperparameters