metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:369762
- loss:CachedMultipleNegativesRankingLoss
base_model: benjamintli/modernbert-cosqa
widget:
- source_sentence: Return a Python AST node for `recur` occurring inside a `loop`.
sentences:
- |-
def _reset(self, name=None):
"""Revert specified property to default value
If no property is specified, all properties are returned to default.
"""
if name is None:
for key in self._props:
if isinstance(self._props[key], basic.Property):
self._reset(key)
return
if name not in self._props:
raise AttributeError("Input name '{}' is not a known "
"property or attribute".format(name))
if not isinstance(self._props[name], basic.Property):
raise AttributeError("Cannot reset GettableProperty "
"'{}'".format(name))
if name in self._defaults:
val = self._defaults[name]
else:
val = self._props[name].default
if callable(val):
val = val()
setattr(self, name, val)
- |-
def cancel(self):
'''
Cancel a running workflow.
Args:
None
Returns:
None
'''
if not self.id:
raise WorkflowError('Workflow is not running. Cannot cancel.')
if self.batch_values:
self.workflow.batch_workflow_cancel(self.id)
else:
self.workflow.cancel(self.id)
- >-
def __loop_recur_to_py_ast(ctx: GeneratorContext, node: Recur) ->
GeneratedPyAST:
"""Return a Python AST node for `recur` occurring inside a `loop`."""
assert node.op == NodeOp.RECUR
recur_deps: List[ast.AST] = []
recur_targets: List[ast.Name] = []
recur_exprs: List[ast.AST] = []
for name, expr in zip(ctx.recur_point.binding_names, node.exprs):
expr_ast = gen_py_ast(ctx, expr)
recur_deps.extend(expr_ast.dependencies)
recur_targets.append(ast.Name(id=name, ctx=ast.Store()))
recur_exprs.append(expr_ast.node)
if len(recur_targets) == 1:
assert len(recur_exprs) == 1
recur_deps.append(ast.Assign(targets=recur_targets, value=recur_exprs[0]))
else:
recur_deps.append(
ast.Assign(
targets=[ast.Tuple(elts=recur_targets, ctx=ast.Store())],
value=ast.Tuple(elts=recur_exprs, ctx=ast.Load()),
)
)
recur_deps.append(ast.Continue())
return GeneratedPyAST(node=ast.NameConstant(None), dependencies=recur_deps)
- source_sentence: |-
Create a :class:`~turicreate.linear_regression.LinearRegression` to
predict a scalar target variable as a linear function of one or more
features. In addition to standard numeric and categorical types, features
can also be extracted automatically from list- or dictionary-type SFrame
columns.
The linear regression module can be used for ridge regression, Lasso, and
elastic net regression (see References for more detail on these methods). By
default, this model has an l2 regularization weight of 0.01.
Parameters
----------
dataset : SFrame
The dataset to use for training the model.
target : string
Name of the column containing the target variable.
features : list[string], optional
Names of the columns containing features. 'None' (the default) indicates
that all columns except the target variable should be used as features.
The features are columns in the input SFrame that can be of the
following types:
- *Numeric*: values of numeric type integer or float.
- *Categorical*: values of type string.
- *Array*: list of numeric (integer or float) values. Each list element
is treated as a separate feature in the model.
- *Dictionary*: key-value pairs with numeric (integer or float) values
Each key of a dictionary is treated as a separate feature and the
value in the dictionary corresponds to the value of the feature.
Dictionaries are ideal for representing sparse data.
Columns of type *list* are not supported. Convert such feature
columns to type array if all entries in the list are of numeric
types. If the lists contain data of mixed types, separate
them out into different columns.
l2_penalty : float, optional
Weight on the l2-regularizer of the model. The larger this weight, the
more the model coefficients shrink toward 0. This introduces bias into
the model but decreases variance, potentially leading to better
predictions. The default value is 0.01; setting this parameter to 0
corresponds to unregularized linear regression. See the ridge
regression reference for more detail.
l1_penalty : float, optional
Weight on l1 regularization of the model. Like the l2 penalty, the
higher the l1 penalty, the more the estimated coefficients shrink toward
0. The l1 penalty, however, completely zeros out sufficiently small
coefficients, automatically indicating features that are not useful for
the model. The default weight of 0 prevents any features from being
discarded. See the LASSO regression reference for more detail.
solver : string, optional
Solver to use for training the model. See the references for more detail
on each solver.
- *auto (default)*: automatically chooses the best solver for the data
and model parameters.
- *newton*: Newton-Raphson
- *lbfgs*: limited memory BFGS
- *fista*: accelerated gradient descent
The model is trained using a carefully engineered collection of methods
that are automatically picked based on the input data. The ``newton``
method works best for datasets with plenty of examples and few features
(long datasets). Limited memory BFGS (``lbfgs``) is a robust solver for
wide datasets (i.e datasets with many coefficients). ``fista`` is the
default solver for l1-regularized linear regression. The solvers are
all automatically tuned and the default options should function well.
See the solver options guide for setting additional parameters for each
of the solvers.
See the user guide for additional details on how the solver is chosen.
feature_rescaling : boolean, optional
Feature rescaling is an important pre-processing step that ensures that
all features are on the same scale. An l2-norm rescaling is performed
to make sure that all features are of the same norm. Categorical
features are also rescaled by rescaling the dummy variables that are
used to represent them. The coefficients are returned in original scale
of the problem. This process is particularly useful when features
vary widely in their ranges.
validation_set : SFrame, optional
A dataset for monitoring the model's generalization performance.
For each row of the progress table, the chosen metrics are computed
for both the provided training dataset and the validation_set. The
format of this SFrame must be the same as the training set.
By default this argument is set to 'auto' and a validation set is
automatically sampled and used for progress printing. If
validation_set is set to None, then no additional metrics
are computed. The default value is 'auto'.
convergence_threshold : float, optional
Convergence is tested using variation in the training objective. The
variation in the training objective is calculated using the difference
between the objective values between two steps. Consider reducing this
below the default value (0.01) for a more accurately trained model.
Beware of overfitting (i.e a model that works well only on the training
data) if this parameter is set to a very low value.
lbfgs_memory_level : int, optional
The L-BFGS algorithm keeps track of gradient information from the
previous ``lbfgs_memory_level`` iterations. The storage requirement for
each of these gradients is the ``num_coefficients`` in the problem.
Increasing the ``lbfgs_memory_level`` can help improve the quality of
the model trained. Setting this to more than ``max_iterations`` has the
same effect as setting it to ``max_iterations``.
max_iterations : int, optional
The maximum number of allowed passes through the data. More passes over
the data can result in a more accurately trained model. Consider
increasing this (the default value is 10) if the training accuracy is
low and the *Grad-Norm* in the display is large.
step_size : float, optional (fista only)
The starting step size to use for the ``fista`` and ``gd`` solvers. The
default is set to 1.0, this is an aggressive setting. If the first
iteration takes a considerable amount of time, reducing this parameter
may speed up model training.
verbose : bool, optional
If True, print progress updates.
Returns
-------
out : LinearRegression
A trained model of type
:class:`~turicreate.linear_regression.LinearRegression`.
See Also
--------
LinearRegression, turicreate.boosted_trees_regression.BoostedTreesRegression, turicreate.regression.create
Notes
-----
- Categorical variables are encoded by creating dummy variables. For a
variable with :math:`K` categories, the encoding creates :math:`K-1` dummy
variables, while the first category encountered in the data is used as the
baseline.
- For prediction and evaluation of linear regression models with sparse
dictionary inputs, new keys/columns that were not seen during training
are silently ignored.
- Any 'None' values in the data will result in an error being thrown.
- A constant term is automatically added for the model intercept. This term
is not regularized.
- Standard errors on coefficients are only available when `solver=newton`
or when the default `auto` solver option chooses the newton method and if
the number of examples in the training data is more than the number of
coefficients. If standard errors cannot be estimated, a column of `None`
values are returned.
References
----------
- Hoerl, A.E. and Kennard, R.W. (1970) `Ridge regression: Biased Estimation
for Nonorthogonal Problems
<http://amstat.tandfonline.com/doi/abs/10.1080/00401706.1970.10488634>`_.
Technometrics 12(1) pp.55-67
- Tibshirani, R. (1996) `Regression Shrinkage and Selection via the Lasso <h
ttp://www.jstor.org/discover/10.2307/2346178?uid=3739256&uid=2&uid=4&sid=2
1104169934983>`_. Journal of the Royal Statistical Society. Series B
(Methodological) 58(1) pp.267-288.
- Zhu, C., et al. (1997) `Algorithm 778: L-BFGS-B: Fortran subroutines for
large-scale bound-constrained optimization
<https://dl.acm.org/citation.cfm?id=279236>`_. ACM Transactions on
Mathematical Software 23(4) pp.550-560.
- Barzilai, J. and Borwein, J. `Two-Point Step Size Gradient Methods
<http://imajna.oxfordjournals.org/content/8/1/141.short>`_. IMA Journal of
Numerical Analysis 8(1) pp.141-148.
- Beck, A. and Teboulle, M. (2009) `A Fast Iterative Shrinkage-Thresholding
Algorithm for Linear Inverse Problems
<http://epubs.siam.org/doi/abs/10.1137/080716542>`_. SIAM Journal on
Imaging Sciences 2(1) pp.183-202.
- Zhang, T. (2004) `Solving large scale linear prediction problems using
stochastic gradient descent algorithms
<https://dl.acm.org/citation.cfm?id=1015332>`_. ICML '04: Proceedings of
the twenty-first international conference on Machine learning p.116.
Examples
--------
Given an :class:`~turicreate.SFrame` ``sf`` with a list of columns
[``feature_1`` ... ``feature_K``] denoting features and a target column
``target``, we can create a
:class:`~turicreate.linear_regression.LinearRegression` as follows:
>>> data = turicreate.SFrame('https://static.turi.com/datasets/regression/houses.csv')
>>> model = turicreate.linear_regression.create(data, target='price',
... features=['bath', 'bedroom', 'size'])
For ridge regression, we can set the ``l2_penalty`` parameter higher (the
default is 0.01). For Lasso regression, we set the l1_penalty higher, and
for elastic net, we set both to be higher.
.. sourcecode:: python
# Ridge regression
>>> model_ridge = turicreate.linear_regression.create(data, 'price', l2_penalty=0.1)
# Lasso
>>> model_lasso = turicreate.linear_regression.create(data, 'price', l2_penalty=0.,
l1_penalty=1.0)
# Elastic net regression
>>> model_enet = turicreate.linear_regression.create(data, 'price', l2_penalty=0.5,
l1_penalty=0.5)
sentences:
- >-
def create(dataset, target, features=None, l2_penalty=1e-2,
l1_penalty=0.0,
solver='auto', feature_rescaling=True,
convergence_threshold = _DEFAULT_SOLVER_OPTIONS['convergence_threshold'],
step_size = _DEFAULT_SOLVER_OPTIONS['step_size'],
lbfgs_memory_level = _DEFAULT_SOLVER_OPTIONS['lbfgs_memory_level'],
max_iterations = _DEFAULT_SOLVER_OPTIONS['max_iterations'],
validation_set = "auto",
verbose=True):
"""
Create a :class:`~turicreate.linear_regression.LinearRegression` to
predict a scalar target variable as a linear function of one or more
features. In addition to standard numeric and categorical types, features
can also be extracted automatically from list- or dictionary-type SFrame
columns.
The linear regression module can be used for ridge regression, Lasso, and
elastic net regression (see References for more detail on these methods). By
default, this model has an l2 regularization weight of 0.01.
Parameters
----------
dataset : SFrame
The dataset to use for training the model.
target : string
Name of the column containing the target variable.
features : list[string], optional
Names of the columns containing features. 'None' (the default) indicates
that all columns except the target variable should be used as features.
The features are columns in the input SFrame that can be of the
following types:
- *Numeric*: values of numeric type integer or float.
- *Categorical*: values of type string.
- *Array*: list of numeric (integer or float) values. Each list element
is treated as a separate feature in the model.
- *Dictionary*: key-value pairs with numeric (integer or float) values
Each key of a dictionary is treated as a separate feature and the
value in the dictionary corresponds to the value of the feature.
Dictionaries are ideal for representing sparse data.
Columns of type *list* are not supported. Convert such feature
columns to type array if all entries in the list are of numeric
types. If the lists contain data of mixed types, separate
them out into different columns.
l2_penalty : float, optional
Weight on the l2-regularizer of the model. The larger this weight, the
more the model coefficients shrink toward 0. This introduces bias into
the model but decreases variance, potentially leading to better
predictions. The default value is 0.01; setting this parameter to 0
corresponds to unregularized linear regression. See the ridge
regression reference for more detail.
l1_penalty : float, optional
Weight on l1 regularization of the model. Like the l2 penalty, the
higher the l1 penalty, the more the estimated coefficients shrink toward
0. The l1 penalty, however, completely zeros out sufficiently small
coefficients, automatically indicating features that are not useful for
the model. The default weight of 0 prevents any features from being
discarded. See the LASSO regression reference for more detail.
solver : string, optional
Solver to use for training the model. See the references for more detail
on each solver.
- *auto (default)*: automatically chooses the best solver for the data
and model parameters.
- *newton*: Newton-Raphson
- *lbfgs*: limited memory BFGS
- *fista*: accelerated gradient descent
The model is trained using a carefully engineered collection of methods
that are automatically picked based on the input data. The ``newton``
method works best for datasets with plenty of examples and few features
(long datasets). Limited memory BFGS (``lbfgs``) is a robust solver for
wide datasets (i.e datasets with many coefficients). ``fista`` is the
default solver for l1-regularized linear regression. The solvers are
all automatically tuned and the default options should function well.
See the solver options guide for setting additional parameters for each
of the solvers.
See the user guide for additional details on how the solver is chosen.
feature_rescaling : boolean, optional
Feature rescaling is an important pre-processing step that ensures that
all features are on the same scale. An l2-norm rescaling is performed
to make sure that all features are of the same norm. Categorical
features are also rescaled by rescaling the dummy variables that are
used to represent them. The coefficients are returned in original scale
of the problem. This process is particularly useful when features
vary widely in their ranges.
validation_set : SFrame, optional
A dataset for monitoring the model's generalization performance.
For each row of the progress table, the chosen metrics are computed
for both the provided training dataset and the validation_set. The
format of this SFrame must be the same as the training set.
By default this argument is set to 'auto' and a validation set is
automatically sampled and used for progress printing. If
validation_set is set to None, then no additional metrics
are computed. The default value is 'auto'.
convergence_threshold : float, optional
Convergence is tested using variation in the training objective. The
variation in the training objective is calculated using the difference
between the objective values between two steps. Consider reducing this
below the default value (0.01) for a more accurately trained model.
Beware of overfitting (i.e a model that works well only on the training
data) if this parameter is set to a very low value.
lbfgs_memory_level : int, optional
The L-BFGS algorithm keeps track of gradient information from the
previous ``lbfgs_memory_level`` iterations. The storage requirement for
each of these gradients is the ``num_coefficients`` in the problem.
Increasing the ``lbfgs_memory_level`` can help improve the quality of
the model trained. Setting this to more than ``max_iterations`` has the
same effect as setting it to ``max_iterations``.
max_iterations : int, optional
The maximum number of allowed passes through the data. More passes over
the data can result in a more accurately trained model. Consider
increasing this (the default value is 10) if the training accuracy is
low and the *Grad-Norm* in the display is large.
step_size : float, optional (fista only)
The starting step size to use for the ``fista`` and ``gd`` solvers. The
default is set to 1.0, this is an aggressive setting. If the first
iteration takes a considerable amount of time, reducing this parameter
may speed up model training.
verbose : bool, optional
If True, print progress updates.
Returns
-------
out : LinearRegression
A trained model of type
:class:`~turicreate.linear_regression.LinearRegression`.
See Also
--------
LinearRegression, turicreate.boosted_trees_regression.BoostedTreesRegression, turicreate.regression.create
Notes
-----
- Categorical variables are encoded by creating dummy variables. For a
variable with :math:`K` categories, the encoding creates :math:`K-1` dummy
variables, while the first category encountered in the data is used as the
baseline.
- For prediction and evaluation of linear regression models with sparse
dictionary inputs, new keys/columns that were not seen during training
are silently ignored.
- Any 'None' values in the data will result in an error being thrown.
- A constant term is automatically added for the model intercept. This term
is not regularized.
- Standard errors on coefficients are only available when `solver=newton`
or when the default `auto` solver option chooses the newton method and if
the number of examples in the training data is more than the number of
coefficients. If standard errors cannot be estimated, a column of `None`
values are returned.
References
----------
- Hoerl, A.E. and Kennard, R.W. (1970) `Ridge regression: Biased Estimation
for Nonorthogonal Problems
<http://amstat.tandfonline.com/doi/abs/10.1080/00401706.1970.10488634>`_.
Technometrics 12(1) pp.55-67
- Tibshirani, R. (1996) `Regression Shrinkage and Selection via the Lasso <h
ttp://www.jstor.org/discover/10.2307/2346178?uid=3739256&uid=2&uid=4&sid=2
1104169934983>`_. Journal of the Royal Statistical Society. Series B
(Methodological) 58(1) pp.267-288.
- Zhu, C., et al. (1997) `Algorithm 778: L-BFGS-B: Fortran subroutines for
large-scale bound-constrained optimization
<https://dl.acm.org/citation.cfm?id=279236>`_. ACM Transactions on
Mathematical Software 23(4) pp.550-560.
- Barzilai, J. and Borwein, J. `Two-Point Step Size Gradient Methods
<http://imajna.oxfordjournals.org/content/8/1/141.short>`_. IMA Journal of
Numerical Analysis 8(1) pp.141-148.
- Beck, A. and Teboulle, M. (2009) `A Fast Iterative Shrinkage-Thresholding
Algorithm for Linear Inverse Problems
<http://epubs.siam.org/doi/abs/10.1137/080716542>`_. SIAM Journal on
Imaging Sciences 2(1) pp.183-202.
- Zhang, T. (2004) `Solving large scale linear prediction problems using
stochastic gradient descent algorithms
<https://dl.acm.org/citation.cfm?id=1015332>`_. ICML '04: Proceedings of
the twenty-first international conference on Machine learning p.116.
Examples
--------
Given an :class:`~turicreate.SFrame` ``sf`` with a list of columns
[``feature_1`` ... ``feature_K``] denoting features and a target column
``target``, we can create a
:class:`~turicreate.linear_regression.LinearRegression` as follows:
>>> data = turicreate.SFrame('https://static.turi.com/datasets/regression/houses.csv')
>>> model = turicreate.linear_regression.create(data, target='price',
... features=['bath', 'bedroom', 'size'])
For ridge regression, we can set the ``l2_penalty`` parameter higher (the
default is 0.01). For Lasso regression, we set the l1_penalty higher, and
for elastic net, we set both to be higher.
.. sourcecode:: python
# Ridge regression
>>> model_ridge = turicreate.linear_regression.create(data, 'price', l2_penalty=0.1)
# Lasso
>>> model_lasso = turicreate.linear_regression.create(data, 'price', l2_penalty=0.,
l1_penalty=1.0)
# Elastic net regression
>>> model_enet = turicreate.linear_regression.create(data, 'price', l2_penalty=0.5,
l1_penalty=0.5)
"""
# Regression model names.
model_name = "regression_linear_regression"
solver = solver.lower()
model = _sl.create(dataset, target, model_name, features=features,
validation_set = validation_set,
solver = solver, verbose = verbose,
l2_penalty=l2_penalty, l1_penalty = l1_penalty,
feature_rescaling = feature_rescaling,
convergence_threshold = convergence_threshold,
step_size = step_size,
lbfgs_memory_level = lbfgs_memory_level,
max_iterations = max_iterations)
return LinearRegression(model.__proxy__)
- |-
def restore(self) -> None:
"""
Restore the backed-up (non-average) parameter values.
"""
for name, parameter in self._parameters:
parameter.data.copy_(self._backups[name])
- |-
def _get_sdict(self, env):
"""
Returns a dictionary mapping all of the source suffixes of all
src_builders of this Builder to the underlying Builder that
should be called first.
This dictionary is used for each target specified, so we save a
lot of extra computation by memoizing it for each construction
environment.
Note that this is re-computed each time, not cached, because there
might be changes to one of our source Builders (or one of their
source Builders, and so on, and so on...) that we can't "see."
The underlying methods we call cache their computed values,
though, so we hope repeatedly aggregating them into a dictionary
like this won't be too big a hit. We may need to look for a
better way to do this if performance data show this has turned
into a significant bottleneck.
"""
sdict = {}
for bld in self.get_src_builders(env):
for suf in bld.src_suffixes(env):
sdict[suf] = bld
return sdict
- source_sentence: Traverse the tree below node looking for 'yield [expr]'.
sentences:
- |-
def retrieve_sources():
"""Retrieve sources using spectool
"""
spectool = find_executable('spectool')
if not spectool:
log.warn('spectool is not installed')
return
try:
specfile = spec_fn()
except Exception:
return
cmd = [spectool, "-g", specfile]
output = subprocess.check_output(' '.join(cmd), shell=True)
log.warn(output)
- "def check_subscription(self, request):\n\t\t\"\"\"Redirect to the subscribe page if the user lacks an active subscription.\"\"\"\n\t\tsubscriber = subscriber_request_callback(request)\n\n\t\tif not subscriber_has_active_subscription(subscriber):\n\t\t\tif not SUBSCRIPTION_REDIRECT:\n\t\t\t\traise ImproperlyConfigured(\"DJSTRIPE_SUBSCRIPTION_REDIRECT is not set.\")\n\t\t\treturn redirect(SUBSCRIPTION_REDIRECT)"
- |-
def is_generator(self, node):
"""Traverse the tree below node looking for 'yield [expr]'."""
results = {}
if self.yield_expr.match(node, results):
return True
for child in node.children:
if child.type not in (syms.funcdef, syms.classdef):
if self.is_generator(child):
return True
return False
- source_sentence: >-
Retrieves the content of an input given a DataSource. The input acts like
a filter over the outputs of the DataSource.
Args:
name (str): The name of the input.
ds (openflow.DataSource): The DataSource that will feed the data.
Returns:
pandas.DataFrame: The content of the input.
sentences:
- |-
def valid_state(state: str) -> bool:
"""Validate State Argument
Checks that either 'on' or 'off' was entered as an argument to the
CLI and make it lower case.
:param state: state to validate.
:returns: True if state is valid.
.. versionchanged:: 0.0.12
This moethod was renamed from validateState to valid_state to conform
to PEP-8. Also removed "magic" text for state and instead reference the
_VALID_STATES constant.
"""
lower_case_state = state.lower()
if lower_case_state in _VALID_STATES:
return True
return False
- |-
def get_input(self, name, ds):
"""
Retrieves the content of an input given a DataSource. The input acts like a filter over the outputs of the DataSource.
Args:
name (str): The name of the input.
ds (openflow.DataSource): The DataSource that will feed the data.
Returns:
pandas.DataFrame: The content of the input.
"""
columns = self.inputs.get(name)
df = ds.get_dataframe()
# set defaults
for column in columns:
if column not in df.columns:
df[column] = self.defaults.get(column)
return df[columns]
- |-
def get_scenario_data(scenario_id,**kwargs):
"""
Get all the datasets from the group with the specified name
@returns a list of dictionaries
"""
user_id = kwargs.get('user_id')
scenario_data = db.DBSession.query(Dataset).filter(Dataset.id==ResourceScenario.dataset_id, ResourceScenario.scenario_id==scenario_id).options(joinedload_all('metadata')).distinct().all()
for sd in scenario_data:
if sd.hidden == 'Y':
try:
sd.check_read_permission(user_id)
except:
sd.value = None
sd.metadata = []
db.DBSession.expunge_all()
log.info("Retrieved %s datasets", len(scenario_data))
return scenario_data
- source_sentence: |-
Split the data object along a given expression, in units.
Parameters
----------
expression : int or str
The expression to split along. If given as an integer, the axis at that index
is used.
positions : number-type or 1D array-type
The position(s) to split at, in units.
units : str (optional)
The units of the given positions. Default is same, which assumes
input units are identical to first variable units.
parent : WrightTools.Collection (optional)
The parent collection in which to place the 'split' collection.
Default is a new Collection.
verbose : bool (optional)
Toggle talkback. Default is True.
Returns
-------
WrightTools.collection.Collection
A Collection of data objects.
The order of the objects is such that the axis points retain their original order.
See Also
--------
chop
Divide the dataset into its lower-dimensionality components.
collapse
Collapse the dataset along one axis.
sentences:
- >-
def add_item(self, title, key, synonyms=None, description=None,
img_url=None):
"""Adds item to a list or carousel card.
A list must contain at least 2 items, each requiring a title and object key.
Arguments:
title {str} -- Name of the item object
key {str} -- Key refering to the item.
This string will be used to send a query to your app if selected
Keyword Arguments:
synonyms {list} -- Words and phrases the user may send to select the item
(default: {None})
description {str} -- A description of the item (default: {None})
img_url {str} -- URL of the image to represent the item (default: {None})
"""
item = build_item(title, key, synonyms, description, img_url)
self._items.append(item)
return self
- |-
def compare(a, b):
"""Compares two timestamps.
``a`` and ``b`` must be the same type, in addition to normal
representations of timestamps that order naturally, they can be rfc3339
formatted strings.
Args:
a (string|object): a timestamp
b (string|object): another timestamp
Returns:
int: -1 if a < b, 0 if a == b or 1 if a > b
Raises:
ValueError: if a or b are not the same type
ValueError: if a or b strings but not in valid rfc3339 format
"""
a_is_text = isinstance(a, basestring)
b_is_text = isinstance(b, basestring)
if type(a) != type(b) and not (a_is_text and b_is_text):
_logger.error(u'Cannot compare %s to %s, types differ %s!=%s',
a, b, type(a), type(b))
raise ValueError(u'cannot compare inputs of differing types')
if a_is_text:
a = from_rfc3339(a, with_nanos=True)
b = from_rfc3339(b, with_nanos=True)
if a < b:
return -1
elif a > b:
return 1
else:
return 0
- |-
def split(
self, expression, positions, *, units=None, parent=None, verbose=True
) -> wt_collection.Collection:
"""
Split the data object along a given expression, in units.
Parameters
----------
expression : int or str
The expression to split along. If given as an integer, the axis at that index
is used.
positions : number-type or 1D array-type
The position(s) to split at, in units.
units : str (optional)
The units of the given positions. Default is same, which assumes
input units are identical to first variable units.
parent : WrightTools.Collection (optional)
The parent collection in which to place the 'split' collection.
Default is a new Collection.
verbose : bool (optional)
Toggle talkback. Default is True.
Returns
-------
WrightTools.collection.Collection
A Collection of data objects.
The order of the objects is such that the axis points retain their original order.
See Also
--------
chop
Divide the dataset into its lower-dimensionality components.
collapse
Collapse the dataset along one axis.
"""
# axis ------------------------------------------------------------------------------------
old_expr = self.axis_expressions
old_units = self.units
out = wt_collection.Collection(name="split", parent=parent)
if isinstance(expression, int):
if units is None:
units = self._axes[expression].units
expression = self._axes[expression].expression
elif isinstance(expression, str):
pass
else:
raise TypeError("expression: expected {int, str}, got %s" % type(expression))
self.transform(expression)
if units:
self.convert(units)
try:
positions = [-np.inf] + sorted(list(positions)) + [np.inf]
except TypeError:
positions = [-np.inf, positions, np.inf]
values = self._axes[0].full
masks = [(values >= lo) & (values < hi) for lo, hi in wt_kit.pairwise(positions)]
omasks = []
cuts = []
for mask in masks:
try:
omasks.append(wt_kit.mask_reduce(mask))
cuts.append([i == 1 for i in omasks[-1].shape])
# Ensure at least one axis is kept
if np.all(cuts[-1]):
cuts[-1][0] = False
except ValueError:
omasks.append(None)
cuts.append(None)
for i in range(len(positions) - 1):
out.create_data("split%03i" % i)
for var in self.variables:
for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):
if omask is None:
# Zero length split
continue
omask = wt_kit.enforce_mask_shape(omask, var.shape)
omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])
out_arr = np.full(omask.shape, np.nan)
imask = wt_kit.enforce_mask_shape(imask, var.shape)
out_arr[omask] = var[:][imask]
out[i].create_variable(values=out_arr, **var.attrs)
for ch in self.channels:
for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):
if omask is None:
# Zero length split
continue
omask = wt_kit.enforce_mask_shape(omask, ch.shape)
omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])
out_arr = np.full(omask.shape, np.nan)
imask = wt_kit.enforce_mask_shape(imask, ch.shape)
out_arr[omask] = ch[:][imask]
out[i].create_channel(values=out_arr, **ch.attrs)
if verbose:
for d in out.values():
try:
d.transform(expression)
except IndexError:
continue
print("split data into {0} pieces along <{1}>:".format(len(positions) - 1, expression))
for i, (lo, hi) in enumerate(wt_kit.pairwise(positions)):
new_data = out[i]
if new_data.shape == ():
print(" {0} : None".format(i))
else:
new_axis = new_data.axes[0]
print(
" {0} : {1:0.2f} to {2:0.2f} {3} {4}".format(
i, lo, hi, new_axis.units, new_axis.shape
)
)
for d in out.values():
try:
d.transform(*old_expr)
keep = []
keep_units = []
for ax in d.axes:
if ax.size > 1:
keep.append(ax.expression)
keep_units.append(ax.units)
else:
d.create_constant(ax.expression, verbose=False)
d.transform(*keep)
for ax, u in zip(d.axes, keep_units):
ax.convert(u)
except IndexError:
continue
tempax = Axis(d, expression)
if all(
np.all(
np.sum(~np.isnan(tempax.masked), axis=tuple(set(range(tempax.ndim)) - {j}))
<= 1
)
for j in range(tempax.ndim)
):
d.create_constant(expression, verbose=False)
self.transform(*old_expr)
for ax, u in zip(self.axes, old_units):
ax.convert(u)
return out
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on benjamintli/modernbert-cosqa
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: eval
type: eval
metrics:
- type: cosine_accuracy@1
value: 0.9480526153529956
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.9703010995786662
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9751824067413422
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9806803000719351
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9480526153529956
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.32343369985955533
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19503648134826843
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09806803000719352
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9480526153529956
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9703010995786662
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9751824067413422
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9806803000719351
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9652143122800294
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9601788099886978
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9606213024321194
name: Cosine Map@100
SentenceTransformer based on benjamintli/modernbert-cosqa
This is a sentence-transformers model finetuned from benjamintli/modernbert-cosqa. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: benjamintli/modernbert-cosqa
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'OptimizedModule'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("modernbert-codesearchnet")
# Run inference
queries = [
"Split the data object along a given expression, in units.\n\n Parameters\n ----------\n expression : int or str\n The expression to split along. If given as an integer, the axis at that index\n is used.\n positions : number-type or 1D array-type\n The position(s) to split at, in units.\n units : str (optional)\n The units of the given positions. Default is same, which assumes\n input units are identical to first variable units.\n parent : WrightTools.Collection (optional)\n The parent collection in which to place the \u0027split\u0027 collection.\n Default is a new Collection.\n verbose : bool (optional)\n Toggle talkback. Default is True.\n\n Returns\n -------\n WrightTools.collection.Collection\n A Collection of data objects.\n The order of the objects is such that the axis points retain their original order.\n\n See Also\n --------\n chop\n Divide the dataset into its lower-dimensionality components.\n collapse\n Collapse the dataset along one axis.",
]
documents = [
'def split(\n self, expression, positions, *, units=None, parent=None, verbose=True\n ) -> wt_collection.Collection:\n """\n Split the data object along a given expression, in units.\n\n Parameters\n ----------\n expression : int or str\n The expression to split along. If given as an integer, the axis at that index\n is used.\n positions : number-type or 1D array-type\n The position(s) to split at, in units.\n units : str (optional)\n The units of the given positions. Default is same, which assumes\n input units are identical to first variable units.\n parent : WrightTools.Collection (optional)\n The parent collection in which to place the \'split\' collection.\n Default is a new Collection.\n verbose : bool (optional)\n Toggle talkback. Default is True.\n\n Returns\n -------\n WrightTools.collection.Collection\n A Collection of data objects.\n The order of the objects is such that the axis points retain their original order.\n\n See Also\n --------\n chop\n Divide the dataset into its lower-dimensionality components.\n collapse\n Collapse the dataset along one axis.\n """\n # axis ------------------------------------------------------------------------------------\n old_expr = self.axis_expressions\n old_units = self.units\n out = wt_collection.Collection(name="split", parent=parent)\n if isinstance(expression, int):\n if units is None:\n units = self._axes[expression].units\n expression = self._axes[expression].expression\n elif isinstance(expression, str):\n pass\n else:\n raise TypeError("expression: expected {int, str}, got %s" % type(expression))\n\n self.transform(expression)\n if units:\n self.convert(units)\n\n try:\n positions = [-np.inf] + sorted(list(positions)) + [np.inf]\n except TypeError:\n positions = [-np.inf, positions, np.inf]\n\n values = self._axes[0].full\n masks = [(values >= lo) & (values < hi) for lo, hi in wt_kit.pairwise(positions)]\n omasks = []\n cuts = []\n for mask in masks:\n try:\n omasks.append(wt_kit.mask_reduce(mask))\n cuts.append([i == 1 for i in omasks[-1].shape])\n # Ensure at least one axis is kept\n if np.all(cuts[-1]):\n cuts[-1][0] = False\n except ValueError:\n omasks.append(None)\n cuts.append(None)\n for i in range(len(positions) - 1):\n out.create_data("split%03i" % i)\n\n for var in self.variables:\n for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):\n if omask is None:\n # Zero length split\n continue\n omask = wt_kit.enforce_mask_shape(omask, var.shape)\n omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])\n out_arr = np.full(omask.shape, np.nan)\n imask = wt_kit.enforce_mask_shape(imask, var.shape)\n out_arr[omask] = var[:][imask]\n out[i].create_variable(values=out_arr, **var.attrs)\n\n for ch in self.channels:\n for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):\n if omask is None:\n # Zero length split\n continue\n omask = wt_kit.enforce_mask_shape(omask, ch.shape)\n omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])\n out_arr = np.full(omask.shape, np.nan)\n imask = wt_kit.enforce_mask_shape(imask, ch.shape)\n out_arr[omask] = ch[:][imask]\n out[i].create_channel(values=out_arr, **ch.attrs)\n\n if verbose:\n for d in out.values():\n try:\n d.transform(expression)\n except IndexError:\n continue\n\n print("split data into {0} pieces along <{1}>:".format(len(positions) - 1, expression))\n for i, (lo, hi) in enumerate(wt_kit.pairwise(positions)):\n new_data = out[i]\n if new_data.shape == ():\n print(" {0} : None".format(i))\n else:\n new_axis = new_data.axes[0]\n print(\n " {0} : {1:0.2f} to {2:0.2f} {3} {4}".format(\n i, lo, hi, new_axis.units, new_axis.shape\n )\n )\n\n for d in out.values():\n try:\n d.transform(*old_expr)\n keep = []\n keep_units = []\n for ax in d.axes:\n if ax.size > 1:\n keep.append(ax.expression)\n keep_units.append(ax.units)\n else:\n d.create_constant(ax.expression, verbose=False)\n d.transform(*keep)\n for ax, u in zip(d.axes, keep_units):\n ax.convert(u)\n except IndexError:\n continue\n tempax = Axis(d, expression)\n if all(\n np.all(\n np.sum(~np.isnan(tempax.masked), axis=tuple(set(range(tempax.ndim)) - {j}))\n <= 1\n )\n for j in range(tempax.ndim)\n ):\n d.create_constant(expression, verbose=False)\n self.transform(*old_expr)\n for ax, u in zip(self.axes, old_units):\n ax.convert(u)\n\n return out',
'def add_item(self, title, key, synonyms=None, description=None, img_url=None):\n """Adds item to a list or carousel card.\n\n A list must contain at least 2 items, each requiring a title and object key.\n\n Arguments:\n title {str} -- Name of the item object\n key {str} -- Key refering to the item.\n This string will be used to send a query to your app if selected\n\n Keyword Arguments:\n synonyms {list} -- Words and phrases the user may send to select the item\n (default: {None})\n description {str} -- A description of the item (default: {None})\n img_url {str} -- URL of the image to represent the item (default: {None})\n """\n item = build_item(title, key, synonyms, description, img_url)\n self._items.append(item)\n return self',
'def compare(a, b):\n """Compares two timestamps.\n\n ``a`` and ``b`` must be the same type, in addition to normal\n representations of timestamps that order naturally, they can be rfc3339\n formatted strings.\n\n Args:\n a (string|object): a timestamp\n b (string|object): another timestamp\n\n Returns:\n int: -1 if a < b, 0 if a == b or 1 if a > b\n\n Raises:\n ValueError: if a or b are not the same type\n ValueError: if a or b strings but not in valid rfc3339 format\n\n """\n a_is_text = isinstance(a, basestring)\n b_is_text = isinstance(b, basestring)\n if type(a) != type(b) and not (a_is_text and b_is_text):\n _logger.error(u\'Cannot compare %s to %s, types differ %s!=%s\',\n a, b, type(a), type(b))\n raise ValueError(u\'cannot compare inputs of differing types\')\n\n if a_is_text:\n a = from_rfc3339(a, with_nanos=True)\n b = from_rfc3339(b, with_nanos=True)\n\n if a < b:\n return -1\n elif a > b:\n return 1\n else:\n return 0',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.9188, 0.1817, 0.1583]])
Evaluation
Metrics
Information Retrieval
- Dataset:
eval - Evaluated with
InformationRetrievalEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.9481 |
| cosine_accuracy@3 | 0.9703 |
| cosine_accuracy@5 | 0.9752 |
| cosine_accuracy@10 | 0.9807 |
| cosine_precision@1 | 0.9481 |
| cosine_precision@3 | 0.3234 |
| cosine_precision@5 | 0.195 |
| cosine_precision@10 | 0.0981 |
| cosine_recall@1 | 0.9481 |
| cosine_recall@3 | 0.9703 |
| cosine_recall@5 | 0.9752 |
| cosine_recall@10 | 0.9807 |
| cosine_ndcg@10 | 0.9652 |
| cosine_mrr@10 | 0.9602 |
| cosine_map@100 | 0.9606 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 369,762 training samples
- Columns:
queryandpositive - Approximate statistics based on the first 1000 samples:
query positive type string string details - min: 3 tokens
- mean: 71.9 tokens
- max: 512 tokens
- min: 37 tokens
- mean: 236.1 tokens
- max: 512 tokens
- Samples:
query positive Returns group object for datacenter root group.
>>> clc.v2.Datacenter().RootGroup()
>>> print _
WA1 Hardwaredef RootGroup(self):
"""Returns group object for datacenter root group.
>>> clc.v2.Datacenter().RootGroup()
>>> print _
WA1 Hardware
"""
return(clc.v2.Group(id=self.root_group_id,alias=self.alias,session=self.session))Calculate the euclidean distance of all array positions in "matchArr".
:param matchArr: a dictionary ofnumpy.arrayscontaining at least two
entries that are treated as cartesian coordinates.
:param tKey: #TODO: docstring
:param mKey: #TODO: docstring
:returns: #TODO: docstring
{'eucDist': numpy.array([eucDistance, eucDistance, ...]),
'posPairs': numpy.array([[pos1, pos2], [pos1, pos2], ...])
}def calcDistMatchArr(matchArr, tKey, mKey):
"""Calculate the euclidean distance of all array positions in "matchArr".
:param matchArr: a dictionary ofnumpy.arrayscontaining at least two
entries that are treated as cartesian coordinates.
:param tKey: #TODO: docstring
:param mKey: #TODO: docstring
:returns: #TODO: docstring
{'eucDist': numpy.array([eucDistance, eucDistance, ...]),
'posPairs': numpy.array([[pos1, pos2], [pos1, pos2], ...])
}
"""
#Calculate all sorted list of all eucledian feature distances
matchArrSize = listvalues(matchArr)[0].size
distInfo = {'posPairs': list(), 'eucDist': list()}
_matrix = numpy.swapaxes(numpy.array([matchArr[tKey], matchArr[mKey]]), 0, 1)
for pos1 in range(matchArrSize-1):
for pos2 in range(pos1+1, matchArrSize):
distInfo['posPairs'].append((pos1, pos2))
distInfo['posPairs'] = numpy.array(distInfo['posPairs'])
distInfo['eucD...Format this verifier
Returns:
string: A formatted stringdef format(self, indent_level, indent_size=4):
"""Format this verifier
Returns:
string: A formatted string
"""
name = self.format_name('Literal', indent_size)
if self.long_desc is not None:
name += '\n'
name += self.wrap_lines('value: %s\n' % str(self._literal), 1, indent_size)
return self.wrap_lines(name, indent_level, indent_size) - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 64, "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 }
Evaluation Dataset
Unnamed Dataset
- Size: 19,462 evaluation samples
- Columns:
queryandpositive - Approximate statistics based on the first 1000 samples:
query positive type string string details - min: 3 tokens
- mean: 71.05 tokens
- max: 512 tokens
- min: 40 tokens
- mean: 236.22 tokens
- max: 512 tokens
- Samples:
query positive Create a new ParticipantInstance
:param unicode attributes: An optional string metadata field you can use to store any data you wish.
:param unicode twilio_address: The address of the Twilio phone number that the participant is in contact with.
:param datetime date_created: The date that this resource was created.
:param datetime date_updated: The date that this resource was last updated.
:param unicode identity: A unique string identifier for the session participant as Chat User.
:param unicode user_address: The address of the participant's device.
:returns: Newly created ParticipantInstance
:rtype: twilio.rest.messaging.v1.session.participant.ParticipantInstancedef create(self, attributes=values.unset, twilio_address=values.unset,
date_created=values.unset, date_updated=values.unset,
identity=values.unset, user_address=values.unset):
"""
Create a new ParticipantInstance
:param unicode attributes: An optional string metadata field you can use to store any data you wish.
:param unicode twilio_address: The address of the Twilio phone number that the participant is in contact with.
:param datetime date_created: The date that this resource was created.
:param datetime date_updated: The date that this resource was last updated.
:param unicode identity: A unique string identifier for the session participant as Chat User.
:param unicode user_address: The address of the participant's device.
:returns: Newly created ParticipantInstance
:rtype: twilio.rest.messaging.v1.session.participant.ParticipantInstance
"""
data = values.o...It returns absolute url defined by node related to this pagedef get_absolute_url(self):
"""
It returns absolute url defined by node related to this page
"""
try:
node = Node.objects.select_related().filter(page=self)[0]
return node.get_absolute_url()
except Exception, e:
raise ValueError(u"Error in {0}.{1}: {2}".format(self.module, self.class.name, e))
return u""Return the current scaled font.
:return:
A new :class:ScaledFontobject,
wrapping an existing cairo object.def get_scaled_font(self):
"""Return the current scaled font.
:return:
A new :class:ScaledFontobject,
wrapping an existing cairo object.
"""
return ScaledFont._from_pointer(
cairo.cairo_get_scaled_font(self._pointer), incref=True) - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 64, "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 8192num_train_epochs: 1learning_rate: 2e-06warmup_steps: 0.1bf16: Trueeval_strategy: epochper_device_eval_batch_size: 8192push_to_hub: Truehub_model_id: modernbert-codesearchnetload_best_model_at_end: Truedataloader_num_workers: 4batch_sampler: no_duplicates
All Hyperparameters
Click to expand
per_device_train_batch_size: 8192num_train_epochs: 1max_steps: -1learning_rate: 2e-06lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: epochper_device_eval_batch_size: 8192prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Truehub_private_repo: Nonehub_model_id: modernbert-codesearchnethub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Trueignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 4dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss | Validation Loss | eval_cosine_ndcg@10 |
|---|---|---|---|---|
| 0.2174 | 10 | 0.9210 | - | - |
| 0.4348 | 20 | 0.6679 | - | - |
| 0.6522 | 30 | 0.5007 | - | - |
| 0.8696 | 40 | 0.4181 | - | - |
| 1.0 | 46 | - | 0.0328 | 0.9652 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.3.0
- Transformers: 5.3.0
- PyTorch: 2.10.0+cu128
- Accelerate: 1.13.0
- Datasets: 4.8.2
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}