--- tags: - sentence-transformers - sentence-similarity - feature-extraction - dense - generated_from_trainer - dataset_size:369762 - loss:CachedMultipleNegativesRankingLoss base_model: benjamintli/modernbert-cosqa widget: - source_sentence: Return a Python AST node for `recur` occurring inside a `loop`. sentences: - "def _reset(self, name=None):\n \"\"\"Revert specified property to default\ \ value\n\n If no property is specified, all properties are returned to\ \ default.\n \"\"\"\n if name is None:\n for key in self._props:\n\ \ if isinstance(self._props[key], basic.Property):\n \ \ self._reset(key)\n return\n if name not in self._props:\n\ \ raise AttributeError(\"Input name '{}' is not a known \"\n \ \ \"property or attribute\".format(name))\n if\ \ not isinstance(self._props[name], basic.Property):\n raise AttributeError(\"\ Cannot reset GettableProperty \"\n \"'{}'\".format(name))\n\ \ if name in self._defaults:\n val = self._defaults[name]\n\ \ else:\n val = self._props[name].default\n if callable(val):\n\ \ val = val()\n setattr(self, name, val)" - "def cancel(self):\n '''\n Cancel a running workflow.\n\n \ \ Args:\n None\n\n Returns:\n None\n '''\n\ \ if not self.id:\n raise WorkflowError('Workflow is not running.\ \ Cannot cancel.')\n\n if self.batch_values:\n self.workflow.batch_workflow_cancel(self.id)\n\ \ else:\n self.workflow.cancel(self.id)" - "def __loop_recur_to_py_ast(ctx: GeneratorContext, node: Recur) -> GeneratedPyAST:\n\ \ \"\"\"Return a Python AST node for `recur` occurring inside a `loop`.\"\"\ \"\n assert node.op == NodeOp.RECUR\n\n recur_deps: List[ast.AST] = []\n\ \ recur_targets: List[ast.Name] = []\n recur_exprs: List[ast.AST] = []\n\ \ for name, expr in zip(ctx.recur_point.binding_names, node.exprs):\n \ \ expr_ast = gen_py_ast(ctx, expr)\n recur_deps.extend(expr_ast.dependencies)\n\ \ recur_targets.append(ast.Name(id=name, ctx=ast.Store()))\n recur_exprs.append(expr_ast.node)\n\ \n if len(recur_targets) == 1:\n assert len(recur_exprs) == 1\n \ \ recur_deps.append(ast.Assign(targets=recur_targets, value=recur_exprs[0]))\n\ \ else:\n recur_deps.append(\n ast.Assign(\n \ \ targets=[ast.Tuple(elts=recur_targets, ctx=ast.Store())],\n \ \ value=ast.Tuple(elts=recur_exprs, ctx=ast.Load()),\n )\n \ \ )\n recur_deps.append(ast.Continue())\n\n return GeneratedPyAST(node=ast.NameConstant(None),\ \ dependencies=recur_deps)" - source_sentence: "Create a :class:`~turicreate.linear_regression.LinearRegression`\ \ to\n predict a scalar target variable as a linear function of one or more\n\ \ features. In addition to standard numeric and categorical types, features\n\ \ can also be extracted automatically from list- or dictionary-type SFrame\n\ \ columns.\n\n The linear regression module can be used for ridge regression,\ \ Lasso, and\n elastic net regression (see References for more detail on these\ \ methods). By\n default, this model has an l2 regularization weight of 0.01.\n\ \n Parameters\n ----------\n dataset : SFrame\n The dataset to\ \ use for training the model.\n\n target : string\n Name of the column\ \ containing the target variable.\n\n features : list[string], optional\n \ \ Names of the columns containing features. 'None' (the default) indicates\n\ \ that all columns except the target variable should be used as features.\n\ \n The features are columns in the input SFrame that can be of the\n \ \ following types:\n\n - *Numeric*: values of numeric type integer\ \ or float.\n\n - *Categorical*: values of type string.\n\n - *Array*:\ \ list of numeric (integer or float) values. Each list element\n is treated\ \ as a separate feature in the model.\n\n - *Dictionary*: key-value pairs\ \ with numeric (integer or float) values\n Each key of a dictionary is\ \ treated as a separate feature and the\n value in the dictionary corresponds\ \ to the value of the feature.\n Dictionaries are ideal for representing\ \ sparse data.\n\n Columns of type *list* are not supported. Convert such\ \ feature\n columns to type array if all entries in the list are of numeric\n\ \ types. If the lists contain data of mixed types, separate\n them\ \ out into different columns.\n\n l2_penalty : float, optional\n Weight\ \ on the l2-regularizer of the model. The larger this weight, the\n more\ \ the model coefficients shrink toward 0. This introduces bias into\n the\ \ model but decreases variance, potentially leading to better\n predictions.\ \ The default value is 0.01; setting this parameter to 0\n corresponds\ \ to unregularized linear regression. See the ridge\n regression reference\ \ for more detail.\n\n l1_penalty : float, optional\n Weight on l1 regularization\ \ of the model. Like the l2 penalty, the\n higher the l1 penalty, the more\ \ the estimated coefficients shrink toward\n 0. The l1 penalty, however,\ \ completely zeros out sufficiently small\n coefficients, automatically\ \ indicating features that are not useful for\n the model. The default\ \ weight of 0 prevents any features from being\n discarded. See the LASSO\ \ regression reference for more detail.\n\n solver : string, optional\n \ \ Solver to use for training the model. See the references for more detail\n\ \ on each solver.\n\n - *auto (default)*: automatically chooses\ \ the best solver for the data\n and model parameters.\n - *newton*:\ \ Newton-Raphson\n - *lbfgs*: limited memory BFGS\n - *fista*: accelerated\ \ gradient descent\n\n The model is trained using a carefully engineered\ \ collection of methods\n that are automatically picked based on the input\ \ data. The ``newton``\n method works best for datasets with plenty of\ \ examples and few features\n (long datasets). Limited memory BFGS (``lbfgs``)\ \ is a robust solver for\n wide datasets (i.e datasets with many coefficients).\ \ ``fista`` is the\n default solver for l1-regularized linear regression.\ \ The solvers are\n all automatically tuned and the default options should\ \ function well.\n See the solver options guide for setting additional\ \ parameters for each\n of the solvers.\n\n See the user guide for\ \ additional details on how the solver is chosen.\n\n feature_rescaling : boolean,\ \ optional\n Feature rescaling is an important pre-processing step that\ \ ensures that\n all features are on the same scale. An l2-norm rescaling\ \ is performed\n to make sure that all features are of the same norm. Categorical\n\ \ features are also rescaled by rescaling the dummy variables that are\n\ \ used to represent them. The coefficients are returned in original scale\n\ \ of the problem. This process is particularly useful when features\n \ \ vary widely in their ranges.\n\n validation_set : SFrame, optional\n\ \n A dataset for monitoring the model's generalization performance.\n \ \ For each row of the progress table, the chosen metrics are computed\n\ \ for both the provided training dataset and the validation_set. The\n\ \ format of this SFrame must be the same as the training set.\n \ \ By default this argument is set to 'auto' and a validation set is\n automatically\ \ sampled and used for progress printing. If\n validation_set is set to\ \ None, then no additional metrics\n are computed. The default value is\ \ 'auto'.\n\n convergence_threshold : float, optional\n\n Convergence\ \ is tested using variation in the training objective. The\n variation in\ \ the training objective is calculated using the difference\n between the\ \ objective values between two steps. Consider reducing this\n below the\ \ default value (0.01) for a more accurately trained model.\n Beware of overfitting\ \ (i.e a model that works well only on the training\n data) if this parameter\ \ is set to a very low value.\n\n lbfgs_memory_level : int, optional\n\n \ \ The L-BFGS algorithm keeps track of gradient information from the\n \ \ previous ``lbfgs_memory_level`` iterations. The storage requirement for\n \ \ each of these gradients is the ``num_coefficients`` in the problem.\n \ \ Increasing the ``lbfgs_memory_level`` can help improve the quality of\n \ \ the model trained. Setting this to more than ``max_iterations`` has the\n\ \ same effect as setting it to ``max_iterations``.\n\n max_iterations\ \ : int, optional\n\n The maximum number of allowed passes through the data.\ \ More passes over\n the data can result in a more accurately trained model.\ \ Consider\n increasing this (the default value is 10) if the training accuracy\ \ is\n low and the *Grad-Norm* in the display is large.\n\n step_size\ \ : float, optional (fista only)\n\n The starting step size to use for the\ \ ``fista`` and ``gd`` solvers. The\n default is set to 1.0, this is an aggressive\ \ setting. If the first\n iteration takes a considerable amount of time,\ \ reducing this parameter\n may speed up model training.\n\n verbose :\ \ bool, optional\n If True, print progress updates.\n\n Returns\n \ \ -------\n out : LinearRegression\n A trained model of type\n \ \ :class:`~turicreate.linear_regression.LinearRegression`.\n\n See Also\n\ \ --------\n LinearRegression, turicreate.boosted_trees_regression.BoostedTreesRegression,\ \ turicreate.regression.create\n\n Notes\n -----\n - Categorical variables\ \ are encoded by creating dummy variables. For a\n variable with :math:`K`\ \ categories, the encoding creates :math:`K-1` dummy\n variables, while the\ \ first category encountered in the data is used as the\n baseline.\n\n \ \ - For prediction and evaluation of linear regression models with sparse\n\ \ dictionary inputs, new keys/columns that were not seen during training\n\ \ are silently ignored.\n\n - Any 'None' values in the data will result\ \ in an error being thrown.\n\n - A constant term is automatically added for\ \ the model intercept. This term\n is not regularized.\n\n - Standard\ \ errors on coefficients are only available when `solver=newton`\n or when\ \ the default `auto` solver option chooses the newton method and if\n the\ \ number of examples in the training data is more than the number of\n coefficients.\ \ If standard errors cannot be estimated, a column of `None`\n values are\ \ returned.\n\n\n References\n ----------\n - Hoerl, A.E. and Kennard,\ \ R.W. (1970) `Ridge regression: Biased Estimation\n for Nonorthogonal Problems\n\ \ `_.\n\ \ Technometrics 12(1) pp.55-67\n\n - Tibshirani, R. (1996) `Regression\ \ Shrinkage and Selection via the Lasso `_. Journal of the Royal Statistical Society. Series B\n\ \ (Methodological) 58(1) pp.267-288.\n\n - Zhu, C., et al. (1997) `Algorithm\ \ 778: L-BFGS-B: Fortran subroutines for\n large-scale bound-constrained\ \ optimization\n `_. ACM Transactions\ \ on\n Mathematical Software 23(4) pp.550-560.\n\n - Barzilai, J. and\ \ Borwein, J. `Two-Point Step Size Gradient Methods\n `_.\ \ IMA Journal of\n Numerical Analysis 8(1) pp.141-148.\n\n - Beck, A.\ \ and Teboulle, M. (2009) `A Fast Iterative Shrinkage-Thresholding\n Algorithm\ \ for Linear Inverse Problems\n `_.\ \ SIAM Journal on\n Imaging Sciences 2(1) pp.183-202.\n\n - Zhang, T.\ \ (2004) `Solving large scale linear prediction problems using\n stochastic\ \ gradient descent algorithms\n `_.\ \ ICML '04: Proceedings of\n the twenty-first international conference on\ \ Machine learning p.116.\n\n\n Examples\n --------\n\n Given an :class:`~turicreate.SFrame`\ \ ``sf`` with a list of columns\n [``feature_1`` ... ``feature_K``] denoting\ \ features and a target column\n ``target``, we can create a\n :class:`~turicreate.linear_regression.LinearRegression`\ \ as follows:\n\n >>> data = turicreate.SFrame('https://static.turi.com/datasets/regression/houses.csv')\n\ \n >>> model = turicreate.linear_regression.create(data, target='price',\n\ \ ... features=['bath', 'bedroom', 'size'])\n\ \n\n For ridge regression, we can set the ``l2_penalty`` parameter higher (the\n\ \ default is 0.01). For Lasso regression, we set the l1_penalty higher, and\n\ \ for elastic net, we set both to be higher.\n\n .. sourcecode:: python\n\ \n # Ridge regression\n >>> model_ridge = turicreate.linear_regression.create(data,\ \ 'price', l2_penalty=0.1)\n\n # Lasso\n >>> model_lasso = turicreate.linear_regression.create(data,\ \ 'price', l2_penalty=0.,\n \ \ l1_penalty=1.0)\n\n # Elastic net regression\n >>>\ \ model_enet = turicreate.linear_regression.create(data, 'price', l2_penalty=0.5,\n\ \ l1_penalty=0.5)" sentences: - "def create(dataset, target, features=None, l2_penalty=1e-2, l1_penalty=0.0,\n\ \ solver='auto', feature_rescaling=True,\n convergence_threshold = _DEFAULT_SOLVER_OPTIONS['convergence_threshold'],\n\ \ step_size = _DEFAULT_SOLVER_OPTIONS['step_size'],\n lbfgs_memory_level\ \ = _DEFAULT_SOLVER_OPTIONS['lbfgs_memory_level'],\n max_iterations = _DEFAULT_SOLVER_OPTIONS['max_iterations'],\n\ \ validation_set = \"auto\",\n verbose=True):\n\n \"\"\"\n Create\ \ a :class:`~turicreate.linear_regression.LinearRegression` to\n predict a\ \ scalar target variable as a linear function of one or more\n features. In\ \ addition to standard numeric and categorical types, features\n can also be\ \ extracted automatically from list- or dictionary-type SFrame\n columns.\n\ \n The linear regression module can be used for ridge regression, Lasso, and\n\ \ elastic net regression (see References for more detail on these methods).\ \ By\n default, this model has an l2 regularization weight of 0.01.\n\n \ \ Parameters\n ----------\n dataset : SFrame\n The dataset to use\ \ for training the model.\n\n target : string\n Name of the column containing\ \ the target variable.\n\n features : list[string], optional\n Names\ \ of the columns containing features. 'None' (the default) indicates\n \ \ that all columns except the target variable should be used as features.\n\n\ \ The features are columns in the input SFrame that can be of the\n \ \ following types:\n\n - *Numeric*: values of numeric type integer\ \ or float.\n\n - *Categorical*: values of type string.\n\n - *Array*:\ \ list of numeric (integer or float) values. Each list element\n is treated\ \ as a separate feature in the model.\n\n - *Dictionary*: key-value pairs\ \ with numeric (integer or float) values\n Each key of a dictionary is\ \ treated as a separate feature and the\n value in the dictionary corresponds\ \ to the value of the feature.\n Dictionaries are ideal for representing\ \ sparse data.\n\n Columns of type *list* are not supported. Convert such\ \ feature\n columns to type array if all entries in the list are of numeric\n\ \ types. If the lists contain data of mixed types, separate\n them\ \ out into different columns.\n\n l2_penalty : float, optional\n Weight\ \ on the l2-regularizer of the model. The larger this weight, the\n more\ \ the model coefficients shrink toward 0. This introduces bias into\n the\ \ model but decreases variance, potentially leading to better\n predictions.\ \ The default value is 0.01; setting this parameter to 0\n corresponds\ \ to unregularized linear regression. See the ridge\n regression reference\ \ for more detail.\n\n l1_penalty : float, optional\n Weight on l1 regularization\ \ of the model. Like the l2 penalty, the\n higher the l1 penalty, the more\ \ the estimated coefficients shrink toward\n 0. The l1 penalty, however,\ \ completely zeros out sufficiently small\n coefficients, automatically\ \ indicating features that are not useful for\n the model. The default\ \ weight of 0 prevents any features from being\n discarded. See the LASSO\ \ regression reference for more detail.\n\n solver : string, optional\n \ \ Solver to use for training the model. See the references for more detail\n\ \ on each solver.\n\n - *auto (default)*: automatically chooses\ \ the best solver for the data\n and model parameters.\n - *newton*:\ \ Newton-Raphson\n - *lbfgs*: limited memory BFGS\n - *fista*: accelerated\ \ gradient descent\n\n The model is trained using a carefully engineered\ \ collection of methods\n that are automatically picked based on the input\ \ data. The ``newton``\n method works best for datasets with plenty of\ \ examples and few features\n (long datasets). Limited memory BFGS (``lbfgs``)\ \ is a robust solver for\n wide datasets (i.e datasets with many coefficients).\ \ ``fista`` is the\n default solver for l1-regularized linear regression.\ \ The solvers are\n all automatically tuned and the default options should\ \ function well.\n See the solver options guide for setting additional\ \ parameters for each\n of the solvers.\n\n See the user guide for\ \ additional details on how the solver is chosen.\n\n feature_rescaling : boolean,\ \ optional\n Feature rescaling is an important pre-processing step that\ \ ensures that\n all features are on the same scale. An l2-norm rescaling\ \ is performed\n to make sure that all features are of the same norm. Categorical\n\ \ features are also rescaled by rescaling the dummy variables that are\n\ \ used to represent them. The coefficients are returned in original scale\n\ \ of the problem. This process is particularly useful when features\n \ \ vary widely in their ranges.\n\n validation_set : SFrame, optional\n\ \n A dataset for monitoring the model's generalization performance.\n \ \ For each row of the progress table, the chosen metrics are computed\n\ \ for both the provided training dataset and the validation_set. The\n\ \ format of this SFrame must be the same as the training set.\n \ \ By default this argument is set to 'auto' and a validation set is\n automatically\ \ sampled and used for progress printing. If\n validation_set is set to\ \ None, then no additional metrics\n are computed. The default value is\ \ 'auto'.\n\n convergence_threshold : float, optional\n\n Convergence\ \ is tested using variation in the training objective. The\n variation in\ \ the training objective is calculated using the difference\n between the\ \ objective values between two steps. Consider reducing this\n below the\ \ default value (0.01) for a more accurately trained model.\n Beware of overfitting\ \ (i.e a model that works well only on the training\n data) if this parameter\ \ is set to a very low value.\n\n lbfgs_memory_level : int, optional\n\n \ \ The L-BFGS algorithm keeps track of gradient information from the\n \ \ previous ``lbfgs_memory_level`` iterations. The storage requirement for\n \ \ each of these gradients is the ``num_coefficients`` in the problem.\n \ \ Increasing the ``lbfgs_memory_level`` can help improve the quality of\n \ \ the model trained. Setting this to more than ``max_iterations`` has the\n\ \ same effect as setting it to ``max_iterations``.\n\n max_iterations\ \ : int, optional\n\n The maximum number of allowed passes through the data.\ \ More passes over\n the data can result in a more accurately trained model.\ \ Consider\n increasing this (the default value is 10) if the training accuracy\ \ is\n low and the *Grad-Norm* in the display is large.\n\n step_size\ \ : float, optional (fista only)\n\n The starting step size to use for the\ \ ``fista`` and ``gd`` solvers. The\n default is set to 1.0, this is an aggressive\ \ setting. If the first\n iteration takes a considerable amount of time,\ \ reducing this parameter\n may speed up model training.\n\n verbose :\ \ bool, optional\n If True, print progress updates.\n\n Returns\n \ \ -------\n out : LinearRegression\n A trained model of type\n \ \ :class:`~turicreate.linear_regression.LinearRegression`.\n\n See Also\n\ \ --------\n LinearRegression, turicreate.boosted_trees_regression.BoostedTreesRegression,\ \ turicreate.regression.create\n\n Notes\n -----\n - Categorical variables\ \ are encoded by creating dummy variables. For a\n variable with :math:`K`\ \ categories, the encoding creates :math:`K-1` dummy\n variables, while the\ \ first category encountered in the data is used as the\n baseline.\n\n \ \ - For prediction and evaluation of linear regression models with sparse\n\ \ dictionary inputs, new keys/columns that were not seen during training\n\ \ are silently ignored.\n\n - Any 'None' values in the data will result\ \ in an error being thrown.\n\n - A constant term is automatically added for\ \ the model intercept. This term\n is not regularized.\n\n - Standard\ \ errors on coefficients are only available when `solver=newton`\n or when\ \ the default `auto` solver option chooses the newton method and if\n the\ \ number of examples in the training data is more than the number of\n coefficients.\ \ If standard errors cannot be estimated, a column of `None`\n values are\ \ returned.\n\n\n References\n ----------\n - Hoerl, A.E. and Kennard,\ \ R.W. (1970) `Ridge regression: Biased Estimation\n for Nonorthogonal Problems\n\ \ `_.\n\ \ Technometrics 12(1) pp.55-67\n\n - Tibshirani, R. (1996) `Regression\ \ Shrinkage and Selection via the Lasso `_. Journal of the Royal Statistical Society. Series B\n\ \ (Methodological) 58(1) pp.267-288.\n\n - Zhu, C., et al. (1997) `Algorithm\ \ 778: L-BFGS-B: Fortran subroutines for\n large-scale bound-constrained\ \ optimization\n `_. ACM Transactions\ \ on\n Mathematical Software 23(4) pp.550-560.\n\n - Barzilai, J. and\ \ Borwein, J. `Two-Point Step Size Gradient Methods\n `_.\ \ IMA Journal of\n Numerical Analysis 8(1) pp.141-148.\n\n - Beck, A.\ \ and Teboulle, M. (2009) `A Fast Iterative Shrinkage-Thresholding\n Algorithm\ \ for Linear Inverse Problems\n `_.\ \ SIAM Journal on\n Imaging Sciences 2(1) pp.183-202.\n\n - Zhang, T.\ \ (2004) `Solving large scale linear prediction problems using\n stochastic\ \ gradient descent algorithms\n `_.\ \ ICML '04: Proceedings of\n the twenty-first international conference on\ \ Machine learning p.116.\n\n\n Examples\n --------\n\n Given an :class:`~turicreate.SFrame`\ \ ``sf`` with a list of columns\n [``feature_1`` ... ``feature_K``] denoting\ \ features and a target column\n ``target``, we can create a\n :class:`~turicreate.linear_regression.LinearRegression`\ \ as follows:\n\n >>> data = turicreate.SFrame('https://static.turi.com/datasets/regression/houses.csv')\n\ \n >>> model = turicreate.linear_regression.create(data, target='price',\n\ \ ... features=['bath', 'bedroom', 'size'])\n\ \n\n For ridge regression, we can set the ``l2_penalty`` parameter higher (the\n\ \ default is 0.01). For Lasso regression, we set the l1_penalty higher, and\n\ \ for elastic net, we set both to be higher.\n\n .. sourcecode:: python\n\ \n # Ridge regression\n >>> model_ridge = turicreate.linear_regression.create(data,\ \ 'price', l2_penalty=0.1)\n\n # Lasso\n >>> model_lasso = turicreate.linear_regression.create(data,\ \ 'price', l2_penalty=0.,\n \ \ l1_penalty=1.0)\n\n # Elastic net regression\n >>>\ \ model_enet = turicreate.linear_regression.create(data, 'price', l2_penalty=0.5,\n\ \ l1_penalty=0.5)\n\ \n \"\"\"\n\n # Regression model names.\n model_name = \"regression_linear_regression\"\ \n solver = solver.lower()\n\n model = _sl.create(dataset, target, model_name,\ \ features=features,\n validation_set = validation_set,\n\ \ solver = solver, verbose = verbose,\n \ \ l2_penalty=l2_penalty, l1_penalty = l1_penalty,\n \ \ feature_rescaling = feature_rescaling,\n convergence_threshold\ \ = convergence_threshold,\n step_size = step_size,\n \ \ lbfgs_memory_level = lbfgs_memory_level,\n \ \ max_iterations = max_iterations)\n\n return LinearRegression(model.__proxy__)" - "def restore(self) -> None:\n \"\"\"\n Restore the backed-up (non-average)\ \ parameter values.\n \"\"\"\n for name, parameter in self._parameters:\n\ \ parameter.data.copy_(self._backups[name])" - "def _get_sdict(self, env):\n \"\"\"\n Returns a dictionary mapping\ \ all of the source suffixes of all\n src_builders of this Builder to the\ \ underlying Builder that\n should be called first.\n\n This dictionary\ \ is used for each target specified, so we save a\n lot of extra computation\ \ by memoizing it for each construction\n environment.\n\n Note\ \ that this is re-computed each time, not cached, because there\n might\ \ be changes to one of our source Builders (or one of their\n source Builders,\ \ and so on, and so on...) that we can't \"see.\"\n\n The underlying methods\ \ we call cache their computed values,\n though, so we hope repeatedly\ \ aggregating them into a dictionary\n like this won't be too big a hit.\ \ We may need to look for a\n better way to do this if performance data\ \ show this has turned\n into a significant bottleneck.\n \"\"\"\ \n sdict = {}\n for bld in self.get_src_builders(env):\n \ \ for suf in bld.src_suffixes(env):\n sdict[suf] = bld\n \ \ return sdict" - source_sentence: Traverse the tree below node looking for 'yield [expr]'. sentences: - "def retrieve_sources():\n \"\"\"Retrieve sources using spectool\n \"\"\"\ \n spectool = find_executable('spectool')\n if not spectool:\n log.warn('spectool\ \ is not installed')\n return\n try:\n specfile = spec_fn()\n\ \ except Exception:\n return\n\n cmd = [spectool, \"-g\", specfile]\n\ \ output = subprocess.check_output(' '.join(cmd), shell=True)\n log.warn(output)" - "def check_subscription(self, request):\n\t\t\"\"\"Redirect to the subscribe page\ \ if the user lacks an active subscription.\"\"\"\n\t\tsubscriber = subscriber_request_callback(request)\n\ \n\t\tif not subscriber_has_active_subscription(subscriber):\n\t\t\tif not SUBSCRIPTION_REDIRECT:\n\ \t\t\t\traise ImproperlyConfigured(\"DJSTRIPE_SUBSCRIPTION_REDIRECT is not set.\"\ )\n\t\t\treturn redirect(SUBSCRIPTION_REDIRECT)" - "def is_generator(self, node):\n \"\"\"Traverse the tree below node looking\ \ for 'yield [expr]'.\"\"\"\n results = {}\n if self.yield_expr.match(node,\ \ results):\n return True\n for child in node.children:\n \ \ if child.type not in (syms.funcdef, syms.classdef):\n \ \ if self.is_generator(child):\n return True\n return\ \ False" - source_sentence: "Retrieves the content of an input given a DataSource. The input\ \ acts like a filter over the outputs of the DataSource.\n\n Args:\n \ \ name (str): The name of the input.\n ds (openflow.DataSource):\ \ The DataSource that will feed the data.\n\n Returns:\n pandas.DataFrame:\ \ The content of the input." sentences: - "def valid_state(state: str) -> bool:\n \"\"\"Validate State Argument\n\n \ \ Checks that either 'on' or 'off' was entered as an argument to the\n CLI\ \ and make it lower case.\n\n :param state: state to validate.\n\n :returns:\ \ True if state is valid.\n\n .. versionchanged:: 0.0.12\n This moethod\ \ was renamed from validateState to valid_state to conform\n to PEP-8.\ \ Also removed \"magic\" text for state and instead reference the\n _VALID_STATES\ \ constant.\n \"\"\"\n lower_case_state = state.lower()\n\n if lower_case_state\ \ in _VALID_STATES:\n return True\n return False" - "def get_input(self, name, ds):\n \"\"\"\n Retrieves the content\ \ of an input given a DataSource. The input acts like a filter over the outputs\ \ of the DataSource.\n\n Args:\n name (str): The name of the\ \ input.\n ds (openflow.DataSource): The DataSource that will feed\ \ the data.\n\n Returns:\n pandas.DataFrame: The content of\ \ the input.\n \"\"\"\n columns = self.inputs.get(name)\n \ \ df = ds.get_dataframe()\n\n # set defaults\n for column in columns:\n\ \ if column not in df.columns:\n df[column] = self.defaults.get(column)\n\ \n return df[columns]" - "def get_scenario_data(scenario_id,**kwargs):\n \"\"\"\n Get all the\ \ datasets from the group with the specified name\n @returns a list of\ \ dictionaries\n \"\"\"\n user_id = kwargs.get('user_id')\n\n scenario_data\ \ = db.DBSession.query(Dataset).filter(Dataset.id==ResourceScenario.dataset_id,\ \ ResourceScenario.scenario_id==scenario_id).options(joinedload_all('metadata')).distinct().all()\n\ \n for sd in scenario_data:\n if sd.hidden == 'Y':\n try:\n\ \ sd.check_read_permission(user_id)\n except:\n \ \ sd.value = None\n sd.metadata = []\n\n db.DBSession.expunge_all()\n\ \n log.info(\"Retrieved %s datasets\", len(scenario_data))\n return scenario_data" - source_sentence: "Split the data object along a given expression, in units.\n\n\ \ Parameters\n ----------\n expression : int or str\n \ \ The expression to split along. If given as an integer, the axis at that\ \ index\n is used.\n positions : number-type or 1D array-type\n\ \ The position(s) to split at, in units.\n units : str (optional)\n\ \ The units of the given positions. Default is same, which assumes\n\ \ input units are identical to first variable units.\n parent\ \ : WrightTools.Collection (optional)\n The parent collection in which\ \ to place the 'split' collection.\n Default is a new Collection.\n\ \ verbose : bool (optional)\n Toggle talkback. Default is True.\n\ \n Returns\n -------\n WrightTools.collection.Collection\n\ \ A Collection of data objects.\n The order of the objects\ \ is such that the axis points retain their original order.\n\n See Also\n\ \ --------\n chop\n Divide the dataset into its lower-dimensionality\ \ components.\n collapse\n Collapse the dataset along one axis." sentences: - "def add_item(self, title, key, synonyms=None, description=None, img_url=None):\n\ \ \"\"\"Adds item to a list or carousel card.\n\n A list must contain\ \ at least 2 items, each requiring a title and object key.\n\n Arguments:\n\ \ title {str} -- Name of the item object\n key {str} --\ \ Key refering to the item.\n This string will be used\ \ to send a query to your app if selected\n\n Keyword Arguments:\n \ \ synonyms {list} -- Words and phrases the user may send to select the\ \ item\n (default: {None})\n description\ \ {str} -- A description of the item (default: {None})\n img_url {str}\ \ -- URL of the image to represent the item (default: {None})\n \"\"\"\n\ \ item = build_item(title, key, synonyms, description, img_url)\n \ \ self._items.append(item)\n return self" - "def compare(a, b):\n \"\"\"Compares two timestamps.\n\n ``a`` and ``b``\ \ must be the same type, in addition to normal\n representations of timestamps\ \ that order naturally, they can be rfc3339\n formatted strings.\n\n Args:\n\ \ a (string|object): a timestamp\n b (string|object): another timestamp\n\ \n Returns:\n int: -1 if a < b, 0 if a == b or 1 if a > b\n\n Raises:\n\ \ ValueError: if a or b are not the same type\n ValueError: if a or\ \ b strings but not in valid rfc3339 format\n\n \"\"\"\n a_is_text = isinstance(a,\ \ basestring)\n b_is_text = isinstance(b, basestring)\n if type(a) != type(b)\ \ and not (a_is_text and b_is_text):\n _logger.error(u'Cannot compare %s\ \ to %s, types differ %s!=%s',\n a, b, type(a), type(b))\n\ \ raise ValueError(u'cannot compare inputs of differing types')\n\n \ \ if a_is_text:\n a = from_rfc3339(a, with_nanos=True)\n b = from_rfc3339(b,\ \ with_nanos=True)\n\n if a < b:\n return -1\n elif a > b:\n \ \ return 1\n else:\n return 0" - "def split(\n self, expression, positions, *, units=None, parent=None,\ \ verbose=True\n ) -> wt_collection.Collection:\n \"\"\"\n Split\ \ the data object along a given expression, in units.\n\n Parameters\n\ \ ----------\n expression : int or str\n The expression\ \ to split along. If given as an integer, the axis at that index\n \ \ is used.\n positions : number-type or 1D array-type\n The\ \ position(s) to split at, in units.\n units : str (optional)\n \ \ The units of the given positions. Default is same, which assumes\n \ \ input units are identical to first variable units.\n parent : WrightTools.Collection\ \ (optional)\n The parent collection in which to place the 'split'\ \ collection.\n Default is a new Collection.\n verbose : bool\ \ (optional)\n Toggle talkback. Default is True.\n\n Returns\n\ \ -------\n WrightTools.collection.Collection\n A Collection\ \ of data objects.\n The order of the objects is such that the axis\ \ points retain their original order.\n\n See Also\n --------\n\ \ chop\n Divide the dataset into its lower-dimensionality components.\n\ \ collapse\n Collapse the dataset along one axis.\n \"\ \"\"\n # axis ------------------------------------------------------------------------------------\n\ \ old_expr = self.axis_expressions\n old_units = self.units\n \ \ out = wt_collection.Collection(name=\"split\", parent=parent)\n \ \ if isinstance(expression, int):\n if units is None:\n \ \ units = self._axes[expression].units\n expression = self._axes[expression].expression\n\ \ elif isinstance(expression, str):\n pass\n else:\n\ \ raise TypeError(\"expression: expected {int, str}, got %s\" % type(expression))\n\ \n self.transform(expression)\n if units:\n self.convert(units)\n\ \n try:\n positions = [-np.inf] + sorted(list(positions)) +\ \ [np.inf]\n except TypeError:\n positions = [-np.inf, positions,\ \ np.inf]\n\n values = self._axes[0].full\n masks = [(values >=\ \ lo) & (values < hi) for lo, hi in wt_kit.pairwise(positions)]\n omasks\ \ = []\n cuts = []\n for mask in masks:\n try:\n \ \ omasks.append(wt_kit.mask_reduce(mask))\n cuts.append([i\ \ == 1 for i in omasks[-1].shape])\n # Ensure at least one axis\ \ is kept\n if np.all(cuts[-1]):\n cuts[-1][0]\ \ = False\n except ValueError:\n omasks.append(None)\n\ \ cuts.append(None)\n for i in range(len(positions) - 1):\n\ \ out.create_data(\"split%03i\" % i)\n\n for var in self.variables:\n\ \ for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):\n\ \ if omask is None:\n # Zero length split\n\ \ continue\n omask = wt_kit.enforce_mask_shape(omask,\ \ var.shape)\n omask.shape = tuple([s for s, c in zip(omask.shape,\ \ cut) if not c])\n out_arr = np.full(omask.shape, np.nan)\n \ \ imask = wt_kit.enforce_mask_shape(imask, var.shape)\n \ \ out_arr[omask] = var[:][imask]\n out[i].create_variable(values=out_arr,\ \ **var.attrs)\n\n for ch in self.channels:\n for i, (imask,\ \ omask, cut) in enumerate(zip(masks, omasks, cuts)):\n if omask\ \ is None:\n # Zero length split\n continue\n\ \ omask = wt_kit.enforce_mask_shape(omask, ch.shape)\n \ \ omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])\n\ \ out_arr = np.full(omask.shape, np.nan)\n imask\ \ = wt_kit.enforce_mask_shape(imask, ch.shape)\n out_arr[omask]\ \ = ch[:][imask]\n out[i].create_channel(values=out_arr, **ch.attrs)\n\ \n if verbose:\n for d in out.values():\n try:\n\ \ d.transform(expression)\n except IndexError:\n\ \ continue\n\n print(\"split data into {0} pieces\ \ along <{1}>:\".format(len(positions) - 1, expression))\n for i, (lo,\ \ hi) in enumerate(wt_kit.pairwise(positions)):\n new_data = out[i]\n\ \ if new_data.shape == ():\n print(\" {0} :\ \ None\".format(i))\n else:\n new_axis = new_data.axes[0]\n\ \ print(\n \" {0} : {1:0.2f} to {2:0.2f}\ \ {3} {4}\".format(\n i, lo, hi, new_axis.units, new_axis.shape\n\ \ )\n )\n\n for d in out.values():\n\ \ try:\n d.transform(*old_expr)\n keep\ \ = []\n keep_units = []\n for ax in d.axes:\n \ \ if ax.size > 1:\n keep.append(ax.expression)\n\ \ keep_units.append(ax.units)\n else:\n\ \ d.create_constant(ax.expression, verbose=False)\n \ \ d.transform(*keep)\n for ax, u in zip(d.axes, keep_units):\n\ \ ax.convert(u)\n except IndexError:\n \ \ continue\n tempax = Axis(d, expression)\n if all(\n\ \ np.all(\n np.sum(~np.isnan(tempax.masked),\ \ axis=tuple(set(range(tempax.ndim)) - {j}))\n <= 1\n \ \ )\n for j in range(tempax.ndim)\n ):\n \ \ d.create_constant(expression, verbose=False)\n self.transform(*old_expr)\n\ \ for ax, u in zip(self.axes, old_units):\n ax.convert(u)\n\n\ \ return out" pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: SentenceTransformer based on benjamintli/modernbert-cosqa results: - task: type: information-retrieval name: Information Retrieval dataset: name: eval type: eval metrics: - type: cosine_accuracy@1 value: 0.9480526153529956 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.9703010995786662 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9751824067413422 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9806803000719351 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.9480526153529956 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.32343369985955533 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.19503648134826843 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09806803000719352 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.9480526153529956 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.9703010995786662 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9751824067413422 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9806803000719351 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9652143122800294 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9601788099886978 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9606213024321194 name: Cosine Map@100 --- # SentenceTransformer based on benjamintli/modernbert-cosqa This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [benjamintli/modernbert-cosqa](https://huggingface.co/benjamintli/modernbert-cosqa). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [benjamintli/modernbert-cosqa](https://huggingface.co/benjamintli/modernbert-cosqa) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'OptimizedModule'}) (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("modernbert-codesearchnet") # Run inference queries = [ "Split the data object along a given expression, in units.\n\n Parameters\n ----------\n expression : int or str\n The expression to split along. If given as an integer, the axis at that index\n is used.\n positions : number-type or 1D array-type\n The position(s) to split at, in units.\n units : str (optional)\n The units of the given positions. Default is same, which assumes\n input units are identical to first variable units.\n parent : WrightTools.Collection (optional)\n The parent collection in which to place the \u0027split\u0027 collection.\n Default is a new Collection.\n verbose : bool (optional)\n Toggle talkback. Default is True.\n\n Returns\n -------\n WrightTools.collection.Collection\n A Collection of data objects.\n The order of the objects is such that the axis points retain their original order.\n\n See Also\n --------\n chop\n Divide the dataset into its lower-dimensionality components.\n collapse\n Collapse the dataset along one axis.", ] documents = [ 'def split(\n self, expression, positions, *, units=None, parent=None, verbose=True\n ) -> wt_collection.Collection:\n """\n Split the data object along a given expression, in units.\n\n Parameters\n ----------\n expression : int or str\n The expression to split along. If given as an integer, the axis at that index\n is used.\n positions : number-type or 1D array-type\n The position(s) to split at, in units.\n units : str (optional)\n The units of the given positions. Default is same, which assumes\n input units are identical to first variable units.\n parent : WrightTools.Collection (optional)\n The parent collection in which to place the \'split\' collection.\n Default is a new Collection.\n verbose : bool (optional)\n Toggle talkback. Default is True.\n\n Returns\n -------\n WrightTools.collection.Collection\n A Collection of data objects.\n The order of the objects is such that the axis points retain their original order.\n\n See Also\n --------\n chop\n Divide the dataset into its lower-dimensionality components.\n collapse\n Collapse the dataset along one axis.\n """\n # axis ------------------------------------------------------------------------------------\n old_expr = self.axis_expressions\n old_units = self.units\n out = wt_collection.Collection(name="split", parent=parent)\n if isinstance(expression, int):\n if units is None:\n units = self._axes[expression].units\n expression = self._axes[expression].expression\n elif isinstance(expression, str):\n pass\n else:\n raise TypeError("expression: expected {int, str}, got %s" % type(expression))\n\n self.transform(expression)\n if units:\n self.convert(units)\n\n try:\n positions = [-np.inf] + sorted(list(positions)) + [np.inf]\n except TypeError:\n positions = [-np.inf, positions, np.inf]\n\n values = self._axes[0].full\n masks = [(values >= lo) & (values < hi) for lo, hi in wt_kit.pairwise(positions)]\n omasks = []\n cuts = []\n for mask in masks:\n try:\n omasks.append(wt_kit.mask_reduce(mask))\n cuts.append([i == 1 for i in omasks[-1].shape])\n # Ensure at least one axis is kept\n if np.all(cuts[-1]):\n cuts[-1][0] = False\n except ValueError:\n omasks.append(None)\n cuts.append(None)\n for i in range(len(positions) - 1):\n out.create_data("split%03i" % i)\n\n for var in self.variables:\n for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):\n if omask is None:\n # Zero length split\n continue\n omask = wt_kit.enforce_mask_shape(omask, var.shape)\n omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])\n out_arr = np.full(omask.shape, np.nan)\n imask = wt_kit.enforce_mask_shape(imask, var.shape)\n out_arr[omask] = var[:][imask]\n out[i].create_variable(values=out_arr, **var.attrs)\n\n for ch in self.channels:\n for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):\n if omask is None:\n # Zero length split\n continue\n omask = wt_kit.enforce_mask_shape(omask, ch.shape)\n omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])\n out_arr = np.full(omask.shape, np.nan)\n imask = wt_kit.enforce_mask_shape(imask, ch.shape)\n out_arr[omask] = ch[:][imask]\n out[i].create_channel(values=out_arr, **ch.attrs)\n\n if verbose:\n for d in out.values():\n try:\n d.transform(expression)\n except IndexError:\n continue\n\n print("split data into {0} pieces along <{1}>:".format(len(positions) - 1, expression))\n for i, (lo, hi) in enumerate(wt_kit.pairwise(positions)):\n new_data = out[i]\n if new_data.shape == ():\n print(" {0} : None".format(i))\n else:\n new_axis = new_data.axes[0]\n print(\n " {0} : {1:0.2f} to {2:0.2f} {3} {4}".format(\n i, lo, hi, new_axis.units, new_axis.shape\n )\n )\n\n for d in out.values():\n try:\n d.transform(*old_expr)\n keep = []\n keep_units = []\n for ax in d.axes:\n if ax.size > 1:\n keep.append(ax.expression)\n keep_units.append(ax.units)\n else:\n d.create_constant(ax.expression, verbose=False)\n d.transform(*keep)\n for ax, u in zip(d.axes, keep_units):\n ax.convert(u)\n except IndexError:\n continue\n tempax = Axis(d, expression)\n if all(\n np.all(\n np.sum(~np.isnan(tempax.masked), axis=tuple(set(range(tempax.ndim)) - {j}))\n <= 1\n )\n for j in range(tempax.ndim)\n ):\n d.create_constant(expression, verbose=False)\n self.transform(*old_expr)\n for ax, u in zip(self.axes, old_units):\n ax.convert(u)\n\n return out', 'def add_item(self, title, key, synonyms=None, description=None, img_url=None):\n """Adds item to a list or carousel card.\n\n A list must contain at least 2 items, each requiring a title and object key.\n\n Arguments:\n title {str} -- Name of the item object\n key {str} -- Key refering to the item.\n This string will be used to send a query to your app if selected\n\n Keyword Arguments:\n synonyms {list} -- Words and phrases the user may send to select the item\n (default: {None})\n description {str} -- A description of the item (default: {None})\n img_url {str} -- URL of the image to represent the item (default: {None})\n """\n item = build_item(title, key, synonyms, description, img_url)\n self._items.append(item)\n return self', 'def compare(a, b):\n """Compares two timestamps.\n\n ``a`` and ``b`` must be the same type, in addition to normal\n representations of timestamps that order naturally, they can be rfc3339\n formatted strings.\n\n Args:\n a (string|object): a timestamp\n b (string|object): another timestamp\n\n Returns:\n int: -1 if a < b, 0 if a == b or 1 if a > b\n\n Raises:\n ValueError: if a or b are not the same type\n ValueError: if a or b strings but not in valid rfc3339 format\n\n """\n a_is_text = isinstance(a, basestring)\n b_is_text = isinstance(b, basestring)\n if type(a) != type(b) and not (a_is_text and b_is_text):\n _logger.error(u\'Cannot compare %s to %s, types differ %s!=%s\',\n a, b, type(a), type(b))\n raise ValueError(u\'cannot compare inputs of differing types\')\n\n if a_is_text:\n a = from_rfc3339(a, with_nanos=True)\n b = from_rfc3339(b, with_nanos=True)\n\n if a < b:\n return -1\n elif a > b:\n return 1\n else:\n return 0', ] query_embeddings = model.encode_query(queries) document_embeddings = model.encode_document(documents) print(query_embeddings.shape, document_embeddings.shape) # [1, 768] [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(query_embeddings, document_embeddings) print(similarities) # tensor([[0.9188, 0.1817, 0.1583]]) ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `eval` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.9481 | | cosine_accuracy@3 | 0.9703 | | cosine_accuracy@5 | 0.9752 | | cosine_accuracy@10 | 0.9807 | | cosine_precision@1 | 0.9481 | | cosine_precision@3 | 0.3234 | | cosine_precision@5 | 0.195 | | cosine_precision@10 | 0.0981 | | cosine_recall@1 | 0.9481 | | cosine_recall@3 | 0.9703 | | cosine_recall@5 | 0.9752 | | cosine_recall@10 | 0.9807 | | **cosine_ndcg@10** | **0.9652** | | cosine_mrr@10 | 0.9602 | | cosine_map@100 | 0.9606 | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 369,762 training samples * Columns: query and positive * Approximate statistics based on the first 1000 samples: | | query | positive | |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | | details |
  • min: 3 tokens
  • mean: 71.9 tokens
  • max: 512 tokens
|
  • min: 37 tokens
  • mean: 236.1 tokens
  • max: 512 tokens
| * Samples: | query | positive | |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Returns group object for datacenter root group.

>>> clc.v2.Datacenter().RootGroup()

>>> print _
WA1 Hardware
| def RootGroup(self):
"""Returns group object for datacenter root group.

>>> clc.v2.Datacenter().RootGroup()

>>> print _
WA1 Hardware

"""

return(clc.v2.Group(id=self.root_group_id,alias=self.alias,session=self.session))
| | Calculate the euclidean distance of all array positions in "matchArr".

:param matchArr: a dictionary of ``numpy.arrays`` containing at least two
entries that are treated as cartesian coordinates.
:param tKey: #TODO: docstring
:param mKey: #TODO: docstring

:returns: #TODO: docstring

{'eucDist': numpy.array([eucDistance, eucDistance, ...]),
'posPairs': numpy.array([[pos1, pos2], [pos1, pos2], ...])
}
| def calcDistMatchArr(matchArr, tKey, mKey):
"""Calculate the euclidean distance of all array positions in "matchArr".

:param matchArr: a dictionary of ``numpy.arrays`` containing at least two
entries that are treated as cartesian coordinates.
:param tKey: #TODO: docstring
:param mKey: #TODO: docstring

:returns: #TODO: docstring

{'eucDist': numpy.array([eucDistance, eucDistance, ...]),
'posPairs': numpy.array([[pos1, pos2], [pos1, pos2], ...])
}
"""
#Calculate all sorted list of all eucledian feature distances
matchArrSize = listvalues(matchArr)[0].size

distInfo = {'posPairs': list(), 'eucDist': list()}
_matrix = numpy.swapaxes(numpy.array([matchArr[tKey], matchArr[mKey]]), 0, 1)

for pos1 in range(matchArrSize-1):
for pos2 in range(pos1+1, matchArrSize):
distInfo['posPairs'].append((pos1, pos2))
distInfo['posPairs'] = numpy.array(distInfo['posPairs'])
distInfo['eucD...
| | Format this verifier

Returns:
string: A formatted string
| def format(self, indent_level, indent_size=4):
"""Format this verifier

Returns:
string: A formatted string
"""

name = self.format_name('Literal', indent_size)

if self.long_desc is not None:
name += '\n'

name += self.wrap_lines('value: %s\n' % str(self._literal), 1, indent_size)

return self.wrap_lines(name, indent_level, indent_size)
| * Loss: [CachedMultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 64, "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 } ``` ### Evaluation Dataset #### Unnamed Dataset * Size: 19,462 evaluation samples * Columns: query and positive * Approximate statistics based on the first 1000 samples: | | query | positive | |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| | type | string | string | | details |
  • min: 3 tokens
  • mean: 71.05 tokens
  • max: 512 tokens
|
  • min: 40 tokens
  • mean: 236.22 tokens
  • max: 512 tokens
| * Samples: | query | positive | |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Create a new ParticipantInstance

:param unicode attributes: An optional string metadata field you can use to store any data you wish.
:param unicode twilio_address: The address of the Twilio phone number that the participant is in contact with.
:param datetime date_created: The date that this resource was created.
:param datetime date_updated: The date that this resource was last updated.
:param unicode identity: A unique string identifier for the session participant as Chat User.
:param unicode user_address: The address of the participant's device.

:returns: Newly created ParticipantInstance
:rtype: twilio.rest.messaging.v1.session.participant.ParticipantInstance
| def create(self, attributes=values.unset, twilio_address=values.unset,
date_created=values.unset, date_updated=values.unset,
identity=values.unset, user_address=values.unset):
"""
Create a new ParticipantInstance

:param unicode attributes: An optional string metadata field you can use to store any data you wish.
:param unicode twilio_address: The address of the Twilio phone number that the participant is in contact with.
:param datetime date_created: The date that this resource was created.
:param datetime date_updated: The date that this resource was last updated.
:param unicode identity: A unique string identifier for the session participant as Chat User.
:param unicode user_address: The address of the participant's device.

:returns: Newly created ParticipantInstance
:rtype: twilio.rest.messaging.v1.session.participant.ParticipantInstance
"""
data = values.o...
| | It returns absolute url defined by node related to this page | def get_absolute_url(self):
"""
It returns absolute url defined by node related to this page
"""
try:
node = Node.objects.select_related().filter(page=self)[0]
return node.get_absolute_url()
except Exception, e:
raise ValueError(u"Error in {0}.{1}: {2}".format(self.__module__, self.__class__.__name__, e))
return u""
| | Return the current scaled font.

:return:
A new :class:`ScaledFont` object,
wrapping an existing cairo object.
| def get_scaled_font(self):
"""Return the current scaled font.

:return:
A new :class:`ScaledFont` object,
wrapping an existing cairo object.

"""
return ScaledFont._from_pointer(
cairo.cairo_get_scaled_font(self._pointer), incref=True)
| * Loss: [CachedMultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 64, "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 8192 - `num_train_epochs`: 1 - `learning_rate`: 2e-06 - `warmup_steps`: 0.1 - `bf16`: True - `eval_strategy`: epoch - `per_device_eval_batch_size`: 8192 - `push_to_hub`: True - `hub_model_id`: modernbert-codesearchnet - `load_best_model_at_end`: True - `dataloader_num_workers`: 4 - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `per_device_train_batch_size`: 8192 - `num_train_epochs`: 1 - `max_steps`: -1 - `learning_rate`: 2e-06 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: None - `warmup_steps`: 0.1 - `optim`: adamw_torch_fused - `optim_args`: None - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `optim_target_modules`: None - `gradient_accumulation_steps`: 1 - `average_tokens_across_devices`: True - `max_grad_norm`: 1.0 - `label_smoothing_factor`: 0.0 - `bf16`: True - `fp16`: False - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `use_liger_kernel`: False - `liger_kernel_config`: None - `use_cache`: False - `neftune_noise_alpha`: None - `torch_empty_cache_steps`: None - `auto_find_batch_size`: False - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `include_num_input_tokens_seen`: no - `log_level`: passive - `log_level_replica`: warning - `disable_tqdm`: False - `project`: huggingface - `trackio_space_id`: trackio - `eval_strategy`: epoch - `per_device_eval_batch_size`: 8192 - `prediction_loss_only`: True - `eval_on_start`: False - `eval_do_concat_batches`: True - `eval_use_gather_object`: False - `eval_accumulation_steps`: None - `include_for_metrics`: [] - `batch_eval_metrics`: False - `save_only_model`: False - `save_on_each_node`: False - `enable_jit_checkpoint`: False - `push_to_hub`: True - `hub_private_repo`: None - `hub_model_id`: modernbert-codesearchnet - `hub_strategy`: every_save - `hub_always_push`: False - `hub_revision`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `restore_callback_states_from_checkpoint`: False - `full_determinism`: False - `seed`: 42 - `data_seed`: None - `use_cpu`: False - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `parallelism_config`: None - `dataloader_drop_last`: False - `dataloader_num_workers`: 4 - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `dataloader_prefetch_factor`: None - `remove_unused_columns`: True - `label_names`: None - `train_sampling_strategy`: random - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `ddp_backend`: None - `ddp_timeout`: 1800 - `fsdp`: [] - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `deepspeed`: None - `debug`: [] - `skip_memory_metrics`: True - `do_predict`: False - `resume_from_checkpoint`: None - `warmup_ratio`: None - `local_rank`: -1 - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional - `router_mapping`: {} - `learning_rate_mapping`: {}
### Training Logs | Epoch | Step | Training Loss | Validation Loss | eval_cosine_ndcg@10 | |:-------:|:------:|:-------------:|:---------------:|:-------------------:| | 0.2174 | 10 | 0.9210 | - | - | | 0.4348 | 20 | 0.6679 | - | - | | 0.6522 | 30 | 0.5007 | - | - | | 0.8696 | 40 | 0.4181 | - | - | | **1.0** | **46** | **-** | **0.0328** | **0.9652** | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.12.12 - Sentence Transformers: 5.3.0 - Transformers: 5.3.0 - PyTorch: 2.10.0+cu128 - Accelerate: 1.13.0 - Datasets: 4.8.2 - Tokenizers: 0.22.2 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### CachedMultipleNegativesRankingLoss ```bibtex @misc{gao2021scaling, title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan}, year={2021}, eprint={2101.06983}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```