benjamintli commited on
Commit
181cf68
·
verified ·
1 Parent(s): 38dc3d1

End of training

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,972 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:369762
9
+ - loss:CachedMultipleNegativesRankingLoss
10
+ base_model: benjamintli/modernbert-cosqa
11
+ widget:
12
+ - source_sentence: Return a Python AST node for `recur` occurring inside a `loop`.
13
+ sentences:
14
+ - "def _reset(self, name=None):\n \"\"\"Revert specified property to default\
15
+ \ value\n\n If no property is specified, all properties are returned to\
16
+ \ default.\n \"\"\"\n if name is None:\n for key in self._props:\n\
17
+ \ if isinstance(self._props[key], basic.Property):\n \
18
+ \ self._reset(key)\n return\n if name not in self._props:\n\
19
+ \ raise AttributeError(\"Input name '{}' is not a known \"\n \
20
+ \ \"property or attribute\".format(name))\n if\
21
+ \ not isinstance(self._props[name], basic.Property):\n raise AttributeError(\"\
22
+ Cannot reset GettableProperty \"\n \"'{}'\".format(name))\n\
23
+ \ if name in self._defaults:\n val = self._defaults[name]\n\
24
+ \ else:\n val = self._props[name].default\n if callable(val):\n\
25
+ \ val = val()\n setattr(self, name, val)"
26
+ - "def cancel(self):\n '''\n Cancel a running workflow.\n\n \
27
+ \ Args:\n None\n\n Returns:\n None\n '''\n\
28
+ \ if not self.id:\n raise WorkflowError('Workflow is not running.\
29
+ \ Cannot cancel.')\n\n if self.batch_values:\n self.workflow.batch_workflow_cancel(self.id)\n\
30
+ \ else:\n self.workflow.cancel(self.id)"
31
+ - "def __loop_recur_to_py_ast(ctx: GeneratorContext, node: Recur) -> GeneratedPyAST:\n\
32
+ \ \"\"\"Return a Python AST node for `recur` occurring inside a `loop`.\"\"\
33
+ \"\n assert node.op == NodeOp.RECUR\n\n recur_deps: List[ast.AST] = []\n\
34
+ \ recur_targets: List[ast.Name] = []\n recur_exprs: List[ast.AST] = []\n\
35
+ \ for name, expr in zip(ctx.recur_point.binding_names, node.exprs):\n \
36
+ \ expr_ast = gen_py_ast(ctx, expr)\n recur_deps.extend(expr_ast.dependencies)\n\
37
+ \ recur_targets.append(ast.Name(id=name, ctx=ast.Store()))\n recur_exprs.append(expr_ast.node)\n\
38
+ \n if len(recur_targets) == 1:\n assert len(recur_exprs) == 1\n \
39
+ \ recur_deps.append(ast.Assign(targets=recur_targets, value=recur_exprs[0]))\n\
40
+ \ else:\n recur_deps.append(\n ast.Assign(\n \
41
+ \ targets=[ast.Tuple(elts=recur_targets, ctx=ast.Store())],\n \
42
+ \ value=ast.Tuple(elts=recur_exprs, ctx=ast.Load()),\n )\n \
43
+ \ )\n recur_deps.append(ast.Continue())\n\n return GeneratedPyAST(node=ast.NameConstant(None),\
44
+ \ dependencies=recur_deps)"
45
+ - source_sentence: "Create a :class:`~turicreate.linear_regression.LinearRegression`\
46
+ \ to\n predict a scalar target variable as a linear function of one or more\n\
47
+ \ features. In addition to standard numeric and categorical types, features\n\
48
+ \ can also be extracted automatically from list- or dictionary-type SFrame\n\
49
+ \ columns.\n\n The linear regression module can be used for ridge regression,\
50
+ \ Lasso, and\n elastic net regression (see References for more detail on these\
51
+ \ methods). By\n default, this model has an l2 regularization weight of 0.01.\n\
52
+ \n Parameters\n ----------\n dataset : SFrame\n The dataset to\
53
+ \ use for training the model.\n\n target : string\n Name of the column\
54
+ \ containing the target variable.\n\n features : list[string], optional\n \
55
+ \ Names of the columns containing features. 'None' (the default) indicates\n\
56
+ \ that all columns except the target variable should be used as features.\n\
57
+ \n The features are columns in the input SFrame that can be of the\n \
58
+ \ following types:\n\n - *Numeric*: values of numeric type integer\
59
+ \ or float.\n\n - *Categorical*: values of type string.\n\n - *Array*:\
60
+ \ list of numeric (integer or float) values. Each list element\n is treated\
61
+ \ as a separate feature in the model.\n\n - *Dictionary*: key-value pairs\
62
+ \ with numeric (integer or float) values\n Each key of a dictionary is\
63
+ \ treated as a separate feature and the\n value in the dictionary corresponds\
64
+ \ to the value of the feature.\n Dictionaries are ideal for representing\
65
+ \ sparse data.\n\n Columns of type *list* are not supported. Convert such\
66
+ \ feature\n columns to type array if all entries in the list are of numeric\n\
67
+ \ types. If the lists contain data of mixed types, separate\n them\
68
+ \ out into different columns.\n\n l2_penalty : float, optional\n Weight\
69
+ \ on the l2-regularizer of the model. The larger this weight, the\n more\
70
+ \ the model coefficients shrink toward 0. This introduces bias into\n the\
71
+ \ model but decreases variance, potentially leading to better\n predictions.\
72
+ \ The default value is 0.01; setting this parameter to 0\n corresponds\
73
+ \ to unregularized linear regression. See the ridge\n regression reference\
74
+ \ for more detail.\n\n l1_penalty : float, optional\n Weight on l1 regularization\
75
+ \ of the model. Like the l2 penalty, the\n higher the l1 penalty, the more\
76
+ \ the estimated coefficients shrink toward\n 0. The l1 penalty, however,\
77
+ \ completely zeros out sufficiently small\n coefficients, automatically\
78
+ \ indicating features that are not useful for\n the model. The default\
79
+ \ weight of 0 prevents any features from being\n discarded. See the LASSO\
80
+ \ regression reference for more detail.\n\n solver : string, optional\n \
81
+ \ Solver to use for training the model. See the references for more detail\n\
82
+ \ on each solver.\n\n - *auto (default)*: automatically chooses\
83
+ \ the best solver for the data\n and model parameters.\n - *newton*:\
84
+ \ Newton-Raphson\n - *lbfgs*: limited memory BFGS\n - *fista*: accelerated\
85
+ \ gradient descent\n\n The model is trained using a carefully engineered\
86
+ \ collection of methods\n that are automatically picked based on the input\
87
+ \ data. The ``newton``\n method works best for datasets with plenty of\
88
+ \ examples and few features\n (long datasets). Limited memory BFGS (``lbfgs``)\
89
+ \ is a robust solver for\n wide datasets (i.e datasets with many coefficients).\
90
+ \ ``fista`` is the\n default solver for l1-regularized linear regression.\
91
+ \ The solvers are\n all automatically tuned and the default options should\
92
+ \ function well.\n See the solver options guide for setting additional\
93
+ \ parameters for each\n of the solvers.\n\n See the user guide for\
94
+ \ additional details on how the solver is chosen.\n\n feature_rescaling : boolean,\
95
+ \ optional\n Feature rescaling is an important pre-processing step that\
96
+ \ ensures that\n all features are on the same scale. An l2-norm rescaling\
97
+ \ is performed\n to make sure that all features are of the same norm. Categorical\n\
98
+ \ features are also rescaled by rescaling the dummy variables that are\n\
99
+ \ used to represent them. The coefficients are returned in original scale\n\
100
+ \ of the problem. This process is particularly useful when features\n \
101
+ \ vary widely in their ranges.\n\n validation_set : SFrame, optional\n\
102
+ \n A dataset for monitoring the model's generalization performance.\n \
103
+ \ For each row of the progress table, the chosen metrics are computed\n\
104
+ \ for both the provided training dataset and the validation_set. The\n\
105
+ \ format of this SFrame must be the same as the training set.\n \
106
+ \ By default this argument is set to 'auto' and a validation set is\n automatically\
107
+ \ sampled and used for progress printing. If\n validation_set is set to\
108
+ \ None, then no additional metrics\n are computed. The default value is\
109
+ \ 'auto'.\n\n convergence_threshold : float, optional\n\n Convergence\
110
+ \ is tested using variation in the training objective. The\n variation in\
111
+ \ the training objective is calculated using the difference\n between the\
112
+ \ objective values between two steps. Consider reducing this\n below the\
113
+ \ default value (0.01) for a more accurately trained model.\n Beware of overfitting\
114
+ \ (i.e a model that works well only on the training\n data) if this parameter\
115
+ \ is set to a very low value.\n\n lbfgs_memory_level : int, optional\n\n \
116
+ \ The L-BFGS algorithm keeps track of gradient information from the\n \
117
+ \ previous ``lbfgs_memory_level`` iterations. The storage requirement for\n \
118
+ \ each of these gradients is the ``num_coefficients`` in the problem.\n \
119
+ \ Increasing the ``lbfgs_memory_level`` can help improve the quality of\n \
120
+ \ the model trained. Setting this to more than ``max_iterations`` has the\n\
121
+ \ same effect as setting it to ``max_iterations``.\n\n max_iterations\
122
+ \ : int, optional\n\n The maximum number of allowed passes through the data.\
123
+ \ More passes over\n the data can result in a more accurately trained model.\
124
+ \ Consider\n increasing this (the default value is 10) if the training accuracy\
125
+ \ is\n low and the *Grad-Norm* in the display is large.\n\n step_size\
126
+ \ : float, optional (fista only)\n\n The starting step size to use for the\
127
+ \ ``fista`` and ``gd`` solvers. The\n default is set to 1.0, this is an aggressive\
128
+ \ setting. If the first\n iteration takes a considerable amount of time,\
129
+ \ reducing this parameter\n may speed up model training.\n\n verbose :\
130
+ \ bool, optional\n If True, print progress updates.\n\n Returns\n \
131
+ \ -------\n out : LinearRegression\n A trained model of type\n \
132
+ \ :class:`~turicreate.linear_regression.LinearRegression`.\n\n See Also\n\
133
+ \ --------\n LinearRegression, turicreate.boosted_trees_regression.BoostedTreesRegression,\
134
+ \ turicreate.regression.create\n\n Notes\n -----\n - Categorical variables\
135
+ \ are encoded by creating dummy variables. For a\n variable with :math:`K`\
136
+ \ categories, the encoding creates :math:`K-1` dummy\n variables, while the\
137
+ \ first category encountered in the data is used as the\n baseline.\n\n \
138
+ \ - For prediction and evaluation of linear regression models with sparse\n\
139
+ \ dictionary inputs, new keys/columns that were not seen during training\n\
140
+ \ are silently ignored.\n\n - Any 'None' values in the data will result\
141
+ \ in an error being thrown.\n\n - A constant term is automatically added for\
142
+ \ the model intercept. This term\n is not regularized.\n\n - Standard\
143
+ \ errors on coefficients are only available when `solver=newton`\n or when\
144
+ \ the default `auto` solver option chooses the newton method and if\n the\
145
+ \ number of examples in the training data is more than the number of\n coefficients.\
146
+ \ If standard errors cannot be estimated, a column of `None`\n values are\
147
+ \ returned.\n\n\n References\n ----------\n - Hoerl, A.E. and Kennard,\
148
+ \ R.W. (1970) `Ridge regression: Biased Estimation\n for Nonorthogonal Problems\n\
149
+ \ <http://amstat.tandfonline.com/doi/abs/10.1080/00401706.1970.10488634>`_.\n\
150
+ \ Technometrics 12(1) pp.55-67\n\n - Tibshirani, R. (1996) `Regression\
151
+ \ Shrinkage and Selection via the Lasso <h\n ttp://www.jstor.org/discover/10.2307/2346178?uid=3739256&uid=2&uid=4&sid=2\n\
152
+ \ 1104169934983>`_. Journal of the Royal Statistical Society. Series B\n\
153
+ \ (Methodological) 58(1) pp.267-288.\n\n - Zhu, C., et al. (1997) `Algorithm\
154
+ \ 778: L-BFGS-B: Fortran subroutines for\n large-scale bound-constrained\
155
+ \ optimization\n <https://dl.acm.org/citation.cfm?id=279236>`_. ACM Transactions\
156
+ \ on\n Mathematical Software 23(4) pp.550-560.\n\n - Barzilai, J. and\
157
+ \ Borwein, J. `Two-Point Step Size Gradient Methods\n <http://imajna.oxfordjournals.org/content/8/1/141.short>`_.\
158
+ \ IMA Journal of\n Numerical Analysis 8(1) pp.141-148.\n\n - Beck, A.\
159
+ \ and Teboulle, M. (2009) `A Fast Iterative Shrinkage-Thresholding\n Algorithm\
160
+ \ for Linear Inverse Problems\n <http://epubs.siam.org/doi/abs/10.1137/080716542>`_.\
161
+ \ SIAM Journal on\n Imaging Sciences 2(1) pp.183-202.\n\n - Zhang, T.\
162
+ \ (2004) `Solving large scale linear prediction problems using\n stochastic\
163
+ \ gradient descent algorithms\n <https://dl.acm.org/citation.cfm?id=1015332>`_.\
164
+ \ ICML '04: Proceedings of\n the twenty-first international conference on\
165
+ \ Machine learning p.116.\n\n\n Examples\n --------\n\n Given an :class:`~turicreate.SFrame`\
166
+ \ ``sf`` with a list of columns\n [``feature_1`` ... ``feature_K``] denoting\
167
+ \ features and a target column\n ``target``, we can create a\n :class:`~turicreate.linear_regression.LinearRegression`\
168
+ \ as follows:\n\n >>> data = turicreate.SFrame('https://static.turi.com/datasets/regression/houses.csv')\n\
169
+ \n >>> model = turicreate.linear_regression.create(data, target='price',\n\
170
+ \ ... features=['bath', 'bedroom', 'size'])\n\
171
+ \n\n For ridge regression, we can set the ``l2_penalty`` parameter higher (the\n\
172
+ \ default is 0.01). For Lasso regression, we set the l1_penalty higher, and\n\
173
+ \ for elastic net, we set both to be higher.\n\n .. sourcecode:: python\n\
174
+ \n # Ridge regression\n >>> model_ridge = turicreate.linear_regression.create(data,\
175
+ \ 'price', l2_penalty=0.1)\n\n # Lasso\n >>> model_lasso = turicreate.linear_regression.create(data,\
176
+ \ 'price', l2_penalty=0.,\n \
177
+ \ l1_penalty=1.0)\n\n # Elastic net regression\n >>>\
178
+ \ model_enet = turicreate.linear_regression.create(data, 'price', l2_penalty=0.5,\n\
179
+ \ l1_penalty=0.5)"
180
+ sentences:
181
+ - "def create(dataset, target, features=None, l2_penalty=1e-2, l1_penalty=0.0,\n\
182
+ \ solver='auto', feature_rescaling=True,\n convergence_threshold = _DEFAULT_SOLVER_OPTIONS['convergence_threshold'],\n\
183
+ \ step_size = _DEFAULT_SOLVER_OPTIONS['step_size'],\n lbfgs_memory_level\
184
+ \ = _DEFAULT_SOLVER_OPTIONS['lbfgs_memory_level'],\n max_iterations = _DEFAULT_SOLVER_OPTIONS['max_iterations'],\n\
185
+ \ validation_set = \"auto\",\n verbose=True):\n\n \"\"\"\n Create\
186
+ \ a :class:`~turicreate.linear_regression.LinearRegression` to\n predict a\
187
+ \ scalar target variable as a linear function of one or more\n features. In\
188
+ \ addition to standard numeric and categorical types, features\n can also be\
189
+ \ extracted automatically from list- or dictionary-type SFrame\n columns.\n\
190
+ \n The linear regression module can be used for ridge regression, Lasso, and\n\
191
+ \ elastic net regression (see References for more detail on these methods).\
192
+ \ By\n default, this model has an l2 regularization weight of 0.01.\n\n \
193
+ \ Parameters\n ----------\n dataset : SFrame\n The dataset to use\
194
+ \ for training the model.\n\n target : string\n Name of the column containing\
195
+ \ the target variable.\n\n features : list[string], optional\n Names\
196
+ \ of the columns containing features. 'None' (the default) indicates\n \
197
+ \ that all columns except the target variable should be used as features.\n\n\
198
+ \ The features are columns in the input SFrame that can be of the\n \
199
+ \ following types:\n\n - *Numeric*: values of numeric type integer\
200
+ \ or float.\n\n - *Categorical*: values of type string.\n\n - *Array*:\
201
+ \ list of numeric (integer or float) values. Each list element\n is treated\
202
+ \ as a separate feature in the model.\n\n - *Dictionary*: key-value pairs\
203
+ \ with numeric (integer or float) values\n Each key of a dictionary is\
204
+ \ treated as a separate feature and the\n value in the dictionary corresponds\
205
+ \ to the value of the feature.\n Dictionaries are ideal for representing\
206
+ \ sparse data.\n\n Columns of type *list* are not supported. Convert such\
207
+ \ feature\n columns to type array if all entries in the list are of numeric\n\
208
+ \ types. If the lists contain data of mixed types, separate\n them\
209
+ \ out into different columns.\n\n l2_penalty : float, optional\n Weight\
210
+ \ on the l2-regularizer of the model. The larger this weight, the\n more\
211
+ \ the model coefficients shrink toward 0. This introduces bias into\n the\
212
+ \ model but decreases variance, potentially leading to better\n predictions.\
213
+ \ The default value is 0.01; setting this parameter to 0\n corresponds\
214
+ \ to unregularized linear regression. See the ridge\n regression reference\
215
+ \ for more detail.\n\n l1_penalty : float, optional\n Weight on l1 regularization\
216
+ \ of the model. Like the l2 penalty, the\n higher the l1 penalty, the more\
217
+ \ the estimated coefficients shrink toward\n 0. The l1 penalty, however,\
218
+ \ completely zeros out sufficiently small\n coefficients, automatically\
219
+ \ indicating features that are not useful for\n the model. The default\
220
+ \ weight of 0 prevents any features from being\n discarded. See the LASSO\
221
+ \ regression reference for more detail.\n\n solver : string, optional\n \
222
+ \ Solver to use for training the model. See the references for more detail\n\
223
+ \ on each solver.\n\n - *auto (default)*: automatically chooses\
224
+ \ the best solver for the data\n and model parameters.\n - *newton*:\
225
+ \ Newton-Raphson\n - *lbfgs*: limited memory BFGS\n - *fista*: accelerated\
226
+ \ gradient descent\n\n The model is trained using a carefully engineered\
227
+ \ collection of methods\n that are automatically picked based on the input\
228
+ \ data. The ``newton``\n method works best for datasets with plenty of\
229
+ \ examples and few features\n (long datasets). Limited memory BFGS (``lbfgs``)\
230
+ \ is a robust solver for\n wide datasets (i.e datasets with many coefficients).\
231
+ \ ``fista`` is the\n default solver for l1-regularized linear regression.\
232
+ \ The solvers are\n all automatically tuned and the default options should\
233
+ \ function well.\n See the solver options guide for setting additional\
234
+ \ parameters for each\n of the solvers.\n\n See the user guide for\
235
+ \ additional details on how the solver is chosen.\n\n feature_rescaling : boolean,\
236
+ \ optional\n Feature rescaling is an important pre-processing step that\
237
+ \ ensures that\n all features are on the same scale. An l2-norm rescaling\
238
+ \ is performed\n to make sure that all features are of the same norm. Categorical\n\
239
+ \ features are also rescaled by rescaling the dummy variables that are\n\
240
+ \ used to represent them. The coefficients are returned in original scale\n\
241
+ \ of the problem. This process is particularly useful when features\n \
242
+ \ vary widely in their ranges.\n\n validation_set : SFrame, optional\n\
243
+ \n A dataset for monitoring the model's generalization performance.\n \
244
+ \ For each row of the progress table, the chosen metrics are computed\n\
245
+ \ for both the provided training dataset and the validation_set. The\n\
246
+ \ format of this SFrame must be the same as the training set.\n \
247
+ \ By default this argument is set to 'auto' and a validation set is\n automatically\
248
+ \ sampled and used for progress printing. If\n validation_set is set to\
249
+ \ None, then no additional metrics\n are computed. The default value is\
250
+ \ 'auto'.\n\n convergence_threshold : float, optional\n\n Convergence\
251
+ \ is tested using variation in the training objective. The\n variation in\
252
+ \ the training objective is calculated using the difference\n between the\
253
+ \ objective values between two steps. Consider reducing this\n below the\
254
+ \ default value (0.01) for a more accurately trained model.\n Beware of overfitting\
255
+ \ (i.e a model that works well only on the training\n data) if this parameter\
256
+ \ is set to a very low value.\n\n lbfgs_memory_level : int, optional\n\n \
257
+ \ The L-BFGS algorithm keeps track of gradient information from the\n \
258
+ \ previous ``lbfgs_memory_level`` iterations. The storage requirement for\n \
259
+ \ each of these gradients is the ``num_coefficients`` in the problem.\n \
260
+ \ Increasing the ``lbfgs_memory_level`` can help improve the quality of\n \
261
+ \ the model trained. Setting this to more than ``max_iterations`` has the\n\
262
+ \ same effect as setting it to ``max_iterations``.\n\n max_iterations\
263
+ \ : int, optional\n\n The maximum number of allowed passes through the data.\
264
+ \ More passes over\n the data can result in a more accurately trained model.\
265
+ \ Consider\n increasing this (the default value is 10) if the training accuracy\
266
+ \ is\n low and the *Grad-Norm* in the display is large.\n\n step_size\
267
+ \ : float, optional (fista only)\n\n The starting step size to use for the\
268
+ \ ``fista`` and ``gd`` solvers. The\n default is set to 1.0, this is an aggressive\
269
+ \ setting. If the first\n iteration takes a considerable amount of time,\
270
+ \ reducing this parameter\n may speed up model training.\n\n verbose :\
271
+ \ bool, optional\n If True, print progress updates.\n\n Returns\n \
272
+ \ -------\n out : LinearRegression\n A trained model of type\n \
273
+ \ :class:`~turicreate.linear_regression.LinearRegression`.\n\n See Also\n\
274
+ \ --------\n LinearRegression, turicreate.boosted_trees_regression.BoostedTreesRegression,\
275
+ \ turicreate.regression.create\n\n Notes\n -----\n - Categorical variables\
276
+ \ are encoded by creating dummy variables. For a\n variable with :math:`K`\
277
+ \ categories, the encoding creates :math:`K-1` dummy\n variables, while the\
278
+ \ first category encountered in the data is used as the\n baseline.\n\n \
279
+ \ - For prediction and evaluation of linear regression models with sparse\n\
280
+ \ dictionary inputs, new keys/columns that were not seen during training\n\
281
+ \ are silently ignored.\n\n - Any 'None' values in the data will result\
282
+ \ in an error being thrown.\n\n - A constant term is automatically added for\
283
+ \ the model intercept. This term\n is not regularized.\n\n - Standard\
284
+ \ errors on coefficients are only available when `solver=newton`\n or when\
285
+ \ the default `auto` solver option chooses the newton method and if\n the\
286
+ \ number of examples in the training data is more than the number of\n coefficients.\
287
+ \ If standard errors cannot be estimated, a column of `None`\n values are\
288
+ \ returned.\n\n\n References\n ----------\n - Hoerl, A.E. and Kennard,\
289
+ \ R.W. (1970) `Ridge regression: Biased Estimation\n for Nonorthogonal Problems\n\
290
+ \ <http://amstat.tandfonline.com/doi/abs/10.1080/00401706.1970.10488634>`_.\n\
291
+ \ Technometrics 12(1) pp.55-67\n\n - Tibshirani, R. (1996) `Regression\
292
+ \ Shrinkage and Selection via the Lasso <h\n ttp://www.jstor.org/discover/10.2307/2346178?uid=3739256&uid=2&uid=4&sid=2\n\
293
+ \ 1104169934983>`_. Journal of the Royal Statistical Society. Series B\n\
294
+ \ (Methodological) 58(1) pp.267-288.\n\n - Zhu, C., et al. (1997) `Algorithm\
295
+ \ 778: L-BFGS-B: Fortran subroutines for\n large-scale bound-constrained\
296
+ \ optimization\n <https://dl.acm.org/citation.cfm?id=279236>`_. ACM Transactions\
297
+ \ on\n Mathematical Software 23(4) pp.550-560.\n\n - Barzilai, J. and\
298
+ \ Borwein, J. `Two-Point Step Size Gradient Methods\n <http://imajna.oxfordjournals.org/content/8/1/141.short>`_.\
299
+ \ IMA Journal of\n Numerical Analysis 8(1) pp.141-148.\n\n - Beck, A.\
300
+ \ and Teboulle, M. (2009) `A Fast Iterative Shrinkage-Thresholding\n Algorithm\
301
+ \ for Linear Inverse Problems\n <http://epubs.siam.org/doi/abs/10.1137/080716542>`_.\
302
+ \ SIAM Journal on\n Imaging Sciences 2(1) pp.183-202.\n\n - Zhang, T.\
303
+ \ (2004) `Solving large scale linear prediction problems using\n stochastic\
304
+ \ gradient descent algorithms\n <https://dl.acm.org/citation.cfm?id=1015332>`_.\
305
+ \ ICML '04: Proceedings of\n the twenty-first international conference on\
306
+ \ Machine learning p.116.\n\n\n Examples\n --------\n\n Given an :class:`~turicreate.SFrame`\
307
+ \ ``sf`` with a list of columns\n [``feature_1`` ... ``feature_K``] denoting\
308
+ \ features and a target column\n ``target``, we can create a\n :class:`~turicreate.linear_regression.LinearRegression`\
309
+ \ as follows:\n\n >>> data = turicreate.SFrame('https://static.turi.com/datasets/regression/houses.csv')\n\
310
+ \n >>> model = turicreate.linear_regression.create(data, target='price',\n\
311
+ \ ... features=['bath', 'bedroom', 'size'])\n\
312
+ \n\n For ridge regression, we can set the ``l2_penalty`` parameter higher (the\n\
313
+ \ default is 0.01). For Lasso regression, we set the l1_penalty higher, and\n\
314
+ \ for elastic net, we set both to be higher.\n\n .. sourcecode:: python\n\
315
+ \n # Ridge regression\n >>> model_ridge = turicreate.linear_regression.create(data,\
316
+ \ 'price', l2_penalty=0.1)\n\n # Lasso\n >>> model_lasso = turicreate.linear_regression.create(data,\
317
+ \ 'price', l2_penalty=0.,\n \
318
+ \ l1_penalty=1.0)\n\n # Elastic net regression\n >>>\
319
+ \ model_enet = turicreate.linear_regression.create(data, 'price', l2_penalty=0.5,\n\
320
+ \ l1_penalty=0.5)\n\
321
+ \n \"\"\"\n\n # Regression model names.\n model_name = \"regression_linear_regression\"\
322
+ \n solver = solver.lower()\n\n model = _sl.create(dataset, target, model_name,\
323
+ \ features=features,\n validation_set = validation_set,\n\
324
+ \ solver = solver, verbose = verbose,\n \
325
+ \ l2_penalty=l2_penalty, l1_penalty = l1_penalty,\n \
326
+ \ feature_rescaling = feature_rescaling,\n convergence_threshold\
327
+ \ = convergence_threshold,\n step_size = step_size,\n \
328
+ \ lbfgs_memory_level = lbfgs_memory_level,\n \
329
+ \ max_iterations = max_iterations)\n\n return LinearRegression(model.__proxy__)"
330
+ - "def restore(self) -> None:\n \"\"\"\n Restore the backed-up (non-average)\
331
+ \ parameter values.\n \"\"\"\n for name, parameter in self._parameters:\n\
332
+ \ parameter.data.copy_(self._backups[name])"
333
+ - "def _get_sdict(self, env):\n \"\"\"\n Returns a dictionary mapping\
334
+ \ all of the source suffixes of all\n src_builders of this Builder to the\
335
+ \ underlying Builder that\n should be called first.\n\n This dictionary\
336
+ \ is used for each target specified, so we save a\n lot of extra computation\
337
+ \ by memoizing it for each construction\n environment.\n\n Note\
338
+ \ that this is re-computed each time, not cached, because there\n might\
339
+ \ be changes to one of our source Builders (or one of their\n source Builders,\
340
+ \ and so on, and so on...) that we can't \"see.\"\n\n The underlying methods\
341
+ \ we call cache their computed values,\n though, so we hope repeatedly\
342
+ \ aggregating them into a dictionary\n like this won't be too big a hit.\
343
+ \ We may need to look for a\n better way to do this if performance data\
344
+ \ show this has turned\n into a significant bottleneck.\n \"\"\"\
345
+ \n sdict = {}\n for bld in self.get_src_builders(env):\n \
346
+ \ for suf in bld.src_suffixes(env):\n sdict[suf] = bld\n \
347
+ \ return sdict"
348
+ - source_sentence: Traverse the tree below node looking for 'yield [expr]'.
349
+ sentences:
350
+ - "def retrieve_sources():\n \"\"\"Retrieve sources using spectool\n \"\"\"\
351
+ \n spectool = find_executable('spectool')\n if not spectool:\n log.warn('spectool\
352
+ \ is not installed')\n return\n try:\n specfile = spec_fn()\n\
353
+ \ except Exception:\n return\n\n cmd = [spectool, \"-g\", specfile]\n\
354
+ \ output = subprocess.check_output(' '.join(cmd), shell=True)\n log.warn(output)"
355
+ - "def check_subscription(self, request):\n\t\t\"\"\"Redirect to the subscribe page\
356
+ \ if the user lacks an active subscription.\"\"\"\n\t\tsubscriber = subscriber_request_callback(request)\n\
357
+ \n\t\tif not subscriber_has_active_subscription(subscriber):\n\t\t\tif not SUBSCRIPTION_REDIRECT:\n\
358
+ \t\t\t\traise ImproperlyConfigured(\"DJSTRIPE_SUBSCRIPTION_REDIRECT is not set.\"\
359
+ )\n\t\t\treturn redirect(SUBSCRIPTION_REDIRECT)"
360
+ - "def is_generator(self, node):\n \"\"\"Traverse the tree below node looking\
361
+ \ for 'yield [expr]'.\"\"\"\n results = {}\n if self.yield_expr.match(node,\
362
+ \ results):\n return True\n for child in node.children:\n \
363
+ \ if child.type not in (syms.funcdef, syms.classdef):\n \
364
+ \ if self.is_generator(child):\n return True\n return\
365
+ \ False"
366
+ - source_sentence: "Retrieves the content of an input given a DataSource. The input\
367
+ \ acts like a filter over the outputs of the DataSource.\n\n Args:\n \
368
+ \ name (str): The name of the input.\n ds (openflow.DataSource):\
369
+ \ The DataSource that will feed the data.\n\n Returns:\n pandas.DataFrame:\
370
+ \ The content of the input."
371
+ sentences:
372
+ - "def valid_state(state: str) -> bool:\n \"\"\"Validate State Argument\n\n \
373
+ \ Checks that either 'on' or 'off' was entered as an argument to the\n CLI\
374
+ \ and make it lower case.\n\n :param state: state to validate.\n\n :returns:\
375
+ \ True if state is valid.\n\n .. versionchanged:: 0.0.12\n This moethod\
376
+ \ was renamed from validateState to valid_state to conform\n to PEP-8.\
377
+ \ Also removed \"magic\" text for state and instead reference the\n _VALID_STATES\
378
+ \ constant.\n \"\"\"\n lower_case_state = state.lower()\n\n if lower_case_state\
379
+ \ in _VALID_STATES:\n return True\n return False"
380
+ - "def get_input(self, name, ds):\n \"\"\"\n Retrieves the content\
381
+ \ of an input given a DataSource. The input acts like a filter over the outputs\
382
+ \ of the DataSource.\n\n Args:\n name (str): The name of the\
383
+ \ input.\n ds (openflow.DataSource): The DataSource that will feed\
384
+ \ the data.\n\n Returns:\n pandas.DataFrame: The content of\
385
+ \ the input.\n \"\"\"\n columns = self.inputs.get(name)\n \
386
+ \ df = ds.get_dataframe()\n\n # set defaults\n for column in columns:\n\
387
+ \ if column not in df.columns:\n df[column] = self.defaults.get(column)\n\
388
+ \n return df[columns]"
389
+ - "def get_scenario_data(scenario_id,**kwargs):\n \"\"\"\n Get all the\
390
+ \ datasets from the group with the specified name\n @returns a list of\
391
+ \ dictionaries\n \"\"\"\n user_id = kwargs.get('user_id')\n\n scenario_data\
392
+ \ = db.DBSession.query(Dataset).filter(Dataset.id==ResourceScenario.dataset_id,\
393
+ \ ResourceScenario.scenario_id==scenario_id).options(joinedload_all('metadata')).distinct().all()\n\
394
+ \n for sd in scenario_data:\n if sd.hidden == 'Y':\n try:\n\
395
+ \ sd.check_read_permission(user_id)\n except:\n \
396
+ \ sd.value = None\n sd.metadata = []\n\n db.DBSession.expunge_all()\n\
397
+ \n log.info(\"Retrieved %s datasets\", len(scenario_data))\n return scenario_data"
398
+ - source_sentence: "Split the data object along a given expression, in units.\n\n\
399
+ \ Parameters\n ----------\n expression : int or str\n \
400
+ \ The expression to split along. If given as an integer, the axis at that\
401
+ \ index\n is used.\n positions : number-type or 1D array-type\n\
402
+ \ The position(s) to split at, in units.\n units : str (optional)\n\
403
+ \ The units of the given positions. Default is same, which assumes\n\
404
+ \ input units are identical to first variable units.\n parent\
405
+ \ : WrightTools.Collection (optional)\n The parent collection in which\
406
+ \ to place the 'split' collection.\n Default is a new Collection.\n\
407
+ \ verbose : bool (optional)\n Toggle talkback. Default is True.\n\
408
+ \n Returns\n -------\n WrightTools.collection.Collection\n\
409
+ \ A Collection of data objects.\n The order of the objects\
410
+ \ is such that the axis points retain their original order.\n\n See Also\n\
411
+ \ --------\n chop\n Divide the dataset into its lower-dimensionality\
412
+ \ components.\n collapse\n Collapse the dataset along one axis."
413
+ sentences:
414
+ - "def add_item(self, title, key, synonyms=None, description=None, img_url=None):\n\
415
+ \ \"\"\"Adds item to a list or carousel card.\n\n A list must contain\
416
+ \ at least 2 items, each requiring a title and object key.\n\n Arguments:\n\
417
+ \ title {str} -- Name of the item object\n key {str} --\
418
+ \ Key refering to the item.\n This string will be used\
419
+ \ to send a query to your app if selected\n\n Keyword Arguments:\n \
420
+ \ synonyms {list} -- Words and phrases the user may send to select the\
421
+ \ item\n (default: {None})\n description\
422
+ \ {str} -- A description of the item (default: {None})\n img_url {str}\
423
+ \ -- URL of the image to represent the item (default: {None})\n \"\"\"\n\
424
+ \ item = build_item(title, key, synonyms, description, img_url)\n \
425
+ \ self._items.append(item)\n return self"
426
+ - "def compare(a, b):\n \"\"\"Compares two timestamps.\n\n ``a`` and ``b``\
427
+ \ must be the same type, in addition to normal\n representations of timestamps\
428
+ \ that order naturally, they can be rfc3339\n formatted strings.\n\n Args:\n\
429
+ \ a (string|object): a timestamp\n b (string|object): another timestamp\n\
430
+ \n Returns:\n int: -1 if a < b, 0 if a == b or 1 if a > b\n\n Raises:\n\
431
+ \ ValueError: if a or b are not the same type\n ValueError: if a or\
432
+ \ b strings but not in valid rfc3339 format\n\n \"\"\"\n a_is_text = isinstance(a,\
433
+ \ basestring)\n b_is_text = isinstance(b, basestring)\n if type(a) != type(b)\
434
+ \ and not (a_is_text and b_is_text):\n _logger.error(u'Cannot compare %s\
435
+ \ to %s, types differ %s!=%s',\n a, b, type(a), type(b))\n\
436
+ \ raise ValueError(u'cannot compare inputs of differing types')\n\n \
437
+ \ if a_is_text:\n a = from_rfc3339(a, with_nanos=True)\n b = from_rfc3339(b,\
438
+ \ with_nanos=True)\n\n if a < b:\n return -1\n elif a > b:\n \
439
+ \ return 1\n else:\n return 0"
440
+ - "def split(\n self, expression, positions, *, units=None, parent=None,\
441
+ \ verbose=True\n ) -> wt_collection.Collection:\n \"\"\"\n Split\
442
+ \ the data object along a given expression, in units.\n\n Parameters\n\
443
+ \ ----------\n expression : int or str\n The expression\
444
+ \ to split along. If given as an integer, the axis at that index\n \
445
+ \ is used.\n positions : number-type or 1D array-type\n The\
446
+ \ position(s) to split at, in units.\n units : str (optional)\n \
447
+ \ The units of the given positions. Default is same, which assumes\n \
448
+ \ input units are identical to first variable units.\n parent : WrightTools.Collection\
449
+ \ (optional)\n The parent collection in which to place the 'split'\
450
+ \ collection.\n Default is a new Collection.\n verbose : bool\
451
+ \ (optional)\n Toggle talkback. Default is True.\n\n Returns\n\
452
+ \ -------\n WrightTools.collection.Collection\n A Collection\
453
+ \ of data objects.\n The order of the objects is such that the axis\
454
+ \ points retain their original order.\n\n See Also\n --------\n\
455
+ \ chop\n Divide the dataset into its lower-dimensionality components.\n\
456
+ \ collapse\n Collapse the dataset along one axis.\n \"\
457
+ \"\"\n # axis ------------------------------------------------------------------------------------\n\
458
+ \ old_expr = self.axis_expressions\n old_units = self.units\n \
459
+ \ out = wt_collection.Collection(name=\"split\", parent=parent)\n \
460
+ \ if isinstance(expression, int):\n if units is None:\n \
461
+ \ units = self._axes[expression].units\n expression = self._axes[expression].expression\n\
462
+ \ elif isinstance(expression, str):\n pass\n else:\n\
463
+ \ raise TypeError(\"expression: expected {int, str}, got %s\" % type(expression))\n\
464
+ \n self.transform(expression)\n if units:\n self.convert(units)\n\
465
+ \n try:\n positions = [-np.inf] + sorted(list(positions)) +\
466
+ \ [np.inf]\n except TypeError:\n positions = [-np.inf, positions,\
467
+ \ np.inf]\n\n values = self._axes[0].full\n masks = [(values >=\
468
+ \ lo) & (values < hi) for lo, hi in wt_kit.pairwise(positions)]\n omasks\
469
+ \ = []\n cuts = []\n for mask in masks:\n try:\n \
470
+ \ omasks.append(wt_kit.mask_reduce(mask))\n cuts.append([i\
471
+ \ == 1 for i in omasks[-1].shape])\n # Ensure at least one axis\
472
+ \ is kept\n if np.all(cuts[-1]):\n cuts[-1][0]\
473
+ \ = False\n except ValueError:\n omasks.append(None)\n\
474
+ \ cuts.append(None)\n for i in range(len(positions) - 1):\n\
475
+ \ out.create_data(\"split%03i\" % i)\n\n for var in self.variables:\n\
476
+ \ for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):\n\
477
+ \ if omask is None:\n # Zero length split\n\
478
+ \ continue\n omask = wt_kit.enforce_mask_shape(omask,\
479
+ \ var.shape)\n omask.shape = tuple([s for s, c in zip(omask.shape,\
480
+ \ cut) if not c])\n out_arr = np.full(omask.shape, np.nan)\n \
481
+ \ imask = wt_kit.enforce_mask_shape(imask, var.shape)\n \
482
+ \ out_arr[omask] = var[:][imask]\n out[i].create_variable(values=out_arr,\
483
+ \ **var.attrs)\n\n for ch in self.channels:\n for i, (imask,\
484
+ \ omask, cut) in enumerate(zip(masks, omasks, cuts)):\n if omask\
485
+ \ is None:\n # Zero length split\n continue\n\
486
+ \ omask = wt_kit.enforce_mask_shape(omask, ch.shape)\n \
487
+ \ omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])\n\
488
+ \ out_arr = np.full(omask.shape, np.nan)\n imask\
489
+ \ = wt_kit.enforce_mask_shape(imask, ch.shape)\n out_arr[omask]\
490
+ \ = ch[:][imask]\n out[i].create_channel(values=out_arr, **ch.attrs)\n\
491
+ \n if verbose:\n for d in out.values():\n try:\n\
492
+ \ d.transform(expression)\n except IndexError:\n\
493
+ \ continue\n\n print(\"split data into {0} pieces\
494
+ \ along <{1}>:\".format(len(positions) - 1, expression))\n for i, (lo,\
495
+ \ hi) in enumerate(wt_kit.pairwise(positions)):\n new_data = out[i]\n\
496
+ \ if new_data.shape == ():\n print(\" {0} :\
497
+ \ None\".format(i))\n else:\n new_axis = new_data.axes[0]\n\
498
+ \ print(\n \" {0} : {1:0.2f} to {2:0.2f}\
499
+ \ {3} {4}\".format(\n i, lo, hi, new_axis.units, new_axis.shape\n\
500
+ \ )\n )\n\n for d in out.values():\n\
501
+ \ try:\n d.transform(*old_expr)\n keep\
502
+ \ = []\n keep_units = []\n for ax in d.axes:\n \
503
+ \ if ax.size > 1:\n keep.append(ax.expression)\n\
504
+ \ keep_units.append(ax.units)\n else:\n\
505
+ \ d.create_constant(ax.expression, verbose=False)\n \
506
+ \ d.transform(*keep)\n for ax, u in zip(d.axes, keep_units):\n\
507
+ \ ax.convert(u)\n except IndexError:\n \
508
+ \ continue\n tempax = Axis(d, expression)\n if all(\n\
509
+ \ np.all(\n np.sum(~np.isnan(tempax.masked),\
510
+ \ axis=tuple(set(range(tempax.ndim)) - {j}))\n <= 1\n \
511
+ \ )\n for j in range(tempax.ndim)\n ):\n \
512
+ \ d.create_constant(expression, verbose=False)\n self.transform(*old_expr)\n\
513
+ \ for ax, u in zip(self.axes, old_units):\n ax.convert(u)\n\n\
514
+ \ return out"
515
+ pipeline_tag: sentence-similarity
516
+ library_name: sentence-transformers
517
+ metrics:
518
+ - cosine_accuracy@1
519
+ - cosine_accuracy@3
520
+ - cosine_accuracy@5
521
+ - cosine_accuracy@10
522
+ - cosine_precision@1
523
+ - cosine_precision@3
524
+ - cosine_precision@5
525
+ - cosine_precision@10
526
+ - cosine_recall@1
527
+ - cosine_recall@3
528
+ - cosine_recall@5
529
+ - cosine_recall@10
530
+ - cosine_ndcg@10
531
+ - cosine_mrr@10
532
+ - cosine_map@100
533
+ model-index:
534
+ - name: SentenceTransformer based on benjamintli/modernbert-cosqa
535
+ results:
536
+ - task:
537
+ type: information-retrieval
538
+ name: Information Retrieval
539
+ dataset:
540
+ name: eval
541
+ type: eval
542
+ metrics:
543
+ - type: cosine_accuracy@1
544
+ value: 0.9480526153529956
545
+ name: Cosine Accuracy@1
546
+ - type: cosine_accuracy@3
547
+ value: 0.9703010995786662
548
+ name: Cosine Accuracy@3
549
+ - type: cosine_accuracy@5
550
+ value: 0.9751824067413422
551
+ name: Cosine Accuracy@5
552
+ - type: cosine_accuracy@10
553
+ value: 0.9806803000719351
554
+ name: Cosine Accuracy@10
555
+ - type: cosine_precision@1
556
+ value: 0.9480526153529956
557
+ name: Cosine Precision@1
558
+ - type: cosine_precision@3
559
+ value: 0.32343369985955533
560
+ name: Cosine Precision@3
561
+ - type: cosine_precision@5
562
+ value: 0.19503648134826843
563
+ name: Cosine Precision@5
564
+ - type: cosine_precision@10
565
+ value: 0.09806803000719352
566
+ name: Cosine Precision@10
567
+ - type: cosine_recall@1
568
+ value: 0.9480526153529956
569
+ name: Cosine Recall@1
570
+ - type: cosine_recall@3
571
+ value: 0.9703010995786662
572
+ name: Cosine Recall@3
573
+ - type: cosine_recall@5
574
+ value: 0.9751824067413422
575
+ name: Cosine Recall@5
576
+ - type: cosine_recall@10
577
+ value: 0.9806803000719351
578
+ name: Cosine Recall@10
579
+ - type: cosine_ndcg@10
580
+ value: 0.9652143122800294
581
+ name: Cosine Ndcg@10
582
+ - type: cosine_mrr@10
583
+ value: 0.9601788099886978
584
+ name: Cosine Mrr@10
585
+ - type: cosine_map@100
586
+ value: 0.9606213024321194
587
+ name: Cosine Map@100
588
+ ---
589
+
590
+ # SentenceTransformer based on benjamintli/modernbert-cosqa
591
+
592
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [benjamintli/modernbert-cosqa](https://huggingface.co/benjamintli/modernbert-cosqa). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
593
+
594
+ ## Model Details
595
+
596
+ ### Model Description
597
+ - **Model Type:** Sentence Transformer
598
+ - **Base model:** [benjamintli/modernbert-cosqa](https://huggingface.co/benjamintli/modernbert-cosqa) <!-- at revision 8d7c40aabc62d4956cab19aec28165b206d86790 -->
599
+ - **Maximum Sequence Length:** 512 tokens
600
+ - **Output Dimensionality:** 768 dimensions
601
+ - **Similarity Function:** Cosine Similarity
602
+ <!-- - **Training Dataset:** Unknown -->
603
+ <!-- - **Language:** Unknown -->
604
+ <!-- - **License:** Unknown -->
605
+
606
+ ### Model Sources
607
+
608
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
609
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
610
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
611
+
612
+ ### Full Model Architecture
613
+
614
+ ```
615
+ SentenceTransformer(
616
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'OptimizedModule'})
617
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
618
+ )
619
+ ```
620
+
621
+ ## Usage
622
+
623
+ ### Direct Usage (Sentence Transformers)
624
+
625
+ First install the Sentence Transformers library:
626
+
627
+ ```bash
628
+ pip install -U sentence-transformers
629
+ ```
630
+
631
+ Then you can load this model and run inference.
632
+ ```python
633
+ from sentence_transformers import SentenceTransformer
634
+
635
+ # Download from the 🤗 Hub
636
+ model = SentenceTransformer("modernbert-codesearchnet")
637
+ # Run inference
638
+ queries = [
639
+ "Split the data object along a given expression, in units.\n\n Parameters\n ----------\n expression : int or str\n The expression to split along. If given as an integer, the axis at that index\n is used.\n positions : number-type or 1D array-type\n The position(s) to split at, in units.\n units : str (optional)\n The units of the given positions. Default is same, which assumes\n input units are identical to first variable units.\n parent : WrightTools.Collection (optional)\n The parent collection in which to place the \u0027split\u0027 collection.\n Default is a new Collection.\n verbose : bool (optional)\n Toggle talkback. Default is True.\n\n Returns\n -------\n WrightTools.collection.Collection\n A Collection of data objects.\n The order of the objects is such that the axis points retain their original order.\n\n See Also\n --------\n chop\n Divide the dataset into its lower-dimensionality components.\n collapse\n Collapse the dataset along one axis.",
640
+ ]
641
+ documents = [
642
+ 'def split(\n self, expression, positions, *, units=None, parent=None, verbose=True\n ) -> wt_collection.Collection:\n """\n Split the data object along a given expression, in units.\n\n Parameters\n ----------\n expression : int or str\n The expression to split along. If given as an integer, the axis at that index\n is used.\n positions : number-type or 1D array-type\n The position(s) to split at, in units.\n units : str (optional)\n The units of the given positions. Default is same, which assumes\n input units are identical to first variable units.\n parent : WrightTools.Collection (optional)\n The parent collection in which to place the \'split\' collection.\n Default is a new Collection.\n verbose : bool (optional)\n Toggle talkback. Default is True.\n\n Returns\n -------\n WrightTools.collection.Collection\n A Collection of data objects.\n The order of the objects is such that the axis points retain their original order.\n\n See Also\n --------\n chop\n Divide the dataset into its lower-dimensionality components.\n collapse\n Collapse the dataset along one axis.\n """\n # axis ------------------------------------------------------------------------------------\n old_expr = self.axis_expressions\n old_units = self.units\n out = wt_collection.Collection(name="split", parent=parent)\n if isinstance(expression, int):\n if units is None:\n units = self._axes[expression].units\n expression = self._axes[expression].expression\n elif isinstance(expression, str):\n pass\n else:\n raise TypeError("expression: expected {int, str}, got %s" % type(expression))\n\n self.transform(expression)\n if units:\n self.convert(units)\n\n try:\n positions = [-np.inf] + sorted(list(positions)) + [np.inf]\n except TypeError:\n positions = [-np.inf, positions, np.inf]\n\n values = self._axes[0].full\n masks = [(values >= lo) & (values < hi) for lo, hi in wt_kit.pairwise(positions)]\n omasks = []\n cuts = []\n for mask in masks:\n try:\n omasks.append(wt_kit.mask_reduce(mask))\n cuts.append([i == 1 for i in omasks[-1].shape])\n # Ensure at least one axis is kept\n if np.all(cuts[-1]):\n cuts[-1][0] = False\n except ValueError:\n omasks.append(None)\n cuts.append(None)\n for i in range(len(positions) - 1):\n out.create_data("split%03i" % i)\n\n for var in self.variables:\n for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):\n if omask is None:\n # Zero length split\n continue\n omask = wt_kit.enforce_mask_shape(omask, var.shape)\n omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])\n out_arr = np.full(omask.shape, np.nan)\n imask = wt_kit.enforce_mask_shape(imask, var.shape)\n out_arr[omask] = var[:][imask]\n out[i].create_variable(values=out_arr, **var.attrs)\n\n for ch in self.channels:\n for i, (imask, omask, cut) in enumerate(zip(masks, omasks, cuts)):\n if omask is None:\n # Zero length split\n continue\n omask = wt_kit.enforce_mask_shape(omask, ch.shape)\n omask.shape = tuple([s for s, c in zip(omask.shape, cut) if not c])\n out_arr = np.full(omask.shape, np.nan)\n imask = wt_kit.enforce_mask_shape(imask, ch.shape)\n out_arr[omask] = ch[:][imask]\n out[i].create_channel(values=out_arr, **ch.attrs)\n\n if verbose:\n for d in out.values():\n try:\n d.transform(expression)\n except IndexError:\n continue\n\n print("split data into {0} pieces along <{1}>:".format(len(positions) - 1, expression))\n for i, (lo, hi) in enumerate(wt_kit.pairwise(positions)):\n new_data = out[i]\n if new_data.shape == ():\n print(" {0} : None".format(i))\n else:\n new_axis = new_data.axes[0]\n print(\n " {0} : {1:0.2f} to {2:0.2f} {3} {4}".format(\n i, lo, hi, new_axis.units, new_axis.shape\n )\n )\n\n for d in out.values():\n try:\n d.transform(*old_expr)\n keep = []\n keep_units = []\n for ax in d.axes:\n if ax.size > 1:\n keep.append(ax.expression)\n keep_units.append(ax.units)\n else:\n d.create_constant(ax.expression, verbose=False)\n d.transform(*keep)\n for ax, u in zip(d.axes, keep_units):\n ax.convert(u)\n except IndexError:\n continue\n tempax = Axis(d, expression)\n if all(\n np.all(\n np.sum(~np.isnan(tempax.masked), axis=tuple(set(range(tempax.ndim)) - {j}))\n <= 1\n )\n for j in range(tempax.ndim)\n ):\n d.create_constant(expression, verbose=False)\n self.transform(*old_expr)\n for ax, u in zip(self.axes, old_units):\n ax.convert(u)\n\n return out',
643
+ 'def add_item(self, title, key, synonyms=None, description=None, img_url=None):\n """Adds item to a list or carousel card.\n\n A list must contain at least 2 items, each requiring a title and object key.\n\n Arguments:\n title {str} -- Name of the item object\n key {str} -- Key refering to the item.\n This string will be used to send a query to your app if selected\n\n Keyword Arguments:\n synonyms {list} -- Words and phrases the user may send to select the item\n (default: {None})\n description {str} -- A description of the item (default: {None})\n img_url {str} -- URL of the image to represent the item (default: {None})\n """\n item = build_item(title, key, synonyms, description, img_url)\n self._items.append(item)\n return self',
644
+ 'def compare(a, b):\n """Compares two timestamps.\n\n ``a`` and ``b`` must be the same type, in addition to normal\n representations of timestamps that order naturally, they can be rfc3339\n formatted strings.\n\n Args:\n a (string|object): a timestamp\n b (string|object): another timestamp\n\n Returns:\n int: -1 if a < b, 0 if a == b or 1 if a > b\n\n Raises:\n ValueError: if a or b are not the same type\n ValueError: if a or b strings but not in valid rfc3339 format\n\n """\n a_is_text = isinstance(a, basestring)\n b_is_text = isinstance(b, basestring)\n if type(a) != type(b) and not (a_is_text and b_is_text):\n _logger.error(u\'Cannot compare %s to %s, types differ %s!=%s\',\n a, b, type(a), type(b))\n raise ValueError(u\'cannot compare inputs of differing types\')\n\n if a_is_text:\n a = from_rfc3339(a, with_nanos=True)\n b = from_rfc3339(b, with_nanos=True)\n\n if a < b:\n return -1\n elif a > b:\n return 1\n else:\n return 0',
645
+ ]
646
+ query_embeddings = model.encode_query(queries)
647
+ document_embeddings = model.encode_document(documents)
648
+ print(query_embeddings.shape, document_embeddings.shape)
649
+ # [1, 768] [3, 768]
650
+
651
+ # Get the similarity scores for the embeddings
652
+ similarities = model.similarity(query_embeddings, document_embeddings)
653
+ print(similarities)
654
+ # tensor([[0.9188, 0.1817, 0.1583]])
655
+ ```
656
+
657
+ <!--
658
+ ### Direct Usage (Transformers)
659
+
660
+ <details><summary>Click to see the direct usage in Transformers</summary>
661
+
662
+ </details>
663
+ -->
664
+
665
+ <!--
666
+ ### Downstream Usage (Sentence Transformers)
667
+
668
+ You can finetune this model on your own dataset.
669
+
670
+ <details><summary>Click to expand</summary>
671
+
672
+ </details>
673
+ -->
674
+
675
+ <!--
676
+ ### Out-of-Scope Use
677
+
678
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
679
+ -->
680
+
681
+ ## Evaluation
682
+
683
+ ### Metrics
684
+
685
+ #### Information Retrieval
686
+
687
+ * Dataset: `eval`
688
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
689
+
690
+ | Metric | Value |
691
+ |:--------------------|:-----------|
692
+ | cosine_accuracy@1 | 0.9481 |
693
+ | cosine_accuracy@3 | 0.9703 |
694
+ | cosine_accuracy@5 | 0.9752 |
695
+ | cosine_accuracy@10 | 0.9807 |
696
+ | cosine_precision@1 | 0.9481 |
697
+ | cosine_precision@3 | 0.3234 |
698
+ | cosine_precision@5 | 0.195 |
699
+ | cosine_precision@10 | 0.0981 |
700
+ | cosine_recall@1 | 0.9481 |
701
+ | cosine_recall@3 | 0.9703 |
702
+ | cosine_recall@5 | 0.9752 |
703
+ | cosine_recall@10 | 0.9807 |
704
+ | **cosine_ndcg@10** | **0.9652** |
705
+ | cosine_mrr@10 | 0.9602 |
706
+ | cosine_map@100 | 0.9606 |
707
+
708
+ <!--
709
+ ## Bias, Risks and Limitations
710
+
711
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
712
+ -->
713
+
714
+ <!--
715
+ ### Recommendations
716
+
717
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
718
+ -->
719
+
720
+ ## Training Details
721
+
722
+ ### Training Dataset
723
+
724
+ #### Unnamed Dataset
725
+
726
+ * Size: 369,762 training samples
727
+ * Columns: <code>query</code> and <code>positive</code>
728
+ * Approximate statistics based on the first 1000 samples:
729
+ | | query | positive |
730
+ |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
731
+ | type | string | string |
732
+ | details | <ul><li>min: 3 tokens</li><li>mean: 71.9 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 37 tokens</li><li>mean: 236.1 tokens</li><li>max: 512 tokens</li></ul> |
733
+ * Samples:
734
+ | query | positive |
735
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
736
+ | <code>Returns group object for datacenter root group.<br><br> >>> clc.v2.Datacenter().RootGroup()<br> <clc.APIv2.group.Group object at 0x105feacd0><br> >>> print _<br> WA1 Hardware</code> | <code>def RootGroup(self):<br> """Returns group object for datacenter root group.<br><br> >>> clc.v2.Datacenter().RootGroup()<br> <clc.APIv2.group.Group object at 0x105feacd0><br> >>> print _<br> WA1 Hardware<br><br> """<br><br> return(clc.v2.Group(id=self.root_group_id,alias=self.alias,session=self.session))</code> |
737
+ | <code>Calculate the euclidean distance of all array positions in "matchArr".<br><br> :param matchArr: a dictionary of ``numpy.arrays`` containing at least two<br> entries that are treated as cartesian coordinates.<br> :param tKey: #TODO: docstring<br> :param mKey: #TODO: docstring<br><br> :returns: #TODO: docstring<br><br> {'eucDist': numpy.array([eucDistance, eucDistance, ...]),<br> 'posPairs': numpy.array([[pos1, pos2], [pos1, pos2], ...])<br> }</code> | <code>def calcDistMatchArr(matchArr, tKey, mKey):<br> """Calculate the euclidean distance of all array positions in "matchArr".<br><br> :param matchArr: a dictionary of ``numpy.arrays`` containing at least two<br> entries that are treated as cartesian coordinates.<br> :param tKey: #TODO: docstring<br> :param mKey: #TODO: docstring<br><br> :returns: #TODO: docstring<br><br> {'eucDist': numpy.array([eucDistance, eucDistance, ...]),<br> 'posPairs': numpy.array([[pos1, pos2], [pos1, pos2], ...])<br> }<br> """<br> #Calculate all sorted list of all eucledian feature distances<br> matchArrSize = listvalues(matchArr)[0].size<br><br> distInfo = {'posPairs': list(), 'eucDist': list()}<br> _matrix = numpy.swapaxes(numpy.array([matchArr[tKey], matchArr[mKey]]), 0, 1)<br><br> for pos1 in range(matchArrSize-1):<br> for pos2 in range(pos1+1, matchArrSize):<br> distInfo['posPairs'].append((pos1, pos2))<br> distInfo['posPairs'] = numpy.array(distInfo['posPairs'])<br> distInfo['eucD...</code> |
738
+ | <code>Format this verifier<br><br> Returns:<br> string: A formatted string</code> | <code>def format(self, indent_level, indent_size=4):<br> """Format this verifier<br><br> Returns:<br> string: A formatted string<br> """<br><br> name = self.format_name('Literal', indent_size)<br><br> if self.long_desc is not None:<br> name += '\n'<br><br> name += self.wrap_lines('value: %s\n' % str(self._literal), 1, indent_size)<br><br> return self.wrap_lines(name, indent_level, indent_size)</code> |
739
+ * Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
740
+ ```json
741
+ {
742
+ "scale": 20.0,
743
+ "similarity_fct": "cos_sim",
744
+ "mini_batch_size": 64,
745
+ "gather_across_devices": false,
746
+ "directions": [
747
+ "query_to_doc"
748
+ ],
749
+ "partition_mode": "joint",
750
+ "hardness_mode": null,
751
+ "hardness_strength": 0.0
752
+ }
753
+ ```
754
+
755
+ ### Evaluation Dataset
756
+
757
+ #### Unnamed Dataset
758
+
759
+ * Size: 19,462 evaluation samples
760
+ * Columns: <code>query</code> and <code>positive</code>
761
+ * Approximate statistics based on the first 1000 samples:
762
+ | | query | positive |
763
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
764
+ | type | string | string |
765
+ | details | <ul><li>min: 3 tokens</li><li>mean: 71.05 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 40 tokens</li><li>mean: 236.22 tokens</li><li>max: 512 tokens</li></ul> |
766
+ * Samples:
767
+ | query | positive |
768
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
769
+ | <code>Create a new ParticipantInstance<br><br> :param unicode attributes: An optional string metadata field you can use to store any data you wish.<br> :param unicode twilio_address: The address of the Twilio phone number that the participant is in contact with.<br> :param datetime date_created: The date that this resource was created.<br> :param datetime date_updated: The date that this resource was last updated.<br> :param unicode identity: A unique string identifier for the session participant as Chat User.<br> :param unicode user_address: The address of the participant's device.<br><br> :returns: Newly created ParticipantInstance<br> :rtype: twilio.rest.messaging.v1.session.participant.ParticipantInstance</code> | <code>def create(self, attributes=values.unset, twilio_address=values.unset,<br> date_created=values.unset, date_updated=values.unset,<br> identity=values.unset, user_address=values.unset):<br> """<br> Create a new ParticipantInstance<br><br> :param unicode attributes: An optional string metadata field you can use to store any data you wish.<br> :param unicode twilio_address: The address of the Twilio phone number that the participant is in contact with.<br> :param datetime date_created: The date that this resource was created.<br> :param datetime date_updated: The date that this resource was last updated.<br> :param unicode identity: A unique string identifier for the session participant as Chat User.<br> :param unicode user_address: The address of the participant's device.<br><br> :returns: Newly created ParticipantInstance<br> :rtype: twilio.rest.messaging.v1.session.participant.ParticipantInstance<br> """<br> data = values.o...</code> |
770
+ | <code>It returns absolute url defined by node related to this page</code> | <code>def get_absolute_url(self):<br> """<br> It returns absolute url defined by node related to this page<br> """<br> try:<br> node = Node.objects.select_related().filter(page=self)[0]<br> return node.get_absolute_url()<br> except Exception, e:<br> raise ValueError(u"Error in {0}.{1}: {2}".format(self.__module__, self.__class__.__name__, e))<br> return u""</code> |
771
+ | <code>Return the current scaled font.<br><br> :return:<br> A new :class:`ScaledFont` object,<br> wrapping an existing cairo object.</code> | <code>def get_scaled_font(self):<br> """Return the current scaled font.<br><br> :return:<br> A new :class:`ScaledFont` object,<br> wrapping an existing cairo object.<br><br> """<br> return ScaledFont._from_pointer(<br> cairo.cairo_get_scaled_font(self._pointer), incref=True)</code> |
772
+ * Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
773
+ ```json
774
+ {
775
+ "scale": 20.0,
776
+ "similarity_fct": "cos_sim",
777
+ "mini_batch_size": 64,
778
+ "gather_across_devices": false,
779
+ "directions": [
780
+ "query_to_doc"
781
+ ],
782
+ "partition_mode": "joint",
783
+ "hardness_mode": null,
784
+ "hardness_strength": 0.0
785
+ }
786
+ ```
787
+
788
+ ### Training Hyperparameters
789
+ #### Non-Default Hyperparameters
790
+
791
+ - `per_device_train_batch_size`: 8192
792
+ - `num_train_epochs`: 1
793
+ - `learning_rate`: 2e-06
794
+ - `warmup_steps`: 0.1
795
+ - `bf16`: True
796
+ - `eval_strategy`: epoch
797
+ - `per_device_eval_batch_size`: 8192
798
+ - `push_to_hub`: True
799
+ - `hub_model_id`: modernbert-codesearchnet
800
+ - `load_best_model_at_end`: True
801
+ - `dataloader_num_workers`: 4
802
+ - `batch_sampler`: no_duplicates
803
+
804
+ #### All Hyperparameters
805
+ <details><summary>Click to expand</summary>
806
+
807
+ - `per_device_train_batch_size`: 8192
808
+ - `num_train_epochs`: 1
809
+ - `max_steps`: -1
810
+ - `learning_rate`: 2e-06
811
+ - `lr_scheduler_type`: linear
812
+ - `lr_scheduler_kwargs`: None
813
+ - `warmup_steps`: 0.1
814
+ - `optim`: adamw_torch_fused
815
+ - `optim_args`: None
816
+ - `weight_decay`: 0.0
817
+ - `adam_beta1`: 0.9
818
+ - `adam_beta2`: 0.999
819
+ - `adam_epsilon`: 1e-08
820
+ - `optim_target_modules`: None
821
+ - `gradient_accumulation_steps`: 1
822
+ - `average_tokens_across_devices`: True
823
+ - `max_grad_norm`: 1.0
824
+ - `label_smoothing_factor`: 0.0
825
+ - `bf16`: True
826
+ - `fp16`: False
827
+ - `bf16_full_eval`: False
828
+ - `fp16_full_eval`: False
829
+ - `tf32`: None
830
+ - `gradient_checkpointing`: False
831
+ - `gradient_checkpointing_kwargs`: None
832
+ - `torch_compile`: False
833
+ - `torch_compile_backend`: None
834
+ - `torch_compile_mode`: None
835
+ - `use_liger_kernel`: False
836
+ - `liger_kernel_config`: None
837
+ - `use_cache`: False
838
+ - `neftune_noise_alpha`: None
839
+ - `torch_empty_cache_steps`: None
840
+ - `auto_find_batch_size`: False
841
+ - `log_on_each_node`: True
842
+ - `logging_nan_inf_filter`: True
843
+ - `include_num_input_tokens_seen`: no
844
+ - `log_level`: passive
845
+ - `log_level_replica`: warning
846
+ - `disable_tqdm`: False
847
+ - `project`: huggingface
848
+ - `trackio_space_id`: trackio
849
+ - `eval_strategy`: epoch
850
+ - `per_device_eval_batch_size`: 8192
851
+ - `prediction_loss_only`: True
852
+ - `eval_on_start`: False
853
+ - `eval_do_concat_batches`: True
854
+ - `eval_use_gather_object`: False
855
+ - `eval_accumulation_steps`: None
856
+ - `include_for_metrics`: []
857
+ - `batch_eval_metrics`: False
858
+ - `save_only_model`: False
859
+ - `save_on_each_node`: False
860
+ - `enable_jit_checkpoint`: False
861
+ - `push_to_hub`: True
862
+ - `hub_private_repo`: None
863
+ - `hub_model_id`: modernbert-codesearchnet
864
+ - `hub_strategy`: every_save
865
+ - `hub_always_push`: False
866
+ - `hub_revision`: None
867
+ - `load_best_model_at_end`: True
868
+ - `ignore_data_skip`: False
869
+ - `restore_callback_states_from_checkpoint`: False
870
+ - `full_determinism`: False
871
+ - `seed`: 42
872
+ - `data_seed`: None
873
+ - `use_cpu`: False
874
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
875
+ - `parallelism_config`: None
876
+ - `dataloader_drop_last`: False
877
+ - `dataloader_num_workers`: 4
878
+ - `dataloader_pin_memory`: True
879
+ - `dataloader_persistent_workers`: False
880
+ - `dataloader_prefetch_factor`: None
881
+ - `remove_unused_columns`: True
882
+ - `label_names`: None
883
+ - `train_sampling_strategy`: random
884
+ - `length_column_name`: length
885
+ - `ddp_find_unused_parameters`: None
886
+ - `ddp_bucket_cap_mb`: None
887
+ - `ddp_broadcast_buffers`: False
888
+ - `ddp_backend`: None
889
+ - `ddp_timeout`: 1800
890
+ - `fsdp`: []
891
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
892
+ - `deepspeed`: None
893
+ - `debug`: []
894
+ - `skip_memory_metrics`: True
895
+ - `do_predict`: False
896
+ - `resume_from_checkpoint`: None
897
+ - `warmup_ratio`: None
898
+ - `local_rank`: -1
899
+ - `prompts`: None
900
+ - `batch_sampler`: no_duplicates
901
+ - `multi_dataset_batch_sampler`: proportional
902
+ - `router_mapping`: {}
903
+ - `learning_rate_mapping`: {}
904
+
905
+ </details>
906
+
907
+ ### Training Logs
908
+ | Epoch | Step | Training Loss | Validation Loss | eval_cosine_ndcg@10 |
909
+ |:-------:|:------:|:-------------:|:---------------:|:-------------------:|
910
+ | 0.2174 | 10 | 0.9210 | - | - |
911
+ | 0.4348 | 20 | 0.6679 | - | - |
912
+ | 0.6522 | 30 | 0.5007 | - | - |
913
+ | 0.8696 | 40 | 0.4181 | - | - |
914
+ | **1.0** | **46** | **-** | **0.0328** | **0.9652** |
915
+
916
+ * The bold row denotes the saved checkpoint.
917
+
918
+ ### Framework Versions
919
+ - Python: 3.12.12
920
+ - Sentence Transformers: 5.3.0
921
+ - Transformers: 5.3.0
922
+ - PyTorch: 2.10.0+cu128
923
+ - Accelerate: 1.13.0
924
+ - Datasets: 4.8.2
925
+ - Tokenizers: 0.22.2
926
+
927
+ ## Citation
928
+
929
+ ### BibTeX
930
+
931
+ #### Sentence Transformers
932
+ ```bibtex
933
+ @inproceedings{reimers-2019-sentence-bert,
934
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
935
+ author = "Reimers, Nils and Gurevych, Iryna",
936
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
937
+ month = "11",
938
+ year = "2019",
939
+ publisher = "Association for Computational Linguistics",
940
+ url = "https://arxiv.org/abs/1908.10084",
941
+ }
942
+ ```
943
+
944
+ #### CachedMultipleNegativesRankingLoss
945
+ ```bibtex
946
+ @misc{gao2021scaling,
947
+ title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
948
+ author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
949
+ year={2021},
950
+ eprint={2101.06983},
951
+ archivePrefix={arXiv},
952
+ primaryClass={cs.LG}
953
+ }
954
+ ```
955
+
956
+ <!--
957
+ ## Glossary
958
+
959
+ *Clearly define terms in order to be accessible across audiences.*
960
+ -->
961
+
962
+ <!--
963
+ ## Model Card Authors
964
+
965
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
966
+ -->
967
+
968
+ <!--
969
+ ## Model Card Contact
970
+
971
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
972
+ -->
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.3.0",
5
+ "transformers": "5.3.0",
6
+ "pytorch": "2.10.0+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }