---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:9020
- loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: python multiprocessing show cpu count
sentences:
- "def unique(seq):\n \"\"\"Return the unique elements of a collection even if\
\ those elements are\n unhashable and unsortable, like dicts and sets\"\"\
\"\n cleaned = []\n for each in seq:\n if each not in cleaned:\n\
\ cleaned.append(each)\n return cleaned"
- "def is_in(self, point_x, point_y):\n \"\"\" Test if a point is within\
\ this polygonal region \"\"\"\n\n point_array = array(((point_x, point_y),))\n\
\ vertices = array(self.points)\n winding = self.inside_rule ==\
\ \"winding\"\n result = points_in_polygon(point_array, vertices, winding)\n\
\ return result[0]"
- "def machine_info():\n \"\"\"Retrieve core and memory information for the current\
\ machine.\n \"\"\"\n import psutil\n BYTES_IN_GIG = 1073741824.0\n \
\ free_bytes = psutil.virtual_memory().total\n return [{\"memory\": float(\"\
%.1f\" % (free_bytes / BYTES_IN_GIG)), \"cores\": multiprocessing.cpu_count(),\n\
\ \"name\": socket.gethostname()}]"
- source_sentence: python subplot set the whole title
sentences:
- "def set_title(self, title, **kwargs):\n \"\"\"Sets the title on the underlying\
\ matplotlib AxesSubplot.\"\"\"\n ax = self.get_axes()\n ax.set_title(title,\
\ **kwargs)"
- "def moving_average(array, n=3):\n \"\"\"\n Calculates the moving average\
\ of an array.\n\n Parameters\n ----------\n array : array\n The\
\ array to have the moving average taken of\n n : int\n The number of\
\ points of moving average to take\n \n Returns\n -------\n MovingAverageArray\
\ : array\n The n-point moving average of the input array\n \"\"\"\n\
\ ret = _np.cumsum(array, dtype=float)\n ret[n:] = ret[n:] - ret[:-n]\n\
\ return ret[n - 1:] / n"
- "def to_query_parameters(parameters):\n \"\"\"Converts DB-API parameter values\
\ into query parameters.\n\n :type parameters: Mapping[str, Any] or Sequence[Any]\n\
\ :param parameters: A dictionary or sequence of query parameter values.\n\n\
\ :rtype: List[google.cloud.bigquery.query._AbstractQueryParameter]\n :returns:\
\ A list of query parameters.\n \"\"\"\n if parameters is None:\n \
\ return []\n\n if isinstance(parameters, collections_abc.Mapping):\n \
\ return to_query_parameters_dict(parameters)\n\n return to_query_parameters_list(parameters)"
- source_sentence: python merge two set to dict
sentences:
- "def make_regex(separator):\n \"\"\"Utility function to create regexp for matching\
\ escaped separators\n in strings.\n\n \"\"\"\n return re.compile(r'(?:'\
\ + re.escape(separator) + r')?((?:[^' +\n re.escape(separator)\
\ + r'\\\\]|\\\\.)+)')"
- "def csvtolist(inputstr):\n \"\"\" converts a csv string into a list \"\"\"\
\n reader = csv.reader([inputstr], skipinitialspace=True)\n output = []\n\
\ for r in reader:\n output += r\n return output"
- "def dict_merge(set1, set2):\n \"\"\"Joins two dictionaries.\"\"\"\n return\
\ dict(list(set1.items()) + list(set2.items()))"
- source_sentence: python string % substitution float
sentences:
- "def _configure_logger():\n \"\"\"Configure the logging module.\"\"\"\n \
\ if not app.debug:\n _configure_logger_for_production(logging.getLogger())\n\
\ elif not app.testing:\n _configure_logger_for_debugging(logging.getLogger())"
- "def __set__(self, instance, value):\n \"\"\" Set a related object for\
\ an instance. \"\"\"\n\n self.map[id(instance)] = (weakref.ref(instance),\
\ value)"
- "def format_float(value): # not used\n \"\"\"Modified form of the 'g' format\
\ specifier.\n \"\"\"\n string = \"{:g}\".format(value).replace(\"e+\",\
\ \"e\")\n string = re.sub(\"e(-?)0*(\\d+)\", r\"e\\1\\2\", string)\n return\
\ string"
- source_sentence: bottom 5 rows in python
sentences:
- "def refresh(self, document):\n\t\t\"\"\" Load a new copy of a document from the\
\ database. does not\n\t\t\treplace the old one \"\"\"\n\t\ttry:\n\t\t\told_cache_size\
\ = self.cache_size\n\t\t\tself.cache_size = 0\n\t\t\tobj = self.query(type(document)).filter_by(mongo_id=document.mongo_id).one()\n\
\t\tfinally:\n\t\t\tself.cache_size = old_cache_size\n\t\tself.cache_write(obj)\n\
\t\treturn obj"
- "def table_top_abs(self):\n \"\"\"Returns the absolute position of table\
\ top\"\"\"\n table_height = np.array([0, 0, self.table_full_size[2]])\n\
\ return string_to_array(self.floor.get(\"pos\")) + table_height"
- "def get_dimension_array(array):\n \"\"\"\n Get dimension of an array getting\
\ the number of rows and the max num of\n columns.\n \"\"\"\n if all(isinstance(el,\
\ list) for el in array):\n result = [len(array), len(max([x for x in array],\
\ key=len,))]\n\n # elif array and isinstance(array, list):\n else:\n \
\ result = [len(array), 1]\n\n return result"
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Devy1/MiniLM-cosqa-128")
# Run inference
sentences = [
'bottom 5 rows in python',
'def table_top_abs(self):\n """Returns the absolute position of table top"""\n table_height = np.array([0, 0, self.table_full_size[2]])\n return string_to_array(self.floor.get("pos")) + table_height',
'def refresh(self, document):\n\t\t""" Load a new copy of a document from the database. does not\n\t\t\treplace the old one """\n\t\ttry:\n\t\t\told_cache_size = self.cache_size\n\t\t\tself.cache_size = 0\n\t\t\tobj = self.query(type(document)).filter_by(mongo_id=document.mongo_id).one()\n\t\tfinally:\n\t\t\tself.cache_size = old_cache_size\n\t\tself.cache_write(obj)\n\t\treturn obj',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.4828, -0.0626],
# [ 0.4828, 1.0000, -0.0528],
# [-0.0626, -0.0528, 1.0000]])
```
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 9,020 training samples
* Columns: anchor and positive
* Approximate statistics based on the first 1000 samples:
| | anchor | positive |
|:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string |
| details |
1d array in char datatype in python | def _convert_to_array(array_like, dtype):
"""
Convert Matrix attributes which are array-like or buffer to array.
"""
if isinstance(array_like, bytes):
return np.frombuffer(array_like, dtype=dtype)
return np.asarray(array_like, dtype=dtype) |
| python condition non none | def _not(condition=None, **kwargs):
"""
Return the opposite of input condition.
:param condition: condition to process.
:result: not condition.
:rtype: bool
"""
result = True
if condition is not None:
result = not run(condition, **kwargs)
return result |
| accessing a column from a matrix in python | def get_column(self, X, column):
"""Return a column of the given matrix.
Args:
X: `numpy.ndarray` or `pandas.DataFrame`.
column: `int` or `str`.
Returns:
np.ndarray: Selected column.
"""
if isinstance(X, pd.DataFrame):
return X[column].values
return X[:, column] |
* Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 128
- `fp16`: True
#### All Hyperparameters