---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:42459
- loss:TripletLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: policy for how can i verify if a tekton task version is still supported
by checking for the build.appstudio.redhat.com/expires-on annotation?
sentences:
- 'Helper: lib.to_array
Signature: to_array(s)
Description: '
- 'Helper: lib.pipelinerun_attestations
Signature: pipelinerun_attestations
Description: '
- 'Helper: lib.k8s.name
Signature: name(resource)
Description: '
- source_sentence: how to check attestation is missing statement field.
sentences:
- 'Helper: lib.k8s.name
Signature: name(resource)
Description: '
- 'Helper: lib.tekton.untrusted_task_refs
Signature: untrusted_task_refs(tasks)
Description: '
- 'Helper: lib.k8s.version
Signature: version(resource)
Description: '
- source_sentence: I need to ensure the operators.openshift.io/valid-subscription
annotation in the ClusterServiceVersion manifest contains a valid JSON encoded
non-empty array of strings.
sentences:
- 'Helper: lib.to_array
Signature: to_array(s)
Description: '
- 'Helper: lib.image.equal_ref
Signature: equal_ref(ref1, ref2)
Description: '
- 'Helper: lib.result_helper
Signature: result_helper(chain, failure_sprintf_params)
Description: '
- source_sentence: write a rule to deny approval for an container image with non-unique
RPM names
sentences:
- 'Helper: lib.result_helper
Signature: result_helper(chain, failure_sprintf_params)
Description: '
- 'Helper: lib.to_set
Signature: to_set(arr)
Description: '
- 'Helper: lib.rule_data_defaults
Signature: rule_data_defaults
Description: '
- source_sentence: check if i need to validate that spdx package is an operating system
component.
sentences:
- 'Helper: lib.to_set
Signature: to_set(arr)
Description: '
- 'Helper: lib.rule_data_defaults
Signature: rule_data_defaults
Description: '
- 'Helper: lib.result_helper
Signature: result_helper(chain, failure_sprintf_params)
Description: '
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
model-index:
- name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
results:
- task:
type: triplet
name: Triplet
dataset:
name: retrieval eval
type: retrieval-eval
metrics:
- type: cosine_accuracy
value: 0.9834675788879395
name: Cosine Accuracy
---
# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'check if i need to validate that spdx package is an operating system component.',
'Helper: lib.result_helper\nSignature: result_helper(chain, failure_sprintf_params)\nDescription: ',
'Helper: lib.to_set\nSignature: to_set(arr)\nDescription: ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.4979, -0.4443],
# [ 0.4979, 1.0000, -0.4918],
# [-0.4443, -0.4918, 1.0000]])
```
## Evaluation
### Metrics
#### Triplet
* Dataset: `retrieval-eval`
* Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| **cosine_accuracy** | **0.9835** |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 42,459 training samples
* Columns: sentence_0, sentence_1, and sentence_2
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 | sentence_2 |
|:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string | string |
| details |
I need to ensure that only images from specific registries are used in our policy | Helper: lib.image.str
Signature: str(d)
Description: | Helper: lib.konflux.is_validating_image_index
Signature: is_validating_image_index
Description: |
| check if check warn | Helper: lib.tekton.expiry_of
Signature: expiry_of(task)
Description: | Helper: lib.tekton.untagged_task_references
Signature: untagged_task_references(tasks)
Description: |
| verify that task has an expiry date set. | Helper: lib.tekton.task_param
Signature: task_param(task, name)
Description: | Helper: lib.tekton.untagged_task_references
Signature: untagged_task_references(tasks)
Description: |
* Loss: [TripletLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
```json
{
"distance_metric": "TripletDistanceMetric.COSINE",
"triplet_margin": 0.5
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 128
- `per_device_eval_batch_size`: 128
- `num_train_epochs`: 5
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters