Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
10
This is a sentence-transformers model finetuned from BAAI/bge-base-en. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/finetuned-bge-base-en")
# Run inference
sentences = [
'Occurence highlighting highlights wrong part of the code <!-- Please search existing issues to avoid creating duplicates. -->\r\n\r\n## Environment data\r\n\r\n- VS Code version: 1.58.0-insider 062e6519f8973fede2ca736e80682bd19007460a \r\n- Jupyter Extension version (available under the Extensions sidebar): v2021.8.1000539794\r\n- Python Extension version (available under the Extensions sidebar): v2021.6.944021595\r\n- OS (Windows | Mac | Linux distro) and version: Ubuntu 18.04\r\n- Python and/or Anaconda version: 3.9.2 Anaconda\r\n- Type of virtual environment used (N/A | venv | virtualenv | conda | ...): conda\r\n- Jupyter server running: Remote \r\n\r\nIt seems that issues https://github.com/microsoft/vscode/issues/120148 and https://github.com/microsoft/vscode-jupyter/issues/5451 have been closed but the problem still exists in the last versions. I have not seen any similar issues on the repo',
'File explorer is expanding all root folders in a MR workspace Steps to Reproduce:\r\n\r\n1. Create a MR workspace file with more than one folder\r\n2. Open the MR workspace\r\n\r\n🐛 All top level folders are expanded. This is very slow if there are lot of root folders and also if the MR workspace is in remote\r\n',
'Quick input reset scroll position * use latest from master\r\n* f1 > insert snippet\r\n* scroll down to an extension snippet and hide it (press 👁️ icon)\r\n* :bug: the scroll position resets\r\n\r\nThis is happening when reassigning the items (since the press changed the label) here: https://github.com/microsoft/vscode/blob/92314d61a55f466c125fa9d1f9fe8da633a82423/src/vs/workbench/contrib/snippets/browser/insertSnippet.ts#L213',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5572, 0.5031],
# [0.5572, 1.0000, 0.5477],
# [0.5031, 0.5477, 1.0000]])
bge-base-en-trainTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.9479 |
bge-base-en-trainTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.9933 |
sentence and label| sentence | label | |
|---|---|---|
| type | string | float |
| details |
|
|
BatchSemiHardTripletLosssentence and label| sentence | label | |
|---|---|---|
| type | string | float |
| details |
|
|
| sentence | label |
|---|---|
VS Code does not delete old extension versions even after restart |
|
Does this issue occur when all extensions are disabled?: Yes |
|
Base model
BAAI/bge-base-en