Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
10
This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/finetuned-bge-m3-base-en")
# Run inference
sentences = [
'Shell integration: bash and zsh don\'t serialize \\n and ; characters Part of https://github.com/microsoft/vscode/issues/155639\r\n\r\nRepro:\r\n\r\n1. Open a bash or zsh session\r\n2. Run:\r\n ```sh\r\n echo "a\r\n … b"\r\n ```\r\n \r\n3. ctrl+alt+r to run recent command, select the last command, 🐛 it\'s run without the new line\r\n \r\n',
'TreeView state out of sync Testing #117304\r\n\r\nRepro: Not Sure\r\n\r\nTest state shows passed in file but still running in tree view.\r\n\r\n\r\n',
'Setting icon and color in createTerminal API no longer works correctly See https://github.com/fabiospampinato/vscode-terminals/issues/77\r\n\r\nLooks like the default tab color/icon change probably regressed this.\r\n\r\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4264, 0.4315],
# [0.4264, 1.0000, 0.4278],
# [0.4315, 0.4278, 1.0000]])
bge-base-en-trainTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 1.0 |
bge-base-en-trainTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.9524 |
texts and label| texts | label | |
|---|---|---|
| type | string | int |
| details |
|
|
| texts | label |
|---|---|
Branch list is sometimes out of order |
|
Type: Bug |
|
1. Open a workspace |
|
2. Quickly open the branch picker and type main |
|
Bug |
|
The first time you do this, sometimes you end up with an unordered list: |
|
The correct order shows up when you keep start typing or try doing this again: |
|
VS Code version: Code - Insiders 1.91.0-insider (Universal) (0354163c1c66b950b0762364f5b4cd37937b624a, 2024-06-26T10:12:33.304Z) |
|
OS version: Darwin arm64 23.5.0 |
|
Modes: |
|
|Item|Value| |
|
|---|---| |
|
|CPUs|Apple M2 Max (12 x 2400)| |
|
|GPU Status|2d_canvas: unavailable_software canvas_oop_rasterization: disabled_off direct_rendering_display_compositor: disabled_off_ok gpu_compositing: disabled_software multiple_raster_threads: enabled_on ope... |
218 |
Git Branch Picker Race Condition If I paste the branch too quickly and then press enter, it does not switch to it, but creates a new branch. |
|
This breaks muscle memory, as it works when you do it slowly. |
|
Once loading completes, it should select the branch again. |
218 |
links aren't discoverable to screen reader users in markdown documents They're only discoverable via visual distinction and the action that can be taken (IE opening them) is only indicated in the tooltip AFAICT. |
|
https://github.com/microsoft/vscode/assets/29464607/09d28b81-c2cc-4477-b1fc-7b1de1baae74 |
|
177 |
BatchSemiHardTripletLosstexts and label| texts | label | |
|---|---|---|
| type | string | int |
| details |
|
|
Base model
BAAI/bge-m3