Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use aaa961/finetuned-bge-base-en with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("aaa961/finetuned-bge-base-en")
sentences = [
"There's a gap in issue hovers where the hover disappears Verifying: https://github.com/microsoft/vscode/issues/101495\r\n\r\n```\r\nVersion: 1.47.0-insider (user setup)\r\nCommit: 04545fa88043fd10d1f3edefd26be1b8245b516f\r\nDate: 2020-07-02T05:48:37.715Z\r\nElectron: 7.3.2\r\nChrome: 78.0.3904.130\r\nNode.js: 12.8.1\r\nV8: 7.8.279.23-electron.0\r\nOS: Windows_NT x64 10.0.18363\r\n```\r\n\r\nIf I move the mouse quickly I can get onto the issue hover, but it seems it can sometimes disappear if I move it more slowly.\r\n\r\n",
"Error Maximum call stack size exceeded <!-- ⚠️⚠️ Do Not Delete This! bug_report_template ⚠️⚠️ -->\r\n<!-- Please read our Rules of Conduct: https://opensource.microsoft.com/codeofconduct/ -->\r\n<!-- 🕮 Read our guide about submitting issues: https://github.com/microsoft/vscode/wiki/Submitting-Bugs-and-Suggestions -->\r\n<!-- 🔎 Search existing issues to avoid creating duplicates. -->\r\n<!-- 🧪 Test using the latest Insiders build to see if your issue has already been fixed: https://code.visualstudio.com/insiders/ -->\r\n<!-- 💡 Instead of creating your report here, use 'Report Issue' from the 'Help' menu in VS Code to pre-fill useful information. -->\r\n<!-- 🔧 Launch with `code --disable-extensions` to check. -->\r\nDoes this issue occur when all extensions are disabled?: Yes/No\r\n\r\n<!-- 🪓 If you answered No above, use 'Help: Start Extension Bisect' from Command Palette to try to identify the cause. -->\r\n<!-- 📣 Issues caused by an extension need to be reported directly to the extension publisher. The 'Help > Report Issue' dialog can assist with this. -->\r\n- VS Code Version: 1.60.0-insider (system setup)\r\n- OS Version: Windows_NT x64 10.0.19043\r\n\r\nSteps to Reproduce:\r\n\r\n1. Open file vector from STL C++.\r\n2. Push the button automatic detection\r\n\r\nExpected Behavior:\r\nVsCode will expose the C ++ language.\r\n\r\nActual Behavior: \r\nError Maximum call stack size exceeded\r\n\r\n\r\n",
"Pasting (or sending text) in terminal can scramble the input \r\nIssue Type: <b>Bug</b>\r\n\r\nText copied to an integrated terminal tab configured to use Cygwin bash is sometimes scrambled. I have observed this both when launching a debug task that copies a command line to a shell and manually pasting from the clipboard.\r\n\r\nI can reproduce this problem as follows.\r\n\r\nCopy the 60 character string \"echo 56789b123456789c123456789d123456789e123456789f123456789\" then paste repeatedly into a terminal tab running Cygwin bash.\r\n\r\nFirst attempt:\r\n```\r\n192.168.3.220:~> echo 56789b123456789c123456789d123456789e123456789f123456789\r\n56789b123456789c123456789d123456789e123456789f123456789\r\n```\r\nThat worked. Second attempt:\r\n```\r\n192.168.3.220:~> f123456789echo 56789b123456789c123456789d123456789e123456789\r\n```\r\n\r\nThat failed. The failure frequency I experience is about 1 in 10.\r\n\r\nNotice that the pasted text in the second case was reordered, with the first 50 characters rotated to the end. That 50 character granularity is apparent in every failure I have examined, including the longer strings generated by debug task launches. Checking the source, I see `MAX_WRITE_CHECK_SIZE = 50` in terminalProcess.ts in code addressing a similar issue, likely related to my 50 character observation.\r\n\r\nIn case it matters, the version of bash I'm running is \"4.4.12(3)-release\" and cygwin.dll version is \"3.2.0(0.340/5/3)\".\r\n\r\n\r\nVS Code version: Code 1.57.1 (507ce72a4466fbb27b715c3722558bb15afa9f48, 2021-06-17T13:28:07.755Z)\r\nOS version: Windows_NT x64 10.0.19042\r\nRestricted Mode: No\r\n\r\n<details>\r\n<summary>System Info</summary>\r\n\r\n|Item|Value|\r\n|---|---|\r\n|CPUs|Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (8 x 3408)|\r\n|GPU Status|2d_canvas: enabled<br>gpu_compositing: enabled<br>multiple_raster_threads: enabled_on<br>oop_rasterization: enabled<br>opengl: enabled_on<br>rasterization: enabled<br>skia_renderer: enabled_on<br>video_decode: enabled<br>vulkan: disabled_off<br>webgl: enabled<br>webgl2: enabled|\r\n|Load (avg)|undefined|\r\n|Memory (System)|31.90GB (26.28GB free)|\r\n|Process Argv|--disable-extensions|\r\n|Screen Reader|no|\r\n|VM|0%|\r\n</details>Extensions disabled\r\n<!-- generated by issue reporter -->",
"SCM - switching branch from the terminal causes focus loss Copied from https://github.com/microsoft/vscode/issues/35307#issuecomment-2071044810\r\n\r\n> works pretty well so far for me. I notice that if I switch branches from the terminal, it loses focus after the editor state is restored. It would be nice if the focus stayed in the terminal. Otherwise, I like it!"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from BAAI/bge-base-en. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/finetuned-bge-base-en")
# Run inference
sentences = [
'Occurence highlighting highlights wrong part of the code <!-- Please search existing issues to avoid creating duplicates. -->\r\n\r\n## Environment data\r\n\r\n- VS Code version: 1.58.0-insider 062e6519f8973fede2ca736e80682bd19007460a \r\n- Jupyter Extension version (available under the Extensions sidebar): v2021.8.1000539794\r\n- Python Extension version (available under the Extensions sidebar): v2021.6.944021595\r\n- OS (Windows | Mac | Linux distro) and version: Ubuntu 18.04\r\n- Python and/or Anaconda version: 3.9.2 Anaconda\r\n- Type of virtual environment used (N/A | venv | virtualenv | conda | ...): conda\r\n- Jupyter server running: Remote \r\n\r\nIt seems that issues https://github.com/microsoft/vscode/issues/120148 and https://github.com/microsoft/vscode-jupyter/issues/5451 have been closed but the problem still exists in the last versions. I have not seen any similar issues on the repo',
'File explorer is expanding all root folders in a MR workspace Steps to Reproduce:\r\n\r\n1. Create a MR workspace file with more than one folder\r\n2. Open the MR workspace\r\n\r\n🐛 All top level folders are expanded. This is very slow if there are lot of root folders and also if the MR workspace is in remote\r\n',
'Quick input reset scroll position * use latest from master\r\n* f1 > insert snippet\r\n* scroll down to an extension snippet and hide it (press 👁️ icon)\r\n* :bug: the scroll position resets\r\n\r\nThis is happening when reassigning the items (since the press changed the label) here: https://github.com/microsoft/vscode/blob/92314d61a55f466c125fa9d1f9fe8da633a82423/src/vs/workbench/contrib/snippets/browser/insertSnippet.ts#L213',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5572, 0.5031],
# [0.5572, 1.0000, 0.5477],
# [0.5031, 0.5477, 1.0000]])
bge-base-en-trainTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.9479 |
bge-base-en-trainTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.9933 |
sentence and label| sentence | label | |
|---|---|---|
| type | string | float |
| details |
|
|
BatchSemiHardTripletLosssentence and label| sentence | label | |
|---|---|---|
| type | string | float |
| details |
|
|
Base model
BAAI/bge-base-en