SentenceTransformer based on BAAI/bge-base-en

This is a sentence-transformers model finetuned from BAAI/bge-base-en. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-base-en
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/finetuned-bge-base-en-firefox-bugs-bugs")
# Run inference
sentences = [
    "Some bookmark icons disappear over time User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0\n\nSteps to reproduce:\n\nI have two bookmarks in my Bookmarks Menu for which sometimes Firefox resets the icon to the default (globe) icon from time to time for no reason. I don't visit them or do anything to those boookmarks. It's these two sites that have this problem, all other bookmarks' icons stay in tact. \n\nTo fix it, I have to click on the respective bookmark and wait for the page load - after that the proper site icon is shown again.",
    'After closing or opening a tab without leaving the tab strip, there is no delay in displaying the tab preview **Found in**\n* 126.0a1 (2024-04-04)\n\n\n\n**Affected versions**\n*  126.0a1 (2024-04-04)\n\n\n\n**Tested platforms**\n* Affected platforms: Windows 10x64, Ubuntu 23, macOS 12\n* Unaffected platforms: none\n\n**Preconditions**\n* browser.tabs.cardPreview.enabled: true\n\n**Steps to reproduce**\n1. Open some random tabs and hover over them until the tab preview is displayed.\n2. Without leaving the tab strip close a tab located in the middle of the other tabs.\n\n**Expected result** \n* The tab preview is displayed after 500ms.\n\n\n\n**Actual result**\n* The tab preview is displayed instantly. \n\n\n**Regression range**\n*  Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=7b609d9f295fce7ab954f09492fea414b72843e6&tochange=a387d4331dd332c954d2689a4a8b64c2181690b1 \nPossible regressor: Bug 1876522 \n\n**Additional notes**\n* Attached a screen recording.\n* This also happens when opening a new tab.',
    'Adding Opensearch search engine does not work for .onion addresses User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0\n\nSteps to reproduce:\n\n1. Navigate to a search engine website hosted on the TOR network (like http://yra4tke2pwcnatxjkufpw6kvebu3h3ti2jca2lcdpgx3mpwol326lzid.onion/ ).\nThe website must provide an Opensearch descriptor.\n\n2. Click "Page Actions" -> "Add Search Engine".\n\n\nActual results:\n\nFirefox cannot add the search engine despite the website providing a valid Opensearch descriptor. Following message is shown:\n```\nFirefox could not download the search plugin from: http://yra4tke2pwcnatxjkufpw6kvebu3h3ti2jca2lcdpgx3mpwol326lzid.onion/opensearch.xml\n```\n\n\nExpected results:\n\nWhen fetching the Opensearch descriptor Firefox should use the proxy settings provided by the user. In the case of TOR requests to .onion websites will fail when not using the SOCKS5 proxy.\n\nWorkaround: Downloading the index.html and the opensearch.xml descriptor file manually and serving them locally. After navigating to the local address, the search engine can be added normally.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9140, 0.9162],
#         [0.9140, 1.0000, 0.9958],
#         [0.9162, 0.9958, 1.0000]])

Evaluation

Metrics

Triplet

Datasets: bge-base-en-eval and bge-base-en-train
Evaluated with TripletEvaluator

Metric	bge-base-en-eval	bge-base-en-train
cosine_accuracy	0.5159	0.6812

Training Details

Training Dataset

Unnamed Dataset

Size: 5,424 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 23 tokens mean: 201.03 tokens max: 508 tokens	min: 23 tokens mean: 197.28 tokens max: 505 tokens	min: 20 tokens mean: 232.13 tokens max: 510 tokens

Samples:

anchor	positive	negative
The "X" button of the opt-in modal is only read as "button" using a screen reader [Affected Versions]: - Firefox Beta 97.0b2 (Build ID: 20220111185943) - Firefox Nightly 98.a1 (Build ID: 20220111093827) [Affected Platforms]: - Windows 10 x64 - Ubuntu 20.04 x64 - macOS 10.15.7 [Prerequisites]: - Have Firefox Beta 97.0b2 downloaded on your computer. - Have the "browser.search.region" set to "US". - Have one of the treatment user.js on your computer. - Make sure there is no other modal displayed when starting the browser (browser default window, onboarding for new users etc). - Have a screen reader application opened. [Steps to reproduce]: 1. Open Firefox Beta 97.0b2. 2. Navigate to the “about:support” page and paste the user.js file into the Profile folder. 3. Restart the browser. 4. Focus on the "X" button of the modal. 5. Listen to what the screen reader application reads. **[Exp...	normalize.scss breaking some stuff on Firefox View Our sheet aboutwelcome.css is only loaded into firefox view when a feature callout is shown, so it doesn't always apply to about:firefoxview. When it is loaded, it affects the layout of the page and of some elements (see attachment) because of normalize.scss. For example, check out the .last-active-badge - it's inheriting box-sizing: border-box from its parent. (That rule is a bit odd, since box-sizing is not supposed to be an inherited property anyway. Maybe revert would make more sense)	`Typing "gm" in the address-bar does not highlight "gmail.com" since a few days In my usual firefox profile, type "gm" in the address-bar. AR: gmail.com is not highlighted / auto-selected ER: IT should. It used to till a couple of days back. :mak, ni? you as you have looked at similar issue in the past. Thanks.`
`Text and radio buttons are overlapping on PDF Tested with: Nightly 91.0a1 (2021-06-23) Tested on: Win 10 Preconditions: In about:config, set pdfjs.enableXfa = true Steps: 1. Launch Firefox 2. Open the attached pdf. 3. Go to "Taille de l'entreprise" Actual result: Radio buttons and text are overlapping. Expected result: Text and radio buttons should be properly displayed`	The Firefox icon from the task bar is no longer displayed if Firefox is pinned again after it is already pinned [Affected Versions]: - Firefox Nightly 88.0a1 (Build ID: 20210304092248) - Firefox Beta 87.0b7 (Build ID: 20210302185821) - Firefox Release 86.0 (Build ID: 20210222142601) [Affected Platforms]: - Windows 10 Version: 2004 x64 - Windows 10 Version: 2H20 x64 [Prerequisites]: - Have a new Firefox profile created. - Have the user.js saved in the profile folder before starting it. [Steps to reproduce]: 1. Start the Firefox Browser using the “--first-startup” arg. 2. Click the “Pin to Taskbar” button. 3. Click the “Back” browser button. 4. Click the “Pin to Taskbar” button and observe the Windows TaskBar. [Expected results]: - The Firefox Browser is pinned and the Firefox icon is correctly displayed. [Actual results]: - The Firefox Browser is pinned and the Firefox...	'send yourself a download link' link is marked up as a button ## Prerequisites: Found in Nightly 134.0a1 (2024-10-29)(64-bit) ## STR: 1. Open about:welcome from URL bar 2. Click the blue 'Save and Continue' button 3. Click the gray 'Skip this step' button 4. The multiple device screen will appear. Right click the 'send yourself a download link' link and observe the code in Dev Tools ## Expected/Actual: <a role="button" tabindex="0" data-l10n-name="download-label">send yourself a download link.</a> Since this is a link and not a button, role="button" should be removed. This should be marked up as a link with <a href="">
`Editing a PDF results in unexpected content change User Agent: Mozilla/5.0 (Android 14; Mobile; rv:128.0) Gecko/128.0 Firefox/128.0 Steps to reproduce: Open this pdf: https://www.vodafone.de/media/downloads/pdf/2060-sepa-basis-lastschrift-mandat-festnetz.pdf Make a edit in a field of the pdf Leave the field The barcode on the left turns into rubbish Actual results: The barcode on the left turns into rubbish Expected results: Only the edited field should have changed`	PDF editor resets active field when switching windows User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0 Steps to reproduce: I opened a PDF in Firefox, edited a large text box, switched to another Firefox window, and switched back to the PDF window. Actual results: The edit I had made to the large text box disappeared, reverting the text box to its state before I edited it. Expected results: The large text box should have preserved my edit and remained the active text box when I switched back to the PDF window.	PDF Form: Only first line of two line input field gets printed I noticed an odd behavior with a PDF form from my bank. This is the PDF: https://dok.dkb.de/pdf/scheck_bundesbank.pdf (I'll attach the PDF in case the link becomes unavailable.) It contains a two-line input field for the address. I can enter the address without problems and it's shown, but when I printed it only the first line of the address was printed. This can also be seen in the print preview. I'll attach screenshots from both the filled form and the print preview.

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.COSINE",
    "triplet_margin": 5
}

Evaluation Dataset

Unnamed Dataset

Size: 1,162 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 16 tokens mean: 204.97 tokens max: 508 tokens	min: 16 tokens mean: 206.68 tokens max: 506 tokens	min: 18 tokens mean: 203.39 tokens max: 508 tokens

Samples:

anchor	positive	negative
The Not Now button from the Fakespot Onboarding sidebar is missing the Clicked State Found in * Nightly 118.0a1 (2023-08-18) Affected versions * Nightly 118.0a1 (2023-08-18) Affected platforms * ALL Preconditions: Set the browser.shopping.experience2023.enabled - TRUE Set the toolkit.shopping.useOHTTP - TRUE Steps to reproduce 1. Reach about:preferences and turn off feature recommendations. 2. Reach the Amazon https://www.amazon.com/dp/B09B6ZXD2V/ref=sbl_dpx_office-desks_B0B4CYW8FB_0 link 3. Click and Hold the Not Now button from the Onboarding Shopping sidebar. Expected result * The Not now button from the Onboarding Shopping sidebar should change its state when Clicked. Actual result * The Not now button from the Onboarding Shopping sidebar is missing the Clicked state. Regression range Not Applicable	[Experiment] The “Survey” screen from the “about:welcome” page has layout issues on Firefox locales with long strings [Affected versions]: - Firefox Beta 123.0b7 - Build ID: 20240205091725 (Release channel) [Affected Platforms]: - Windows 10x64. - Windows 11 x64. [Prerequisites]: - Have a Firefox locale with longer strings installed and opened (e.g.: de, it, es-ES). - Have the “nimbus.debug” pref from the “about:config” page set to true. [Steps to reproduce]: 1. Navigate to the “about:studies?optin_slug=new-profile-survey-new-vs-existing-vs-returning-all-locales-fx123&optin_branch=treatment-b&optin_collection=nimbus-preview” link to enroll in the Treatment B branch of the “NP Survey: New vs. Existing vs. Returning (Multiple Locales) - Fx123” experiment. 2. Navigate to the “about:welcome” page. 3. Observe the elements from the first screen displayed. [Expected result]: - The “Survey” screen is successfully displayed with no layout issues. *[Actual result]:...	Resume/Retry doesn’t work in case of deleted inprogress downloads Note * Not sure if we want to have a retry available for a deleted download, but in case we don’t then fixing bug 1755728, automatically removes the scenario causing this one. Affected versions * Firefox 98 beta 4 * Nightly 99.0a1 Affected platforms * all Steps to reproduce 1. Download a big file link 2. While downloading, right click and delete. 3. Download is set into a pause state, proceed to delete again or just cancel. 4. Retry download. Expected result 2. Download is deleted, the download panel states that successfully - see enh. File deleted from Downloads Panel contextual menu should have a different UI from the one deleted from disk 3. This case is not hit. .4 Download is restarted. Actual result 2. Download is paused 3. Download is d...
`Missing data from table Tested with: Nightly 91.0a1 (2021-06-22) Tested on: Win 10 Preconditions: In about:config, set pdfjs.enableXfa = true Steps: 1. Launch firefox 2. Open the attached PDF Actual result: No data in table is displayed Expected result: A table with data should be displayed`	`Failure in toolkit/components/extensions/test/browser/browser_ext_themes_autocomplete_popup.js with proton urlbar There's a failure in toolkit/components/extensions/test/browser/browser_ext_themes_autocomplete_popup.js when graduating the proton urlbar`	Insecure connection icon is barely visible on http website login form with dark theme Affected versions * 97.0a1 (20211213093143) * 96.0b4 (20211212185725) * 95.0 (20211129150630) * 91.4.0esr (20211126104708) Affected platforms * macOS 11.6 Preconditions * dark theme enabled Steps to reproduce 1. Open Firefox and http://jsbin.testim.io/soviruvalo/1/edit?html,output. 2. Click on the username/ password form and observe the insecure icon. Expected result * The icon is visible. Actual result * The lock icon is barely visible. Regression range * Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=b8b54a4990d7e778b82909a871e92031a4bc649d&tochange=4b3932f9c4f5d9572da2f0232375474133191500 Potential regressor: bug 1715619. Notes * Screenshot attached.
`Change the return type in case of JEXL evaluation error Now the default return in case of failures (when eval throws) is false, it should be null to tell apart successful evaluation from errors.`	`LightweightThemeConsumer.jsm:_sanitizeCSSColor slows down first-time about:welcome theme enables https://searchfox.org/mozilla-central/source/toolkit/modules/LightweightThemeConsumer.jsm#458 has shown up to be a good candidate for speeding up to support making the theme switches in about:welcome meaningfully faster. Emilio and I chatted about this for a while in Slack, and he has a plan for what looks to be a straightforward fix that should be upliftable to 81 beta.`	If "Open tabs" suggestion is disabled, top sites in address bar doesn't show open top sites From: https://www.reddit.com/r/firefox/comments/g42asi/when_disabling_switch_to_tab_feature_on_address/ Steps to reproduce: 1. Ensure YouTube is in top sites 2. Open YouTube 3. Ensure "Open tabs" suggestion is disabled for address bar 4. open new tab, navigate to any other site 5. Click on address bar What happens: YouTube doesn't appear in the top sites suggestions in the address bar. Expected result: YouTube appears; clicking the suggestion opens it without switching to existing tab.

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.COSINE",
    "triplet_margin": 5
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 32
gradient_accumulation_steps: 8
learning_rate: 2e-05
num_train_epochs: 7
warmup_ratio: 0.1
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 8
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 7
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss	bge-base-en-eval_cosine_accuracy	bge-base-en-train_cosine_accuracy
-1	-1	-	-	0.5073	-
2.3304	100	4.8424	4.9209	-	0.6825
4.6608	200	4.488	4.8791	-	0.6829
6.9912	300	4.4078	4.9009	-	0.6812
-1	-1	-	-	0.5159	-

Framework Versions

Python: 3.10.10
Sentence Transformers: 5.1.0
Transformers: 4.56.0
PyTorch: 2.7.1+cu128
Accelerate: 1.10.1
Datasets: 4.0.0
Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for aaa961/finetuned-bge-base-en-firefox-bugs-bugs

Base model

BAAI/bge-base-en

Finetuned

(40)

this model

Papers for aaa961/finetuned-bge-base-en-firefox-bugs-bugs

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 12

In Defense of the Triplet Loss for Person Re-Identification

Paper • 1703.07737 • Published Mar 22, 2017

Evaluation results

Cosine Accuracy on bge base en eval
self-reported

0.516
Cosine Accuracy on bge base en train
self-reported

0.681