SentenceTransformer based on BAAI/bge-base-en

This is a sentence-transformers model finetuned from BAAI/bge-base-en. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/finetuned-bge-base-en-firefox-bugs-bugs")
# Run inference
sentences = [
    "Some bookmark icons disappear over time User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0\n\nSteps to reproduce:\n\nI have two bookmarks in my Bookmarks Menu for which sometimes Firefox resets the icon to the default (globe) icon from time to time for no reason. I don't visit them or do anything to those boookmarks. It's these two sites that have this problem, all other bookmarks' icons stay in tact. \n\nTo fix it, I have to click on the respective bookmark and wait for the page load - after that the proper site icon is shown again.",
    'After closing or opening a tab without leaving the tab strip, there is no delay in displaying the tab preview **Found in**\n* 126.0a1 (2024-04-04)\n\n\n\n**Affected versions**\n*  126.0a1 (2024-04-04)\n\n\n\n**Tested platforms**\n* Affected platforms: Windows 10x64, Ubuntu 23, macOS 12\n* Unaffected platforms: none\n\n**Preconditions**\n* browser.tabs.cardPreview.enabled: true\n\n**Steps to reproduce**\n1. Open some random tabs and hover over them until the tab preview is displayed.\n2. Without leaving the tab strip close a tab located in the middle of the other tabs.\n\n**Expected result** \n* The tab preview is displayed after 500ms.\n\n\n\n**Actual result**\n* The tab preview is displayed instantly. \n\n\n**Regression range**\n*  Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=7b609d9f295fce7ab954f09492fea414b72843e6&tochange=a387d4331dd332c954d2689a4a8b64c2181690b1 \nPossible regressor: Bug 1876522 \n\n**Additional notes**\n* Attached a screen recording.\n* This also happens when opening a new tab.',
    'Adding Opensearch search engine does not work for .onion addresses User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0\n\nSteps to reproduce:\n\n1. Navigate to a search engine website hosted on the TOR network (like http://yra4tke2pwcnatxjkufpw6kvebu3h3ti2jca2lcdpgx3mpwol326lzid.onion/ ).\nThe website must provide an Opensearch descriptor.\n\n2. Click "Page Actions" -> "Add Search Engine".\n\n\nActual results:\n\nFirefox cannot add the search engine despite the website providing a valid Opensearch descriptor. Following message is shown:\n```\nFirefox could not download the search plugin from: http://yra4tke2pwcnatxjkufpw6kvebu3h3ti2jca2lcdpgx3mpwol326lzid.onion/opensearch.xml\n```\n\n\nExpected results:\n\nWhen fetching the Opensearch descriptor Firefox should use the proxy settings provided by the user. In the case of TOR requests to .onion websites will fail when not using the SOCKS5 proxy.\n\nWorkaround: Downloading the index.html and the opensearch.xml descriptor file manually and serving them locally. After navigating to the local address, the search engine can be added normally.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9140, 0.9162],
#         [0.9140, 1.0000, 0.9958],
#         [0.9162, 0.9958, 1.0000]])

Evaluation

Metrics

Triplet

  • Datasets: bge-base-en-eval and bge-base-en-train
  • Evaluated with TripletEvaluator
Metric bge-base-en-eval bge-base-en-train
cosine_accuracy 0.5159 0.6812

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,424 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 23 tokens
    • mean: 201.03 tokens
    • max: 508 tokens
    • min: 23 tokens
    • mean: 197.28 tokens
    • max: 505 tokens
    • min: 20 tokens
    • mean: 232.13 tokens
    • max: 510 tokens
  • Samples:
    anchor positive negative
    The "X" button of the opt-in modal is only read as "button" using a screen reader [Affected Versions]:
    - Firefox Beta 97.0b2 (Build ID: 20220111185943)
    - Firefox Nightly 98.a1 (Build ID: 20220111093827)

    [Affected Platforms]:
    - Windows 10 x64
    - Ubuntu 20.04 x64
    - macOS 10.15.7

    [Prerequisites]:
    - Have Firefox Beta 97.0b2 downloaded on your computer.
    - Have the "browser.search.region" set to "US".
    - Have one of the treatment user.js on your computer.
    - Make sure there is no other modal displayed when starting the browser (browser default window, onboarding for new users etc).
    - Have a screen reader application opened.

    [Steps to reproduce]:
    1. Open Firefox Beta 97.0b2.
    2. Navigate to the “about:support” page and paste the user.js file into the Profile folder.
    3. Restart the browser.
    4. Focus on the "X" button of the modal.
    5. Listen to what the screen reader application reads.

    **[Exp...
    normalize.scss breaking some stuff on Firefox View Our sheet aboutwelcome.css is only loaded into firefox view when a feature callout is shown, so it doesn't always apply to about:firefoxview. When it is loaded, it affects the layout of the page and of some elements (see attachment) because of normalize.scss. For example, check out the .last-active-badge - it's inheriting box-sizing: border-box from its parent. (That rule is a bit odd, since box-sizing is not supposed to be an inherited property anyway. Maybe revert would make more sense) Typing "gm" in the address-bar does not highlight "gmail.com" since a few days In my usual firefox profile, type "gm" in the address-bar.

    AR: gmail.com is not highlighted / auto-selected
    ER: IT should. It used to till a couple of days back.

    :mak, ni? you as you have looked at similar issue in the past. Thanks.
    Text and radio buttons are overlapping on PDF Tested with:
    Nightly 91.0a1 (2021-06-23)

    Tested on:
    Win 10

    Preconditions:
    In about:config, set pdfjs.enableXfa = true

    Steps:

    1. Launch Firefox
    2. Open the attached pdf.
    3. Go to "Taille de l'entreprise"

    Actual result:
    Radio buttons and text are overlapping.

    Expected result:
    Text and radio buttons should be properly displayed
    The Firefox icon from the task bar is no longer displayed if Firefox is pinned again after it is already pinned [Affected Versions]:
    - Firefox Nightly 88.0a1 (Build ID: 20210304092248)
    - Firefox Beta 87.0b7 (Build ID: 20210302185821)
    - Firefox Release 86.0 (Build ID: 20210222142601)

    [Affected Platforms]:
    - Windows 10 Version: 2004 x64
    - Windows 10 Version: 2H20 x64

    [Prerequisites]:
    - Have a new Firefox profile created.
    - Have the user.js saved in the profile folder before starting it.

    [Steps to reproduce]:

    1. Start the Firefox Browser using the “--first-startup” arg.
    2. Click the “Pin to Taskbar” button.
    3. Click the “Back” browser button.
    4. Click the “Pin to Taskbar” button and observe the Windows TaskBar.

    [Expected results]:
    - The Firefox Browser is pinned and the Firefox icon is correctly displayed.

    [Actual results]:
    - The Firefox Browser is pinned and the Firefox...
    'send yourself a download link' link is marked up as a button ## Prerequisites:

    Found in Nightly 134.0a1 (2024-10-29)(64-bit)

    ## STR:

    1. Open about:welcome from URL bar
    2. Click the blue 'Save and Continue' button
    3. Click the gray 'Skip this step' button
    4. The multiple device screen will appear. Right click the 'send yourself a download link' link and observe the code in Dev Tools

    ## Expected/Actual:

    <a role="button" tabindex="0" data-l10n-name="download-label">send yourself a download link.</a>

    Since this is a link and not a button, role="button" should be removed. This should be marked up as a link with <a href="">
    Editing a PDF results in unexpected content change User Agent: Mozilla/5.0 (Android 14; Mobile; rv:128.0) Gecko/128.0 Firefox/128.0

    Steps to reproduce:

    Open this pdf:

    https://www.vodafone.de/media/downloads/pdf/2060-sepa-basis-lastschrift-mandat-festnetz.pdf

    Make a edit in a field of the pdf
    Leave the field
    The barcode on the left turns into rubbish


    Actual results:

    The barcode on the left turns into rubbish


    Expected results:

    Only the edited field should have changed
    PDF editor resets active field when switching windows User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/116.0

    Steps to reproduce:

    I opened a PDF in Firefox, edited a large text box, switched to another Firefox window, and switched back to the PDF window.


    Actual results:

    The edit I had made to the large text box disappeared, reverting the text box to its state before I edited it.


    Expected results:

    The large text box should have preserved my edit and remained the active text box when I switched back to the PDF window.
    PDF Form: Only first line of two line input field gets printed I noticed an odd behavior with a PDF form from my bank. This is the PDF:
    https://dok.dkb.de/pdf/scheck_bundesbank.pdf
    (I'll attach the PDF in case the link becomes unavailable.)

    It contains a two-line input field for the address. I can enter the address without problems and it's shown, but when I printed it only the first line of the address was printed. This can also be seen in the print preview. I'll attach screenshots from both the filled form and the print preview.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 5
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,162 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 16 tokens
    • mean: 204.97 tokens
    • max: 508 tokens
    • min: 16 tokens
    • mean: 206.68 tokens
    • max: 506 tokens
    • min: 18 tokens
    • mean: 203.39 tokens
    • max: 508 tokens
  • Samples:
    anchor positive negative
    The Not Now button from the Fakespot Onboarding sidebar is missing the Clicked State Found in
    * Nightly 118.0a1 (2023-08-18)

    Affected versions
    * Nightly 118.0a1 (2023-08-18)

    Affected platforms
    * ALL

    Preconditions:
    Set the browser.shopping.experience2023.enabled - TRUE
    Set the toolkit.shopping.useOHTTP - TRUE

    Steps to reproduce
    1. Reach about:preferences and turn off feature recommendations.
    2. Reach the Amazon https://www.amazon.com/dp/B09B6ZXD2V/ref=sbl_dpx_office-desks_B0B4CYW8FB_0 link
    3. Click and Hold the Not Now button from the Onboarding Shopping sidebar.

    Expected result
    * The Not now button from the Onboarding Shopping sidebar should change its state when Clicked.

    Actual result
    * The Not now button from the Onboarding Shopping sidebar is missing the Clicked state.

    Regression range
    Not Applicable
    [Experiment] The “Survey” screen from the “about:welcome” page has layout issues on Firefox locales with long strings [Affected versions]:
    - Firefox Beta 123.0b7 - Build ID: 20240205091725 (Release channel)

    [Affected Platforms]:
    - Windows 10x64.
    - Windows 11 x64.

    [Prerequisites]:
    - Have a Firefox locale with longer strings installed and opened (e.g.: de, it, es-ES).
    - Have the “nimbus.debug” pref from the “about:config” page set to true.

    [Steps to reproduce]:
    1. Navigate to the “about:studies?optin_slug=new-profile-survey-new-vs-existing-vs-returning-all-locales-fx123&optin_branch=treatment-b&optin_collection=nimbus-preview” link to enroll in the Treatment B branch of the “NP Survey: New vs. Existing vs. Returning (Multiple Locales) - Fx123” experiment.
    2. Navigate to the “about:welcome” page.
    3. Observe the elements from the first screen displayed.

    [Expected result]:
    - The “Survey” screen is successfully displayed with no layout issues.

    **[Actual result]:*...
    Resume/Retry doesn’t work in case of deleted inprogress downloads Note
    * Not sure if we want to have a retry available for a deleted download, but in case we don’t then fixing bug 1755728, automatically removes the scenario causing this one.

    Affected versions
    * Firefox 98 beta 4
    * Nightly 99.0a1

    Affected platforms
    * all

    Steps to reproduce
    1. Download a big file link
    2. While downloading, right click and delete.
    3. Download is set into a pause state, proceed to delete again or just cancel.
    4. Retry download.


    Expected result

    2. Download is deleted, the download panel states that successfully - see enh. File deleted from Downloads Panel contextual menu should have a different UI from the one deleted from disk
    3. This case is not hit.
    .4 Download is restarted.

    Actual result
    2. Download is paused
    3. Download is d...
    Missing data from table Tested with:
    Nightly 91.0a1 (2021-06-22)

    Tested on:
    Win 10

    Preconditions:
    In about:config, set pdfjs.enableXfa = true

    Steps:

    1. Launch firefox
    2. Open the attached PDF

    Actual result:
    No data in table is displayed

    Expected result:
    A table with data should be displayed
    Failure in toolkit/components/extensions/test/browser/browser_ext_themes_autocomplete_popup.js with proton urlbar There's a failure in toolkit/components/extensions/test/browser/browser_ext_themes_autocomplete_popup.js when graduating the proton urlbar Insecure connection icon is barely visible on http website login form with dark theme Affected versions
    * 97.0a1 (20211213093143)
    * 96.0b4 (20211212185725)
    * 95.0 (20211129150630)
    * 91.4.0esr (20211126104708)





    Affected platforms

    * macOS 11.6

    Preconditions
    * dark theme enabled

    Steps to reproduce
    1. Open Firefox and http://jsbin.testim.io/soviruvalo/1/edit?html,output.
    2. Click on the username/ password form and observe the insecure icon.







    Expected result
    * The icon is visible.


    Actual result
    * The lock icon is barely visible.

    Regression range
    * Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=b8b54a4990d7e778b82909a871e92031a4bc649d&tochange=4b3932f9c4f5d9572da2f0232375474133191500
    Potential regressor: bug 1715619.














    Notes
    * Screenshot attached.
    Change the return type in case of JEXL evaluation error Now the default return in case of failures (when eval throws) is false, it should be null to tell apart successful evaluation from errors. LightweightThemeConsumer.jsm:_sanitizeCSSColor slows down first-time about:welcome theme enables https://searchfox.org/mozilla-central/source/toolkit/modules/LightweightThemeConsumer.jsm#458 has shown up to be a good candidate for speeding up to support making the theme switches in about:welcome meaningfully faster.

    Emilio and I chatted about this for a while in Slack, and he has a plan for what looks to be a straightforward fix that should be upliftable to 81 beta.
    If "Open tabs" suggestion is disabled, top sites in address bar doesn't show open top sites From: https://www.reddit.com/r/firefox/comments/g42asi/when_disabling_switch_to_tab_feature_on_address/

    Steps to reproduce:

    1. Ensure YouTube is in top sites
    2. Open YouTube
    3. Ensure "Open tabs" suggestion is disabled for address bar
    4. open new tab, navigate to any other site
    5. Click on address bar

    What happens:

    YouTube doesn't appear in the top sites suggestions in the address bar.

    Expected result:

    YouTube appears; clicking the suggestion opens it without switching to existing tab.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 8
  • learning_rate: 2e-05
  • num_train_epochs: 7
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 8
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 7
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss bge-base-en-eval_cosine_accuracy bge-base-en-train_cosine_accuracy
-1 -1 - - 0.5073 -
2.3304 100 4.8424 4.9209 - 0.6825
4.6608 200 4.488 4.8791 - 0.6829
6.9912 300 4.4078 4.9009 - 0.6812
-1 -1 - - 0.5159 -

Framework Versions

  • Python: 3.10.10
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.0
  • PyTorch: 2.7.1+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaa961/finetuned-bge-base-en-firefox-bugs-bugs

Base model

BAAI/bge-base-en
Finetuned
(37)
this model

Papers for aaa961/finetuned-bge-base-en-firefox-bugs-bugs

Evaluation results