SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Finetuned bge-m3 (dense retrieval) using a QA dataset created from corpus file of WRI (250 chunk length, overlap = 40, no titles). Questions were generated using the QGEN method (https://github.com/UKPLab/gpl/tree/main), 2 questions per chunk. Loss function was the MNRL without hard negatives, see training details below.

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("collaborativeearth/bge-m3_wri_notitles")
# Run inference
sentences = [
    'what to do about climate change in the meat industry',
    '1.  Calculate the scope 3 GHG emissions baseline of food purchases, including meat. Establishing a scope 3 GHG emissions baseline for food purchases will allow companies to understand how much of an impact meat has on their food-related carbon footprint and enable them to pinpoint emissions hot spots.\n\n2.  Shift from high-emissions products like beef and lamb toward lower-emissions products like plant-based foods and alternative proteins. This type of shift is a triple win for climate, nature, and animal welfare.\n\n3.  Define priorities around improved meat sourcing by product type. For example, around beef, the goal might be to reduce climate and land impacts—both through sourcing less of it, and through encouraging lower-emissions production methods. For chicken and eggs, the goal might be to improve animal welfare, promote responsible antibiotic use, and minimize water pollution.',
    'We also conducted t-tests to determine the statistical significance of the above findings. For these t-tests, our null hypothesis was that there would be no difference between the conventional and alternative production systems, while the alternative hypothesis was that the alternative production systems would have mostly higher environmental impacts than the conventional systems. We conducted these tests using the paired data points for beef, lamb, dairy, pork, poultry, and eggs, for both GHG emissions and land use. (There were not enough data for water pollution and water use to conduct t-tests.) The GHG emissions results were statistically significant for beef, poultry, and eggs, with a p value <0.05. The land use results were statistically significant for beef, dairy, pork, poultry, and eggs, with a p value <0.05. Overall, the fact that the majority of these results, for GHG emissions and land use, were statistically significant reinforces the findings that alternative production systems generally have higher environmental impacts than conventional systems. There were not enough data for water pollution and water use, so the statistical significance of the water-related results could not be determined.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.3449
cosine_accuracy@3 0.5416
cosine_accuracy@5 0.6198
cosine_accuracy@10 0.7198
cosine_precision@1 0.3449
cosine_precision@3 0.1805
cosine_precision@5 0.124
cosine_precision@10 0.072
cosine_recall@1 0.3449
cosine_recall@3 0.5416
cosine_recall@5 0.6198
cosine_recall@10 0.7198
cosine_ndcg@10 0.5246
cosine_mrr@10 0.463
cosine_map@100 0.4721

Training Details

Training Dataset

Unnamed Dataset

  • Size: 82,191 training samples
  • Columns: question and answer
  • Approximate statistics based on the first 1000 samples:
    question answer
    type string string
    details
    • min: 5 tokens
    • mean: 10.69 tokens
    • max: 36 tokens
    • min: 40 tokens
    • mean: 217.15 tokens
    • max: 334 tokens
  • Samples:
    question answer
    what countries are affected by landscape restoration? The Economic Case for Landscape Restoration in Latin America

    THE ECONOMIC CASE FOR LANDSCAPE RESTORATION IN LATIN AMERICA

    WALTER VERGARA, LUCIANA GALLARDO LOMELI, ANA R. RIOS, PAUL ISBELL, STEVEN PRAGER, RONNIE DE CAMINO

    Land use and land-use change are central to the economic and social fabric of Latin America and the Caribbean, and essential to the region’s prospects for sustainable development. Countries are realizing that now, more than ever, is the time for action. Eleven countries, three Brazilian states and several regional programs have already committed to restoring more than 27 million hectares of degraded land in Latin America—but can these ambitions become a reality while supporting good living standards and economic development?
    how many countries in latin america are trying to restore landscapes The Economic Case for Landscape Restoration in Latin America

    THE ECONOMIC CASE FOR LANDSCAPE RESTORATION IN LATIN AMERICA

    WALTER VERGARA, LUCIANA GALLARDO LOMELI, ANA R. RIOS, PAUL ISBELL, STEVEN PRAGER, RONNIE DE CAMINO

    Land use and land-use change are central to the economic and social fabric of Latin America and the Caribbean, and essential to the region’s prospects for sustainable development. Countries are realizing that now, more than ever, is the time for action. Eleven countries, three Brazilian states and several regional programs have already committed to restoring more than 27 million hectares of degraded land in Latin America—but can these ambitions become a reality while supporting good living standards and economic development?
    what percent of land is deforested Agriculture and forestry exports from Latin America represent about 13 percent of the global trade of food, feed, and fiber and account for a majority of employment outside large urban areas—numbers only expected to grow as Latin America is called upon to meet an increasing global demand for food. Yet, since the turn of the century, about 37 million hectares of natural forests, savannas and wetlands have been transformed to expand agriculture. Cumulative, unsustainable land-use practices have led to the degradation of about 300 million hectares, resulting in a reduction in yields and quality of production, and in losses in biomass content, soil quality, surface water hydrology, and biodiversity. Deforestation, land-use change, and unsustainable agricultural activities are also currently the largest drivers of climate change in the region, accounting for 56 percent of all greenhouse gas emissions. Today, while some progress has been achieved, the rate of deforestation remains high at an...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • learning_rate: 1e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • gradient_checkpointing: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss ir-eval_cosine_ndcg@10
-1 -1 - 0.4718
0.0389 100 0.5021 -
0.0779 200 0.2574 -
0.1168 300 0.2008 -
0.1557 400 0.182 -
0.1946 500 0.1673 0.5134
0.2336 600 0.1488 -
0.2725 700 0.1582 -
0.3114 800 0.1662 -
0.3503 900 0.1642 -
0.3893 1000 0.1522 0.5107
0.4282 1100 0.1448 -
0.4671 1200 0.1525 -
0.5060 1300 0.1354 -
0.5450 1400 0.1437 -
0.5839 1500 0.1403 0.5172
0.6228 1600 0.1355 -
0.6617 1700 0.1459 -
0.7007 1800 0.1498 -
0.7396 1900 0.1221 -
0.7785 2000 0.1311 0.5201
0.8174 2100 0.1263 -
0.8564 2200 0.126 -
0.8953 2300 0.1111 -
0.9342 2400 0.1394 -
0.9731 2500 0.1188 0.5228
1.0121 2600 0.1267 -
1.0510 2700 0.0999 -
1.0899 2800 0.0911 -
1.1288 2900 0.0803 -
1.1678 3000 0.095 0.5255
1.2067 3100 0.0933 -
1.2456 3200 0.0909 -
1.2845 3300 0.093 -
1.3235 3400 0.0895 -
1.3624 3500 0.0872 0.5191
1.4013 3600 0.0914 -
1.4402 3700 0.0901 -
1.4792 3800 0.0832 -
1.5181 3900 0.0867 -
1.5570 4000 0.078 0.5250
1.5960 4100 0.0799 -
1.6349 4200 0.0871 -
1.6738 4300 0.0837 -
1.7127 4400 0.0911 -
1.7517 4500 0.0783 0.5248
1.7906 4600 0.0749 -
1.8295 4700 0.097 -
1.8684 4800 0.0865 -
1.9074 4900 0.0849 -
1.9463 5000 0.0937 0.5246
1.9852 5100 0.0839 -

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 2.14.4
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for collaborativeearth/bge-m3_wri_notitles

Base model

BAAI/bge-m3
Finetuned
(386)
this model

Papers for collaborativeearth/bge-m3_wri_notitles

Evaluation results