Instructions to use collaborativeearth/bge-m3_wri with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use collaborativeearth/bge-m3_wri with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("collaborativeearth/bge-m3_wri")

sentences = [
    "can beef help reduce emissions",
    "Toward \"Better\" Meat? Aligning Meat Sourcing Strategies with Corporate Climate and Sustainability Goals These studies each shed light on the quantitative effects of shifting production or sourcing from a conventional system to an alternative system.\n\nBecause Poore and Nemecek’s (2018) database only captured studies published between 2000 and June 2016, we performed a literature review using similar search terms and study inclusion criteria to capture additional studies that were published through 2022. As Poore and Nemecek (2018) did, in some instances we performed adjustments to fill data gaps or make results more comparable between studies (e.g., estimating land use using data included in a study, making assumptions to estimate impacts from the animals’ full life cycle). See Appendix A for more details on our approach to adding in more recent studies and Appendix B for the full list of “paired studies” included in our analysis below, as well as all adjustments made. The Glossary provides definitions of the various production systems.\n\nFor each quantitative environmental indicator (e.g., GHG emissions, land use) in each “paired study,” we calculated the percent changes that occurred when shifting from the conventional system to the alternative production system.",
    "Toward \"Better\" Meat? Aligning Meat Sourcing Strategies with Corporate Climate and Sustainability Goals Finally, there are more complex nutrient quality indices that could be used as denominators (FAO 2021; Katz-Rosene et al. 2023), but, since no consensus exists about which one is “best,” we have used the simpler denominator of protein. In sum, use of any of these alternative numerators and denominators would not change the main findings and recommendations of this report.\n\n4.  For GHG emissions, we removed land-use-change emissions from the estimates in Poore and Nemecek (2018), so as not to double-count with the “carbon opportunity costs” of agricultural land use.",
    "Toward \"Better\" Meat? Aligning Meat Sourcing Strategies with Corporate Climate and Sustainability Goals Shift toward lower-emissions foods. As noted elsewhere in this report, because beef is an emissions-intensive food, shifting purchases and sales toward lower-emissions foods can help companies reduce scope 3 emissions.\n\nThere is growing interest in improving grazing management to increase the amount of carbon sequestered in pasturelands, a practice often called “regenerative grazing.” Some proponents of regenerative grazing even suggest that by removing carbon from the atmosphere, soil carbon sequestration could fully offset GHG emissions from beef production, suggesting potentially “carbon neutral” or “carbon negative” beef. And while traditional life cycle assessments assumed that soil carbon stocks on agricultural lands were in equilibrium and did not include soil carbon stock changes in studies on agriculture’s environmental impacts, more recent studies have begun to incorporate soil carbon measurements, including several beef studies included in our review (Buratti et al. 2017; Eldesouky et al. 2018; Stanley et al. 2018)."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-m3
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("collaborativeearth/bge-m3_wri")
# Run inference
sentences = [
    'what is the wri meat initiative?',
    'Toward "Better" Meat? Aligning Meat Sourcing Strategies with Corporate Climate and Sustainability Goals Toward “Better” Meat? Aligning meat sourcing strategies with corporate climate and sustainability goals\n\nWOR L D WOR L D R E S O U R C E S R E S O U R C E S I NS T I T U T E I NS T I T U T E\n\nRICHARD WAITE is the Acting Director for Agriculture Initiatives at WRI.\n\nis a doctoral student with Oxford University’s Environmental Change Institute and a former Research Analyst for WRI’s Food and Climate Programs.\n\nCLARA CHO is the Data Analyst for the Coolfood initiative at WRI. Contact: clara.cho@wri.org.\n\nWe are pleased to acknowledge our institutional strategic partners that provide core funding to WRI: the Netherlands Ministry of Foreign Affairs, Royal Danish Ministry of Foreign Affairs, and Swedish International Development Cooperation Agency.\n\nThe authors acknowledge the following individuals for their valuable guidance and critical reviews:',
    'Pilot analysis of global ecosystems: Grassland ecosystems Although GLASOD was by necessity a somewhat subjective assessment it was extremely carefully prepared by leading experts in the field. It remains the only global database on the status of human-induced soil degradation, and no other data set comes as close to defining the extent of desertification at the global scale (UNEP 1997: V).',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Dataset: ir-eval
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.3403
cosine_accuracy@3	0.5389
cosine_accuracy@5	0.6212
cosine_accuracy@10	0.7122
cosine_precision@1	0.3403
cosine_precision@3	0.1796
cosine_precision@5	0.1242
cosine_precision@10	0.0712
cosine_recall@1	0.3403
cosine_recall@3	0.5389
cosine_recall@5	0.6212
cosine_recall@10	0.7122
cosine_ndcg@10	0.5191
cosine_mrr@10	0.458
cosine_map@100	0.4673

Training Details

Training Dataset

Unnamed Dataset

Size: 82,169 training samples
Columns: question and answer
Approximate statistics based on the first 1000 samples:
question answer
type string string
details
min: 4 tokens
mean: 10.62 tokens
max: 31 tokens

min: 53 tokens
mean: 232.17 tokens
max: 337 tokens

	question	answer
type	string	string
details	min: 4 tokens mean: 10.62 tokens max: 31 tokens	min: 53 tokens mean: 232.17 tokens max: 337 tokens

Samples:

question	answer
`what is the economic case of restoration`	The Economic Case for Landscape Restoration in Latin America The Economic Case for Landscape Restoration in Latin America THE ECONOMIC CASE FOR LANDSCAPE RESTORATION IN LATIN AMERICA WALTER VERGARA, LUCIANA GALLARDO LOMELI, ANA R. RIOS, PAUL ISBELL, STEVEN PRAGER, RONNIE DE CAMINO Land use and land-use change are central to the economic and social fabric of Latin America and the Caribbean, and essential to the region’s prospects for sustainable development. Countries are realizing that now, more than ever, is the time for action. Eleven countries, three Brazilian states and several regional programs have already committed to restoring more than 27 million hectares of degraded land in Latin America—but can these ambitions become a reality while supporting good living standards and economic development?
`economic case of landscape restoration in latin america`	The Economic Case for Landscape Restoration in Latin America The Economic Case for Landscape Restoration in Latin America THE ECONOMIC CASE FOR LANDSCAPE RESTORATION IN LATIN AMERICA WALTER VERGARA, LUCIANA GALLARDO LOMELI, ANA R. RIOS, PAUL ISBELL, STEVEN PRAGER, RONNIE DE CAMINO Land use and land-use change are central to the economic and social fabric of Latin America and the Caribbean, and essential to the region’s prospects for sustainable development. Countries are realizing that now, more than ever, is the time for action. Eleven countries, three Brazilian states and several regional programs have already committed to restoring more than 27 million hectares of degraded land in Latin America—but can these ambitions become a reality while supporting good living standards and economic development?
`what is lata-american landscape`	The Economic Case for Landscape Restoration in Latin America Agriculture and forestry exports from Latin America represent about 13 percent of the global trade of food, feed, and fiber and account for a majority of employment outside large urban areas—numbers only expected to grow as Latin America is called upon to meet an increasing global demand for food. Yet, since the turn of the century, about 37 million hectares of natural forests, savannas and wetlands have been transformed to expand agriculture. Cumulative, unsustainable land-use practices have led to the degradation of about 300 million hectares, resulting in a reduction in yields and quality of production, and in losses in biomass content, soil quality, surface water hydrology, and biodiversity. Deforestation, land-use change, and unsustainable agricultural activities are also currently the largest drivers of climate change in the region, accounting for 56 percent of all greenhouse gas emissions. Today, while some progress ha...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
learning_rate: 1e-06
num_train_epochs: 2
warmup_ratio: 0.1
fp16: True
gradient_checkpointing: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 1e-06
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 2
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: True
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	ir-eval_cosine_ndcg@10
-1	-1	-	0.4718
0.0389	100	0.7439	-
0.0779	200	0.6208	-
0.1168	300	0.4568	-
0.1558	400	0.3713	-
0.1947	500	0.3263	0.5004
0.2336	600	0.2722	-
0.2726	700	0.2521	-
0.3115	800	0.2541	-
0.3505	900	0.2348	-
0.3894	1000	0.2321	0.5090
0.4283	1100	0.2313	-
0.4673	1200	0.2195	-
0.5062	1300	0.2286	-
0.5452	1400	0.2188	-
0.5841	1500	0.2166	0.5115
0.6231	1600	0.2194	-
0.6620	1700	0.2006	-
0.7009	1800	0.1954	-
0.7399	1900	0.2157	-
0.7788	2000	0.2059	0.5154
0.8178	2100	0.203	-
0.8567	2200	0.1949	-
0.8956	2300	0.1943	-
0.9346	2400	0.206	-
0.9735	2500	0.2015	0.5175
1.0125	2600	0.1801	-
1.0514	2700	0.1867	-
1.0903	2800	0.1914	-
1.1293	2900	0.1827	-
1.1682	3000	0.1899	0.5165
1.2072	3100	0.1707	-
1.2461	3200	0.1872	-
1.2850	3300	0.1943	-
1.3240	3400	0.1854	-
1.3629	3500	0.1747	0.5182
1.4019	3600	0.1764	-
1.4408	3700	0.1866	-
1.4798	3800	0.1855	-
1.5187	3900	0.1782	-
1.5576	4000	0.1744	0.5181
1.5966	4100	0.1793	-
1.6355	4200	0.187	-
1.6745	4300	0.1907	-
1.7134	4400	0.1781	-
1.7523	4500	0.1825	0.5185
1.7913	4600	0.1981	-
1.8302	4700	0.1751	-
1.8692	4800	0.1824	-
1.9081	4900	0.1866	-
1.9470	5000	0.188	0.5191
1.9860	5100	0.1838	-

Framework Versions

Python: 3.11.12
Sentence Transformers: 4.1.0
Transformers: 4.51.3
PyTorch: 2.6.0+cu124
Accelerate: 1.6.0
Datasets: 2.14.4
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 6

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for collaborativeearth/bge-m3_wri

Base model

BAAI/bge-m3

Finetuned

(510)

this model

Papers for collaborativeearth/bge-m3_wri

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 15

Efficient Natural Language Response Suggestion for Smart Reply

Paper • 1705.00652 • Published May 1, 2017

Evaluation results

Cosine Accuracy@1 on ir eval
self-reported

0.340
Cosine Accuracy@3 on ir eval
self-reported

0.539
Cosine Accuracy@5 on ir eval
self-reported

0.621
Cosine Accuracy@10 on ir eval
self-reported

0.712
Cosine Precision@1 on ir eval
self-reported

0.340
Cosine Precision@3 on ir eval
self-reported

0.180
Cosine Precision@5 on ir eval
self-reported

0.124
Cosine Precision@10 on ir eval
self-reported

0.071