Instructions to use collaborativeearth/bge-m3_wri_notitles with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use collaborativeearth/bge-m3_wri_notitles with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("collaborativeearth/bge-m3_wri_notitles") sentences = [ "who is world wildlife fund", "In the restoration scenario, a representative set of 20 degraded hectares in Latin America and the Caribbean—51 percent from subtropical wet and moist landscapes and 48 percent from drymixed forest and savanna biomes—is assumed to be brought under restoration and fully restored (WRI 2014; Potapov et al. 2011).\n\nThe 20 million hectares are pulled from each of these biomes according to the actual distribution of the degrees of degradation—34 percent lightly degraded, 58 percent moderately degraded, and 8 percent extremely degraded (Oldeman et al. 1991). 33 The restoration scenario then assumes that a balanced menu of three broad restoration methods—(1) wide-scale planted restoration, (2) assisted regeneration of secondary and naturally existing forests, and (3) agroforestry—will split equally between, and applied to, these target hectares.", "▪ Fermentation-enabled proteins. These include products in which microorganisms such as fungi and microalgae are used to convert glucose into protein (e.g., mycoprotein) via a biomass fermentation process. The microbial biomass is typically consumed, such as in products from the companies Quorn and Meati. This category also includes foods containing functional ingredients (e.g., egg proteins, enzymes) produced by microorganisms through a precision fermentation process. The ingredients that are created are typically used to improve the flavor and texture of other foods, such as dairy products containing animal-free whey protein produced by Perfect Day.\n\n▪ Cultivated meat. This novel technology produces meat from animal cells and is also known as “cell-based,” “cultured,” or “lab-grown” meat. Animal cells are grown in a bioreactor in a growth medium, producing actual animal meat. Examples include Eat Just’s chicken nuggets currently available for sale in Singapore, and cultivated chicken from Upside Foods and Good Meat for sale in the United States (Lucas 2023).", "BirdLife International; Carbon Dioxide Information Analysis Center (CDIAC), Oak Ridge National Laboratory(ORNL); Center for International Earth Science Information Network (CIESIN); Environmental Systems Research Institute (ESRI); European Space Agency (ESA); Food and Agriculture Organization of the United Nations (FAO); International Livestock Research Institute (ILRI); International Soil Reference and Information Centre (ISRIC); IUCNThe World Conservation Union; National Oceanic and Atmospheric Administration - National Geophysical Data Center (NOAA-NGDC); The Nature Conservancy (TNC); Patuxent Wildlife Research Laboratory; Safari Club International; United Nations Environment Programme (UNEP); United States Geological Survey (USGS), Earth Resources Observation Systems (EROS) Data Center; University of Maryland, Geography Department; The World Bank; World Conservation Monitoring Centre (WCMC); World Wildlife Fund – U.S. (WWF-U.S.).\n\nThe authors also would like to express their gratitude to the many individuals who contributed information and advice, attended expert workshops, and reviewed successive drafts of this report. Niels Batjes, International Soil Reference and Information Centre; Roy H. Behnke, Overseas Development Institute;" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
SentenceTransformer based on BAAI/bge-m3
This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Finetuned bge-m3 (dense retrieval) using a QA dataset created from corpus file of WRI (250 chunk length, overlap = 40, no titles). Questions were generated using the QGEN method (https://github.com/UKPLab/gpl/tree/main), 2 questions per chunk. Loss function was the MNRL without hard negatives, see training details below.
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-m3
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("collaborativeearth/bge-m3_wri_notitles")
# Run inference
sentences = [
'what to do about climate change in the meat industry',
'1. Calculate the scope 3 GHG emissions baseline of food purchases, including meat. Establishing a scope 3 GHG emissions baseline for food purchases will allow companies to understand how much of an impact meat has on their food-related carbon footprint and enable them to pinpoint emissions hot spots.\n\n2. Shift from high-emissions products like beef and lamb toward lower-emissions products like plant-based foods and alternative proteins. This type of shift is a triple win for climate, nature, and animal welfare.\n\n3. Define priorities around improved meat sourcing by product type. For example, around beef, the goal might be to reduce climate and land impacts—both through sourcing less of it, and through encouraging lower-emissions production methods. For chicken and eggs, the goal might be to improve animal welfare, promote responsible antibiotic use, and minimize water pollution.',
'We also conducted t-tests to determine the statistical significance of the above findings. For these t-tests, our null hypothesis was that there would be no difference between the conventional and alternative production systems, while the alternative hypothesis was that the alternative production systems would have mostly higher environmental impacts than the conventional systems. We conducted these tests using the paired data points for beef, lamb, dairy, pork, poultry, and eggs, for both GHG emissions and land use. (There were not enough data for water pollution and water use to conduct t-tests.) The GHG emissions results were statistically significant for beef, poultry, and eggs, with a p value <0.05. The land use results were statistically significant for beef, dairy, pork, poultry, and eggs, with a p value <0.05. Overall, the fact that the majority of these results, for GHG emissions and land use, were statistically significant reinforces the findings that alternative production systems generally have higher environmental impacts than conventional systems. There were not enough data for water pollution and water use, so the statistical significance of the water-related results could not be determined.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
ir-eval - Evaluated with
InformationRetrievalEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.3449 |
| cosine_accuracy@3 | 0.5416 |
| cosine_accuracy@5 | 0.6198 |
| cosine_accuracy@10 | 0.7198 |
| cosine_precision@1 | 0.3449 |
| cosine_precision@3 | 0.1805 |
| cosine_precision@5 | 0.124 |
| cosine_precision@10 | 0.072 |
| cosine_recall@1 | 0.3449 |
| cosine_recall@3 | 0.5416 |
| cosine_recall@5 | 0.6198 |
| cosine_recall@10 | 0.7198 |
| cosine_ndcg@10 | 0.5246 |
| cosine_mrr@10 | 0.463 |
| cosine_map@100 | 0.4721 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 82,191 training samples
- Columns:
questionandanswer - Approximate statistics based on the first 1000 samples:
question answer type string string details - min: 5 tokens
- mean: 10.69 tokens
- max: 36 tokens
- min: 40 tokens
- mean: 217.15 tokens
- max: 334 tokens
- Samples:
question answer what countries are affected by landscape restoration?The Economic Case for Landscape Restoration in Latin America
THE ECONOMIC CASE FOR LANDSCAPE RESTORATION IN LATIN AMERICA
WALTER VERGARA, LUCIANA GALLARDO LOMELI, ANA R. RIOS, PAUL ISBELL, STEVEN PRAGER, RONNIE DE CAMINO
Land use and land-use change are central to the economic and social fabric of Latin America and the Caribbean, and essential to the region’s prospects for sustainable development. Countries are realizing that now, more than ever, is the time for action. Eleven countries, three Brazilian states and several regional programs have already committed to restoring more than 27 million hectares of degraded land in Latin America—but can these ambitions become a reality while supporting good living standards and economic development?how many countries in latin america are trying to restore landscapesThe Economic Case for Landscape Restoration in Latin America
THE ECONOMIC CASE FOR LANDSCAPE RESTORATION IN LATIN AMERICA
WALTER VERGARA, LUCIANA GALLARDO LOMELI, ANA R. RIOS, PAUL ISBELL, STEVEN PRAGER, RONNIE DE CAMINO
Land use and land-use change are central to the economic and social fabric of Latin America and the Caribbean, and essential to the region’s prospects for sustainable development. Countries are realizing that now, more than ever, is the time for action. Eleven countries, three Brazilian states and several regional programs have already committed to restoring more than 27 million hectares of degraded land in Latin America—but can these ambitions become a reality while supporting good living standards and economic development?what percent of land is deforestedAgriculture and forestry exports from Latin America represent about 13 percent of the global trade of food, feed, and fiber and account for a majority of employment outside large urban areas—numbers only expected to grow as Latin America is called upon to meet an increasing global demand for food. Yet, since the turn of the century, about 37 million hectares of natural forests, savannas and wetlands have been transformed to expand agriculture. Cumulative, unsustainable land-use practices have led to the degradation of about 300 million hectares, resulting in a reduction in yields and quality of production, and in losses in biomass content, soil quality, surface water hydrology, and biodiversity. Deforestation, land-use change, and unsustainable agricultural activities are also currently the largest drivers of climate change in the region, accounting for 56 percent of all greenhouse gas emissions. Today, while some progress has been achieved, the rate of deforestation remains high at an... - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 32learning_rate: 1e-05num_train_epochs: 2warmup_ratio: 0.1fp16: Truegradient_checkpointing: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Truegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | ir-eval_cosine_ndcg@10 |
|---|---|---|---|
| -1 | -1 | - | 0.4718 |
| 0.0389 | 100 | 0.5021 | - |
| 0.0779 | 200 | 0.2574 | - |
| 0.1168 | 300 | 0.2008 | - |
| 0.1557 | 400 | 0.182 | - |
| 0.1946 | 500 | 0.1673 | 0.5134 |
| 0.2336 | 600 | 0.1488 | - |
| 0.2725 | 700 | 0.1582 | - |
| 0.3114 | 800 | 0.1662 | - |
| 0.3503 | 900 | 0.1642 | - |
| 0.3893 | 1000 | 0.1522 | 0.5107 |
| 0.4282 | 1100 | 0.1448 | - |
| 0.4671 | 1200 | 0.1525 | - |
| 0.5060 | 1300 | 0.1354 | - |
| 0.5450 | 1400 | 0.1437 | - |
| 0.5839 | 1500 | 0.1403 | 0.5172 |
| 0.6228 | 1600 | 0.1355 | - |
| 0.6617 | 1700 | 0.1459 | - |
| 0.7007 | 1800 | 0.1498 | - |
| 0.7396 | 1900 | 0.1221 | - |
| 0.7785 | 2000 | 0.1311 | 0.5201 |
| 0.8174 | 2100 | 0.1263 | - |
| 0.8564 | 2200 | 0.126 | - |
| 0.8953 | 2300 | 0.1111 | - |
| 0.9342 | 2400 | 0.1394 | - |
| 0.9731 | 2500 | 0.1188 | 0.5228 |
| 1.0121 | 2600 | 0.1267 | - |
| 1.0510 | 2700 | 0.0999 | - |
| 1.0899 | 2800 | 0.0911 | - |
| 1.1288 | 2900 | 0.0803 | - |
| 1.1678 | 3000 | 0.095 | 0.5255 |
| 1.2067 | 3100 | 0.0933 | - |
| 1.2456 | 3200 | 0.0909 | - |
| 1.2845 | 3300 | 0.093 | - |
| 1.3235 | 3400 | 0.0895 | - |
| 1.3624 | 3500 | 0.0872 | 0.5191 |
| 1.4013 | 3600 | 0.0914 | - |
| 1.4402 | 3700 | 0.0901 | - |
| 1.4792 | 3800 | 0.0832 | - |
| 1.5181 | 3900 | 0.0867 | - |
| 1.5570 | 4000 | 0.078 | 0.5250 |
| 1.5960 | 4100 | 0.0799 | - |
| 1.6349 | 4200 | 0.0871 | - |
| 1.6738 | 4300 | 0.0837 | - |
| 1.7127 | 4400 | 0.0911 | - |
| 1.7517 | 4500 | 0.0783 | 0.5248 |
| 1.7906 | 4600 | 0.0749 | - |
| 1.8295 | 4700 | 0.097 | - |
| 1.8684 | 4800 | 0.0865 | - |
| 1.9074 | 4900 | 0.0849 | - |
| 1.9463 | 5000 | 0.0937 | 0.5246 |
| 1.9852 | 5100 | 0.0839 | - |
Framework Versions
- Python: 3.11.12
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.6.0
- Datasets: 2.14.4
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- -
Model tree for collaborativeearth/bge-m3_wri_notitles
Base model
BAAI/bge-m3Papers for collaborativeearth/bge-m3_wri_notitles
Efficient Natural Language Response Suggestion for Smart Reply
Evaluation results
- Cosine Accuracy@1 on ir evalself-reported0.345
- Cosine Accuracy@3 on ir evalself-reported0.542
- Cosine Accuracy@5 on ir evalself-reported0.620
- Cosine Accuracy@10 on ir evalself-reported0.720
- Cosine Precision@1 on ir evalself-reported0.345
- Cosine Precision@3 on ir evalself-reported0.181
- Cosine Precision@5 on ir evalself-reported0.124
- Cosine Precision@10 on ir evalself-reported0.072