metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:41432
- loss:MultipleNegativesRankingLoss
base_model: google/embeddinggemma-300m
widget:
- source_sentence: >-
How does precipitation influence the water use efficiency and carbon
isotopes of Picea meyeri, and what are the implications for climate change
studies?
sentences:
- >-
In the study of starry flounders (Platichthys stellatus), cortisol
levels increased with increasing water temperature and then gradually
decreased. This suggests that cortisol, a stress hormone, is elevated as
a response to higher water temperatures, indicating that the fish
experience stress under these conditions. The increase in cortisol
levels is part of the fish's physiological response to environmental
stressors, such as temperature changes, which can affect their survival
and overall health.
- >-
The FY-4A/AGRI LST products effectively capture surface temperatures in
Hunan Province, with a correlation coefficient (R) of 0.893. However,
they exhibit a relatively high error level, with a bias of ?6.295 °C and
a root mean square error (RMSE) of 8.58 °C, particularly in capturing
high LST values. The performance of this product is superior in the
eastern flat terrain area of Hunan Province compared to the western
mountainous region. Environmental conditions in the mountainous areas
cause systematic errors that contribute to instability in detection
deviation. Surface heat resources are more abundant in eastern Hunan
Province than in the mountainous areas located to the west and south,
and their detailed distribution at finer scales is mainly influenced by
terrain and climate conditions. There is no obvious seasonal difference
in the distribution of heat resources except in winter, and rapid
urbanization within the Chang–Zhu–Tan urban agglomeration over two years
has significantly altered the spatial distribution pattern of surface
heat resources across Hunan Province.
- >-
The water use efficiency (WUE) of Picea meyeri is significantly
influenced by precipitation, along with temperature. The study found
that there is a significant positive correlation between the WUE
sequence and temperature. However, due to the combined effects of
precipitation and temperature, Picea meyeri is subject to drought stress
to some extent. This indicates that while temperature is the main
climatic factor affecting the δ13C and WUE of Picea meyeri,
precipitation also plays a crucial role in the plant's response to
climate change. These findings are important for understanding the
impacts of climate change on tree species and their ability to adapt to
changing environmental conditions.
- source_sentence: >-
How does the warming of the Southern Indian Ocean (SIO) compare to its
impact on cyclone destruction potential in the recent period versus the
earlier period?
sentences:
- >-
Green roofing systems are adopted as part of Nature-Based Solutions
(NBS) to control urban stormwater runoff and mitigate urban flood risks.
Unlike traditional roofing methods, green roofs help manage stormwater
by absorbing and retaining rainfall, reducing the volume and rate of
runoff. However, there is currently no specific widely recognized
standard or code dedicated to determining the hydrological performance
of green roofs as a whole system, and no test protocols to regulate
their design. This highlights the need for a standardized test method to
evaluate the hydrological performance of green roofing systems, making
them a more reliable solution for flood resilience in cities affected by
climate change.
- >-
In the monitoring project conducted in Chengdu, Shuangliu (SL) was one
of the three urban sites studied. The key findings regarding the sources
and contributions of VOCs to ozone formation in Shuangliu included the
identification of five dominant VOC sources: vehicular exhaust and fuel
evaporation, solvent utilization, biogenic background, secondary
formation, and industrial emissions. Before the control measures were
implemented, vehicular exhaust and fuel evaporation were the highest
contributors. During the control period, the contribution from vehicular
exhaust was reduced the most at Shuangliu. VOC species such as xylenes,
toluene, and propene, which are primarily from vehicular and industrial
emissions and solvent utilization, were found to be the dominant
precursors for ozone formation potential (OFP). These results suggest
that effective control of photochemical pollutants, particularly from
vehicular and industrial sources, is crucial for reducing ozone
formation in Chengdu.
- >-
The warming of the Southern Indian Ocean (SIO) has led to a doubling of
the Power Dissipation Index (PDI) during 1999–2016 compared to
1980–1998. This increase is primarily due to an increase in the
intensity and duration of cyclones, associated with higher sea surface
temperatures and upper ocean heat content.
- source_sentence: >-
How do the findings of the study on Azotobacter paspali bacteria in Iraq
relate to the impact of nitrogen on air pollution, and what implications
does this have for future research and applications in both environmental
and agricultural contexts?
sentences:
- >-
Quartz is one of the minerals present in the limonite ore sample from
the Wolo mine area. The ore sample contains various minerals including
chlorite, goethite, lizardite, maghemite, and quartz. The chemical
composition of the ore indicates that it is mainly composed of Fe2O3
(53.59%), followed by SiO2 (12.16%).
- >-
The study on Azotobacter paspali bacteria in Iraq found that these
bacteria have a significant effect on fixing atmospheric nitrogen and
dissolving phosphorus. This is important in the context of biological
fertilization of plants and soil, which can reduce the need for
synthetic fertilizers and potentially lower nitrogen emissions that
contribute to air pollution. In the environmental context, the research
on nitrogen dioxide air pollution in Madrid highlights the importance of
nitrogen compounds in air quality. The findings suggest that by
promoting the use of nitrogen-fixing bacteria in agriculture, we can
reduce the reliance on synthetic nitrogen fertilizers, which are a major
source of nitrogen dioxide emissions. This could lead to improved air
quality and better human health protection. Future research could focus
on integrating these biological solutions with advanced air pollution
forecasting models to create a more holistic approach to managing
nitrogen in both agricultural and urban environments.
- >-
In the Tigris River Batman-Hasankeyf region, intensive agricultural
activities are carried out, and irrigation is generally obtained from
groundwater just as it moves away from the riverfront. This region is a
valuable basin for both Turkey and the Middle East. A study using the
Geographic Information System (GIS)-based multicriteria decision-making
(MCDM) analytic hierarchy process (AHP) was conducted to explore the
groundwater potential of the drainage area. The study considered eight
hydrological and hydrogeological criteria, including geomorphology,
geology, rainfall, drainage density, slope, lineament density, land use,
and soil properties. The major findings indicated that the
groundwater-potential index values of the basin were derived, and the
groundwater potential zones were evaluated as very poor (19%), poor
(17%), moderate (34%), good (17%), and very good (13%).
- source_sentence: >-
How does the optical approach compare to the thermal approach in mapping
irrigated landcover, and what are the implications of this method?
sentences:
- >-
The analysis of Land Use and Land Cover changes in Lagos State suggests
that areas with low flood hazard levels are less affected by the
conversion of wetland areas into developed areas and unplanned
development. While wetland areas have significantly decreased and
developed areas have increased, the changes primarily impact very high
to moderate flood hazard zones.
- >-
Managers and planners should focus on people’s perceptions and
preferences of park landscape characteristics to enhance the spatial
vitality and services of urban parks, ensuring they meet the needs of
urban residents and visitors.
- >-
The optical approach, which uses SWIR-transformed reflectance (STR), has
been found to be comparable to the thermal approach in mapping irrigated
landcover. Specifically, the classification accuracy of the optical
approach was 97.6%, which is slightly better than the 93.9% accuracy of
the thermal approach. This confirms the feasibility of using STR to map
irrigated landcover, with broader implications for the use of satellite
imagery in these applications, potentially reducing the reliance on
microwave or thermal sensors.
- source_sentence: >-
Based on the Brine Shrimp Lethality Test (BSLT), what are the toxicity
levels of liquid smoke from cocoa pod skin at various pyrolysis
temperatures and water contents?
sentences:
- >-
The estimated annual flood damage for agriculture and built-up areas in
the Tajan watershed, northern Iran, is projected to surge from USD 162
million to USD 376 million and USD 91 million to USD 220 million,
respectively, by 2040, considering the land use change scenarios from
2021 to 2040.
- >-
The Brine Shrimp Lethality Test (BSLT) was used to determine the
toxicity levels of liquid smoke from cocoa pod skin at various pyrolysis
temperatures and water contents. The results showed that the LC50 values
(the concentration required to kill 50% of the test organisms) were as
follows: at 200°C and 10% water content, 11,858.58 ppm; at 200°C and 15%
water content, 13,094.23 ppm; at 200°C and 20% water content, 13,373.94
ppm; at 200°C and 25% water content, 15,703.52 ppm. At 300°C and 10%
water content, 11,604.26 ppm; at 300°C and 15% water content, 11,673.05
ppm; at 300°C and 20% water content, 13,373.94 ppm; at 300°C and 25%
water content, 13,373.94 ppm. At 400°C and 10% water content, 9,213.73
ppm; at 400°C and 15% water content, 13,094.237 ppm; at 400°C and 20%
water content, 13,373.94 ppm; at 400°C and 25% water content, 12,493.63
ppm. All the results indicate that the liquid smoke from cocoa pod skin
at different pyrolysis temperatures and water contents is classified as
non-toxic.
- >-
The distribution of PM2.5 in Santa Ana, CA, tends to be higher in
socioeconomically disadvantaged communities compared to other areas,
highlighting environmental health inequities that persist in urban
areas. This can inform policy decisions related to health equity and
community access to resources.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
datasets:
- GeoGPT-Research-Project/GeoGPT-QA
SentenceTransformer based on google/embeddinggemma-300m
This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: google/embeddinggemma-300m
- Maximum Sequence Length: 2048 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(4): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("yasserrmd/geo-gemma-300m-emb")
# Run inference
queries = [
"Based on the Brine Shrimp Lethality Test (BSLT), what are the toxicity levels of liquid smoke from cocoa pod skin at various pyrolysis temperatures and water contents?",
]
documents = [
'The Brine Shrimp Lethality Test (BSLT) was used to determine the toxicity levels of liquid smoke from cocoa pod skin at various pyrolysis temperatures and water contents. The results showed that the LC50 values (the concentration required to kill 50% of the test organisms) were as follows: at 200°C and 10% water content, 11,858.58 ppm; at 200°C and 15% water content, 13,094.23 ppm; at 200°C and 20% water content, 13,373.94 ppm; at 200°C and 25% water content, 15,703.52 ppm. At 300°C and 10% water content, 11,604.26 ppm; at 300°C and 15% water content, 11,673.05 ppm; at 300°C and 20% water content, 13,373.94 ppm; at 300°C and 25% water content, 13,373.94 ppm. At 400°C and 10% water content, 9,213.73 ppm; at 400°C and 15% water content, 13,094.237 ppm; at 400°C and 20% water content, 13,373.94 ppm; at 400°C and 25% water content, 12,493.63 ppm. All the results indicate that the liquid smoke from cocoa pod skin at different pyrolysis temperatures and water contents is classified as non-toxic.',
'The estimated annual flood damage for agriculture and built-up areas in the Tajan watershed, northern Iran, is projected to surge from USD 162 million to USD 376 million and USD 91 million to USD 220 million, respectively, by 2040, considering the land use change scenarios from 2021 to 2040.',
'The distribution of PM2.5 in Santa Ana, CA, tends to be higher in socioeconomically disadvantaged communities compared to other areas, highlighting environmental health inequities that persist in urban areas. This can inform policy decisions related to health equity and community access to resources.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.5805, 0.0253, 0.0709]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 41,432 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 12 tokens
- mean: 27.1 tokens
- max: 71 tokens
- min: 17 tokens
- mean: 119.32 tokens
- max: 413 tokens
- Samples:
sentence_0 sentence_1 How does plastic debris from land-based sources impact the ocean, particularly in the context of First Long Beach, China?Plastic debris from land-based sources can significantly impact the ocean, as seen in the study conducted at First Long Beach (FLB), China. The study found that plastic debris amounts ranged from 2 to 82 particles per square meter on this marine sand beach. The most common size of plastics was 0.5–2.5 cm (44.4%), and the most common color was white (60.9%). The most abundant shape of plastic debris was fragments (76.2%). The amount of plastic debris varied significantly between different transects along the land-based source input zone due to the impacts of wind, ocean currents, and waves. Land-based wastewater discharge was identified as a major source of plastic debris on FLB, influenced by coastal water tide variations. Reduction strategies should focus on tracing and managing these land-based sources to mitigate the impact of plastic debris on the ocean.How does the concentration of SO2 in urban areas of Nanjing correlate with the normalized difference vegetation index (NDVI), and what does this imply for public health?The concentration of SO2 in urban areas of Nanjing exhibits a strong correlation (coefficient of determination, R2 > 0.5) with the normalized difference vegetation index (NDVI) within a radial distance of 2 km from the air pollutant monitoring sites. This indicates that NDVI can be an effective indicator for assessing the distribution and concentrations of air pollutants such as SO2. Negative correlations between NDVI and socio-economic indicators are observed under relatively consistent natural conditions, including climate and terrain. Therefore, the spatiotemporal distribution patterns of NDVI can provide valuable insights not only into socio-economic growth but also into the levels and locations of air pollution concentrations, which is crucial for public health interventions and policies.How has the rise of user-generated geodata impacted the role of traditional map producers?The rise of user-generated geodata has transformed ordinary citizens into neogeographers, blurring the boundaries between traditional map producers, such as national mapping agencies and local authorities, and citizens as consumers of this information. Citizens now actively participate in mapping different types of features on the Earth’s surface as volunteers, either by providing observations on the ground or tracing data from other sources, such as aerial photographs or satellite imagery. This has resulted in a significant increase in the availability of rich spatial datasets, which are often openly accessible through platforms like OpenStreetMap (OSM) and Ushahidi. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Training Hyperparameters
Non-Default Hyperparameters
num_train_epochs: 1multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.0965 | 500 | 0.012 |
| 0.1931 | 1000 | 0.006 |
| 0.2896 | 1500 | 0.0057 |
| 0.3862 | 2000 | 0.0045 |
| 0.4827 | 2500 | 0.0024 |
| 0.5793 | 3000 | 0.0013 |
| 0.6758 | 3500 | 0.0025 |
| 0.7723 | 4000 | 0.0029 |
| 0.8689 | 4500 | 0.0012 |
| 0.9654 | 5000 | 0.0004 |
Framework Versions
- Python: 3.12.11
- Sentence Transformers: 5.1.0
- Transformers: 4.56.1
- PyTorch: 2.8.0+cu128
- Accelerate: 1.10.1
- Datasets: 4.0.0
- Tokenizers: 0.22.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}