metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:4338
- loss:CosineSimilarityLoss
- loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: >-
What are the main climatic factors influencing water level fluctuations in
lakes, particularly in semi-arid regions?
sentences:
- >-
The main climatic factors influencing water level fluctuations in lakes
in semi-arid regions include potential evapotranspiration,
precipitation, temperature, and vapor pressure.
- >-
Bias correction improves the accuracy of satellite precipitation data,
enhancing its effectiveness in streamflow simulation.
- >-
Climate change is associated with an increase in the frequency and
intensity of extreme rainfall events, although regional variations can
complicate the detection of consistent trends.
- source_sentence: What is the purpose of the WATYIELD model in hydrology?
sentences:
- >-
Different precipitation datasets can lead to significant variations in
the simulation of blue and green water resources, impacting water
resource assessment and management.
- >-
The WATYIELD model quantifies the impact of land use changes on stream
discharge, facilitating predictions based on alterations in vegetation
cover.
- >-
Antecedent wetness conditions influence the timing and magnitude of DOC
mobilization, with wetter conditions leading to faster and higher DOC
export compared to drier conditions, which cause delays and reduced
export.
- source_sentence: >-
How does deep groundwater discharge influence solute budgets in
mountainous watersheds?
sentences:
- >-
Deep groundwater discharge contributes significant solute loads to
streams, affecting water quality and ecological health.
- >-
Strategies include adaptive cooperation, information sharing, water
conservation, development of alternative water sources, and flexible
water allocation policies.
- >-
Groundwater storage depletion can be influenced by land use changes,
groundwater abstraction, and decreases in precipitation due to climate
change.
- source_sentence: >-
How can uncertainty in predictive modeling of seawater intrusion be
effectively quantified and managed in coastal aquifers?
sentences:
- >-
By employing optimized sampling strategies and methods like Null Space
Monte Carlo to explore parameter spaces while integrating diverse
measurement data.
- >-
Factors include operational costs, potential losses from dam breaches,
benefits provided by the dam, and social impacts on local communities.
- >-
The relative permeability is influenced by phase saturation, wettability
conditions, capillary number, and the interfacial area between the two
fluids.
- source_sentence: What is the relationship between groundwater and streamflow?
sentences:
- >-
Long-chain alkanes and their stable hydrogen isotopes reflect variations
in vegetation types and moisture sources, providing insights into
historical precipitation patterns and climatic conditions.
- >-
A floating vegetation canopy alters flow dynamics and increases near-bed
turbulent kinetic energy, which can lead to sediment resuspension and
reduced deposition beneath the canopy.
- >-
Groundwater can sustain streamflow during dry periods, while streams can
also contribute water back to groundwater through infiltration.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("HydroEmbed/HydroEmbed-OpenQA-MiniLM-DualLoss")
# Run inference
sentences = [
'What is the relationship between groundwater and streamflow?',
'Groundwater can sustain streamflow during dry periods, while streams can also contribute water back to groundwater through infiltration.',
'A floating vegetation canopy alters flow dynamics and increases near-bed turbulent kinetic energy, which can lead to sediment resuspension and reduced deposition beneath the canopy.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Datasets
Unnamed Dataset
- Size: 2,169 training samples
- Columns:
sentence_0,sentence_1, andlabel - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string float details - min: 11 tokens
- mean: 23.44 tokens
- max: 45 tokens
- min: 16 tokens
- mean: 33.55 tokens
- max: 71 tokens
- min: 1.0
- mean: 1.0
- max: 1.0
- Samples:
sentence_0 sentence_1 label How can deep learning technologies improve the identification and management of unregulated private pumping wells in groundwater systems?Deep learning technologies can accurately detect and map private pumping wells using image data, enhancing groundwater management by providing spatial distribution insights and reducing the labor-intensive nature of traditional investigations.1.0How does solar-induced chlorophyll fluorescence relate to vegetation transpiration across different land cover types and environmental conditions?Solar-induced chlorophyll fluorescence exhibits a robust linear correlation with vegetation transpiration, which is influenced by land cover types and various environmental factors, showing higher sensitivity in C4 compared to C3 vegetation.1.0How does soil salinity affect the accuracy of soil moisture measurements from different sensing technologies and satellite products?Soil salinity introduces significant errors in dielectric-based soil moisture measurements, with L-band products being more affected than C-band products.1.0 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Unnamed Dataset
- Size: 2,169 training samples
- Columns:
sentence_0,sentence_1, andlabel - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string float details - min: 11 tokens
- mean: 23.58 tokens
- max: 47 tokens
- min: 15 tokens
- mean: 33.32 tokens
- max: 63 tokens
- min: 1.0
- mean: 1.0
- max: 1.0
- Samples:
sentence_0 sentence_1 label How does climate change impact agricultural water supply and demand in arid and semi-arid regions?Climate change exacerbates agricultural water scarcity by increasing evaporation rates and altering precipitation patterns, leading to a higher agricultural water demand while potentially reducing the available water supply.1.0How do changes in land use and climate affect river discharge dynamics in Mediterranean catchments?Changes in land use and climate primarily influence river discharge dynamics by altering vegetation cover and its associated water consumption, leading to significant reductions in discharge despite minor changes in precipitation.1.0Why is it important to regularly update rating curves in hydrological studies?Regular updates ensure that changes in river bed profiles or other environmental factors are accurately reflected in discharge estimations.1.0 - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 64per_device_eval_batch_size: 64num_train_epochs: 20fp16: Truemulti_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 20max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 7.3529 | 500 | 0.094 |
| 14.7059 | 1000 | 0.0339 |
Framework Versions
- Python: 3.11.1
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.7.0+cu118
- Accelerate: 1.6.0
- Datasets: 3.5.1
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}