metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:254
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: How long should I let my potatoes 'cork over' in storage to prevent rot?
sentences:
- >-
If fruit sets on the vine but begins to show small, light brown spots on
the blossom end of the fruit that turn leathery, the problem may be
"blossom end rot."
- >-
Deterioration is rapid since young fruits dry out quickly in storage and
are quite sensitive to chilling injury.
- >-
Properly suberize potatoes by initial storage at high humidity with good
ventilation (no wet surfaces) at 50-55°F for 10-14 days. [cite: 1873]
- source_sentence: >-
What's the difference between all these types of sweet corn like 'sugary'
and 'supersweet'?
sentences:
- >-
If only a few bees are present in the area, partial pollination may
occur, resulting in misshapen fruit and low yield.
- >-
Sweet corn varieties are categorized by their genotypes. The most common
varieties are: Normal or sugary (su)... Sugar enhanced (se)...
Supersweet or shrunken (sh2)
- >-
Sprinkler irrigation is not recommended when growing squash, as it won't
provide deep water for the plants and may even encourage some diseases.
- source_sentence: I'm thinking of using a row cover on my corn. What are the perks?
sentences:
- >-
Floating row covers allow the use of standard row spacing, pose less
danger of plant injury from high temperatures, are easier to use, and
allow for the reuse of row covers for several seasons.
- >-
Row cover cloth can be laid directly on the plants and left on during
establishment
- >-
The use of copper-based fungicides with or without mancozeb is
recommended after hail events. [cite: 1838]
- source_sentence: >-
I see powdery pustules on the underside of my groundnut leaves, what is
it?
sentences:
- >-
The leaflets exhibit large number of small powdery pustules on the lower
surface. Correspondingly the upper surface shows yellow discolouration
which later turns brown.
- >-
In order to get sour rot, you need a wounded grape, a yeast to ferment
the sugars and generate ethanol, acetic acid bacteria to convert that
ethanol into vinegar, and fruit flies.
- >-
Other plants that late blight may infect include petunia, nightshades,
and tomatillos. [cite: 2071]
- source_sentence: How can I tell the difference between a squash bug and a stink bug?
sentences:
- >-
Squash bugs and stink bugs are similar in shape, and both have
disagreeable odors when crushed or disturbed. Generally, stink bugs are
wider and rounder than squash bugs.
- >-
Symptoms: Purpling of older leaves, usually on young plants. Causes:
Acid and cold soils.
- >-
When bees are absent, fruit set on garden plants in the cucurbit family
is very poor and often nonexistent.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on BAAI/bge-base-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'How can I tell the difference between a squash bug and a stink bug?',
'Squash bugs and stink bugs are similar in shape, and both have disagreeable odors when crushed or disturbed. Generally, stink bugs are wider and rounder than squash bugs.',
'Symptoms: Purpling of older leaves, usually on young plants. Causes: Acid and cold soils.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8290, 0.2432],
# [0.8290, 1.0000, 0.2091],
# [0.2432, 0.2091, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 254 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 254 samples:
sentence_0 sentence_1 type string string details - min: 8 tokens
- mean: 17.12 tokens
- max: 31 tokens
- min: 13 tokens
- mean: 34.21 tokens
- max: 91 tokens
- Samples:
sentence_0 sentence_1 Why are my melons misshapen and the yield is low?If only a few bees are present in the area, partial pollination may occur, resulting in misshapen fruit and low yield.What's the best way to keep these bugs out of my garden next year?The best cultural strategy for squash bug control is prevention through sanitation. Remove old cucurbit plants after harvest. Keep the garden free from rubbish and debris that can provide overwintering sites for squash bugs.Why is it so hard to kill these bugs with sprays?Squash bugs are difficult to kill using insecticides because egg masses, nymphs, and bugs are often hidden near the crown of the plant and difficult to reach with sprays. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 16per_device_eval_batch_size: 16multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.1.2
- Transformers: 4.57.1
- PyTorch: 2.8.0+cu126
- Accelerate: 1.11.0
- Datasets: 4.0.0
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}