agri-bot-model-v2 / README.md
goldenevil's picture
Upload folder using huggingface_hub
3d08350 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:254
  - loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
  - source_sentence: How long should I let my potatoes 'cork over' in storage to prevent rot?
    sentences:
      - >-
        If fruit sets on the vine but begins to show small, light brown spots on
        the blossom end of the fruit that turn leathery, the problem may be
        "blossom end rot."
      - >-
        Deterioration is rapid since young fruits dry out quickly in storage and
        are quite sensitive to chilling injury.
      - >-
        Properly suberize potatoes by initial storage at high humidity with good
        ventilation (no wet surfaces) at 50-55°F for 10-14 days. [cite: 1873]
  - source_sentence: >-
      What's the difference between all these types of sweet corn like 'sugary'
      and 'supersweet'?
    sentences:
      - >-
        If only a few bees are present in the area, partial pollination may
        occur, resulting in misshapen fruit and low yield.
      - >-
        Sweet corn varieties are categorized by their genotypes. The most common
        varieties are: Normal or sugary (su)... Sugar enhanced (se)...
        Supersweet or shrunken (sh2)
      - >-
        Sprinkler irrigation is not recommended when growing squash, as it won't
        provide deep water for the plants and may even encourage some diseases.
  - source_sentence: I'm thinking of using a row cover on my corn. What are the perks?
    sentences:
      - >-
        Floating row covers allow the use of standard row spacing, pose less
        danger of plant injury from high temperatures, are easier to use, and
        allow for the reuse of row covers for several seasons.
      - >-
        Row cover cloth can be laid directly on the plants and left on during
        establishment
      - >-
        The use of copper-based fungicides with or without mancozeb is
        recommended after hail events. [cite: 1838]
  - source_sentence: >-
      I see powdery pustules on the underside of my groundnut leaves, what is
      it?
    sentences:
      - >-
        The leaflets exhibit large number of small powdery pustules on the lower
        surface. Correspondingly the upper surface shows yellow discolouration
        which later turns brown.
      - >-
        In order to get sour rot, you need a wounded grape, a yeast to ferment
        the sugars and generate ethanol, acetic acid bacteria to convert that
        ethanol into vinegar, and fruit flies.
      - >-
        Other plants that late blight may infect include petunia, nightshades,
        and tomatillos. [cite: 2071]
  - source_sentence: How can I tell the difference between a squash bug and a stink bug?
    sentences:
      - >-
        Squash bugs and stink bugs are similar in shape, and both have
        disagreeable odors when crushed or disturbed. Generally, stink bugs are
        wider and rounder than squash bugs.
      - >-
        Symptoms: Purpling of older leaves, usually on young plants. Causes:
        Acid and cold soils.
      - >-
        When bees are absent, fruit set on garden plants in the cucurbit family
        is very poor and often nonexistent.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'How can I tell the difference between a squash bug and a stink bug?',
    'Squash bugs and stink bugs are similar in shape, and both have disagreeable odors when crushed or disturbed. Generally, stink bugs are wider and rounder than squash bugs.',
    'Symptoms: Purpling of older leaves, usually on young plants. Causes: Acid and cold soils.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8290, 0.2432],
#         [0.8290, 1.0000, 0.2091],
#         [0.2432, 0.2091, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 254 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 254 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 8 tokens
    • mean: 17.12 tokens
    • max: 31 tokens
    • min: 13 tokens
    • mean: 34.21 tokens
    • max: 91 tokens
  • Samples:
    sentence_0 sentence_1
    Why are my melons misshapen and the yield is low? If only a few bees are present in the area, partial pollination may occur, resulting in misshapen fruit and low yield.
    What's the best way to keep these bugs out of my garden next year? The best cultural strategy for squash bug control is prevention through sanitation. Remove old cucurbit plants after harvest. Keep the garden free from rubbish and debris that can provide overwintering sites for squash bugs.
    Why is it so hard to kill these bugs with sprays? Squash bugs are difficult to kill using insecticides because egg masses, nymphs, and bugs are often hidden near the crown of the plant and difficult to reach with sprays.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}