SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/final_golden_rag_tuned_minilm_100")
# Run inference
sentences = [
    "How does RapidFire AI's shard-based adaptive execution engine enable online aggregation of eval metrics with confidence intervals, and what specific mathematical strategies are available for computing those intervals?",
    '**online_strategy_kwargs** : dict[str, Any], optional\n\tParameters for evals online aggregation strategy. The dictionary must include the following keys:\n\t\n\t* :code:`"strategy_name"` (str) - Must be :code:`"normal"`, :code:`"wilson"`, or :code:`"hoeffding"`.\n\t* :code:`"confidence_level"` (float) - Confidence level for confidence intervals on metrics. Must be in [0,1]. Default is 0.95 (95%).\n\t* :code:`"use_fpc"` (bool) - Whether to apply finite population correction. Default is :code:`True`.',
    'Confidence Intervals\n--------------------\n\nThe data points in the evals dataset are **assigned to shards uniformly randomly**, i.e., \nRapidFire AI performs sampling without replacement. \nBased on that, it supports 3 strategies to calculate confidence intervals for projected estimates of metrics. \nYou can indicate the confidence level (we recommend 95%) and whether to perform "finite population correction" (FPC) or not. \nThese values can be specified under the key :code:`"online_strategy_kwargs"` in your config dictionary as illustrated below.\n\n.. code-block:: python\n\n    # Based on FiQA RAG tutorial use case\n    "online_strategy_kwargs": {\n        "strategy_name": "normal",\n        "confidence_level": 0.95,\n        "use_fpc": True,\n    },',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9646, 0.3418],
#         [0.9646, 1.0000, 0.4140],
#         [0.3418, 0.4140, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 208 training samples

  • Columns: sentence_0, sentence_1, and label

  • Approximate statistics based on the first 208 samples:

    sentence_0 sentence_1 label
    type string string float
    details
    • min: 25 tokens
    • mean: 38.85 tokens
    • max: 70 tokens
    • min: 58 tokens
    • mean: 223.85 tokens
    • max: 256 tokens
    • min: 0.0
    • mean: 0.25
    • max: 1.0
  • Samples: | sentence_0 | sentence_1 | label | |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------| | What are all the Experiment class methods (experiment ops) provided by RapidFire AI, and what does each one do? | API: User-Provided Functions for Run Evals
    ===============

    Users can provide the following custom functions as part of their eval config to be used in :func:run_evals().
    Note that each leaf config can have its own set of functions for all of these.


    Preprocess Function
    -------------------

    Mandatory user-provided function to prepare the inputs to be given to the generator model.
    It is invoked for each batch during the evaluation process before generation.
    Pass it directly to the :code:preprocess_fn key in your eval config dictionary.

    The system injects into this function the batch data, as well as the RAG spec and
    the prompt manager of an individual leaf config.


    .. py:function:: preprocess_fn(batch: dict[str, list], rag: RFLangChainRagSpec, prompt_manager: RFPromptManager) -> dict[str, list]

    :param batch: Dictionary with a batch of examples with dataset field names as keys and lists as values
    | 0.0 | | What are all the Experiment class methods (experiment ops) provided by RapidFire AI, and what does each one do? | Preprocess Function

    Mandatory user-provided function to prepare the inputs to be given to the generator model. It is invoked for each batch during the evaluation process before generation. Pass it directly to the :code:preprocess_fn key in your eval config dictionary.

    The system injects into this function the batch data, as well as the RAG spec and the prompt manager of an individual leaf config.

    .. py:function:: preprocess_fn(batch: dict[str, list], rag: RFLangChainRagSpec, prompt_manager: RFPromptManager) -> dict[str, list]

    :param batch: Dictionary with a batch of examples with dataset field names as keys and lists as values :type batch: dict[str, list]

    :param rag: RAG specification object for document chunk retrieval and context serialization :type rag: RFLangChainRagSpec

    :param prompt_manager: Prompt manager object for handling instructions and few-shot examples :type prompt_manager: RFPromptManager

    :return: Dictionary with the p... | 0.0 | | How does the Wilson Score confidence interval strategy work for online aggregation in RapidFire AI evals, and when should it be preferred over the Normal Approximation? | Run Evals

    The main function to launch LLM evaluation (evals), including with optional RAG, for a given config group in one go. See :doc:the Multi-Config Specification page</configs> for more details on how to construct a config group.

    .. py:function:: run_evals(self, config_group: Any, dataset: Dataset, num_shards: int=4, num_actors: int, seed: int=42) -> dict[int, tuple[dict, dict]]:

    :param config_group: Single evals config knob dictionary, a generated config group, or a :code:`list` of configs or config groups
    :type config_group: Evals config-group or list as described in :doc:`the Multi-Config Specification page</configs>`
    
    :param dataset: Evaluation dataset to measure eval metrics
    :type dataset: Dataset
    
    :param num_shards: Number of logical splits of data to control degree of concurrency for multi-config execution (recommended: at least 4)
    :type num_shards: int
    
    :param num_actors: Number of parallel worker processes per machine to control degree of concurrency...</code> | <code>0.0</code> |
    
  • Loss: ContrastiveLoss with these parameters:

    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.5,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 100
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 100
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
38.4615 500 0.0037
76.9231 1000 0.0003

Training Time

  • Training: 3.8 minutes

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.4.1
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
    title={Dimensionality Reduction by Learning an Invariant Mapping},
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}
Downloads last month
6
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ronit01/final_golden_rag_tuned_minilm_100

Paper for ronit01/final_golden_rag_tuned_minilm_100