SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/golden_rag_tuned_minilm_10")
# Run inference
sentences = [
    "How do you set up and use Pinecone as an external vector store in RapidFire AI's RAG pipeline, including configuring create, read, and update modes?",
    'just start rapidfireai again with the above command.\n\nIf the start command fails for whatever reason, wait for half a minute and rerun it.\nFor diagnostics and common fixes (including Linux/macOS and Windows steps), see :doc:`Troubleshooting </troubleshooting>`.\n',
    'Confidence Intervals\n--------------------\n\nThe data points in the evals dataset are **assigned to shards uniformly randomly**, i.e., \nRapidFire AI performs sampling without replacement. \nBased on that, it supports 3 strategies to calculate confidence intervals for projected estimates of metrics. \nYou can indicate the confidence level (we recommend 95%) and whether to perform "finite population correction" (FPC) or not. \nThese values can be specified under the key :code:`"online_strategy_kwargs"` in your config dictionary as illustrated below.\n\n.. code-block:: python\n\n    # Based on FiQA RAG tutorial use case\n    "online_strategy_kwargs": {\n        "strategy_name": "normal",\n        "confidence_level": 0.95,\n        "use_fpc": True,\n    },\n\nNotation \n^^^^^^^\n\n* :math:`N` = Total population size (total number of queries in eval set)\n* :math:`n` = Sample size (number of queries processed so far)\n* :math:`\\hat{p}` = Observed sample proportion or average for an algebraic metric\n* :math:`\\bar{X}` = Sample mean for a distributive metric\n* :math:`\\widehat{T}` = Estimated population total for a distributive metric\n* :math:`\\text{Var}(\\widehat{T})` = Variance of the above estimated population total\n* :math:`\\text{SE}` = Standard error (measure of estimate uncertainty)\n* :math:`\\text{CI}` = Confidence interval\n* :math:`z` = Z-score for confidence level (1.96 for 95% confidence; used in Normal and Wilson)\n* :math:`\\alpha` = Significance level (0.05 for 95% confidence)\n* :math:`n_{\\text{eff}}` = Effective sample size (adjusted for FPC in Wilson)\n* :math:`a, b` = Lower and upper bounds of metric value range\n* :math:`R` = Range width, :math:`R = b - a`\n* :math:`\\varepsilon` = Margin of error (half-width of confidence interval for Hoeffding)\n* :math:`\\varepsilon_{\\bar{X}}` = Margin of error for sample mean (Hoeffding distributive)\n* :math:`\\text{FPC}` = Finite population correction factor\n\n\nFinite Population Correction (FPC)\n^^^^^^^^^^^^^^^^^^^^^^\n\nWhen sampling without replacement from finite populations, enabling FPC \nmultiplies the standard error (SE) by :math:`\\text{FPC} = \\sqrt{(N-n)/(N-1)}` \nwhere :math:`N` is population size and :math:`n` is sample size.\n\n\nNormal Approximation\n^^^^^^^^^^^^^^^^^^^\n\nThis is the default strategy, and it uses the Central Limit Theorem. \nIt is suitable for most cases with non-trivial sample sizes (n > 30). \nIt provides tight intervals when the statistical assumptions hold.\n\n* For algebraic metrics:\n\n.. math::\n\n   \\text{SE}_{\\hat{p}} = \\sqrt{\\frac{\\hat{p}(1-\\hat{p})}{n}} \\times \\text{FPC}\n\n   \\text{CI} = \\hat{p} \\pm 1.96 \\cdot \\text{SE}_{\\hat{p}}\n\n\n* For distributive metrics: \n\nEstimate population total :math:`\\widehat{T} = N\\bar{X}` with \nvariance :math:`\\text{Var}(\\widehat{T}) = N^2 \\cdot \\bar{X}(1-\\bar{X})/n` (FPC-adjusted).\n\n\nWilson Score\n^^^^^^^^^^^\n\nThis strategy is better for small sample sizes or metrics near 0/1 boundaries. \nIt is more robust than Normal Approximation for extreme proportions. \n\n* For algebraic metrics:\n\n.. math::\n\n   \\text{center} = \\frac{\\hat{p} + z^2/(2n_{\\text{eff}})}{1 + z^2/n_{\\text{eff}}}\n\n   \\text{margin} = \\frac{z\\sqrt{\\hat{p}(1-\\hat{p})/n_{\\text{eff}} + z^2/(4n_{\\text{eff}}^2)}}{1 + z^2/n_{\\text{eff}}}\n\nwhere :math:`n_{\\text{eff}} = n/\\text{FPC}^2` when using FPC. \nThe Wilson confidence interval is then :math:`[\\text{center} - \\text{margin}, \\text{center} + \\text{margin}]`,\nclamped to [0, 1].\n\n* For distributive metrics, this falls back to Normal Approximation. \n\n\n\nHoeffding Bounds\n^^^^^^^^^^^\n\nThis strategy is best for maximum safety (guaranteed coverage). It makes no distributional assumptions, \nbut that also means its intervals are typically quite loose.\n\n.. math::\n\n   \\varepsilon = (b-a)\\sqrt{\\frac{\\ln(2/\\alpha)}{2n}} \\times \\text{FPC}\n\n   \\text{CI} = [\\hat{p} - \\varepsilon, \\hat{p} + \\varepsilon]\n\nFor distributive metrics with range :math:`R=b-a`, it computes :math:`\\varepsilon_{\\bar{X}} = R\\sqrt{\\ln(2/\\alpha)/(2n)}` \nand then scales to population total.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5341, 0.3671],
#         [0.5341, 1.0000, 0.3282],
#         [0.3671, 0.3282, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 444 training samples

  • Columns: sentence_0, sentence_1, and label

  • Approximate statistics based on the first 444 samples:

    sentence_0 sentence_1 label
    type string string float
    details
    • min: 15 tokens
    • mean: 41.97 tokens
    • max: 70 tokens
    • min: 36 tokens
    • mean: 225.67 tokens
    • max: 256 tokens
    • min: 0.0
    • mean: 0.25
    • max: 1.0
  • Samples: | sentence_0 | sentence_1 | label | |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------| | How do you set up and run an SFT fine-tuning experiment from scratch using RapidFire AI's full installation, from installing the package through launching training and monitoring results? | online_strategy_kwargs : dict[str, Any], optional
    Parameters for evals online aggregation strategy. The dictionary must include the following keys:

    * :code:"strategy_name" (str) - Must be :code:"normal", :code:"wilson", or :code:"hoeffding".
    * :code:"confidence_level" (float) - Confidence level for confidence intervals on metrics. Must be in [0,1]. Default is 0.95 (95%).
    * :code:"use_fpc" (bool) - Whether to apply finite population correction. Default is :code:True.
    | 0.0 | | How do FAISS, Pinecone, and PGVector compare as vector store options in RapidFire AI in terms of supported modes, persistence, configuration requirements, and GPU support? | Run Fit

    The main function to launch training (including LLM fine-tuning and post-training) and evaluation for a given config group in one go. See :doc:the Multi-Config Specification page</configs> for more details on how to construct a config group.

    .. py:function:: run_fit(self, param_config: Any, create_model_fn: Callable, train_dataset: Dataset, eval_dataset: Dataset, num_chunks: int, seed: int=42, num_gpus: int) -> None:

    :param param_config: A train config knob dictionary, a generated config group, or a :code:`list` of configs or config groups
    :type param_config: Train config-group or list as described in :doc:`the Multi-Config Specification page</configs>`
    
    :param create_model_fn: User-given function to create a model instance; a single cfg is passed as input by the system
    :type create_model_fn: Callable
    
    :param train_dataset: Training dataset
    :type train_dataset: Dataset
    
    :param eval_dataset: Evaluation dataset to measure eval metrics
    :type eval_dataset: Dat...</code> | <code>0.0</code> |
    

    | How do you configure and launch a multi-config RAG evaluation experiment using run_evals(), including defining all required user-provided functions? | What Makes RapidFire AI Different?

    The crux of RapidFire AI's difference is in its adaptive execution engine: it enables "interruptible" execution of configurations across GPUs/CPUs. To do so, it first shards the training and/or evaluation dataset randomly into "chunks" (also called "shards"). Then instead of waiting for a run to see the whole dataset for all epochs (for SFT/RFT) or for full eval metrics calculation (for RAG evals), RapidFire AI schedules all runs on one shard at a time, and then cycles through all shards.

    Suppose you have only 1 GPU, say an A100 or H100, and you want to run SFT on a Llama model. Current tools force you to run one config after another sequentially as shown in the (simplified) illustration below. In contrast, by operating on shards, RapidFire AI offers a far more concurrent learning experience by automatically swapping adapters (and base models, if needed) across GPU(s) and DRAM. It does this via efficient shared memory-b... | 0.0 |

  • Loss: ContrastiveLoss with these parameters:

    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.5,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Time

  • Training: 53.3 seconds

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.4.1
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
    title={Dimensionality Reduction by Learning an Invariant Mapping},
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}
Downloads last month
5
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ronit01/golden_rag_tuned_minilm_10

Paper for ronit01/golden_rag_tuned_minilm_10