SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/golden_rag_tuned_minilm")
# Run inference
sentences = [
    'How do you configure and launch a multi-config RAG evaluation experiment using run_evals(), including defining all required user-provided functions?',
    'Other Eval Config Knobs\n------\n\nFinally, apart from the Generator, the following knobs can also be included in your eval config dictionary. Each of \nthese can also be a knob set generator, viz., :func:`List()` for a discrete and :func:`Range()` for continuous knobs.\n\nFor more details on the four user-given functions listed below, see :doc:`the API: User-Provided Functions for Run Evals page</evalsfunctions>`.\n\nFor more details on the semantics of the online aggregation strategy arguments listed below, see :doc:`the Online Aggregation for Evals page</onlineagg>`.\n\n\n**batch_size** : int\n\tNumber of examples to process in one batch for GPU efficiency (if applicable)\n\n**preprocess_fn** : Callable\n\tUser-given function to preprocess a batch of examples; an eval config\'s RagSpec and PromptManager are input by the system\n\n**postprocess_fn** : Callable, optional\n\tUser-given function to postprocess a batch of examples and generations; a single cfg is passed as input by the system\n\n**compute_metrics_fn** : Callable\n\tUser-given evaluation function to compute eval metrics per batch\n\n**accumulate_metrics_fn** : Callable, optional\n\tUser-given evaluation function to aggregate algebraic eval metrics across batches. If this is not given, all metrics provided in :code:`eval_compute_metrics_fn` will be assumed to be distributive by default.\n\n**online_strategy_kwargs** : dict[str, Any], optional\n\tParameters for evals online aggregation strategy. The dictionary must include the following keys:\n\t\n\t* :code:`"strategy_name"` (str) - Must be :code:`"normal"`, :code:`"wilson"`, or :code:`"hoeffding"`.\n\t* :code:`"confidence_level"` (float) - Confidence level for confidence intervals on metrics. Must be in [0,1]. Default is 0.95 (95%).\n\t* :code:`"use_fpc"` (bool) - Whether to apply finite population correction. Default is :code:`True`.',
    '.. code-block:: python\n\n    # Based on the FiQA Pinecone tutorial notebook\n    spec = ServerlessSpec(cloud="gcp", region="us-central1")\n\n    # Create mode\n    vector_store_cfg_create={\n        "type": "pinecone",\n        "pinecone_api_key": PINECONE_API_KEY, # Or set the PINECONE_API_KEY environment variable\n        "spec": spec,\n        "metric": "cosine",\n        "batch_size": 1024, # documents are embedded in batches of 1024. Defaults to 128.\n    }\n\n    # Read and Update mode\n    vector_store_cfg_read_update={\n        "type": "pinecone", # Required\n        "pinecone_api_key": PINECONE_API_KEY, # Or set the PINECONE_API_KEY environment variable\n        "index_namespace": List([("fiqa", "chunk64"), ("fiqa", "chunk256")]), # Names of *pre-existing* pinecone indexes paired with respective namespaces\n        "embedding_cfg": {\n            "class": HuggingFaceEmbeddings,\n            "model_name": "sentence-transformers/all-MiniLM-L6-v2",\n            "model_kwargs": {"device": "cuda:0"},\n            "encode_kwargs": {"normalize_embeddings": True, "batch_size": 128}\n        },\n        "text_key": "original_doctext", # Metadata field name for raw text in Pinecone; defaults to "text"\n    }\n\n    rag_gpu = RFLangChainRagSpec(\n        document_loader=DirectoryLoader(\n            ...\n        ),\n        ...\n        vector_store_cfg=vector_store_cfg_create,  # Using Pinecone in create mode\n    )',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8252, 0.5741],
#         [0.8252, 1.0000, 0.6055],
#         [0.5741, 0.6055, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 444 training samples

  • Columns: sentence_0, sentence_1, and label

  • Approximate statistics based on the first 444 samples:

    sentence_0 sentence_1 label
    type string string float
    details
    • min: 15 tokens
    • mean: 41.97 tokens
    • max: 70 tokens
    • min: 36 tokens
    • mean: 225.67 tokens
    • max: 256 tokens
    • min: 0.0
    • mean: 0.25
    • max: 1.0
  • Samples: | sentence_0 | sentence_1 | label | |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------| | How do the Stop and Delete IC Ops compare in terms of their effects on a run's state, visibility on the dashboard, resource usage, artifact preservation, and what further IC Ops can be performed on the run afterward? | Delete
    ----

    This IC Op earmarks the run to be deleted from the next chunk onward.
    On the chart, you will see its curves vanish almost immediately.
    You cannot do any further IC Ops on a deleted run because it will not be visible.
    Note that although a deleted run vanishes from the plots, its model checkpoints are still part of
    the artifacts of that experiment so that you have post-hoc audibility.
    | 1.0 | | How does RapidFire AI's approach to multi-config experimentation unify training (run_fit) and evaluation (run_evals) workflows under a common adaptive execution model, and what are the key differences in how each workflow exposes parallelism controls, return values, and user-provided functions? | RFOpenAIAPIModelConfig

    This is a wrapper around OpenAI's API client config and chat completion parameters. The full list of their arguments are available on this page <https://platform.openai.com/docs/api-reference/chat/create>__.

    The difference here is that the individual arguments (knobs) can be :class:List valued or :class:Range valued in an :class:RFOpenAIAPIModelConfig. That is how you can specify a base set of knob combinations from which a config group can be produced. Also read :doc:the Multi-Config Specification page</configs>.

    .. py:class:: RFOpenAIAPIModelConfig

    :param client_config: A dictionary necessary for initializing the AsyncOpenAI client. All knobs given in this dictionary are simply passed to the AsyncOpenAI client as is. We recommend listing at least the following knobs.

    * :code:`"api_key"`: Your OpenAI API key for authentication. Note that we are NOT able to provide a publicly visible API key.
    * :code:`"max_retries"`: Maximum ...</code> | <code>0.0</code> |
    

    | How do RFvLLMModelConfig and RFOpenAIAPIModelConfig compare in terms of their configuration parameters, underlying systems, rate limiting capabilities, and typical use cases? | RapidFire AI's execution pipeline for RAG pipelines engineering is split into 2 main stages as illustrated in the figure below:

    • Document Preprocessing: Workers operate in parallel on the base data and produce preprocessed data that is stored in a vector store.

    • Query Processing: Workers operate in parallel on the eval set examples to embed them, retrieve relevant chunks from the vector store, rerank them, construct the full context, and then generate the outputs.

    .. image:: /images/ragspec-2.png :width: 800px

    Depending on the state of your use case's data, you can invoke only the Query Processing stage or both stages in one go via the same :class:RFLangChainRagSpec depending on what arguments are provided:

    • With Preprocessing: This creates both Document Preprocessing workers and Query Processing workers. Provide :code:document_loader, optional :code:text_splitter, :code:embedding_cfg, and optional :code:vector_store_cfg. The document preprocessing w... | 0.0 |
  • Loss: ContrastiveLoss with these parameters:

    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.5,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Time

  • Training: 5.2 seconds

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.4.1
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
    title={Dimensionality Reduction by Learning an Invariant Mapping},
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}
Downloads last month
6
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ronit01/golden_rag_tuned_minilm

Paper for ronit01/golden_rag_tuned_minilm