SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/rag_tuned_minilm_mnr_100epoch")
# Run inference
sentences = [
    'What RAG components and configuration knobs does the SciFact tutorial use for scientific claim verification?',
    'This use case notebook features an all-closed model API workflow, with Open AI calls used for both embedding for generation. So, you do not need a GPU to run this notebook.\n\n\nTask, Dataset, and Prompt\n-------\n\nThis tutorial shows Retrieval-Augmented Generation (RAG) for verifying scientific claims against evidence.\n\nIt uses the "SciFact" dataset from the BEIR benchmark; \n`see its details here <https://github.com/allenai/scifact>`__. \nThe dataset contains scientific claims that must be labeled as SUPPORT, CONTRADICT, or NOINFO based on retrieved evidence.\n\nThe prompt format includes system instructions defining the verification task with an example, \nretrieved evidence documents with titles, and the scientific claim to verify.\n\n\nModel, RAG Components, and Configuration Knobs\n-------\n\nWe compare 2 generator models via OpenAI API: gpt-5-mini and gpt-4o.\n\nThere are 2 different retrieval/search strategies: similarity search and maximum marginal relevance (MMR).\n\nThe RAG pipeline uses:\n\n- **Embeddings**: OpenAI text-embedding-3-small.\n- **Vector Store**: FAISS with CPU-based exact search, i.e., no ANN approximation.\n- **Chunking**: 512-token chunks with 32-token overlap using recursive character splitting with tiktoken encoding.\n- **Retrieval**: Top-15 initial retrieval.\n- **Reranking**: cross-encoder/ms-marco-MiniLM-L6-v2 with top-5 final documents.\n- **Document Template**: Custom template including document titles with content.\n\nAll other knobs are fixed across all configs. Thus, there are a total of 4 combinations launched \nwith a simple grid search: 2 generator models x 2 search strategies.',
    'Port conflicts (services already running)\n----------------------------------------\n\nIf you encounter port conflicts, you can kill existing processes.\n\n.. code-block:: bash\n\n   lsof -t -i:8852 | xargs kill -9  # mlflow\n   lsof -t -i:8851 | xargs kill -9  # dispatcher\n   lsof -t -i:8853 | xargs kill -9  # frontend server\n\nSelect specific GPU(s) to use\n-----------------------------\n\nSet the ``CUDA_VISIBLE_DEVICES`` environment variable BEFORE running ``rapidfireai start`` to control which GPU(s) RapidFire can see and use.\n\n.. code-block:: bash\n\n   export CUDA_VISIBLE_DEVICES=2   # use GPU index 2 only\n   rapidfireai start\n\nMultiple GPUs (example: GPUs 0 and 2):\n\n.. code-block:: bash\n\n   export CUDA_VISIBLE_DEVICES=0,2\n   rapidfireai start\n\nFrom a Python script (set before importing/starting RapidFire):\n\n.. code-block:: python\n\n   import os\n   os.environ["CUDA_VISIBLE_DEVICES"] = "2"\n   # then start your RapidFire workflow\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.5354, -0.1518],
#         [ 0.5354,  1.0000,  0.0280],
#         [-0.1518,  0.0280,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 46 training samples

  • Columns: sentence_0 and sentence_1

  • Approximate statistics based on the first 46 samples:

    sentence_0 sentence_1
    type string string
    details
    • min: 11 tokens
    • mean: 30.57 tokens
    • max: 48 tokens
    • min: 64 tokens
    • mean: 225.52 tokens
    • max: 256 tokens
  • Samples:

    sentence_0 sentence_1
    What are the four tabs on the main Experiments page of RapidFire AI's MLflow-fork dashboard? The main "Experiments" page on the dashboard has 4 main tabs:
    • Table

    • Chart

    • Experiment Log

    • Interactive Control (IC) Log

    The screenshot below shows the "Table" view of an experiment with all its runs. Each run represents one model with one set of config knob values, which is standard dashboard semantics.

    .. raw:: html

    <img src="_static/mlflow-1-table.png" alt="Table view of runs metadata" 
         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">
    

    Metrics Plots

    The screenshot below shows the "Chart" view of an experiment with all its runs. Each plot corresponds to a metric, spanning :code:loss on the training set and evaluation set, as well all named metrics returned in your :func:compute_metrics() function in the trainer config.

    We call attention to 3 key aspects of the visualizations here:

    • The x-axis "Step" for the mini batch-level plots represents absolute number of minibatches seen by that run. So, if the :code:bat...</code> | | <code>What is the difference between RFGridSearch and RFRandomSearch in terms of how they handle knob values?</code> | <code>We currently support two common config group generators: :func:RFGridSearch() for grid search and :func:RFRandomSearch()` for random search.

    More support for AutoML heuristics such as SHA, HyperOpt, as well as an integration with the popular AutoML library Optuna are coming soon. Likewise for RAG/context engineering, we also plan to support the AutoML heuristic syftr.

    .. py:function:: RFGridSearch(configs: Dict[str, Any] | List[Dict[str, Any]], trainer_type: str = "SFT" | "DPO" | "GRPO" | None)

    :param configs: A config dictionary with :func:`List()` for at least one knob; can be a list of such config dictionaries too.
    :type configs: Dict[str, Any] | List[Dict[str, Any]]
    
    :param trainer_type: The fine-tuning/post-training control flow to use: "SFT", "DPO", or "GRPO". Skip this argument for :func:`run_evals()`.
    :type trainer_type: str, optional 
    

    .. py:function:: RFRandomSearch(configs: Dict[str, Any], trainer_type: str = "SFT" | "DPO" | "GRPO" | None, num_runs: int, seed... | | How does the compute_metrics function differ between the run_fit() (training) pipeline and the run_evals() pipeline in terms of its signature, invocation timing, and what it receives as input? | Eval Accumulate Metrics Function

    Optional user-provided function to aggregate algebraic eval metrics across all batches of the data. If this function is not provided, all metrics returned by :func:eval.compute_metrics_fn() will be assumed to be distributive (i.e., summed across batches) by default. Use this function when metrics require (weighted) averaging or other custom dataset-wide aggregation logic.

    It is invoked once at the very end of the evaluation process after all batches have been processed. Pass it directly to the :code:accumulate_metrics_fn key in your eval config dictionary.

    .. py:function:: eval.accumulate_metrics_fn(aggregated_metrics: dict[str, list[dict[str, Any]]]) -> dict[str, dict[str, Any]]

    :param aggregated_metrics: Dictionary with a metric's name as key and a list of per-batch metric dictionaries as values from across all data batches. Inside each value dictionary, at least the reserved key :code:"value" will exist t... |

  • Loss: MultipleNegativesRankingLoss with these parameters:

    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 100
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 100
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Time

  • Training: 50.0 seconds

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.4.1
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
7
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ronit01/rag_tuned_minilm_mnr_100epoch

Papers for ronit01/rag_tuned_minilm_mnr_100epoch