Instructions to use ronit01/rag_tuned_minilm_mnr_10epoch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ronit01/rag_tuned_minilm_mnr_10epoch with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("ronit01/rag_tuned_minilm_mnr_10epoch")

sentences = [
    "How does RapidFire AI's shard-based adaptive execution engine enable online aggregation of eval metrics with confidence intervals, and what specific mathematical strategies are available for computing those intervals?",
    "RapidFire AI is a new AI experiment execution framework that transforms your LLM pipeline customization \nfrom slow, sequential processes into rapid, intelligent workflows with hyperparallelized execution, \ndynamic real-time experiment control, and automatic backend optimization.\n\nFor *RAG and context engineering evals*, start here: :doc:`Install and Get Started: RAG and Context Engineering</walkthroughrag>`.\n\nFor *SFT and RFT/post-training workflows*, start here: :doc:`Install and Get Started: SFT/RFT</walkthroughft>`.\n\n\nRapidFire AI is the first system of its kind to establish live three-way communication between the IDE\nwhere the experiment is launched, a metrics display/control dashboard, and a multi-core/multi-GPU execution backend.\n\n.. image:: /images/rf-usage.png\n   :width: 800px\n\nJust pip install the :code:`rapidfireai` OSS package. It works on a CPU-only machine, a single-GPU machine, \nor a multi-GPU machine. Note that for RAG/context engineering with only closed model APIs, GPUs are not needed. ",
    "\nRapidFire AI transforms the status quo by adapting the powerful idea of **online aggregation** \nfrom database systems research to LLM evals. \nOur adaptive execution engine, :doc:`as described on this page</difference>`, automatically \nshards the data and processes multiple configs in parallel, one shard at a time, with \nefficient swapping techniques.\n\nThis means you get **running metric estimates with confidence intervals** in real time. \nSo, you can confidently stop poor configs earlier, clone better configs on the fly, and \nperform more informed exploration to reach much better eval metrics in much less time.\n\n\nExample: Traditional Batch Evals vs. RapidFire AI\n-------\n\nFor instance, suppose you have an evals set with 400 queries. You decide to compare, say, \n4 RAG configs in one go with RapidFire AI with number of shards set to 8. The illustration\nbelow contrasts traditional batch evals vs. RapidFire AI's approach for a simple eval metric.\n\n.. list-table::\n   :widths: 50 50\n   :class: side-by-side\n\n   * - .. figure:: /images/rag-eval-online1.png\n          :width: 100%\n          :alt: Online aggregation for evals with RapidFire AI and IC Ops.\n\n     - .. figure:: /images/rag-eval-online2.png\n          :width: 100%\n          :alt: Online aggregation for evals with RapidFire AI and IC Ops.\n\n\nAll configs are executed on the first 1/8th of the data (50 examples), with \ntheir **incrementally computed** eval metrics shown in real time with confidence intervals. \nIn the figure, the 3 worst configs are stopped, while the best is cloned to add 2 new variants. \nThe 3 running configs now continue on the second 1/8th of the data (cumulatively, \n100 examples), and so on.\nOne clone is then stopped halfway through the aggregation, while the other two run to completion. \nUltimately, the other clone ends up being the best config overall.\n\nNote that the confidence intervals shown will keep narrowing as configs see more shards, converging \nto zero when 100% of the data is seen, i.e., the metrics become exact point estimates.\nOverall, compared to sequential batch evals in which the original 4 configs all run to completion, \nRapidFire AI enables you to explore more configs in less time, while reaching better eval metrics.\n\n\n\nTypes of Metrics\n-----------------------\n\nWe support 2 types of metrics based on their aggregation semantics: \n\n* **Distributive Metrics:** These are purely additive over a given set of data points. \n  \n  Examples: *count* of number of correct predictions; *sum* of output token lengths across queries.\n\n* **Algebraic Metrics:** These are averages or proportions over a given set of data points. They can be decomposed into components that are individually distributive. \n  \n  Examples: *precision*, which counts number of correct predictions and total number of data points separately and then divides them; *mean rouge-1*, which averages per-example rouge-1 values that assesses overlap of tokens between generated text and ground truth text.\n\nWhen you define an eval metric via :func:`evals.compute_metrics_fn()` and :func:`evals.accumulate_metrics_fn()`, \nyou must specify their type (algebraic or distributive) and value range as illustrated below. \nFor metrics without a type defined, they will be displayed *as is*, i.e., without projected \nestimates or confidence intervals.\n\n.. code-block:: python\n\n    # Based on GSM8K tutorial use case\n    metrics = {\n        \"Total\": {\"value\": total},\n        \"Correct\": {\n            \"value\": correct,\n            \"is_distributive\": True,\n            \"value_range\": (0, 1),\n        },\n        \"Accuracy\": {\n            \"value\": accuracy,\n            \"is_algebraic\": True,\n            \"value_range\": (0, 1),\n        },\n    }\n\nConfidence Intervals\n--------------------\n\nThe data points in the evals dataset are **assigned to shards uniformly randomly**, i.e., \nRapidFire AI performs sampling without replacement. \nBased on that, it supports 3 strategies to calculate confidence intervals for projected estimates of metrics. \nYou can indicate the confidence level (we recommend 95%) and whether to perform \"finite population correction\" (FPC) or not. \nThese values can be specified under the key :code:`\"online_strategy_kwargs\"` in your config dictionary as illustrated below.\n\n.. code-block:: python\n\n    # Based on FiQA RAG tutorial use case\n    \"online_strategy_kwargs\": {\n        \"strategy_name\": \"normal\",\n        \"confidence_level\": 0.95,\n        \"use_fpc\": True,\n    },\n\nNotation \n^^^^^^^\n\n* :math:`N` = Total population size (total number of queries in eval set)\n* :math:`n` = Sample size (number of queries processed so far)\n* :math:`\\hat{p}` = Observed sample proportion or average for an algebraic metric\n* :math:`\\bar{X}` = Sample mean for a distributive metric\n* :math:`\\widehat{T}` = Estimated population total for a distributive metric\n* :math:`\\text{Var}(\\widehat{T})` = Variance of the above estimated population total\n* :math:`\\text{SE}` = Standard error (measure of estimate uncertainty)\n* :math:`\\text{CI}` = Confidence interval\n* :math:`z` = Z-score for confidence level (1.96 for 95% confidence; used in Normal and Wilson)\n* :math:`\\alpha` = Significance level (0.05 for 95% confidence)\n* :math:`n_{\\text{eff}}` = Effective sample size (adjusted for FPC in Wilson)\n* :math:`a, b` = Lower and upper bounds of metric value range\n* :math:`R` = Range width, :math:`R = b - a`\n* :math:`\\varepsilon` = Margin of error (half-width of confidence interval for Hoeffding)\n* :math:`\\varepsilon_{\\bar{X}}` = Margin of error for sample mean (Hoeffding distributive)\n* :math:`\\text{FPC}` = Finite population correction factor\n\n\nFinite Population Correction (FPC)\n^^^^^^^^^^^^^^^^^^^^^^\n\nWhen sampling without replacement from finite populations, enabling FPC \nmultiplies the standard error (SE) by :math:`\\text{FPC} = \\sqrt{(N-n)/(N-1)}` \nwhere :math:`N` is population size and :math:`n` is sample size.\n\n\nNormal Approximation\n^^^^^^^^^^^^^^^^^^^\n\nThis is the default strategy, and it uses the Central Limit Theorem. \nIt is suitable for most cases with non-trivial sample sizes (n > 30). \nIt provides tight intervals when the statistical assumptions hold.\n\n* For algebraic metrics:\n\n.. math::\n\n   \\text{SE}_{\\hat{p}} = \\sqrt{\\frac{\\hat{p}(1-\\hat{p})}{n}} \\times \\text{FPC}\n\n   \\text{CI} = \\hat{p} \\pm 1.96 \\cdot \\text{SE}_{\\hat{p}}\n\n\n* For distributive metrics: \n\nEstimate population total :math:`\\widehat{T} = N\\bar{X}` with \nvariance :math:`\\text{Var}(\\widehat{T}) = N^2 \\cdot \\bar{X}(1-\\bar{X})/n` (FPC-adjusted).\n\n\nWilson Score\n^^^^^^^^^^^\n\nThis strategy is better for small sample sizes or metrics near 0/1 boundaries. \nIt is more robust than Normal Approximation for extreme proportions. \n\n* For algebraic metrics:\n\n.. math::\n\n   \\text{center} = \\frac{\\hat{p} + z^2/(2n_{\\text{eff}})}{1 + z^2/n_{\\text{eff}}}\n\n   \\text{margin} = \\frac{z\\sqrt{\\hat{p}(1-\\hat{p})/n_{\\text{eff}} + z^2/(4n_{\\text{eff}}^2)}}{1 + z^2/n_{\\text{eff}}}\n\nwhere :math:`n_{\\text{eff}} = n/\\text{FPC}^2` when using FPC. \nThe Wilson confidence interval is then :math:`[\\text{center} - \\text{margin}, \\text{center} + \\text{margin}]`,\nclamped to [0, 1].\n\n* For distributive metrics, this falls back to Normal Approximation. \n\n\n\nHoeffding Bounds\n^^^^^^^^^^^\n\nThis strategy is best for maximum safety (guaranteed coverage). It makes no distributional assumptions, \nbut that also means its intervals are typically quite loose.\n\n.. math::\n\n   \\varepsilon = (b-a)\\sqrt{\\frac{\\ln(2/\\alpha)}{2n}} \\times \\text{FPC}\n\n   \\text{CI} = [\\hat{p} - \\varepsilon, \\hat{p} + \\varepsilon]\n\nFor distributive metrics with range :math:`R=b-a`, it computes :math:`\\varepsilon_{\\bar{X}} = R\\sqrt{\\ln(2/\\alpha)/(2n)}` \nand then scales to population total.",
    "This class wraps around some LangChain APIs to manage dynamic few-shot example selection. It provides semantic \nsimilarity-based example selection to construct prompts with the most relevant examples for each input query.\n\nThe individual arguments (knobs) can be :class:`List` valued or :class:`Range` valued in an :class:`RFPromptManager`. \nThat is how you can specify a base set of knob combinations from which a config group can be produced. \nAlso read :doc:`the Multi-Config Specification page</configs>`.\n\n.. py:class:: RFPromptManager\n\n  :param instructions: The main instructions for the prompt that guide the generator's behavior. This sets the overall task description and role for the assistant. Either this or :code:`instructions_file_path` must be provided.\n  :type instructions: str, optional\n\n  :param instructions_file_path: Path to a file containing the instructions. Use this as an alternative to the :code:`instructions` parameter for loading instructions from a file, say, if they are very long.\n  :type instructions_file_path: str, optional\n\n  :param examples: A list of example dictionaries for few-shot learning. Each example should be a dictionary with keys matching the expected input-output format (e.g., \"question\" and \"answer\").\n  :type examples: list[dict[str, str]], optional\n\n\n  :param embedding_cfg: The embedding class and its kwargs to use for computing semantic similarity between examples and queries, provided as a single dictionary. Must include a key :code:`\"class\"` with the class itself as value, not an instance. Options for the class include :class:`HuggingFaceEmbeddings` and :class:`OpenAIEmbeddings`. The kwargs that follow must contain all parameters needed to initialize the embedding class; required parameters vary by embedding class. For example, :class:`HuggingFaceEmbeddings` needs :code:`model_name`, :code:`model_kwargs` and :code:`device`, while :class:`OpenAIEmbeddings` needs :code:`\"model\"` and :code:`\"api_key\"`.\n  :type embedding_cfg: dict[str, Any], optional\n\n\n  :param example_selector_cls: The example selector class that determines how to choose relevant examples based on the input query. Must be either :code:`SemanticSimilarityExampleSelector` or :code:`MaxMarginalRelevanceExampleSelector` (for diversity) from LangChain.\n  :type example_selector_cls: type[MaxMarginalRelevanceExampleSelector | SemanticSimilarityExampleSelector], optional\n\n  :param example_prompt_template: A LangChain :code:`PromptTemplate` that defines how to format each example. Should specify :code:`input_variables` and a :code:`template` string with placeholders matching the keys in the examples dictionaries.\n  :type example_prompt_template: PromptTemplate, optional\n\n  :param k: Number of most similar or diverse examples to retrieve and include in the prompt for each query. Default is 3.\n  :type k: int, optional"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Supported Modality: Text

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/rag_tuned_minilm_mnr_10epoch")
# Run inference
sentences = [
    "How do the Stop and Delete IC Ops compare in terms of their effects on a run's state, visibility on the dashboard, resource usage, artifact preservation, and what further IC Ops can be performed on the run afterward?",
    'Stop\n----\n\nThis IC Op earmarks a run to be stopped at the end of its current chunk. \nIt will still be alive but it will not use any GPU resources from the next chunk. \nYou will still see its minibatch-level plots advancing for the current chunk. \nYou cannot stop an already stopped or deleted run. \n\n\n.. raw:: html\n\n    <img src="/ronit01/rag_tuned_minilm_mnr_10epoch/resolve/main/_static/icop-stop2.png" alt="IC Op Stop" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n\n    <img src="/ronit01/rag_tuned_minilm_mnr_10epoch/resolve/main/_static/icop-stop.png" alt="IC Op Stop" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n',
    'RapidFire AI offers a browser-based dashboard to automatically visualize all ML metrics and lets \nyou control runs on the fly from there. \nOur current default dashboard is a fork of the popular OSS tool `MLflow <https://mlflow.org/>`__, \nand it inherits much of MLflow\'s native features.\nThe dashboard URI is printed when the rapidfireai server is started; open it in a browser. \n\nAs of this writing, apart from MLflow, RapidFire AI also supports \n`TensorBoard  <https://www.tensorflow.org/tensorboard>`__\nand `Trackio <https://huggingface.co/docs/trackio/en/index>`__\nfor logging metrics plots. \nSpecify any one, two, or all three dashboards to use with the following server start argument. \n\n.. code-block:: bash\n\n   rapidfireai start --tracking-backends [mlflow | tensorboard | trackio]\n\nAlternatively, set the dashboard using its environment variable as below in your python code/notebook:\n\n.. code-block:: python\n\n   os.environ["RF_MLFLOW_ENABLED"] = "true"\n   os.environ["RF_TENSORBOARD_ENABLED"] = "true"\n   os.environ["RF_TRACKIO_ENABLED"] = "true"\n\nSupport for other popular dashboards such as Weights & Biases and CometML is coming soon. \nThe rest of this section explains the new features of our MLflow-fork dashboard.\nNote that these new features are not yet available on the other dashboards.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5636, 0.2072],
#         [0.5636, 1.0000, 0.2361],
#         [0.2072, 0.2361, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 46 training samples
Columns: sentence_0 and sentence_1
Approximate statistics based on the first 46 samples:
sentence_0 sentence_1
type string string
details
min: 11 tokens
mean: 30.57 tokens
max: 48 tokens

min: 64 tokens
mean: 225.52 tokens
max: 256 tokens

	sentence_0	sentence_1
type	string	string
details	min: 11 tokens mean: 30.57 tokens max: 48 tokens	min: 64 tokens mean: 225.52 tokens max: 256 tokens

Samples:

sentence_0	sentence_1
`What user-provided functions can be included in an eval config for run_evals(), and which are mandatory vs. optional?`	API: User-Provided Functions for Run Evals =============== Users can provide the following custom functions as part of their eval config to be used in :func:run_evals(). Note that each leaf config can have its own set of functions for all of these. Preprocess Function ------------------- Mandatory user-provided function to prepare the inputs to be given to the generator model. It is invoked for each batch during the evaluation process before generation. Pass it directly to the :code:preprocess_fn key in your eval config dictionary. The system injects into this function the batch data, as well as the RAG spec and the prompt manager of an individual leaf config. .. py:function:: preprocess_fn(batch: dict[str, list], rag: RFLangChainRagSpec, prompt_manager: RFPromptManager) -> dict[str, list] :param batch: Dictionary with a batch of examples with dataset field names as keys and lists as values
`How do I set up RapidFire AI for RAG evaluation on a machine without GPUs?`	`Note that if you plan to use only OpenAI APIs and not self-hosted models (for embedding or generation), you do NOT need GPUs on your machine.`
But you must provide a valid OpenAI API key via a config argument as shown in the GSM8K and SciFact tutorial notebooks.

Step 1: Install dependencies and package

Obtain the RapidFire AI OSS package from pypi (includes all dependencies) and ensure it is installed correctly.

.. important::

Requires Python 3.12+. Ensure that python3 resolves to Python 3.12 before creating the venv.

.. code-block:: bash

python3 --version # must be 3.12.x python3 -m venv .venv source .venv/bin/activate

pip install rapidfireai

rapidfireai --version

Verify it prints the following:

RapidFire AI 0..14.0

Due to current issue: https://github.com/huggingface/xet-core/issues/527

pip uninstall -y hf-xet

The tutorial notebooks for RAG evals do not use any gated models from Hugging Face. If you want to a... | | What are the four Interactive Control (IC) Operations supported by RapidFire AI? | As of this writing, we support 4 IC Ops: Stop, Resume, Clone-Modify, and Delete. We explain each shortly below.


All IC Ops on a run are queued by the system and executed at a chunk boundary for that run. 
This avoids potentially non-deterministic or other inconsistent behaviors during concurrent run execution.
Note that different runs might reach their chunk boundary at different points in time. 
To control the number of chunks, set :code:num_chunks during :func:run_fit(); 
more details :doc:on the Experiment docs page </experiments>.
IC ops can be invoked as intermittently as you want during a long-running :func:run_fit(). 
So, you can launch, say, 16 configs in one go (even on a 4-GPU machine), check in after a few chunks,
and stop bottom 80% of the runs. You can let the top performers continue for longer. Then you can 
clone and modify some to add new finer grained runs and warm start their parameters. And so on.

Under the hood, RapidFire AI automat... |

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false,
    "directions": [
        "query_to_doc"
    ],
    "partition_mode": "joint",
    "hardness_mode": null,
    "hardness_strength": 0.0
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 10
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: None
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
enable_jit_checkpoint: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
use_cpu: False
seed: 42
data_seed: None
bf16: False
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: -1
ddp_backend: None
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_for_metrics: []
eval_do_concat_batches: True
auto_find_batch_size: False
full_determinism: False
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
use_cache: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin
router_mapping: {}
learning_rate_mapping: {}

Training Time

Training: 4.9 seconds

Framework Versions

Python: 3.12.13
Sentence Transformers: 5.4.1
Transformers: 5.0.0
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}