Instructions to use ronit01/rag_tuned_minilm_mnr_100epoch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ronit01/rag_tuned_minilm_mnr_100epoch with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("ronit01/rag_tuned_minilm_mnr_100epoch")

sentences = [
    "How do the Stop and Delete IC Ops compare in terms of their effects on a run's state, visibility on the dashboard, resource usage, artifact preservation, and what further IC Ops can be performed on the run afterward?",
    "RapidFire AI is a new AI experiment execution framework that transforms your LLM pipeline customization \nfrom slow, sequential processes into rapid, intelligent workflows with hyperparallelized execution, \ndynamic real-time experiment control, and automatic backend optimization.\n\nFor *RAG and context engineering evals*, start here: :doc:`Install and Get Started: RAG and Context Engineering</walkthroughrag>`.\n\nFor *SFT and RFT/post-training workflows*, start here: :doc:`Install and Get Started: SFT/RFT</walkthroughft>`.\n\n\nRapidFire AI is the first system of its kind to establish live three-way communication between the IDE\nwhere the experiment is launched, a metrics display/control dashboard, and a multi-core/multi-GPU execution backend.\n\n.. image:: /images/rf-usage.png\n   :width: 800px\n\nJust pip install the :code:`rapidfireai` OSS package. It works on a CPU-only machine, a single-GPU machine, \nor a multi-GPU machine. Note that for RAG/context engineering with only closed model APIs, GPUs are not needed. ",
    "Resume\n-----\n\nThis IC Op is applicable only to a previously stopped run. \nIt earmarks this run to be resumed from the next chunk onward, when it will be added to the mix of \nongoing runs and assigned GPU(s) automatically. \nYou cannot resume an already resumed or deleted run.",
    "The crux of RapidFire AI's difference is in its *adaptive execution engine*: it enables \"interruptible\"\nexecution of configurations across GPUs/CPUs. To do so, it first shards the training and/or evaluation \ndataset randomly into \"chunks\" (also called \"shards\").\nThen instead of waiting for a run to see the whole dataset for all epochs (for SFT/RFT) or for full \neval metrics calculation (for RAG evals), RapidFire AI schedules all runs on *one shard at a time*, \nand then cycles through all shards.\n\nSuppose you have only 1 GPU, say an A100 or H100, and you want to run SFT on a Llama model. \nCurrent tools force you to run one config after another *sequentially* as shown in the (simplified) illustration below. \nIn contrast, by operating on shards, RapidFire AI offers a far more concurrent learning experience by \nautomatically *swapping* adapters (and base models, if needed) across GPU(s) and DRAM. \nIt does this via efficient shared memory-based caching mechanisms that can spill to disk when needed.\n\n.. image:: /images/gantt-1gpu.png\n   :width: 800px\n\nIn the above figure, all 3 model configs are shown for 1 epoch. RapidFire AI is set to use 4 chunks.\nSo, before model config 3 (M3) even starts in the sequential approach, RapidFire AI already shows you \nthe learning behaviors of all 3 configs on the first 2-3 chunks. \nThe overhead of swapping, represented by the thin gray box, is minimal, less than 5% of the runtime,\nas per our measurements--thanks to our new efficient memory management techniques.\n\nFor inference evals for RAG/context engineering, such sharded execution means RapidFire AI surfaces eval metrics \nsooner based on a statistical technique known as *online aggregation* from the database systems literature.\nBasically, see estimated values and confidence intervals for all eval metrics in real time as the shards \nget processed, ultimately converging to the exact metrics on the full dataset."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Supported Modality: Text

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/rag_tuned_minilm_mnr_100epoch")
# Run inference
sentences = [
    'What RAG components and configuration knobs does the SciFact tutorial use for scientific claim verification?',
    'This use case notebook features an all-closed model API workflow, with Open AI calls used for both embedding for generation. So, you do not need a GPU to run this notebook.\n\n\nTask, Dataset, and Prompt\n-------\n\nThis tutorial shows Retrieval-Augmented Generation (RAG) for verifying scientific claims against evidence.\n\nIt uses the "SciFact" dataset from the BEIR benchmark; \n`see its details here <https://github.com/allenai/scifact>`__. \nThe dataset contains scientific claims that must be labeled as SUPPORT, CONTRADICT, or NOINFO based on retrieved evidence.\n\nThe prompt format includes system instructions defining the verification task with an example, \nretrieved evidence documents with titles, and the scientific claim to verify.\n\n\nModel, RAG Components, and Configuration Knobs\n-------\n\nWe compare 2 generator models via OpenAI API: gpt-5-mini and gpt-4o.\n\nThere are 2 different retrieval/search strategies: similarity search and maximum marginal relevance (MMR).\n\nThe RAG pipeline uses:\n\n- **Embeddings**: OpenAI text-embedding-3-small.\n- **Vector Store**: FAISS with CPU-based exact search, i.e., no ANN approximation.\n- **Chunking**: 512-token chunks with 32-token overlap using recursive character splitting with tiktoken encoding.\n- **Retrieval**: Top-15 initial retrieval.\n- **Reranking**: cross-encoder/ms-marco-MiniLM-L6-v2 with top-5 final documents.\n- **Document Template**: Custom template including document titles with content.\n\nAll other knobs are fixed across all configs. Thus, there are a total of 4 combinations launched \nwith a simple grid search: 2 generator models x 2 search strategies.',
    'Port conflicts (services already running)\n----------------------------------------\n\nIf you encounter port conflicts, you can kill existing processes.\n\n.. code-block:: bash\n\n   lsof -t -i:8852 | xargs kill -9  # mlflow\n   lsof -t -i:8851 | xargs kill -9  # dispatcher\n   lsof -t -i:8853 | xargs kill -9  # frontend server\n\nSelect specific GPU(s) to use\n-----------------------------\n\nSet the ``CUDA_VISIBLE_DEVICES`` environment variable BEFORE running ``rapidfireai start`` to control which GPU(s) RapidFire can see and use.\n\n.. code-block:: bash\n\n   export CUDA_VISIBLE_DEVICES=2   # use GPU index 2 only\n   rapidfireai start\n\nMultiple GPUs (example: GPUs 0 and 2):\n\n.. code-block:: bash\n\n   export CUDA_VISIBLE_DEVICES=0,2\n   rapidfireai start\n\nFrom a Python script (set before importing/starting RapidFire):\n\n.. code-block:: python\n\n   import os\n   os.environ["CUDA_VISIBLE_DEVICES"] = "2"\n   # then start your RapidFire workflow\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.5354, -0.1518],
#         [ 0.5354,  1.0000,  0.0280],
#         [-0.1518,  0.0280,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 46 training samples
Columns: sentence_0 and sentence_1
Approximate statistics based on the first 46 samples:
sentence_0 sentence_1
type string string
details
min: 11 tokens
mean: 30.57 tokens
max: 48 tokens

min: 64 tokens
mean: 225.52 tokens
max: 256 tokens
Samples:

sentence_0 sentence_1

What are the four tabs on the main Experiments page of RapidFire AI's MLflow-fork dashboard? The main "Experiments" page on the dashboard has 4 main tabs:
- Table
- Chart
- Experiment Log
- Interactive Control (IC) Log
The screenshot below shows the "Table" view of an experiment with all its runs. Each run represents one model with one set of config knob values, which is standard dashboard semantics.

.. raw:: html
```
<img src="_static/mlflow-1-table.png" alt="Table view of runs metadata" 
     style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">
```
Metrics Plots

The screenshot below shows the "Chart" view of an experiment with all its runs. Each plot corresponds to a metric, spanning :code:loss on the training set and evaluation set, as well all named metrics returned in your :func:compute_metrics() function in the trainer config.

We call attention to 3 key aspects of the visualizations here:
- The x-axis "Step" for the mini batch-level plots represents absolute number of minibatches seen by that run. So, if the :code:bat...</code> | | <code>What is the difference between RFGridSearch and RFRandomSearch in terms of how they handle knob values?</code> | <code>We currently support two common config group generators: :func:RFGridSearch() for grid search and :func:RFRandomSearch()` for random search.
More support for AutoML heuristics such as SHA, HyperOpt, as well as an integration with the popular AutoML library Optuna are coming soon. Likewise for RAG/context engineering, we also plan to support the AutoML heuristic syftr.

.. py:function:: RFGridSearch(configs: Dict[str, Any] | List[Dict[str, Any]], trainer_type: str = "SFT" | "DPO" | "GRPO" | None)
```
:param configs: A config dictionary with :func:`List()` for at least one knob; can be a list of such config dictionaries too.
:type configs: Dict[str, Any] | List[Dict[str, Any]]

:param trainer_type: The fine-tuning/post-training control flow to use: "SFT", "DPO", or "GRPO". Skip this argument for :func:`run_evals()`.
:type trainer_type: str, optional 
```
.. py:function:: RFRandomSearch(configs: Dict[str, Any], trainer_type: str = "SFT" | "DPO" | "GRPO" | None, num_runs: int, seed... | | How does the compute_metrics function differ between the run_fit() (training) pipeline and the run_evals() pipeline in terms of its signature, invocation timing, and what it receives as input? | Eval Accumulate Metrics Function
Optional user-provided function to aggregate algebraic eval metrics across all batches of the data. If this function is not provided, all metrics returned by :func:eval.compute_metrics_fn() will be assumed to be distributive (i.e., summed across batches) by default. Use this function when metrics require (weighted) averaging or other custom dataset-wide aggregation logic. It is invoked once at the very end of the evaluation process after all batches have been processed. Pass it directly to the :code:accumulate_metrics_fn key in your eval config dictionary. .. py:function:: eval.accumulate_metrics_fn(aggregated_metrics: dict[str, list[dict[str, Any]]]) -> dict[str, dict[str, Any]]
:param aggregated_metrics: Dictionary with a metric's name as key and a list of per-batch metric dictionaries as values from across all data batches. Inside each value dictionary, at least the reserved key :code:"value" will exist t... |

	sentence_0	sentence_1
type	string	string
details	min: 11 tokens mean: 30.57 tokens max: 48 tokens	min: 64 tokens mean: 225.52 tokens max: 256 tokens

sentence_0	sentence_1
`What are the four tabs on the main Experiments page of RapidFire AI's MLflow-fork dashboard?`	`The main "Experiments" page on the dashboard has 4 main tabs:`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false,
    "directions": [
        "query_to_doc"
    ],
    "partition_mode": "joint",
    "hardness_mode": null,
    "hardness_strength": 0.0
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 100
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 100
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: None
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
enable_jit_checkpoint: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
use_cpu: False
seed: 42
data_seed: None
bf16: False
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: -1
ddp_backend: None
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_for_metrics: []
eval_do_concat_batches: True
auto_find_batch_size: False
full_determinism: False
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
use_cache: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin
router_mapping: {}
learning_rate_mapping: {}

Training Time

Training: 50.0 seconds

Framework Versions

Python: 3.12.13
Sentence Transformers: 5.4.1
Transformers: 5.0.0
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}

Downloads last month: 7

Safetensors

Model size

22.7M params

Tensor type

F32

Model tree for ronit01/rag_tuned_minilm_mnr_100epoch

Base model

nreimers/MiniLM-L6-H384-uncased

Quantized

sentence-transformers/all-MiniLM-L6-v2

Finetuned

(921)

this model

Papers for ronit01/rag_tuned_minilm_mnr_100epoch

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 14

Representation Learning with Contrastive Predictive Coding

Paper • 1807.03748 • Published Jul 10, 2018 • 1