Instructions to use ronit01/final_golden_rag_tuned_minilm_10 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ronit01/final_golden_rag_tuned_minilm_10 with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("ronit01/final_golden_rag_tuned_minilm_10")

sentences = [
    "How do the Stop and Delete IC Ops compare in terms of their effects on a run's state, visibility on the dashboard, resource usage, artifact preservation, and what further IC Ops can be performed on the run afterward?",
    "Get Runs Information\n-------\n\nReturns metadata about all the runs from across all :func:`run_fit()` invocations in the current experiment. \n\n.. py:function:: get_runs_info(self) -> pd.DataFrame:\n\n\t:return: A DataFrame with the following columns: run_id, status, mlflow_run_id, completed_steps, total_steps, start_chunk_id, num_chunks_visited_curr_epoch, num_epochs_completed, error, source, ended_by, warm_started_from, config (full config dictionary)\n\n\t:rtype: pandas.DataFrame\n\n**Examples:**\n\n.. code-block:: python\n\n\t# Get metadata of all runs from this experiments so far; based on SFT notebook\n\tall_runs_info = experiment.get_runs_info()\n\tall_runs_info # Screenshot of output below\n\n.. raw:: html\n\n    <img src=\"_static/get-runs-info.png\" alt=\"Outputs of get runs info\" \n         style=\"cursor: zoom-in; max-width: 100%;\" onclick=\"this.requestFullscreen()\">\n\n\n**Notes:**\n \nThis function is useful for programmatic post-processing and/or pre-processing of runs and their config knobs.\nFor instance, you can use it as part of new custom AutoML procedure to launch a new :func:`run_fit()` with new config \nknob values based on :func:`get_results()` from past :func:`run_fit()` invocations.\n\nWe plan to expand this API in the future to return other details about runs such as total runtime, GPU utilization, etc. based on feedback.",
    "The main function to launch training (including LLM fine-tuning and post-training) and evaluation for a given config group in one go. \nSee :doc:`the Multi-Config Specification page</configs>` for more details on how to construct a config group. \n\n.. py:function:: run_fit(self, param_config: Any, create_model_fn: Callable, train_dataset: Dataset, eval_dataset: Dataset, num_chunks: int, seed: int=42, num_gpus: int) -> None:\n\n\t:param param_config: A train config knob dictionary, a generated config group, or a :code:`list` of configs or config groups\n\t:type param_config: Train config-group or list as described in :doc:`the Multi-Config Specification page</configs>`\n\n\t:param create_model_fn: User-given function to create a model instance; a single cfg is passed as input by the system\n\t:type create_model_fn: Callable\n\n\t:param train_dataset: Training dataset\n\t:type train_dataset: Dataset\n\n\t:param eval_dataset: Evaluation dataset to measure eval metrics\n\t:type eval_dataset: Dataset\n\n\t:param num_chunks: Number of logical splits of data to control degree of concurrency for multi-config execution (recommended: at least 4)\n\t:type num_chunks: int\n\n\t:param seed: Seed for any randomness used in your code (default: 42)\n\t:type seed: int, optional\n\n\t:param num_gpus: Number of GPUs to use per run/config for each config represented in :code:`param_config`; overriden by any :code:`num_gpus` given in :code:`RFModelConfig` for those associated configs.\n\t:type num_gpus: int, optional\n\n\t:return: None\n\t:rtype: None\n\n**Example:**\n\n.. code-block:: python\n\n\t# Based on SFT chatbot tutorial notebook\n\t>>> experiment.run_fit(config_group, sample_create_model, train_dataset, eval_dataset, num_chunks=4, seed=42)\n\tStarted 4 worker processes successfully ...\n\n**Notes:**\n\nThis method auto-generates the ML metrics files as per user specification and auto-plots them on the dashboard.\nWithin an experiment, you can rerun :func:`run_fit()` as many times as you want. All of them \nwill be overlaid on the same plots on the ML metrics dashboard.\nNote that :func:`run_fit()` must be actively running for you to be able to use Interactive Control (IC) \nops on the dashboard.\n\nThe :code:`param_config` argument is very versatile in allowing you to construct various knob combinations \nand launch them in one go.  \nIt can be a single config dictionary, a :code:`list` of config dictionaries, a config group generator output \n(:func:`RFGridSearch()` or :func:`RFRandomSearch()` for now), or even a :code:`list` with mix of configs or \nconfig group generator outputs as its elements.\nPlease see the :doc:`the Multi-Config Specification page</search>` for more details. \n\nEach individual config is passed as input to your :func:`create_model_fn()`. Inside it you can use whatever \nknob you set in the config group, e.g., model type or name to instantiate a model accordingly. \nYou can import models from libraries such as HuggingFace transformers or load your own PyTorch checkpoints.\n\nThe :code:`num_chunks` argument is a critical one that enables you to balance a higher degree of concurrency \nyou desire for cross-config comparisons against the (relatively minor) extra swapping overhead incurred. \nWe recommend at least 4, which means you will see results for all runs on 1/4th of the data at a time.\n",
    "Normal Approximation\n^^^^^^^^^^^^^^^^^^^\n\nThis is the default strategy, and it uses the Central Limit Theorem. \nIt is suitable for most cases with non-trivial sample sizes (n > 30). \nIt provides tight intervals when the statistical assumptions hold.\n\n* For algebraic metrics:\n\n.. math::\n\n   \\text{SE}_{\\hat{p}} = \\sqrt{\\frac{\\hat{p}(1-\\hat{p})}{n}} \\times \\text{FPC}\n\n   \\text{CI} = \\hat{p} \\pm 1.96 \\cdot \\text{SE}_{\\hat{p}}\n\n\n* For distributive metrics: \n\nEstimate population total :math:`\\widehat{T} = N\\bar{X}` with \nvariance :math:`\\text{Var}(\\widehat{T}) = N^2 \\cdot \\bar{X}(1-\\bar{X})/n` (FPC-adjusted).\n\n\nWilson Score\n^^^^^^^^^^^\n\nThis strategy is better for small sample sizes or metrics near 0/1 boundaries. \nIt is more robust than Normal Approximation for extreme proportions. \n\n* For algebraic metrics:\n\n.. math::\n\n   \\text{center} = \\frac{\\hat{p} + z^2/(2n_{\\text{eff}})}{1 + z^2/n_{\\text{eff}}}\n\n   \\text{margin} = \\frac{z\\sqrt{\\hat{p}(1-\\hat{p})/n_{\\text{eff}} + z^2/(4n_{\\text{eff}}^2)}}{1 + z^2/n_{\\text{eff}}}\n\nwhere :math:`n_{\\text{eff}} = n/\\text{FPC}^2` when using FPC. \nThe Wilson confidence interval is then :math:`[\\text{center} - \\text{margin}, \\text{center} + \\text{margin}]`,\nclamped to [0, 1].\n\n* For distributive metrics, this falls back to Normal Approximation. "
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Supported Modality: Text

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/final_golden_rag_tuned_minilm_10")
# Run inference
sentences = [
    'How does the compute_metrics function differ between the run_fit() (training) pipeline and the run_evals() pipeline in terms of its signature, invocation timing, and what it receives as input?',
    'Step 5: Monitor training behaviors on ML metrics dashboard\n--------\n\n.. raw:: html\n\n    <img src="/ronit01/final_golden_rag_tuned_minilm_10/resolve/main/_static/step7.png" alt="Monitor training behaviors on ML metrics dashboard" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n\n\nStep 6: Interactive Control (IC) Ops: Stop, Clone-Modify; check their results \n-----\n\n.. raw:: html\n\n    <img src="/ronit01/final_golden_rag_tuned_minilm_10/resolve/main/_static/icop-stop.png" alt="IC Op: Stop" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n\n\n.. raw:: html\n\n    <img src="/ronit01/final_golden_rag_tuned_minilm_10/resolve/main/_static/icop-clone.png" alt="IC Op: Clone-Modify" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n\n\n.. raw:: html\n\n    <img src="/ronit01/final_golden_rag_tuned_minilm_10/resolve/main/_static/step10.png" alt="IC Op results on dashboard" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n',
    'Step 1: Install dependencies and package\n-----------------------\n\nObtain the RapidFire AI OSS package from pypi (includes all dependencies) and ensure it is installed correctly.\n\n.. important::\n\n  Requires Python 3.12+. Ensure that ``python3`` resolves to Python 3.12 before creating the venv.\n\n.. code-block:: bash\n\n   python3 --version  # must be 3.12.x\n   python3 -m venv .venv\n   source .venv/bin/activate\n\n   pip install rapidfireai\n\n   rapidfireai --version\n   # Verify it prints the following:\n   # RapidFire AI 0.14.0\n\nProvide your Hugging Face account token to access the gated Llama and Mistral models \nshowcased in the tutorial notebooks. \nIf you do not have such a token, you have two options:\n\n* Switch the :code:`model_name` in the tutorial notebook to a non-gated model from Hugging Face. Then proceed to Step 2.\n\n* Create a Hugging Face token `as explained here <https://huggingface.co/docs/hub/en/security-tokens>`_. Then request access on the following gated models\' Hugging Face pages:\n\n  * `mistralai/Mistral-7B-Instruct-v0.3 <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3>`_\n  * `meta-llama/Llama-3.1-8B-Instruct <https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct>`_\n  * `meta-llama/Llama-3.2-1B-Instruct <https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct>`_\n  \n  Headsup: the approval for the Llama models may take a few hours. Then provide your HF token in the same venv.\n\n.. code-block:: bash\n\n   source .venv/bin/activate\n   pip install "huggingface-hub[cli]"\n\n   # Replace YOUR_TOKEN with your actual HF token\n   # https://huggingface.co/docs/hub/en/security-tokens\n   hf auth login --token YOUR_TOKEN\n\n   # Due to current issue: https://github.com/huggingface/xet-core/issues/527\n   pip uninstall -y hf-xet\n\n\nFeel free to ask us on Discord if you need any help with accessing gated Hugging Face models. Unfortunately, we are not allowed to provide a publicly visible token here for your use due to Hugging Face\'s policies.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4620, 0.3238],
#         [0.4620, 1.0000, 0.5698],
#         [0.3238, 0.5698, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 208 training samples
Columns: sentence_0, sentence_1, and label
Approximate statistics based on the first 208 samples:
sentence_0 sentence_1 label
type string string float
details
min: 25 tokens
mean: 38.85 tokens
max: 70 tokens

min: 58 tokens
mean: 223.85 tokens
max: 256 tokens

min: 0.0
mean: 0.25
max: 1.0
Samples: | sentence_0 | sentence_1 | label | |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------| | How do the Stop and Delete IC Ops compare in terms of their effects on a run's state, visibility on the dashboard, resource usage, artifact preservation, and what further IC Ops can be performed on the run afterward? | online_strategy_kwargs : dict[str, Any], optional Parameters for evals online aggregation strategy. The dictionary must include the following keys: * :code:"strategy_name" (str) - Must be :code:"normal", :code:"wilson", or :code:"hoeffding". * :code:"confidence_level" (float) - Confidence level for confidence intervals on metrics. Must be in [0,1]. Default is 0.95 (95%). * :code:"use_fpc" (bool) - Whether to apply finite population correction. Default is :code:True. | 0.0 | | How does RapidFire AI's shard-based adaptive execution engine enable online aggregation of eval metrics with confidence intervals, and what specific mathematical strategies are available for computing those intervals? | Port conflicts (services already running) ---------------------------------------- If you encounter port conflicts, you can kill existing processes. .. code-block:: bash lsof -t -i:8852 | xargs kill -9 # mlflow lsof -t -i:8851 | xargs kill -9 # dispatcher lsof -t -i:8853 | xargs kill -9 # frontend server Select specific GPU(s) to use ----------------------------- Set the CUDA_VISIBLE_DEVICES environment variable BEFORE running rapidfireai start to control which GPU(s) RapidFire can see and use. .. code-block:: bash export CUDA_VISIBLE_DEVICES=2 # use GPU index 2 only rapidfireai start Multiple GPUs (example: GPUs 0 and 2): .. code-block:: bash export CUDA_VISIBLE_DEVICES=0,2 rapidfireai start From a Python script (set before importing/starting RapidFire): .. code-block:: python import os os.environ["CUDA_VISIBLE_DEVICES"] = "2" # then start your RapidFire workflow | 0.0 | | How do you set up and run an SFT fine-tuning experiment from scratch using RapidFire AI's full installation, from installing the package through launching training and monitoring results? | Step 4: Run the notebook cells
Run the cells one by one as shown in the above videos. Wait for a cell to finish before running the next. Imports Load dataset and specify train and eval partitions If you want to run the notebook faster for demo purposes, downsample the data further as per your wish. Here are some suggested reductions. You can also reduce effective batch size by reducing either or both of :code:per_device_train_batch_size and :code:gradient_accumulation_steps in the trainer configs. SFT notebook: .. code-block:: python train_dataset=dataset["train"].select(range(128)) # 128 instead of 5000 eval_dataset=dataset["train"].select(range(5000,5032)) # 5032 instead of 5200 DPO notebook: .. code-block:: python select(range(128)) # 128 instead of 500 GRPO notebook: .. code-block:: python train_dataset = get_gsm8k_questions(split="train").select(range(128)) # 128 instead of 5000 eval...</code> | <code>1.0</code> |

	sentence_0	sentence_1	label
type	string	string	float
details	min: 25 tokens mean: 38.85 tokens max: 70 tokens	min: 58 tokens mean: 223.85 tokens max: 256 tokens	min: 0.0 mean: 0.25 max: 1.0


Loss: ContrastiveLoss with these parameters:
{
    "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
    "margin": 0.5,
    "size_average": true
}



	
		
	
	
		Training Hyperparameters
	


	
		
	
	
		Non-Default Hyperparameters
	


per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 10
multi_dataset_batch_sampler: round_robin


	
		
	
	
		All Hyperparameters
	

Click to expand


do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: None
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
enable_jit_checkpoint: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
use_cpu: False
seed: 42
data_seed: None
bf16: False
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: -1
ddp_backend: None
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_for_metrics: []
eval_do_concat_batches: True
auto_find_batch_size: False
full_determinism: False
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
use_cache: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin
router_mapping: {}
learning_rate_mapping: {}




	
		
	
	
		Training Time
	


Training: 24.5 seconds


	
		
	
	
		Framework Versions
	


Python: 3.12.13
Sentence Transformers: 5.4.1
Transformers: 5.0.0
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 4.0.0
Tokenizers: 0.22.2


	
		
	
	
		Citation
	


	
		
	
	
		BibTeX
	


	
		
	
	
		Sentence Transformers
	

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}


	
		
	
	
		ContrastiveLoss
	

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
    title={Dimensionality Reduction by Learning an Invariant Mapping},
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}