Instructions to use ronit01/rag_tuned_minilm_mnr_10 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use ronit01/rag_tuned_minilm_mnr_10 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("ronit01/rag_tuned_minilm_mnr_10") sentences = [ "What are the three use case tutorials provided for RAG and context engineering, and what type of workflow does each demonstrate?", " :param rpm_limit: Rate limit for requests per minute to the OpenAI API. Used for throttling to avoid exceeding Open AI API quotas. Check the rate limit published by Open AI for details on your tier and the latest per-model limits on `this page <https://platform.openai.com/docs/guides/rate-limits>`__.\n :type rpm_limit: int\n\n :param tpm_limit: Rate limit for tokens per minute to the OpenAI API. Used for throttling to avoid exceeding API quotas. See the rate limit page above for details.\n :type tpm_limit: int", "This use case notebook features an all-closed model API workflow, with Open AI calls used for both embedding for generation. So, you do not need a GPU to run this notebook.", "Follow these steps to install RapidFire AI on your local machine or remote/cloud instance for complete functionality without limitations.\n\n\nStep 1: Install dependencies and package\n-----------------------\n\nObtain the RapidFire AI OSS package from pypi (includes all dependencies) and ensure it is installed correctly.\n\n.. important::\n\n Requires Python 3.12+. Ensure that ``python3`` resolves to Python 3.12 before creating the venv.\n\n.. code-block:: bash\n\n python3 --version # must be 3.12.x\n python3 -m venv .venv\n source .venv/bin/activate\n\n pip install rapidfireai\n\n rapidfireai --version\n # Verify it prints the following:\n # RapidFire AI 0.14.0\n\nProvide your Hugging Face account token to access the gated Llama and Mistral models \nshowcased in the tutorial notebooks. \nIf you do not have such a token, you have two options:\n\n* Switch the :code:`model_name` in the tutorial notebook to a non-gated model from Hugging Face. Then proceed to Step 2.\n\n* Create a Hugging Face token `as explained here <https://huggingface.co/docs/hub/en/security-tokens>`_. Then request access on the following gated models' Hugging Face pages:\n\n * `mistralai/Mistral-7B-Instruct-v0.3 <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3>`_\n * `meta-llama/Llama-3.1-8B-Instruct <https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct>`_\n * `meta-llama/Llama-3.2-1B-Instruct <https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct>`_\n \n Headsup: the approval for the Llama models may take a few hours. Then provide your HF token in the same venv.\n\n.. code-block:: bash\n\n source .venv/bin/activate\n pip install \"huggingface-hub[cli]\"\n\n # Replace YOUR_TOKEN with your actual HF token\n # https://huggingface.co/docs/hub/en/security-tokens\n hf auth login --token YOUR_TOKEN\n\n # Due to current issue: https://github.com/huggingface/xet-core/issues/527\n pip uninstall -y hf-xet\n\n\nFeel free to ask us on Discord if you need any help with accessing gated Hugging Face models. Unfortunately, we are not allowed to provide a publicly visible token here for your use due to Hugging Face's policies." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Supported Modality: Text
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
(1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
(2): Normalize({})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/rag_tuned_minilm_mnr_10")
# Run inference
sentences = [
"What is the difference between distributive and algebraic metrics in RapidFire AI's online aggregation for evals?",
'Types of Metrics\n-----------------------\n\nWe support 2 types of metrics based on their aggregation semantics: \n\n* **Distributive Metrics:** These are purely additive over a given set of data points. \n \n Examples: *count* of number of correct predictions; *sum* of output token lengths across queries.\n\n* **Algebraic Metrics:** These are averages or proportions over a given set of data points. They can be decomposed into components that are individually distributive. \n \n Examples: *precision*, which counts number of correct predictions and total number of data points separately and then divides them; *mean rouge-1*, which averages per-example rouge-1 values that assesses overlap of tokens between generated text and ground truth text.\n\nWhen you define an eval metric via :func:`evals.compute_metrics_fn()` and :func:`evals.accumulate_metrics_fn()`, \nyou must specify their type (algebraic or distributive) and value range as illustrated below. \nFor metrics without a type defined, they will be displayed *as is*, i.e., without projected \nestimates or confidence intervals.',
'RapidFire AI offers a browser-based dashboard to automatically visualize all ML metrics and lets \nyou control runs on the fly from there. \nOur current default dashboard is a fork of the popular OSS tool `MLflow <https://mlflow.org/>`__, \nand it inherits much of MLflow\'s native features.\nThe dashboard URI is printed when the rapidfireai server is started; open it in a browser. \n\nAs of this writing, apart from MLflow, RapidFire AI also supports \n`TensorBoard <https://www.tensorflow.org/tensorboard>`__\nand `Trackio <https://huggingface.co/docs/trackio/en/index>`__\nfor logging metrics plots. \nSpecify any one, two, or all three dashboards to use with the following server start argument. \n\n.. code-block:: bash\n\n rapidfireai start --tracking-backends [mlflow | tensorboard | trackio]\n\nAlternatively, set the dashboard using its environment variable as below in your python code/notebook:\n\n.. code-block:: python\n\n os.environ["RF_MLFLOW_ENABLED"] = "true"\n os.environ["RF_TENSORBOARD_ENABLED"] = "true"\n os.environ["RF_TRACKIO_ENABLED"] = "true"\n\nSupport for other popular dashboards such as Weights & Biases and CometML is coming soon. \nThe rest of this section explains the new features of our MLflow-fork dashboard.\nNote that these new features are not yet available on the other dashboards.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6788, 0.1914],
# [0.6788, 1.0000, 0.2678],
# [0.1914, 0.2678, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
Size: 52 training samples
Columns:
sentence_0andsentence_1Approximate statistics based on the first 52 samples:
sentence_0 sentence_1 type string string details - min: 11 tokens
- mean: 24.87 tokens
- max: 34 tokens
- min: 31 tokens
- mean: 216.15 tokens
- max: 256 tokens
Samples:
sentence_0 sentence_1 Why does RapidFire AI argue that simply downsampling the evaluation data is insufficient compared to its online aggregation approach?Why Not Just Downsample?
------------------------
One might wonder why downsampling the eval set does not suffice here.
:doc:As also explained on this page</difference>, downsampling alone has
significant disadvantages compared to the approach offered by RapidFire AI.
First, you have to decide a downsample size upfront, which is not trivial if your
eval metrics have high variance across examples. Point estimates without confidence
intervals can give false confidence in a sample. You can resample manually
over and over, but that adds manual grunt work of juggling separate samples/files.
Finally, downsampling alone does not offer you the power of IC Ops and automated
parallelization to try new configs on the fly--you'd have reimplement those manually.
RapidFire AI's online aggregation approach with IC Ops avoids all the above issues,
while also being complementary to downsampling, i.e., you can use both in
conjunction for even lower runtimes/costs.What is online aggregation in the context of RapidFire AI evals, and what three confidence interval strategies does it support?RapidFire AI transforms the status quo by adapting the powerful idea of online aggregationfrom database systems research to LLM evals. Our adaptive execution engine, :doc: as described on this page</difference>, automaticallyshards the data and processes multiple configs in parallel, one shard at a time, with efficient swapping techniques. This means you get running metric estimates with confidence intervals in real time. So, you can confidently stop poor configs earlier, clone better configs on the fly, and perform more informed exploration to reach much better eval metrics in much less time.
Example: Traditional Batch Evals vs. RapidFire AI
For instance, suppose you have an evals set with 400 queries. You decide to compare, say, 4 RAG configs in one go with RapidFire AI with number of shards set to 8. The illustration below contrasts traditional batch evals vs. RapidFire AI's approach for a simple eval metric.
.. list-table:: :widths: 50 50 :class... | |
What arguments does the RFModelConfig class accept for defining a model configuration in RapidFire AI?|RFModelConfigThis is a core class in the RapidFire AI API that abstracts multiple Hugging Face APIs under the hood to simplify and unify all model-related specifications. In particular, it unifies model loading, training configurations, and LoRA settings into one class.
It gives you flexibility to try out variations of LoRA adapter structures, training arguments for multiple control flows (SFT, DPO, and GRPO), formatting and metrics functions, and generation specifics.
Some of the arguments (knobs) here can also be :class:
Listvalued or :class:Rangevalued depending on its data type, as explained below. All this helps form the base set of knob combinations from which a config group can be produced. Also read :doc:the Multi-Config Specification page</configs>... py:class:: RFModelConfig
:param model_name: Model identifier for use with Hugging Face's :code:|AutoModel.from_pretrained(). Can be a Hugging Face model hub name (e.g., ``"Qwen/Qwen2.5-7B-Instruct...Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 10multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
do_predict: Falseprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 10max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: Nonewarmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Training Time
- Training: 6.1 seconds
Framework Versions
- Python: 3.12.13
- Sentence Transformers: 5.4.1
- Transformers: 5.0.0
- PyTorch: 2.10.0+cu128
- Accelerate: 1.13.0
- Datasets: 4.0.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}
- Downloads last month
- 7
Model tree for ronit01/rag_tuned_minilm_mnr_10
Base model
nreimers/MiniLM-L6-H384-uncased