SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/final_golden_rag_tuned_minilm_10")
# Run inference
sentences = [
    'How does the compute_metrics function differ between the run_fit() (training) pipeline and the run_evals() pipeline in terms of its signature, invocation timing, and what it receives as input?',
    'Step 5: Monitor training behaviors on ML metrics dashboard\n--------\n\n.. raw:: html\n\n    <img src="/ronit01/final_golden_rag_tuned_minilm_10/resolve/main/_static/step7.png" alt="Monitor training behaviors on ML metrics dashboard" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n\n\nStep 6: Interactive Control (IC) Ops: Stop, Clone-Modify; check their results \n-----\n\n.. raw:: html\n\n    <img src="/ronit01/final_golden_rag_tuned_minilm_10/resolve/main/_static/icop-stop.png" alt="IC Op: Stop" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n\n\n.. raw:: html\n\n    <img src="/ronit01/final_golden_rag_tuned_minilm_10/resolve/main/_static/icop-clone.png" alt="IC Op: Clone-Modify" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n\n\n.. raw:: html\n\n    <img src="/ronit01/final_golden_rag_tuned_minilm_10/resolve/main/_static/step10.png" alt="IC Op results on dashboard" \n         style="cursor: zoom-in; max-width: 100%;" onclick="this.requestFullscreen()">\n',
    'Step 1: Install dependencies and package\n-----------------------\n\nObtain the RapidFire AI OSS package from pypi (includes all dependencies) and ensure it is installed correctly.\n\n.. important::\n\n  Requires Python 3.12+. Ensure that ``python3`` resolves to Python 3.12 before creating the venv.\n\n.. code-block:: bash\n\n   python3 --version  # must be 3.12.x\n   python3 -m venv .venv\n   source .venv/bin/activate\n\n   pip install rapidfireai\n\n   rapidfireai --version\n   # Verify it prints the following:\n   # RapidFire AI 0.14.0\n\nProvide your Hugging Face account token to access the gated Llama and Mistral models \nshowcased in the tutorial notebooks. \nIf you do not have such a token, you have two options:\n\n* Switch the :code:`model_name` in the tutorial notebook to a non-gated model from Hugging Face. Then proceed to Step 2.\n\n* Create a Hugging Face token `as explained here <https://huggingface.co/docs/hub/en/security-tokens>`_. Then request access on the following gated models\' Hugging Face pages:\n\n  * `mistralai/Mistral-7B-Instruct-v0.3 <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3>`_\n  * `meta-llama/Llama-3.1-8B-Instruct <https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct>`_\n  * `meta-llama/Llama-3.2-1B-Instruct <https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct>`_\n  \n  Headsup: the approval for the Llama models may take a few hours. Then provide your HF token in the same venv.\n\n.. code-block:: bash\n\n   source .venv/bin/activate\n   pip install "huggingface-hub[cli]"\n\n   # Replace YOUR_TOKEN with your actual HF token\n   # https://huggingface.co/docs/hub/en/security-tokens\n   hf auth login --token YOUR_TOKEN\n\n   # Due to current issue: https://github.com/huggingface/xet-core/issues/527\n   pip uninstall -y hf-xet\n\n\nFeel free to ask us on Discord if you need any help with accessing gated Hugging Face models. Unfortunately, we are not allowed to provide a publicly visible token here for your use due to Hugging Face\'s policies.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4620, 0.3238],
#         [0.4620, 1.0000, 0.5698],
#         [0.3238, 0.5698, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 208 training samples

  • Columns: sentence_0, sentence_1, and label

  • Approximate statistics based on the first 208 samples:

    sentence_0 sentence_1 label
    type string string float
    details
    • min: 25 tokens
    • mean: 38.85 tokens
    • max: 70 tokens
    • min: 58 tokens
    • mean: 223.85 tokens
    • max: 256 tokens
    • min: 0.0
    • mean: 0.25
    • max: 1.0
  • Samples: | sentence_0 | sentence_1 | label | |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------| | How do the Stop and Delete IC Ops compare in terms of their effects on a run's state, visibility on the dashboard, resource usage, artifact preservation, and what further IC Ops can be performed on the run afterward? | online_strategy_kwargs : dict[str, Any], optional
    Parameters for evals online aggregation strategy. The dictionary must include the following keys:

    * :code:"strategy_name" (str) - Must be :code:"normal", :code:"wilson", or :code:"hoeffding".
    * :code:"confidence_level" (float) - Confidence level for confidence intervals on metrics. Must be in [0,1]. Default is 0.95 (95%).
    * :code:"use_fpc" (bool) - Whether to apply finite population correction. Default is :code:True.
    | 0.0 | | How does RapidFire AI's shard-based adaptive execution engine enable online aggregation of eval metrics with confidence intervals, and what specific mathematical strategies are available for computing those intervals? | Port conflicts (services already running)
    ----------------------------------------

    If you encounter port conflicts, you can kill existing processes.

    .. code-block:: bash

    lsof -t -i:8852 | xargs kill -9 # mlflow
    lsof -t -i:8851 | xargs kill -9 # dispatcher
    lsof -t -i:8853 | xargs kill -9 # frontend server

    Select specific GPU(s) to use
    -----------------------------

    Set the CUDA_VISIBLE_DEVICES environment variable BEFORE running rapidfireai start to control which GPU(s) RapidFire can see and use.

    .. code-block:: bash

    export CUDA_VISIBLE_DEVICES=2 # use GPU index 2 only
    rapidfireai start

    Multiple GPUs (example: GPUs 0 and 2):

    .. code-block:: bash

    export CUDA_VISIBLE_DEVICES=0,2
    rapidfireai start

    From a Python script (set before importing/starting RapidFire):

    .. code-block:: python

    import os
    os.environ["CUDA_VISIBLE_DEVICES"] = "2"
    # then start your RapidFire workflow
    | 0.0 | | How do you set up and run an SFT fine-tuning experiment from scratch using RapidFire AI's full installation, from installing the package through launching training and monitoring results? | Step 4: Run the notebook cells

    Run the cells one by one as shown in the above videos. Wait for a cell to finish before running the next.

    • Imports

    • Load dataset and specify train and eval partitions

      If you want to run the notebook faster for demo purposes, downsample the data further as per your wish. Here are some suggested reductions. You can also reduce effective batch size by reducing either or both of :code:per_device_train_batch_size and :code:gradient_accumulation_steps in the trainer configs.

      • SFT notebook: .. code-block:: python

        train_dataset=dataset["train"].select(range(128)) # 128 instead of 5000
        eval_dataset=dataset["train"].select(range(5000,5032)) # 5032 instead of 5200
        
      • DPO notebook: .. code-block:: python

        select(range(128)) # 128 instead of 500
        
      • GRPO notebook: .. code-block:: python

        train_dataset = get_gsm8k_questions(split="train").select(range(128)) # 128 instead of 5000
        eval...</code>                                       | <code>1.0</code> |
        
  • Loss: ContrastiveLoss with these parameters:

    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.5,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Time

  • Training: 24.5 seconds

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.4.1
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
    title={Dimensionality Reduction by Learning an Invariant Mapping},
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}
Downloads last month
5
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ronit01/final_golden_rag_tuned_minilm_10

Paper for ronit01/final_golden_rag_tuned_minilm_10