SentenceTransformer based on nomic-ai/nomic-embed-text-v1.5

This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/nomic-embed-text-v1.5
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'NomicBertModel'})
  (1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'mean', 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'How is the Initial Daily Benefit (the Applicable Daily Benefit for the first policy year) determined and stated in the policy schedule?',
    'provided any such part\nexceeds a connuous period of 4 hours (aer having\nstay\ncompleted the 24 hours as above) in a non-ICU ward/room of a hospital, an\namount equal to the Applicable Daily Benefit (ADB) available under the policy\nduring that policy year shall be payable subject to benefit limits and condions\nmenonedinPara11A)andexclusionsmenonedinPara15below.\nDuring the first\nof cover commencement in respect of each insured, the\nyear\nApplicableDailyBenefitshallbetheInialDailyBenefitamountchosenbyyouand\nmenonedinthepolicySchedule.\nTheamountof DBforeachpolicyyear,aerthefirstpolicyyear,shallconsistof2parts:\nA\n\nAn arithmec addion of an amount equal to 5% (five percent) of the Inial Daily',
    'Periodwithoutanymaximumlimit.\nFor members\nsubsequently under the policy, the benefit in the first year\nincluded\nshall be equal to Inial Daily Benefit amount and thereaer the Applicable Daily\nBenefitshallincreaseasabove.\nIfanyofthememberinsuredisrequiredtostayinanIntensiveCareUnitofahospital,\nt\nsubject\nbenefit limits and\nwo mes the\nDaily\nwill be payable\nto\nApplicable\nBenefit\ncondionsmenonedinPara11A)andexclusionsmenonedinPara15below.\nDuring one period of 24 connuous hours (i.e. one day) of Hospitalisaon (aer\nhaving completed the 24 hours as above), if the said Hospitalisaon included stay\ninanIntensiveCareUnitaswellasinanyotherin-paent(non-IntensiveCareUnit)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6203, 0.6283],
#         [0.6203, 1.0000, 0.8679],
#         [0.6283, 0.8679, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5455
cosine_accuracy@3 0.7727
cosine_accuracy@5 0.9091
cosine_accuracy@10 1.0
cosine_precision@1 0.5455
cosine_precision@3 0.2576
cosine_precision@5 0.1818
cosine_precision@10 0.1
cosine_recall@1 0.5455
cosine_recall@3 0.7727
cosine_recall@5 0.9091
cosine_recall@10 1.0
cosine_ndcg@10 0.7731
cosine_mrr@10 0.7011
cosine_map@100 0.7011

Training Details

Training Dataset

Unnamed Dataset

  • Size: 20 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 20 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 19 tokens
    • mean: 29.65 tokens
    • max: 56 tokens
    • min: 44 tokens
    • mean: 173.0 tokens
    • max: 226 tokens
  • Samples:
    sentence_0 sentence_1
    Which specific benefits (e.g., Hospital Cash Benefit, Major Surgical Benefit, Day Care Procedure Benefit, etc.) are available to the insured if they are hospitalized for a continuous period of 24 hours or more? 65 years (last birthday)
    75 (last birthday)
    17 years (last birthday)
    Howlongareeachinsuredunderthispolicy?
    Each of the insured are covered for
    risks up to age (80). Children are insured up
    Health
    toage25years.

    Hospitalcashbenefit(HCB)

    MajorSurgicalBenefit(MSB)

    DayCareProcedureBenefit

    OtherSurgicalBenefit

    AmbulanceBenefit

    PremiumwaiverBenefit(PWB)
    A) HospitalCashBenefit:
    due to
    If you or any of the insured lives covered under the policy is hospitalised
    Accidental Body Injury or Sickness and the stay in hospital exceeds a connuous
    periodof24hours,thenforanyconnuousperiodof24hoursorpartthereof,
    1. Benefits offered under the plan are
    What are the four daily Hospital Cash Benefit options available when choosing the initial Daily Benefit for the LIC Jeevan Arogya policy? emergenciessha eryourpeaceofmind.
    LIC'sJeevanArogyagivesyou:

    Valuablefinancialproteconincaseofhospitalisaon,surgeryetc

    IncreasingHealthcovereveryyear

    Lumpsumbenefitirrespecveofactualmedicalcosts

    Noclaimbenefit

    Flexiblebenefitlimittochoosefrom

    Flexiblepremiumpaymentopons

    Veryeasytochooseyourplan
    Step 1
    2
    Step
    Choose the level of Health cover you need
    Work out the premium payable along with our Representave
    Step 1: Choose the level of Health cover you need:
    You can choose the amount of Inial Daily Benefit (i.e. the daily Hospital Cash Benefit
    applicableinthefirstyearofthepolicy)asperyourneedfromoutofthefollowingchoices:
    1000 per day<br> 2000 per day
    3000 per day<br> 4000 per day
    If a policyholder selects a daily Hospital Cash Benefit of 3000 per day, what will be the Initial Major Surgical Benefit sum assured? 2000 per day<br> 3000 per day
    4000 per day<br>This is the amount that will be payable to you in the event of hospitalisaon in the first<br>year on a per day basis. The Major Surgical Benefit that you will be covered for will be<br>100 mes the Inial Daily Benefit you have chosen. Thus the inial Major Surgical<br>Benefit Sum Assured will be<br>1 lakh, 2 lakh, 3 lakh, 4 lakh respecvely. Other benefits<br>
    such as Day Care Procedure Benefit, Other Surgical Benefit and Premium waiver
    Benefit (PWB) menoned below shall also be payable depending upon the daily
    HospitalCashBenefitchosen.
    Step 2: Work out the premium payable along with our representave
    Your premium will depend on your age, gender, the Health cover opon you have
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • num_train_epochs: 5
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step cosine_ndcg@10
1.0 2 0.7731

Training Time

  • Training: 1.8 minutes

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.4.1
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cpu
  • Accelerate: 1.13.0
  • Datasets: 4.8.5
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for justOneMoreTestCase/insurance-rag-embeddings

Finetuned
(31)
this model

Papers for justOneMoreTestCase/insurance-rag-embeddings

Evaluation results