SentenceTransformer based on VinitT/Embeddings-Trivia

This is a sentence-transformers model finetuned from VinitT/Embeddings-Trivia on the all-nli dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: VinitT/Embeddings-Trivia
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'so he has overcome alcoholism at this point',
    "He's gotten stronger and has overcome alcoholism.",
    "He still is a heavy drinker and can't control it.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7603, 0.0849],
#         [0.7603, 1.0000, 0.0794],
#         [0.0849, 0.0794, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.95

Training Details

Training Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 556,850 training samples
  • Columns: anchor, positive, negative, and label
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative label
    type string string string int
    details
    • min: 5 tokens
    • mean: 19.16 tokens
    • max: 194 tokens
    • min: 5 tokens
    • mean: 11.86 tokens
    • max: 32 tokens
    • min: 5 tokens
    • mean: 12.23 tokens
    • max: 37 tokens
    • 1: 100.00%
  • Samples:
    anchor positive negative label
    a young girl wearing blue smiles. A little girl wears blue. A little girl frowns as she wears an ugly burlap sack. 1
    An old man wearing a tan jacket and blue pants standing on a sidewalk with a small suitcase. A man wearing a jacket and jeans holds a suitcase. A young woman sits on a bench holding her purse. 1
    The people are inside. Two people are dancing by a red couch. People walk up and down the steps in front of a church. 1
  • Loss: custom_loss.ContradictionMarginLoss with these parameters:
    {
        "margin_neutral": 0.2,
        "margin_contradiction": 0.4
    }
    

Evaluation Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 1,000 evaluation samples
  • Columns: anchor, positive, negative, and label
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative label
    type string string string int
    details
    • min: 5 tokens
    • mean: 18.67 tokens
    • max: 86 tokens
    • min: 4 tokens
    • mean: 11.92 tokens
    • max: 41 tokens
    • min: 4 tokens
    • mean: 12.13 tokens
    • max: 40 tokens
    • 1: 100.00%
  • Samples:
    anchor positive negative label
    An older man riding a bike. An elderly man is biking an old man is sleeping 1
    The man is on a skateboard. A shirtless man is doing a skateboard trick over a bike rail. A man performs a bike trick on a ramp. 1
    The Episcopalians are all going to hell. The Episcopalians will not be going to heaven. All Episcopalians will go to heaven. 1
  • Loss: custom_loss.ContradictionMarginLoss with these parameters:
    {
        "margin_neutral": 0.2,
        "margin_contradiction": 0.4
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss contra_eval_cosine_accuracy
0.0001 1 0.2363 - -
0.0057 50 0.1877 - -
0.0115 100 0.1786 - -
0.0172 150 0.1672 - -
0.0230 200 0.1529 - -
0.0287 250 0.1392 - -
0.0345 300 0.1278 - -
0.0402 350 0.1233 - -
0.0460 400 0.1157 - -
0.0517 450 0.1116 - -
0.0575 500 0.1063 0.0983 0.9260
0.0632 550 0.1087 - -
0.0690 600 0.1016 - -
0.0747 650 0.1026 - -
0.0805 700 0.0967 - -
0.0862 750 0.0990 - -
0.0919 800 0.0925 - -
0.0977 850 0.0965 - -
0.1034 900 0.0981 - -
0.1092 950 0.0881 - -
0.1149 1000 0.0920 0.0829 0.9410
0.1207 1050 0.0882 - -
0.1264 1100 0.0839 - -
0.1322 1150 0.0896 - -
0.1379 1200 0.0858 - -
0.1437 1250 0.0878 - -
0.1494 1300 0.0857 - -
0.1552 1350 0.0902 - -
0.1609 1400 0.0793 - -
0.1666 1450 0.0830 - -
0.1724 1500 0.0827 0.0788 0.9380
0.1781 1550 0.0789 - -
0.1839 1600 0.0834 - -
0.1896 1650 0.0805 - -
0.1954 1700 0.0795 - -
0.2011 1750 0.0846 - -
0.2069 1800 0.0822 - -
0.2126 1850 0.0858 - -
0.2184 1900 0.0785 - -
0.2241 1950 0.0777 - -
0.2299 2000 0.0746 0.0721 0.9460
0.2356 2050 0.0798 - -
0.2414 2100 0.0798 - -
0.2471 2150 0.0794 - -
0.2528 2200 0.0769 - -
0.2586 2250 0.0805 - -
0.2643 2300 0.0782 - -
0.2701 2350 0.0776 - -
0.2758 2400 0.0776 - -
0.2816 2450 0.0733 - -
0.2873 2500 0.0750 0.0718 0.9440
0.2931 2550 0.0764 - -
0.2988 2600 0.0775 - -
0.3046 2650 0.0767 - -
0.3103 2700 0.0766 - -
0.3161 2750 0.0755 - -
0.3218 2800 0.0752 - -
0.3275 2850 0.0717 - -
0.3333 2900 0.0714 - -
0.3390 2950 0.0726 - -
0.3448 3000 0.0751 0.0695 0.9470
0.3505 3050 0.0730 - -
0.3563 3100 0.0733 - -
0.3620 3150 0.0738 - -
0.3678 3200 0.0701 - -
0.3735 3250 0.0723 - -
0.3793 3300 0.0759 - -
0.3850 3350 0.0675 - -
0.3908 3400 0.0696 - -
0.3965 3450 0.0707 - -
0.4023 3500 0.0705 0.0669 0.9440
0.4080 3550 0.0702 - -
0.4137 3600 0.0716 - -
0.4195 3650 0.0697 - -
0.4252 3700 0.0721 - -
0.4310 3750 0.0723 - -
0.4367 3800 0.0741 - -
0.4425 3850 0.0702 - -
0.4482 3900 0.0653 - -
0.4540 3950 0.0704 - -
0.4597 4000 0.0718 0.0652 0.9450
0.4655 4050 0.0683 - -
0.4712 4100 0.0719 - -
0.4770 4150 0.0674 - -
0.4827 4200 0.0659 - -
0.4884 4250 0.0735 - -
0.4942 4300 0.0737 - -
0.4999 4350 0.0707 - -
0.5057 4400 0.0690 - -
0.5114 4450 0.0707 - -
0.5172 4500 0.0696 0.0637 0.9470
0.5229 4550 0.0686 - -
0.5287 4600 0.0710 - -
0.5344 4650 0.0681 - -
0.5402 4700 0.0667 - -
0.5459 4750 0.0673 - -
0.5517 4800 0.0618 - -
0.5574 4850 0.0715 - -
0.5632 4900 0.0703 - -
0.5689 4950 0.0675 - -
0.5746 5000 0.0715 0.0638 0.9500
0.5804 5050 0.0681 - -
0.5861 5100 0.0628 - -
0.5919 5150 0.0654 - -
0.5976 5200 0.0662 - -
0.6034 5250 0.0626 - -
0.6091 5300 0.0660 - -
0.6149 5350 0.0652 - -
0.6206 5400 0.0687 - -
0.6264 5450 0.0677 - -
0.6321 5500 0.0683 0.0631 0.9530
0.6379 5550 0.0666 - -
0.6436 5600 0.0663 - -
0.6494 5650 0.0637 - -
0.6551 5700 0.0687 - -
0.6608 5750 0.0620 - -
0.6666 5800 0.0664 - -
0.6723 5850 0.0666 - -
0.6781 5900 0.0632 - -
0.6838 5950 0.0676 - -
0.6896 6000 0.0638 0.0634 0.9530
0.6953 6050 0.0655 - -
0.7011 6100 0.0651 - -
0.7068 6150 0.0675 - -
0.7126 6200 0.0685 - -
0.7183 6250 0.0647 - -
0.7241 6300 0.0609 - -
0.7298 6350 0.0643 - -
0.7355 6400 0.0628 - -
0.7413 6450 0.0627 - -
0.747 6500 0.0639 0.0621 0.954
0.7528 6550 0.0658 - -
0.7585 6600 0.0667 - -
0.7643 6650 0.0632 - -
0.7700 6700 0.0616 - -
0.7758 6750 0.0666 - -
0.7815 6800 0.0634 - -
0.7873 6850 0.0647 - -
0.7930 6900 0.0644 - -
0.7988 6950 0.0617 - -
0.8045 7000 0.0677 0.0626 0.9510
0.8103 7050 0.0616 - -
0.8160 7100 0.0633 - -
0.8217 7150 0.0645 - -
0.8275 7200 0.0656 - -
0.8332 7250 0.0597 - -
0.8390 7300 0.0670 - -
0.8447 7350 0.0638 - -
0.8505 7400 0.0641 - -
0.8562 7450 0.0660 - -
0.8620 7500 0.0687 0.0618 0.9490
0.8677 7550 0.0654 - -
0.8735 7600 0.0633 - -
0.8792 7650 0.0660 - -
0.8850 7700 0.0674 - -
0.8907 7750 0.0681 - -
0.8964 7800 0.0601 - -
0.9022 7850 0.0612 - -
0.9079 7900 0.0626 - -
0.9137 7950 0.0641 - -
0.9194 8000 0.0633 0.0619 0.9470
0.9252 8050 0.0637 - -
0.9309 8100 0.0630 - -
0.9367 8150 0.0646 - -
0.9424 8200 0.0648 - -
0.9482 8250 0.0647 - -
0.9539 8300 0.0601 - -
0.9597 8350 0.0600 - -
0.9654 8400 0.0668 - -
0.9712 8450 0.0640 - -
0.9769 8500 0.0579 0.0618 0.9500
0.9826 8550 0.0645 - -
0.9884 8600 0.0614 - -
0.9941 8650 0.0642 - -
0.9999 8700 0.0652 - -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.2
  • Transformers: 5.0.0
  • PyTorch: 2.9.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
22
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VinitT/Embeddings-NLI-ContradictionMargin

Finetuned
(1)
this model

Dataset used to train VinitT/Embeddings-NLI-ContradictionMargin

Paper for VinitT/Embeddings-NLI-ContradictionMargin

Evaluation results