SentenceTransformer based on VinitT/Embeddings-Trivia
This is a sentence-transformers model finetuned from VinitT/Embeddings-Trivia on the all-nli dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: VinitT/Embeddings-Trivia
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence_transformers_model_id")
sentences = [
'so he has overcome alcoholism at this point',
"He's gotten stronger and has overcome alcoholism.",
"He still is a heavy drinker and can't control it.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Evaluation
Metrics
Triplet
| Metric |
Value |
| cosine_accuracy |
0.95 |
Training Details
Training Dataset
all-nli
Evaluation Dataset
all-nli
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
learning_rate: 2e-05
weight_decay: 0.01
num_train_epochs: 1
warmup_ratio: 0.1
warmup_steps: 0.1
fp16: True
load_best_model_at_end: True
All Hyperparameters
Click to expand
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: 0.1
warmup_steps: 0.1
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
enable_jit_checkpoint: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
use_cpu: False
seed: 42
data_seed: None
bf16: False
fp16: True
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: -1
ddp_backend: None
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_for_metrics: []
eval_do_concat_batches: True
auto_find_batch_size: False
full_determinism: False
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
use_cache: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
Click to expand
| Epoch |
Step |
Training Loss |
Validation Loss |
contra_eval_cosine_accuracy |
| 0.0001 |
1 |
0.2363 |
- |
- |
| 0.0057 |
50 |
0.1877 |
- |
- |
| 0.0115 |
100 |
0.1786 |
- |
- |
| 0.0172 |
150 |
0.1672 |
- |
- |
| 0.0230 |
200 |
0.1529 |
- |
- |
| 0.0287 |
250 |
0.1392 |
- |
- |
| 0.0345 |
300 |
0.1278 |
- |
- |
| 0.0402 |
350 |
0.1233 |
- |
- |
| 0.0460 |
400 |
0.1157 |
- |
- |
| 0.0517 |
450 |
0.1116 |
- |
- |
| 0.0575 |
500 |
0.1063 |
0.0983 |
0.9260 |
| 0.0632 |
550 |
0.1087 |
- |
- |
| 0.0690 |
600 |
0.1016 |
- |
- |
| 0.0747 |
650 |
0.1026 |
- |
- |
| 0.0805 |
700 |
0.0967 |
- |
- |
| 0.0862 |
750 |
0.0990 |
- |
- |
| 0.0919 |
800 |
0.0925 |
- |
- |
| 0.0977 |
850 |
0.0965 |
- |
- |
| 0.1034 |
900 |
0.0981 |
- |
- |
| 0.1092 |
950 |
0.0881 |
- |
- |
| 0.1149 |
1000 |
0.0920 |
0.0829 |
0.9410 |
| 0.1207 |
1050 |
0.0882 |
- |
- |
| 0.1264 |
1100 |
0.0839 |
- |
- |
| 0.1322 |
1150 |
0.0896 |
- |
- |
| 0.1379 |
1200 |
0.0858 |
- |
- |
| 0.1437 |
1250 |
0.0878 |
- |
- |
| 0.1494 |
1300 |
0.0857 |
- |
- |
| 0.1552 |
1350 |
0.0902 |
- |
- |
| 0.1609 |
1400 |
0.0793 |
- |
- |
| 0.1666 |
1450 |
0.0830 |
- |
- |
| 0.1724 |
1500 |
0.0827 |
0.0788 |
0.9380 |
| 0.1781 |
1550 |
0.0789 |
- |
- |
| 0.1839 |
1600 |
0.0834 |
- |
- |
| 0.1896 |
1650 |
0.0805 |
- |
- |
| 0.1954 |
1700 |
0.0795 |
- |
- |
| 0.2011 |
1750 |
0.0846 |
- |
- |
| 0.2069 |
1800 |
0.0822 |
- |
- |
| 0.2126 |
1850 |
0.0858 |
- |
- |
| 0.2184 |
1900 |
0.0785 |
- |
- |
| 0.2241 |
1950 |
0.0777 |
- |
- |
| 0.2299 |
2000 |
0.0746 |
0.0721 |
0.9460 |
| 0.2356 |
2050 |
0.0798 |
- |
- |
| 0.2414 |
2100 |
0.0798 |
- |
- |
| 0.2471 |
2150 |
0.0794 |
- |
- |
| 0.2528 |
2200 |
0.0769 |
- |
- |
| 0.2586 |
2250 |
0.0805 |
- |
- |
| 0.2643 |
2300 |
0.0782 |
- |
- |
| 0.2701 |
2350 |
0.0776 |
- |
- |
| 0.2758 |
2400 |
0.0776 |
- |
- |
| 0.2816 |
2450 |
0.0733 |
- |
- |
| 0.2873 |
2500 |
0.0750 |
0.0718 |
0.9440 |
| 0.2931 |
2550 |
0.0764 |
- |
- |
| 0.2988 |
2600 |
0.0775 |
- |
- |
| 0.3046 |
2650 |
0.0767 |
- |
- |
| 0.3103 |
2700 |
0.0766 |
- |
- |
| 0.3161 |
2750 |
0.0755 |
- |
- |
| 0.3218 |
2800 |
0.0752 |
- |
- |
| 0.3275 |
2850 |
0.0717 |
- |
- |
| 0.3333 |
2900 |
0.0714 |
- |
- |
| 0.3390 |
2950 |
0.0726 |
- |
- |
| 0.3448 |
3000 |
0.0751 |
0.0695 |
0.9470 |
| 0.3505 |
3050 |
0.0730 |
- |
- |
| 0.3563 |
3100 |
0.0733 |
- |
- |
| 0.3620 |
3150 |
0.0738 |
- |
- |
| 0.3678 |
3200 |
0.0701 |
- |
- |
| 0.3735 |
3250 |
0.0723 |
- |
- |
| 0.3793 |
3300 |
0.0759 |
- |
- |
| 0.3850 |
3350 |
0.0675 |
- |
- |
| 0.3908 |
3400 |
0.0696 |
- |
- |
| 0.3965 |
3450 |
0.0707 |
- |
- |
| 0.4023 |
3500 |
0.0705 |
0.0669 |
0.9440 |
| 0.4080 |
3550 |
0.0702 |
- |
- |
| 0.4137 |
3600 |
0.0716 |
- |
- |
| 0.4195 |
3650 |
0.0697 |
- |
- |
| 0.4252 |
3700 |
0.0721 |
- |
- |
| 0.4310 |
3750 |
0.0723 |
- |
- |
| 0.4367 |
3800 |
0.0741 |
- |
- |
| 0.4425 |
3850 |
0.0702 |
- |
- |
| 0.4482 |
3900 |
0.0653 |
- |
- |
| 0.4540 |
3950 |
0.0704 |
- |
- |
| 0.4597 |
4000 |
0.0718 |
0.0652 |
0.9450 |
| 0.4655 |
4050 |
0.0683 |
- |
- |
| 0.4712 |
4100 |
0.0719 |
- |
- |
| 0.4770 |
4150 |
0.0674 |
- |
- |
| 0.4827 |
4200 |
0.0659 |
- |
- |
| 0.4884 |
4250 |
0.0735 |
- |
- |
| 0.4942 |
4300 |
0.0737 |
- |
- |
| 0.4999 |
4350 |
0.0707 |
- |
- |
| 0.5057 |
4400 |
0.0690 |
- |
- |
| 0.5114 |
4450 |
0.0707 |
- |
- |
| 0.5172 |
4500 |
0.0696 |
0.0637 |
0.9470 |
| 0.5229 |
4550 |
0.0686 |
- |
- |
| 0.5287 |
4600 |
0.0710 |
- |
- |
| 0.5344 |
4650 |
0.0681 |
- |
- |
| 0.5402 |
4700 |
0.0667 |
- |
- |
| 0.5459 |
4750 |
0.0673 |
- |
- |
| 0.5517 |
4800 |
0.0618 |
- |
- |
| 0.5574 |
4850 |
0.0715 |
- |
- |
| 0.5632 |
4900 |
0.0703 |
- |
- |
| 0.5689 |
4950 |
0.0675 |
- |
- |
| 0.5746 |
5000 |
0.0715 |
0.0638 |
0.9500 |
| 0.5804 |
5050 |
0.0681 |
- |
- |
| 0.5861 |
5100 |
0.0628 |
- |
- |
| 0.5919 |
5150 |
0.0654 |
- |
- |
| 0.5976 |
5200 |
0.0662 |
- |
- |
| 0.6034 |
5250 |
0.0626 |
- |
- |
| 0.6091 |
5300 |
0.0660 |
- |
- |
| 0.6149 |
5350 |
0.0652 |
- |
- |
| 0.6206 |
5400 |
0.0687 |
- |
- |
| 0.6264 |
5450 |
0.0677 |
- |
- |
| 0.6321 |
5500 |
0.0683 |
0.0631 |
0.9530 |
| 0.6379 |
5550 |
0.0666 |
- |
- |
| 0.6436 |
5600 |
0.0663 |
- |
- |
| 0.6494 |
5650 |
0.0637 |
- |
- |
| 0.6551 |
5700 |
0.0687 |
- |
- |
| 0.6608 |
5750 |
0.0620 |
- |
- |
| 0.6666 |
5800 |
0.0664 |
- |
- |
| 0.6723 |
5850 |
0.0666 |
- |
- |
| 0.6781 |
5900 |
0.0632 |
- |
- |
| 0.6838 |
5950 |
0.0676 |
- |
- |
| 0.6896 |
6000 |
0.0638 |
0.0634 |
0.9530 |
| 0.6953 |
6050 |
0.0655 |
- |
- |
| 0.7011 |
6100 |
0.0651 |
- |
- |
| 0.7068 |
6150 |
0.0675 |
- |
- |
| 0.7126 |
6200 |
0.0685 |
- |
- |
| 0.7183 |
6250 |
0.0647 |
- |
- |
| 0.7241 |
6300 |
0.0609 |
- |
- |
| 0.7298 |
6350 |
0.0643 |
- |
- |
| 0.7355 |
6400 |
0.0628 |
- |
- |
| 0.7413 |
6450 |
0.0627 |
- |
- |
| 0.747 |
6500 |
0.0639 |
0.0621 |
0.954 |
| 0.7528 |
6550 |
0.0658 |
- |
- |
| 0.7585 |
6600 |
0.0667 |
- |
- |
| 0.7643 |
6650 |
0.0632 |
- |
- |
| 0.7700 |
6700 |
0.0616 |
- |
- |
| 0.7758 |
6750 |
0.0666 |
- |
- |
| 0.7815 |
6800 |
0.0634 |
- |
- |
| 0.7873 |
6850 |
0.0647 |
- |
- |
| 0.7930 |
6900 |
0.0644 |
- |
- |
| 0.7988 |
6950 |
0.0617 |
- |
- |
| 0.8045 |
7000 |
0.0677 |
0.0626 |
0.9510 |
| 0.8103 |
7050 |
0.0616 |
- |
- |
| 0.8160 |
7100 |
0.0633 |
- |
- |
| 0.8217 |
7150 |
0.0645 |
- |
- |
| 0.8275 |
7200 |
0.0656 |
- |
- |
| 0.8332 |
7250 |
0.0597 |
- |
- |
| 0.8390 |
7300 |
0.0670 |
- |
- |
| 0.8447 |
7350 |
0.0638 |
- |
- |
| 0.8505 |
7400 |
0.0641 |
- |
- |
| 0.8562 |
7450 |
0.0660 |
- |
- |
| 0.8620 |
7500 |
0.0687 |
0.0618 |
0.9490 |
| 0.8677 |
7550 |
0.0654 |
- |
- |
| 0.8735 |
7600 |
0.0633 |
- |
- |
| 0.8792 |
7650 |
0.0660 |
- |
- |
| 0.8850 |
7700 |
0.0674 |
- |
- |
| 0.8907 |
7750 |
0.0681 |
- |
- |
| 0.8964 |
7800 |
0.0601 |
- |
- |
| 0.9022 |
7850 |
0.0612 |
- |
- |
| 0.9079 |
7900 |
0.0626 |
- |
- |
| 0.9137 |
7950 |
0.0641 |
- |
- |
| 0.9194 |
8000 |
0.0633 |
0.0619 |
0.9470 |
| 0.9252 |
8050 |
0.0637 |
- |
- |
| 0.9309 |
8100 |
0.0630 |
- |
- |
| 0.9367 |
8150 |
0.0646 |
- |
- |
| 0.9424 |
8200 |
0.0648 |
- |
- |
| 0.9482 |
8250 |
0.0647 |
- |
- |
| 0.9539 |
8300 |
0.0601 |
- |
- |
| 0.9597 |
8350 |
0.0600 |
- |
- |
| 0.9654 |
8400 |
0.0668 |
- |
- |
| 0.9712 |
8450 |
0.0640 |
- |
- |
| 0.9769 |
8500 |
0.0579 |
0.0618 |
0.9500 |
| 0.9826 |
8550 |
0.0645 |
- |
- |
| 0.9884 |
8600 |
0.0614 |
- |
- |
| 0.9941 |
8650 |
0.0642 |
- |
- |
| 0.9999 |
8700 |
0.0652 |
- |
- |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.2.2
- Transformers: 5.0.0
- PyTorch: 2.9.0+cu128
- Accelerate: 1.12.0
- Datasets: 4.0.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}