SentenceTransformer based on VinitT/Embeddings-Trivia

This is a sentence-transformers model finetuned from VinitT/Embeddings-Trivia on the all-nli dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: VinitT/Embeddings-Trivia
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- all-nli
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'so he has overcome alcoholism at this point',
    "He's gotten stronger and has overcome alcoholism.",
    "He still is a heavy drinker and can't control it.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7603, 0.0849],
#         [0.7603, 1.0000, 0.0794],
#         [0.0849, 0.0794, 1.0000]])

Evaluation

Metrics

Triplet

Dataset: contra_eval
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.95

Training Details

Training Dataset

all-nli

Dataset: all-nli at d482672
Size: 556,850 training samples
Columns: anchor, positive, negative, and label

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative	label
type	string	string	string	int
details	min: 5 tokens mean: 19.16 tokens max: 194 tokens	min: 5 tokens mean: 11.86 tokens max: 32 tokens	min: 5 tokens mean: 12.23 tokens max: 37 tokens	1: 100.00%

Samples:

anchor	positive	negative	label
`a young girl wearing blue smiles.`	`A little girl wears blue.`	`A little girl frowns as she wears an ugly burlap sack.`	`1`
`An old man wearing a tan jacket and blue pants standing on a sidewalk with a small suitcase.`	`A man wearing a jacket and jeans holds a suitcase.`	`A young woman sits on a bench holding her purse.`	`1`
`The people are inside.`	`Two people are dancing by a red couch.`	`People walk up and down the steps in front of a church.`	`1`

Loss: custom_loss.ContradictionMarginLoss with these parameters:

{
    "margin_neutral": 0.2,
    "margin_contradiction": 0.4
}

Evaluation Dataset

all-nli

Dataset: all-nli at d482672
Size: 1,000 evaluation samples
Columns: anchor, positive, negative, and label

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative	label
type	string	string	string	int
details	min: 5 tokens mean: 18.67 tokens max: 86 tokens	min: 4 tokens mean: 11.92 tokens max: 41 tokens	min: 4 tokens mean: 12.13 tokens max: 40 tokens	1: 100.00%

Samples:

anchor	positive	negative	label
`An older man riding a bike.`	`An elderly man is biking`	`an old man is sleeping`	`1`
`The man is on a skateboard.`	`A shirtless man is doing a skateboard trick over a bike rail.`	`A man performs a bike trick on a ramp.`	`1`
`The Episcopalians are all going to hell.`	`The Episcopalians will not be going to heaven.`	`All Episcopalians will go to heaven.`	`1`

Loss: custom_loss.ContradictionMarginLoss with these parameters:

{
    "margin_neutral": 0.2,
    "margin_contradiction": 0.4
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
learning_rate: 2e-05
weight_decay: 0.01
num_train_epochs: 1
warmup_ratio: 0.1
warmup_steps: 0.1
fp16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: 0.1
warmup_steps: 0.1
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
enable_jit_checkpoint: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
use_cpu: False
seed: 42
data_seed: None
bf16: False
fp16: True
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: -1
ddp_backend: None
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_for_metrics: []
eval_do_concat_batches: True
auto_find_batch_size: False
full_determinism: False
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
use_cache: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss	contra_eval_cosine_accuracy
0.0001	1	0.2363	-	-
0.0057	50	0.1877	-	-
0.0115	100	0.1786	-	-
0.0172	150	0.1672	-	-
0.0230	200	0.1529	-	-
0.0287	250	0.1392	-	-
0.0345	300	0.1278	-	-
0.0402	350	0.1233	-	-
0.0460	400	0.1157	-	-
0.0517	450	0.1116	-	-
0.0575	500	0.1063	0.0983	0.9260
0.0632	550	0.1087	-	-
0.0690	600	0.1016	-	-
0.0747	650	0.1026	-	-
0.0805	700	0.0967	-	-
0.0862	750	0.0990	-	-
0.0919	800	0.0925	-	-
0.0977	850	0.0965	-	-
0.1034	900	0.0981	-	-
0.1092	950	0.0881	-	-
0.1149	1000	0.0920	0.0829	0.9410
0.1207	1050	0.0882	-	-
0.1264	1100	0.0839	-	-
0.1322	1150	0.0896	-	-
0.1379	1200	0.0858	-	-
0.1437	1250	0.0878	-	-
0.1494	1300	0.0857	-	-
0.1552	1350	0.0902	-	-
0.1609	1400	0.0793	-	-
0.1666	1450	0.0830	-	-
0.1724	1500	0.0827	0.0788	0.9380
0.1781	1550	0.0789	-	-
0.1839	1600	0.0834	-	-
0.1896	1650	0.0805	-	-
0.1954	1700	0.0795	-	-
0.2011	1750	0.0846	-	-
0.2069	1800	0.0822	-	-
0.2126	1850	0.0858	-	-
0.2184	1900	0.0785	-	-
0.2241	1950	0.0777	-	-
0.2299	2000	0.0746	0.0721	0.9460
0.2356	2050	0.0798	-	-
0.2414	2100	0.0798	-	-
0.2471	2150	0.0794	-	-
0.2528	2200	0.0769	-	-
0.2586	2250	0.0805	-	-
0.2643	2300	0.0782	-	-
0.2701	2350	0.0776	-	-
0.2758	2400	0.0776	-	-
0.2816	2450	0.0733	-	-
0.2873	2500	0.0750	0.0718	0.9440
0.2931	2550	0.0764	-	-
0.2988	2600	0.0775	-	-
0.3046	2650	0.0767	-	-
0.3103	2700	0.0766	-	-
0.3161	2750	0.0755	-	-
0.3218	2800	0.0752	-	-
0.3275	2850	0.0717	-	-
0.3333	2900	0.0714	-	-
0.3390	2950	0.0726	-	-
0.3448	3000	0.0751	0.0695	0.9470
0.3505	3050	0.0730	-	-
0.3563	3100	0.0733	-	-
0.3620	3150	0.0738	-	-
0.3678	3200	0.0701	-	-
0.3735	3250	0.0723	-	-
0.3793	3300	0.0759	-	-
0.3850	3350	0.0675	-	-
0.3908	3400	0.0696	-	-
0.3965	3450	0.0707	-	-
0.4023	3500	0.0705	0.0669	0.9440
0.4080	3550	0.0702	-	-
0.4137	3600	0.0716	-	-
0.4195	3650	0.0697	-	-
0.4252	3700	0.0721	-	-
0.4310	3750	0.0723	-	-
0.4367	3800	0.0741	-	-
0.4425	3850	0.0702	-	-
0.4482	3900	0.0653	-	-
0.4540	3950	0.0704	-	-
0.4597	4000	0.0718	0.0652	0.9450
0.4655	4050	0.0683	-	-
0.4712	4100	0.0719	-	-
0.4770	4150	0.0674	-	-
0.4827	4200	0.0659	-	-
0.4884	4250	0.0735	-	-
0.4942	4300	0.0737	-	-
0.4999	4350	0.0707	-	-
0.5057	4400	0.0690	-	-
0.5114	4450	0.0707	-	-
0.5172	4500	0.0696	0.0637	0.9470
0.5229	4550	0.0686	-	-
0.5287	4600	0.0710	-	-
0.5344	4650	0.0681	-	-
0.5402	4700	0.0667	-	-
0.5459	4750	0.0673	-	-
0.5517	4800	0.0618	-	-
0.5574	4850	0.0715	-	-
0.5632	4900	0.0703	-	-
0.5689	4950	0.0675	-	-
0.5746	5000	0.0715	0.0638	0.9500
0.5804	5050	0.0681	-	-
0.5861	5100	0.0628	-	-
0.5919	5150	0.0654	-	-
0.5976	5200	0.0662	-	-
0.6034	5250	0.0626	-	-
0.6091	5300	0.0660	-	-
0.6149	5350	0.0652	-	-
0.6206	5400	0.0687	-	-
0.6264	5450	0.0677	-	-
0.6321	5500	0.0683	0.0631	0.9530
0.6379	5550	0.0666	-	-
0.6436	5600	0.0663	-	-
0.6494	5650	0.0637	-	-
0.6551	5700	0.0687	-	-
0.6608	5750	0.0620	-	-
0.6666	5800	0.0664	-	-
0.6723	5850	0.0666	-	-
0.6781	5900	0.0632	-	-
0.6838	5950	0.0676	-	-
0.6896	6000	0.0638	0.0634	0.9530
0.6953	6050	0.0655	-	-
0.7011	6100	0.0651	-	-
0.7068	6150	0.0675	-	-
0.7126	6200	0.0685	-	-
0.7183	6250	0.0647	-	-
0.7241	6300	0.0609	-	-
0.7298	6350	0.0643	-	-
0.7355	6400	0.0628	-	-
0.7413	6450	0.0627	-	-
0.747	6500	0.0639	0.0621	0.954
0.7528	6550	0.0658	-	-
0.7585	6600	0.0667	-	-
0.7643	6650	0.0632	-	-
0.7700	6700	0.0616	-	-
0.7758	6750	0.0666	-	-
0.7815	6800	0.0634	-	-
0.7873	6850	0.0647	-	-
0.7930	6900	0.0644	-	-
0.7988	6950	0.0617	-	-
0.8045	7000	0.0677	0.0626	0.9510
0.8103	7050	0.0616	-	-
0.8160	7100	0.0633	-	-
0.8217	7150	0.0645	-	-
0.8275	7200	0.0656	-	-
0.8332	7250	0.0597	-	-
0.8390	7300	0.0670	-	-
0.8447	7350	0.0638	-	-
0.8505	7400	0.0641	-	-
0.8562	7450	0.0660	-	-
0.8620	7500	0.0687	0.0618	0.9490
0.8677	7550	0.0654	-	-
0.8735	7600	0.0633	-	-
0.8792	7650	0.0660	-	-
0.8850	7700	0.0674	-	-
0.8907	7750	0.0681	-	-
0.8964	7800	0.0601	-	-
0.9022	7850	0.0612	-	-
0.9079	7900	0.0626	-	-
0.9137	7950	0.0641	-	-
0.9194	8000	0.0633	0.0619	0.9470
0.9252	8050	0.0637	-	-
0.9309	8100	0.0630	-	-
0.9367	8150	0.0646	-	-
0.9424	8200	0.0648	-	-
0.9482	8250	0.0647	-	-
0.9539	8300	0.0601	-	-
0.9597	8350	0.0600	-	-
0.9654	8400	0.0668	-	-
0.9712	8450	0.0640	-	-
0.9769	8500	0.0579	0.0618	0.9500
0.9826	8550	0.0645	-	-
0.9884	8600	0.0614	-	-
0.9941	8650	0.0642	-	-
0.9999	8700	0.0652	-	-

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.2.2
Transformers: 5.0.0
PyTorch: 2.9.0+cu128
Accelerate: 1.12.0
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}