SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the trivia-qa-triplet dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence_transformers_model_id")
sentences = [
'In the RAF, what is the rank immediately above Squadron Leader?',
'Squadron leader Squadron leader Squadron leader (Sqn Ldr in the RAF ; SQNLDR in the RAAF and RNZAF; formerly sometimes S/L in all services) is a commissioned rank in the Royal Air Force and the air forces of many countries which have historical British influence. It is also sometimes used as the English translation of an equivalent rank in countries which have a non-English air force-specific rank structure. An air force squadron leader ranks above flight lieutenant and immediately below wing commander and it is the most junior of the senior officer ranks. The air force rank of squadron leader has a',
'Squadron leader RAF used major as the equivalent rank to squadron leader. Royal Naval Air Service lieutenant-commanders and Royal Flying Corps majors on 31 March 1918 became RAF majors on 1 April 1918. On 31 August 1919, the RAF rank of major was superseded by squadron leader which has remained in continuous usage ever since. Promotion to squadron leader is strictly on merit, and requires the individual to be appointed to a Career Commission, which will see them remain in the RAF until retirement or voluntary resignation. Before the Second World War, a squadron leader commanded a squadron of aircraft. Today, however,',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Evaluation
Metrics
Triplet
| Metric |
Value |
| cosine_accuracy |
0.834 |
Training Details
Training Dataset
trivia-qa-triplet
Evaluation Dataset
trivia-qa-triplet
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
learning_rate: 2e-05
weight_decay: 0.01
num_train_epochs: 4
warmup_ratio: 0.1
warmup_steps: 0.1
fp16: True
load_best_model_at_end: True
All Hyperparameters
Click to expand
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 4
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: 0.1
warmup_steps: 0.1
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
enable_jit_checkpoint: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
use_cpu: False
seed: 42
data_seed: None
bf16: False
fp16: True
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: -1
ddp_backend: None
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_for_metrics: []
eval_do_concat_batches: True
auto_find_batch_size: False
full_determinism: False
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
use_cache: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
| Epoch |
Step |
Training Loss |
Validation Loss |
trivia_qa_eval_cosine_accuracy |
| 0.0011 |
1 |
0.7126 |
- |
- |
| 0.0539 |
50 |
0.7164 |
- |
- |
| 0.1079 |
100 |
0.7126 |
- |
- |
| 0.1618 |
150 |
0.6888 |
- |
- |
| 0.2157 |
200 |
0.6802 |
- |
- |
| 0.2697 |
250 |
0.6422 |
- |
- |
| 0.3236 |
300 |
0.6562 |
- |
- |
| 0.3776 |
350 |
0.6356 |
- |
- |
| 0.4315 |
400 |
0.6532 |
- |
- |
| 0.4854 |
450 |
0.6106 |
- |
- |
| 0.5394 |
500 |
0.6104 |
0.5472 |
0.7970 |
| 0.5933 |
550 |
0.6301 |
- |
- |
| 0.6472 |
600 |
0.6259 |
- |
- |
| 0.7012 |
650 |
0.5759 |
- |
- |
| 0.7551 |
700 |
0.6089 |
- |
- |
| 0.8091 |
750 |
0.5835 |
- |
- |
| 0.8630 |
800 |
0.5890 |
- |
- |
| 0.9169 |
850 |
0.5577 |
- |
- |
| 0.9709 |
900 |
0.5569 |
- |
- |
| 1.0248 |
950 |
0.5427 |
- |
- |
| 1.0787 |
1000 |
0.4698 |
0.5046 |
0.8190 |
| 1.1327 |
1050 |
0.4662 |
- |
- |
| 1.1866 |
1100 |
0.4634 |
- |
- |
| 1.2406 |
1150 |
0.4597 |
- |
- |
| 1.2945 |
1200 |
0.4585 |
- |
- |
| 1.3484 |
1250 |
0.5140 |
- |
- |
| 1.4024 |
1300 |
0.4542 |
- |
- |
| 1.4563 |
1350 |
0.4579 |
- |
- |
| 1.5102 |
1400 |
0.4910 |
- |
- |
| 1.5642 |
1450 |
0.5067 |
- |
- |
| 1.6181 |
1500 |
0.4800 |
0.4875 |
0.8300 |
| 1.6721 |
1550 |
0.4638 |
- |
- |
| 1.7260 |
1600 |
0.4760 |
- |
- |
| 1.7799 |
1650 |
0.4699 |
- |
- |
| 1.8339 |
1700 |
0.4912 |
- |
- |
| 1.8878 |
1750 |
0.4726 |
- |
- |
| 1.9417 |
1800 |
0.4764 |
- |
- |
| 1.9957 |
1850 |
0.4802 |
- |
- |
| 2.0496 |
1900 |
0.3941 |
- |
- |
| 2.1036 |
1950 |
0.3991 |
- |
- |
| 2.1575 |
2000 |
0.4114 |
0.4734 |
0.838 |
| 2.2114 |
2050 |
0.3981 |
- |
- |
| 2.2654 |
2100 |
0.4023 |
- |
- |
| 2.3193 |
2150 |
0.3932 |
- |
- |
| 2.3732 |
2200 |
0.3887 |
- |
- |
| 2.4272 |
2250 |
0.3894 |
- |
- |
| 2.4811 |
2300 |
0.3858 |
- |
- |
| 2.5351 |
2350 |
0.3907 |
- |
- |
| 2.5890 |
2400 |
0.3934 |
- |
- |
| 2.6429 |
2450 |
0.3871 |
- |
- |
| 2.6969 |
2500 |
0.3763 |
0.4681 |
0.8310 |
| 2.7508 |
2550 |
0.3997 |
- |
- |
| 2.8047 |
2600 |
0.3941 |
- |
- |
| 2.8587 |
2650 |
0.3884 |
- |
- |
| 2.9126 |
2700 |
0.3771 |
- |
- |
| 2.9666 |
2750 |
0.4168 |
- |
- |
| 3.0205 |
2800 |
0.3722 |
- |
- |
| 3.0744 |
2850 |
0.3565 |
- |
- |
| 3.1284 |
2900 |
0.3499 |
- |
- |
| 3.1823 |
2950 |
0.3428 |
- |
- |
| 3.2362 |
3000 |
0.3583 |
0.4669 |
0.8320 |
| 3.2902 |
3050 |
0.3444 |
- |
- |
| 3.3441 |
3100 |
0.3252 |
- |
- |
| 3.3981 |
3150 |
0.3563 |
- |
- |
| 3.4520 |
3200 |
0.3465 |
- |
- |
| 3.5059 |
3250 |
0.3328 |
- |
- |
| 3.5599 |
3300 |
0.3438 |
- |
- |
| 3.6138 |
3350 |
0.3330 |
- |
- |
| 3.6677 |
3400 |
0.3567 |
- |
- |
| 3.7217 |
3450 |
0.3462 |
- |
- |
| 3.7756 |
3500 |
0.3435 |
0.4639 |
0.8340 |
| 3.8296 |
3550 |
0.3532 |
- |
- |
| 3.8835 |
3600 |
0.3480 |
- |
- |
| 3.9374 |
3650 |
0.3361 |
- |
- |
| 3.9914 |
3700 |
0.3628 |
- |
- |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.2.2
- Transformers: 5.0.0
- PyTorch: 2.9.0+cu128
- Accelerate: 1.12.0
- Datasets: 4.0.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
from sentence_transformers import SentenceTransformer model = SentenceTransformer("VinitT/Embeddings-Trivia") sentences = [ "Johnny Depp plays policeman Ichabod Crane in which 1999 film?", "Mythology in France epics and fairy tales as part of deeply embedded spiritual allegories and mythological archetypes: Mythology in France The mythologies in present-day France encompass the mythology of the Gauls, Franks, Normans, Bretons, and other peoples living in France, those ancient stories about divine or heroic beings that these particular cultures believed to be true and that often use supernatural events or characters to explain the nature of the universe and humanity. French mythology is listed for each culture. Bretons are a subset of the celtics that adopted Christianity. Celtic cosmology predominates their mythology: Gauls were another subset of Celtic people. Celtic", "Johnny Depp in a snuff film in exchange for money for his family. Depp was a fan and friend of writer Hunter S. Thompson, and played his alter ego Raoul Duke in \"Fear and Loathing in Las Vegas\" (1998), Terry Gilliam's film adaptation of Thompson's pseudobiographical novel of the same name. Depp's next venture with Burton was the period film \"Sleepy Hollow\" (1999), in which he played Ichabod Crane opposite Christina Ricci and Christopher Walken. For his performance, Depp took inspiration from Angela Lansbury, Roddy McDowall and Basil Rathbone. He stated that he \"always thought of Ichabod as a very delicate, fragile", "Ichabod Crane Kinderhook town school district (Ichabod Crane Central School District) is also named for the Irving character. It is claimed by many in Tarrytown that Samuel Youngs is the original from whom Irving drew his character of Ichabod Crane\". Author Gary Denis asserts that while the character of Ichabod Crane is loosely based on Kinderhook Schoolmaster, Jesse Merwin, it may possibly include elements from Samuel Youngs' life. Irving's characters drive the story and are most memorable because of his detail in describing each. He says of Ichabod Crane (the main character), 'He was tall, but exceedingly lank, with narrow shoulders, long" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4]