SentenceTransformer based on distilbert/distilbert-base-uncased
This is a sentence-transformers model finetuned from distilbert/distilbert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: distilbert/distilbert-base-uncased
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence_transformers_model_id")
sentences = [
'T ENGINE TRANS TOP LAT 90 Deg Front 2025 U717 G-S',
'T R F ACTIVE VENT SQUIB VOLT 90 Deg Front 2021 P702 VOLTS',
'T ENGINE TRANS TOP LAT 30 Deg Front Angular Left 2020 P558 G-S',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
Evaluation
Metrics
Semantic Similarity
| Metric |
Value |
| pearson_cosine |
0.4518 |
| spearman_cosine |
0.4762 |
| pearson_manhattan |
0.4253 |
| spearman_manhattan |
0.4638 |
| pearson_euclidean |
0.4262 |
| spearman_euclidean |
0.4652 |
| pearson_dot |
0.3898 |
| spearman_dot |
0.374 |
| pearson_max |
0.4518 |
| spearman_max |
0.4762 |
Semantic Similarity
| Metric |
Value |
| pearson_cosine |
0.4412 |
| spearman_cosine |
0.4671 |
| pearson_manhattan |
0.4156 |
| spearman_manhattan |
0.456 |
| pearson_euclidean |
0.4167 |
| spearman_euclidean |
0.4575 |
| pearson_dot |
0.3753 |
| spearman_dot |
0.3629 |
| pearson_max |
0.4412 |
| spearman_max |
0.4671 |
Training Details
Training Dataset
Unnamed Dataset
Evaluation Dataset
Unnamed Dataset
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 3e-05
num_train_epochs: 4
warmup_ratio: 0.1
fp16: True
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 3e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 4
max_steps: -1
lr_scheduler_type: linear
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 4
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: True
dataloader_num_workers: 0
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: False
include_tokens_per_second: False
neftune_noise_alpha: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
Training Logs
Click to expand
| Epoch |
Step |
Training Loss |
loss |
sts-dev_spearman_cosine |
| 0.0317 |
1000 |
6.3069 |
- |
- |
| 0.0634 |
2000 |
6.1793 |
- |
- |
| 0.0950 |
3000 |
6.1607 |
- |
- |
| 0.1267 |
4000 |
6.1512 |
- |
- |
| 0.1584 |
5000 |
6.1456 |
- |
- |
| 0.1901 |
6000 |
6.1419 |
- |
- |
| 0.2218 |
7000 |
6.1398 |
- |
- |
| 0.2534 |
8000 |
6.1377 |
- |
- |
| 0.2851 |
9000 |
6.1352 |
- |
- |
| 0.3168 |
10000 |
6.1338 |
- |
- |
| 0.3485 |
11000 |
6.1332 |
- |
- |
| 0.3801 |
12000 |
6.1309 |
- |
- |
| 0.4118 |
13000 |
6.1315 |
- |
- |
| 0.4435 |
14000 |
6.1283 |
- |
- |
| 0.4752 |
15000 |
6.129 |
- |
- |
| 0.5069 |
16000 |
6.1271 |
- |
- |
| 0.5385 |
17000 |
6.1265 |
- |
- |
| 0.5702 |
18000 |
6.1238 |
- |
- |
| 0.6019 |
19000 |
6.1234 |
- |
- |
| 0.6336 |
20000 |
6.1225 |
- |
- |
| 0.6653 |
21000 |
6.1216 |
- |
- |
| 0.6969 |
22000 |
6.1196 |
- |
- |
| 0.7286 |
23000 |
6.1198 |
- |
- |
| 0.7603 |
24000 |
6.1178 |
- |
- |
| 0.7920 |
25000 |
6.117 |
- |
- |
| 0.8236 |
26000 |
6.1167 |
- |
- |
| 0.8553 |
27000 |
6.1165 |
- |
- |
| 0.8870 |
28000 |
6.1149 |
- |
- |
| 0.9187 |
29000 |
6.1146 |
- |
- |
| 0.9504 |
30000 |
6.113 |
- |
- |
| 0.9820 |
31000 |
6.1143 |
- |
- |
| 1.0 |
31567 |
- |
6.1150 |
0.4829 |
| 1.0137 |
32000 |
6.1115 |
- |
- |
| 1.0454 |
33000 |
6.111 |
- |
- |
| 1.0771 |
34000 |
6.1091 |
- |
- |
| 1.1088 |
35000 |
6.1094 |
- |
- |
| 1.1404 |
36000 |
6.1078 |
- |
- |
| 1.1721 |
37000 |
6.1095 |
- |
- |
| 1.2038 |
38000 |
6.106 |
- |
- |
| 1.2355 |
39000 |
6.1071 |
- |
- |
| 1.2671 |
40000 |
6.1073 |
- |
- |
| 1.2988 |
41000 |
6.1064 |
- |
- |
| 1.3305 |
42000 |
6.1047 |
- |
- |
| 1.3622 |
43000 |
6.1054 |
- |
- |
| 1.3939 |
44000 |
6.1048 |
- |
- |
| 1.4255 |
45000 |
6.1053 |
- |
- |
| 1.4572 |
46000 |
6.1058 |
- |
- |
| 1.4889 |
47000 |
6.1037 |
- |
- |
| 1.5206 |
48000 |
6.1041 |
- |
- |
| 1.5523 |
49000 |
6.1023 |
- |
- |
| 1.5839 |
50000 |
6.1018 |
- |
- |
| 1.6156 |
51000 |
6.104 |
- |
- |
| 1.6473 |
52000 |
6.1004 |
- |
- |
| 1.6790 |
53000 |
6.1027 |
- |
- |
| 1.7106 |
54000 |
6.1017 |
- |
- |
| 1.7423 |
55000 |
6.1011 |
- |
- |
| 1.7740 |
56000 |
6.1002 |
- |
- |
| 1.8057 |
57000 |
6.0994 |
- |
- |
| 1.8374 |
58000 |
6.0985 |
- |
- |
| 1.8690 |
59000 |
6.0986 |
- |
- |
| 1.9007 |
60000 |
6.1006 |
- |
- |
| 1.9324 |
61000 |
6.0983 |
- |
- |
| 1.9641 |
62000 |
6.0983 |
- |
- |
| 1.9958 |
63000 |
6.0973 |
- |
- |
| 2.0 |
63134 |
- |
6.1193 |
0.4828 |
| 2.0274 |
64000 |
6.0943 |
- |
- |
| 2.0591 |
65000 |
6.0941 |
- |
- |
| 2.0908 |
66000 |
6.0936 |
- |
- |
| 2.1225 |
67000 |
6.0909 |
- |
- |
| 2.1541 |
68000 |
6.0925 |
- |
- |
| 2.1858 |
69000 |
6.0932 |
- |
- |
| 2.2175 |
70000 |
6.0939 |
- |
- |
| 2.2492 |
71000 |
6.0919 |
- |
- |
| 2.2809 |
72000 |
6.0932 |
- |
- |
| 2.3125 |
73000 |
6.0916 |
- |
- |
| 2.3442 |
74000 |
6.0919 |
- |
- |
| 2.3759 |
75000 |
6.0919 |
- |
- |
| 2.4076 |
76000 |
6.0911 |
- |
- |
| 2.4393 |
77000 |
6.0924 |
- |
- |
| 2.4709 |
78000 |
6.0911 |
- |
- |
| 2.5026 |
79000 |
6.0922 |
- |
- |
| 2.5343 |
80000 |
6.0926 |
- |
- |
| 2.5660 |
81000 |
6.0911 |
- |
- |
| 2.5976 |
82000 |
6.0897 |
- |
- |
| 2.6293 |
83000 |
6.0922 |
- |
- |
| 2.6610 |
84000 |
6.0908 |
- |
- |
| 2.6927 |
85000 |
6.0884 |
- |
- |
| 2.7244 |
86000 |
6.0907 |
- |
- |
| 2.7560 |
87000 |
6.0904 |
- |
- |
| 2.7877 |
88000 |
6.0881 |
- |
- |
| 2.8194 |
89000 |
6.0902 |
- |
- |
| 2.8511 |
90000 |
6.088 |
- |
- |
| 2.8828 |
91000 |
6.0888 |
- |
- |
| 2.9144 |
92000 |
6.0884 |
- |
- |
| 2.9461 |
93000 |
6.0881 |
- |
- |
| 2.9778 |
94000 |
6.0896 |
- |
- |
| 3.0 |
94701 |
- |
6.1225 |
0.4788 |
| 3.0095 |
95000 |
6.0857 |
- |
- |
| 3.0412 |
96000 |
6.0838 |
- |
- |
| 3.0728 |
97000 |
6.0843 |
- |
- |
| 3.1045 |
98000 |
6.0865 |
- |
- |
| 3.1362 |
99000 |
6.0827 |
- |
- |
| 3.1679 |
100000 |
6.0836 |
- |
- |
| 3.1995 |
101000 |
6.0837 |
- |
- |
| 3.2312 |
102000 |
6.0836 |
- |
- |
| 3.2629 |
103000 |
6.0837 |
- |
- |
| 3.2946 |
104000 |
6.084 |
- |
- |
| 3.3263 |
105000 |
6.0836 |
- |
- |
| 3.3579 |
106000 |
6.0808 |
- |
- |
| 3.3896 |
107000 |
6.0821 |
- |
- |
| 3.4213 |
108000 |
6.0817 |
- |
- |
| 3.4530 |
109000 |
6.082 |
- |
- |
| 3.4847 |
110000 |
6.083 |
- |
- |
| 3.5163 |
111000 |
6.0829 |
- |
- |
| 3.5480 |
112000 |
6.0832 |
- |
- |
| 3.5797 |
113000 |
6.0829 |
- |
- |
| 3.6114 |
114000 |
6.0837 |
- |
- |
| 3.6430 |
115000 |
6.082 |
- |
- |
| 3.6747 |
116000 |
6.0823 |
- |
- |
| 3.7064 |
117000 |
6.082 |
- |
- |
| 3.7381 |
118000 |
6.0833 |
- |
- |
| 3.7698 |
119000 |
6.0831 |
- |
- |
| 3.8014 |
120000 |
6.0814 |
- |
- |
| 3.8331 |
121000 |
6.0813 |
- |
- |
| 3.8648 |
122000 |
6.0797 |
- |
- |
| 3.8965 |
123000 |
6.0793 |
- |
- |
| 3.9282 |
124000 |
6.0818 |
- |
- |
| 3.9598 |
125000 |
6.0806 |
- |
- |
| 3.9915 |
126000 |
6.08 |
- |
- |
| 4.0 |
126268 |
- |
6.1266 |
0.4671 |
Framework Versions
- Python: 3.10.6
- Sentence Transformers: 3.0.0
- Transformers: 4.35.0
- PyTorch: 2.1.0a0+4136153
- Accelerate: 0.30.1
- Datasets: 2.14.1
- Tokenizers: 0.14.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CoSENTLoss
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}