SentenceTransformer based on distilbert/distilbert-base-uncased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: distilbert/distilbert-base-uncased
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'T ENGINE TRANS TOP LAT 90 Deg Front 2025 U717 G-S',
    'T R F ACTIVE VENT SQUIB VOLT 90 Deg Front 2021 P702 VOLTS',
    'T ENGINE TRANS TOP LAT 30 Deg Front Angular Left 2020 P558 G-S',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.4518
spearman_cosine	0.4762
pearson_manhattan	0.4253
spearman_manhattan	0.4638
pearson_euclidean	0.4262
spearman_euclidean	0.4652
pearson_dot	0.3898
spearman_dot	0.374
pearson_max	0.4518
spearman_max	0.4762

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.4412
spearman_cosine	0.4671
pearson_manhattan	0.4156
spearman_manhattan	0.456
pearson_euclidean	0.4167
spearman_euclidean	0.4575
pearson_dot	0.3753
spearman_dot	0.3629
pearson_max	0.4412
spearman_max	0.4671

Training Details

Training Dataset

Unnamed Dataset

Size: 8,081,275 training samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 23 tokens
mean: 31.48 tokens
max: 40 tokens

min: 16 tokens
mean: 30.06 tokens
max: 55 tokens

min: 0.0
mean: 0.44
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 23 tokens mean: 31.48 tokens max: 40 tokens	min: 16 tokens mean: 30.06 tokens max: 55 tokens	min: 0.0 mean: 0.44 max: 1.0

Samples:

sentence1	sentence2	score
`T L F DUMMY PELVIS VERT Dynamic Seat Sled Test 2025 U718 G-S`	`T SCS R2 HY REF 059 R C PLR REF Y SM LAT 90 Deg / Left Side Decel-4g 2020 CX483 G-S`	`0.21129386503072142`
`T L F DUMMY PELVIS VERT Dynamic Seat Sled Test 2025 U718 G-S`	`T R F DUMMY PELVIS VERT 75 Deg Oblique Right Side 10 in. Pole 2015 P552 G-S`	`0.4972955033248179`
`T L F DUMMY PELVIS VERT Dynamic Seat Sled Test 2025 U718 G-S`	`T SCS L1 HY REF 053 L B PLR REF Y SM LAT 90 Deg Front Bumper Override 2021 CX727 G-S`	`0.5701051768787058`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 1,726,581 evaluation samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 22 tokens
mean: 25.0 tokens
max: 30 tokens

min: 16 tokens
mean: 31.04 tokens
max: 53 tokens

min: 0.0
mean: 0.44
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 22 tokens mean: 25.0 tokens max: 30 tokens	min: 16 tokens mean: 31.04 tokens max: 53 tokens	min: 0.0 mean: 0.44 max: 1.0

Samples:

sentence1	sentence2	score
`T R F ADAPTIVE TETHER VENT SQUIB VOLT 30 Deg Front Angular Right 20xx GENERIC VOLTS`	`T L F DUMMY T12 LONG 27 Deg Crabbed Left Side NHTSA 214 MDB to vehicle 2015 P552 G-S`	`0.6835618484879796`
`T R F ADAPTIVE TETHER VENT SQUIB VOLT 30 Deg Front Angular Right 20xx GENERIC VOLTS`	`T L F DUMMY R FEMUR LONG 90 Deg Front 2022 U553 G-S`	`0.666531064739`
`T R F ADAPTIVE TETHER VENT SQUIB VOLT 30 Deg Front Angular Right 20xx GENERIC VOLTS`	`T R F DUMMY NECK UPPER MZ LOAD 90 Deg Front 2019 P375ICA IN-LBS`	`0.46391834212079874`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 3e-05
num_train_epochs: 4
warmup_ratio: 0.1
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 3e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 4
max_steps: -1
lr_scheduler_type: linear
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 4
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: True
dataloader_num_workers: 0
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: False
include_tokens_per_second: False
neftune_noise_alpha: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	loss	sts-dev_spearman_cosine
0.0317	1000	6.3069	-	-
0.0634	2000	6.1793	-	-
0.0950	3000	6.1607	-	-
0.1267	4000	6.1512	-	-
0.1584	5000	6.1456	-	-
0.1901	6000	6.1419	-	-
0.2218	7000	6.1398	-	-
0.2534	8000	6.1377	-	-
0.2851	9000	6.1352	-	-
0.3168	10000	6.1338	-	-
0.3485	11000	6.1332	-	-
0.3801	12000	6.1309	-	-
0.4118	13000	6.1315	-	-
0.4435	14000	6.1283	-	-
0.4752	15000	6.129	-	-
0.5069	16000	6.1271	-	-
0.5385	17000	6.1265	-	-
0.5702	18000	6.1238	-	-
0.6019	19000	6.1234	-	-
0.6336	20000	6.1225	-	-
0.6653	21000	6.1216	-	-
0.6969	22000	6.1196	-	-
0.7286	23000	6.1198	-	-
0.7603	24000	6.1178	-	-
0.7920	25000	6.117	-	-
0.8236	26000	6.1167	-	-
0.8553	27000	6.1165	-	-
0.8870	28000	6.1149	-	-
0.9187	29000	6.1146	-	-
0.9504	30000	6.113	-	-
0.9820	31000	6.1143	-	-
1.0	31567	-	6.1150	0.4829
1.0137	32000	6.1115	-	-
1.0454	33000	6.111	-	-
1.0771	34000	6.1091	-	-
1.1088	35000	6.1094	-	-
1.1404	36000	6.1078	-	-
1.1721	37000	6.1095	-	-
1.2038	38000	6.106	-	-
1.2355	39000	6.1071	-	-
1.2671	40000	6.1073	-	-
1.2988	41000	6.1064	-	-
1.3305	42000	6.1047	-	-
1.3622	43000	6.1054	-	-
1.3939	44000	6.1048	-	-
1.4255	45000	6.1053	-	-
1.4572	46000	6.1058	-	-
1.4889	47000	6.1037	-	-
1.5206	48000	6.1041	-	-
1.5523	49000	6.1023	-	-
1.5839	50000	6.1018	-	-
1.6156	51000	6.104	-	-
1.6473	52000	6.1004	-	-
1.6790	53000	6.1027	-	-
1.7106	54000	6.1017	-	-
1.7423	55000	6.1011	-	-
1.7740	56000	6.1002	-	-
1.8057	57000	6.0994	-	-
1.8374	58000	6.0985	-	-
1.8690	59000	6.0986	-	-
1.9007	60000	6.1006	-	-
1.9324	61000	6.0983	-	-
1.9641	62000	6.0983	-	-
1.9958	63000	6.0973	-	-
2.0	63134	-	6.1193	0.4828
2.0274	64000	6.0943	-	-
2.0591	65000	6.0941	-	-
2.0908	66000	6.0936	-	-
2.1225	67000	6.0909	-	-
2.1541	68000	6.0925	-	-
2.1858	69000	6.0932	-	-
2.2175	70000	6.0939	-	-
2.2492	71000	6.0919	-	-
2.2809	72000	6.0932	-	-
2.3125	73000	6.0916	-	-
2.3442	74000	6.0919	-	-
2.3759	75000	6.0919	-	-
2.4076	76000	6.0911	-	-
2.4393	77000	6.0924	-	-
2.4709	78000	6.0911	-	-
2.5026	79000	6.0922	-	-
2.5343	80000	6.0926	-	-
2.5660	81000	6.0911	-	-
2.5976	82000	6.0897	-	-
2.6293	83000	6.0922	-	-
2.6610	84000	6.0908	-	-
2.6927	85000	6.0884	-	-
2.7244	86000	6.0907	-	-
2.7560	87000	6.0904	-	-
2.7877	88000	6.0881	-	-
2.8194	89000	6.0902	-	-
2.8511	90000	6.088	-	-
2.8828	91000	6.0888	-	-
2.9144	92000	6.0884	-	-
2.9461	93000	6.0881	-	-
2.9778	94000	6.0896	-	-
3.0	94701	-	6.1225	0.4788
3.0095	95000	6.0857	-	-
3.0412	96000	6.0838	-	-
3.0728	97000	6.0843	-	-
3.1045	98000	6.0865	-	-
3.1362	99000	6.0827	-	-
3.1679	100000	6.0836	-	-
3.1995	101000	6.0837	-	-
3.2312	102000	6.0836	-	-
3.2629	103000	6.0837	-	-
3.2946	104000	6.084	-	-
3.3263	105000	6.0836	-	-
3.3579	106000	6.0808	-	-
3.3896	107000	6.0821	-	-
3.4213	108000	6.0817	-	-
3.4530	109000	6.082	-	-
3.4847	110000	6.083	-	-
3.5163	111000	6.0829	-	-
3.5480	112000	6.0832	-	-
3.5797	113000	6.0829	-	-
3.6114	114000	6.0837	-	-
3.6430	115000	6.082	-	-
3.6747	116000	6.0823	-	-
3.7064	117000	6.082	-	-
3.7381	118000	6.0833	-	-
3.7698	119000	6.0831	-	-
3.8014	120000	6.0814	-	-
3.8331	121000	6.0813	-	-
3.8648	122000	6.0797	-	-
3.8965	123000	6.0793	-	-
3.9282	124000	6.0818	-	-
3.9598	125000	6.0806	-	-
3.9915	126000	6.08	-	-
4.0	126268	-	6.1266	0.4671

Framework Versions

Python: 3.10.6
Sentence Transformers: 3.0.0
Transformers: 4.35.0
PyTorch: 2.1.0a0+4136153
Accelerate: 0.30.1
Datasets: 2.14.1
Tokenizers: 0.14.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}