SentenceTransformer

This model was finetuned with Unsloth.

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 2560-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 2560 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 2560, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Evaluation Highlights

Pre-Post Train Relevancy

Pre/Post Train Spread

Spread Summary

Training Summary

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'craniofacial',
    'head and face structure',
    'Anchor-positive pairs are fundamental to contrastive learning, serving to define what the model should consider as semantically similar data points, guiding it to learn meaningful representations.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 2560]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7268, 0.0036],
#         [0.7268, 1.0000, 0.0179],
#         [0.0036, 0.0179, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 223,748 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 2 tokens
mean: 8.95 tokens
max: 33 tokens

min: 2 tokens
mean: 38.48 tokens
max: 124 tokens

	anchor	positive
type	string	string
details	min: 2 tokens mean: 8.95 tokens max: 33 tokens	min: 2 tokens mean: 38.48 tokens max: 124 tokens

Samples:

anchor	positive
`groupthink`	`Psychological tendency for group conformity`
`customs and border protection`	`DHS component enforcing trade and immigration laws`
`What is the meaning and purpose of the //go:noescape directive in Go functions?`	`The //go:noescape comment is a hint to the Go compiler. It asserts that none of the pointer parameters of the decorated function will escape the function's stack frame. This is primarily used for performance tuning in low-level code, ensuring that objects pointed to by function arguments are not allocated on the heap, thus avoiding garbage collection cycles.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim",
    "gather_across_devices": false,
    "directions": [
        "query_to_doc"
    ],
    "partition_mode": "joint",
    "hardness_mode": null,
    "hardness_strength": 0.0
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 64
gradient_accumulation_steps: 8
learning_rate: 3e-05
num_train_epochs: 1
lr_scheduler_type: constant_with_warmup
warmup_ratio: 0.03
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 8
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 3e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: constant_with_warmup
lr_scheduler_kwargs: {}
warmup_ratio: 0.03
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss
0.0023	1	0.5184
0.0046	2	0.5683
0.0069	3	0.5821
0.0092	4	0.4948
0.0114	5	0.4001
0.0137	6	0.3097
0.0160	7	0.257
0.0183	8	0.2752
0.0206	9	0.2311
0.0229	10	0.1433
0.0252	11	0.2507
0.0275	12	0.1944
0.0297	13	0.2052
0.0320	14	0.1044
0.0343	15	0.2027
0.0366	16	0.1969
0.0389	17	0.1833
0.0412	18	0.1641
0.0435	19	0.1629
0.0458	20	0.1702
0.0480	21	0.1855
0.0503	22	0.1697
0.0526	23	0.116
0.0549	24	0.1373
0.0572	25	0.1323
0.0595	26	0.1349
0.0618	27	0.1199
0.0641	28	0.1353
0.0663	29	0.143
0.0686	30	0.1305
0.0709	31	0.1088
0.0732	32	0.0908
0.0755	33	0.1502
0.0778	34	0.1139
0.0801	35	0.1311
0.0824	36	0.1291
0.0846	37	0.0977
0.0869	38	0.0962
0.0892	39	0.1166
0.0915	40	0.0965
0.0938	41	0.1242
0.0961	42	0.0705
0.0984	43	0.0813
0.1007	44	0.1545
0.1029	45	0.0868
0.1052	46	0.0987
0.1075	47	0.0938
0.1098	48	0.1086
0.1121	49	0.0982
0.1144	50	0.0817
0.1167	51	0.0527
0.1190	52	0.0986
0.1212	53	0.098
0.1235	54	0.1074
0.1258	55	0.1396
0.1281	56	0.1101
0.1304	57	0.0829
0.1327	58	0.1261
0.1350	59	0.048
0.1373	60	0.1215
0.1395	61	0.0981
0.1418	62	0.0739
0.1441	63	0.0525
0.1464	64	0.0757
0.1487	65	0.0543
0.1510	66	0.0878
0.1533	67	0.0791
0.1556	68	0.0816
0.1578	69	0.0999
0.1601	70	0.086
0.1624	71	0.0775
0.1647	72	0.1048
0.1670	73	0.0552
0.1693	74	0.0619
0.1716	75	0.0667
0.1739	76	0.0787
0.1762	77	0.1022
0.1784	78	0.0937
0.1807	79	0.0751
0.1830	80	0.0642
0.1853	81	0.0508
0.1876	82	0.1169
0.1899	83	0.09
0.1922	84	0.0725
0.1945	85	0.0476
0.1967	86	0.0737
0.1990	87	0.0968
0.2013	88	0.0988
0.2036	89	0.0575
0.2059	90	0.0629
0.2082	91	0.0627
0.2105	92	0.0565
0.2128	93	0.0696
0.2150	94	0.0413
0.2173	95	0.0625
0.2196	96	0.0593
0.2219	97	0.0511
0.2242	98	0.1168
0.2265	99	0.0601
0.2288	100	0.0919
0.2311	101	0.0471
0.2333	102	0.0701
0.2356	103	0.1032
0.2379	104	0.0823
0.2402	105	0.0825
0.2425	106	0.0626
0.2448	107	0.0821
0.2471	108	0.0532
0.2494	109	0.1171
0.2516	110	0.0814
0.2539	111	0.1167
0.2562	112	0.0918
0.2585	113	0.0704
0.2608	114	0.0726
0.2631	115	0.0522
0.2654	116	0.0628
0.2677	117	0.0716
0.2699	118	0.0676
0.2722	119	0.0616
0.2745	120	0.0505
0.2768	121	0.0653
0.2791	122	0.051
0.2814	123	0.0888
0.2837	124	0.1061
0.2860	125	0.104
0.2882	126	0.095
0.2905	127	0.0715
0.2928	128	0.0766
0.2951	129	0.076
0.2974	130	0.1154
0.2997	131	0.0463
0.3020	132	0.0596
0.3043	133	0.0705
0.3065	134	0.0654
0.3088	135	0.0802
0.3111	136	0.0882
0.3134	137	0.0872
0.3157	138	0.0853
0.3180	139	0.0661
0.3203	140	0.0633
0.3226	141	0.0784
0.3248	142	0.0832
0.3271	143	0.0799
0.3294	144	0.0954
0.3317	145	0.0744
0.3340	146	0.0559
0.3363	147	0.0892
0.3386	148	0.0424
0.3409	149	0.0742
0.3432	150	0.1025
0.3454	151	0.0814
0.3477	152	0.051
0.3500	153	0.1313
0.3523	154	0.0645
0.3546	155	0.1006
0.3569	156	0.0524
0.3592	157	0.0635
0.3615	158	0.0467
0.3637	159	0.0741
0.3660	160	0.0593
0.3683	161	0.0698
0.3706	162	0.0835
0.3729	163	0.0715
0.3752	164	0.0628
0.3775	165	0.0772
0.3798	166	0.1167
0.3820	167	0.0981
0.3843	168	0.0595
0.3866	169	0.041
0.3889	170	0.0728
0.3912	171	0.0937
0.3935	172	0.0757
0.3958	173	0.0603
0.3981	174	0.0542
0.4003	175	0.0701
0.4026	176	0.0372
0.4049	177	0.125
0.4072	178	0.0545
0.4095	179	0.0476
0.4118	180	0.0516
0.4141	181	0.1243
0.4164	182	0.0599
0.4186	183	0.1026
0.4209	184	0.077
0.4232	185	0.0732
0.4255	186	0.0798
0.4278	187	0.0538
0.4301	188	0.0679
0.4324	189	0.0759
0.4347	190	0.0761
0.4369	191	0.0557
0.4392	192	0.0534
0.4415	193	0.0747
0.4438	194	0.0672
0.4461	195	0.0376
0.4484	196	0.0466
0.4507	197	0.0783
0.4530	198	0.0864
0.4552	199	0.0423
0.4575	200	0.0708
0.4598	201	0.0429
0.4621	202	0.0718
0.4644	203	0.0802
0.4667	204	0.073
0.4690	205	0.0628
0.4713	206	0.055
0.4735	207	0.0468
0.4758	208	0.0536
0.4781	209	0.0429
0.4804	210	0.0388
0.4827	211	0.0962
0.4850	212	0.0475
0.4873	213	0.0589
0.4896	214	0.0606
0.4919	215	0.0512
0.4941	216	0.0836
0.4964	217	0.0659
0.4987	218	0.0924
0.5010	219	0.0711
0.5033	220	0.0676
0.5056	221	0.0393
0.5079	222	0.0668
0.5102	223	0.0511
0.5124	224	0.0575
0.5147	225	0.0594
0.5170	226	0.126
0.5193	227	0.0787
0.5216	228	0.0509
0.5239	229	0.0684
0.5262	230	0.0792
0.5285	231	0.0501
0.5307	232	0.0988
0.5330	233	0.0414
0.5353	234	0.0596
0.5376	235	0.0607
0.5399	236	0.0556
0.5422	237	0.0578
0.5445	238	0.0238
0.5468	239	0.0509
0.5490	240	0.0431
0.5513	241	0.0377
0.5536	242	0.0814
0.5559	243	0.0779
0.5582	244	0.0574
0.5605	245	0.0681
0.5628	246	0.0513
0.5651	247	0.0573
0.5673	248	0.0758
0.5696	249	0.0442
0.5719	250	0.0458
0.5742	251	0.0853
0.5765	252	0.0825
0.5788	253	0.065
0.5811	254	0.0429
0.5834	255	0.0438
0.5856	256	0.1028
0.5879	257	0.04
0.5902	258	0.0406
0.5925	259	0.0465
0.5948	260	0.068
0.5971	261	0.0532
0.5994	262	0.0503
0.6017	263	0.0421
0.6039	264	0.0663
0.6062	265	0.0621
0.6085	266	0.0845
0.6108	267	0.049
0.6131	268	0.0503
0.6154	269	0.0392
0.6177	270	0.0505
0.6200	271	0.0594
0.6222	272	0.0573
0.6245	273	0.0383
0.6268	274	0.0568
0.6291	275	0.0386
0.6314	276	0.0573
0.6337	277	0.0397
0.6360	278	0.0459
0.6383	279	0.0624
0.6405	280	0.0706
0.6428	281	0.0743
0.6451	282	0.0405
0.6474	283	0.0761
0.6497	284	0.0583
0.6520	285	0.0444
0.6543	286	0.0305
0.6566	287	0.0716
0.6589	288	0.041
0.6611	289	0.043
0.6634	290	0.0574
0.6657	291	0.0479
0.6680	292	0.062
0.6703	293	0.0441
0.6726	294	0.0657
0.6749	295	0.0515
0.6772	296	0.0718
0.6794	297	0.0839
0.6817	298	0.0751
0.6840	299	0.073
0.6863	300	0.0656
0.6886	301	0.0717
0.6909	302	0.0457
0.6932	303	0.0761
0.6955	304	0.0557
0.6977	305	0.0646
0.7000	306	0.0688
0.7023	307	0.0396
0.7046	308	0.0444
0.7069	309	0.0627
0.7092	310	0.0594
0.7115	311	0.0496
0.7138	312	0.0406
0.7160	313	0.0513
0.7183	314	0.0483
0.7206	315	0.0527
0.7229	316	0.0646
0.7252	317	0.0351
0.7275	318	0.0432
0.7298	319	0.06
0.7321	320	0.0487
0.7343	321	0.0398
0.7366	322	0.0279
0.7389	323	0.0594
0.7412	324	0.0808
0.7435	325	0.0461
0.7458	326	0.0452
0.7481	327	0.0887
0.7504	328	0.057
0.7526	329	0.082
0.7549	330	0.0693
0.7572	331	0.0245
0.7595	332	0.0476
0.7618	333	0.051
0.7641	334	0.0539
0.7664	335	0.0325
0.7687	336	0.0431
0.7709	337	0.0534
0.7732	338	0.0346
0.7755	339	0.0577
0.7778	340	0.086
0.7801	341	0.0705
0.7824	342	0.0412
0.7847	343	0.0426
0.7870	344	0.0829
0.7892	345	0.0767
0.7915	346	0.0702
0.7938	347	0.0662
0.7961	348	0.0436
0.7984	349	0.0292
0.8007	350	0.0586
0.8030	351	0.0416
0.8053	352	0.0874
0.8075	353	0.0378
0.8098	354	0.036
0.8121	355	0.0426
0.8144	356	0.0375
0.8167	357	0.0296
0.8190	358	0.0535
0.8213	359	0.0654
0.8236	360	0.0756
0.8259	361	0.0591
0.8281	362	0.0603
0.8304	363	0.0664
0.8327	364	0.0403
0.8350	365	0.0418
0.8373	366	0.047
0.8396	367	0.077
0.8419	368	0.0597
0.8442	369	0.0683
0.8464	370	0.0557
0.8487	371	0.0487
0.8510	372	0.0499
0.8533	373	0.0328
0.8556	374	0.0211
0.8579	375	0.0411
0.8602	376	0.0648
0.8625	377	0.0583
0.8647	378	0.0483
0.8670	379	0.0362
0.8693	380	0.0616
0.8716	381	0.0634
0.8739	382	0.0542
0.8762	383	0.053
0.8785	384	0.0436
0.8808	385	0.0426
0.8830	386	0.0503
0.8853	387	0.0522
0.8876	388	0.083
0.8899	389	0.0317
0.8922	390	0.0571
0.8945	391	0.0464
0.8968	392	0.0179
0.8991	393	0.0389
0.9013	394	0.0317
0.9036	395	0.0605
0.9059	396	0.0389
0.9082	397	0.0407
0.9105	398	0.0478
0.9128	399	0.0304
0.9151	400	0.0572
0.9174	401	0.037
0.9196	402	0.062
0.9219	403	0.0539
0.9242	404	0.039
0.9265	405	0.0265
0.9288	406	0.0398
0.9311	407	0.0369
0.9334	408	0.053
0.9357	409	0.0503
0.9379	410	0.0535
0.9402	411	0.0645
0.9425	412	0.0328
0.9448	413	0.0438
0.9471	414	0.0435
0.9494	415	0.1018
0.9517	416	0.0403
0.9540	417	0.0577
0.9562	418	0.0234
0.9585	419	0.041
0.9608	420	0.0226
0.9631	421	0.0497
0.9654	422	0.0493
0.9677	423	0.0223
0.9700	424	0.0192
0.9723	425	0.0322
0.9745	426	0.0483
0.9768	427	0.041
0.9791	428	0.0628
0.9814	429	0.0861
0.9837	430	0.0645
0.9860	431	0.0386
0.9883	432	0.0378
0.9906	433	0.0613
0.9929	434	0.067
0.9951	435	0.049
0.9974	436	0.0644
0.9997	437	0.02
1.0	438	0.0001

Framework Versions

Python: 3.12.3
Sentence Transformers: 5.3.0
Transformers: 4.56.2
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 4.3.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}