SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-m3
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 1024 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("seregadgl101/test_bge_2_10ep")
# Run inference
sentences = [
    'набор моя первая кухня',
    'кухонные наборы',
    'ea sports fc 23 ps4',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.9702
spearman_cosine	0.9169
pearson_manhattan	0.9696
spearman_manhattan	0.9166
pearson_euclidean	0.9696
spearman_euclidean	0.9166
pearson_dot	0.9631
spearman_dot	0.9173
pearson_max	0.9702
spearman_max	0.9173

Training Details

Training Dataset

Unnamed Dataset

Size: 4,532 training samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 4 tokens
mean: 14.45 tokens
max: 48 tokens

min: 3 tokens
mean: 13.09 tokens
max: 51 tokens

min: 0.0
mean: 0.6
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 4 tokens mean: 14.45 tokens max: 48 tokens	min: 3 tokens mean: 13.09 tokens max: 51 tokens	min: 0.0 mean: 0.6 max: 1.0

Samples:

sentence1	sentence2	score
`батут evo jump internal 12ft`	`батут evo jump internal 12ft`	`1.0`
`наручные часы orient casual`	`наручные часы orient`	`1.0`
`электрический духовой шкаф weissgauff eov 19 mw`	`электрический духовой шкаф weissgauff eov 19 mx`	`0.4`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 504 evaluation samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 4 tokens
mean: 14.93 tokens
max: 48 tokens

min: 4 tokens
mean: 13.1 tokens
max: 40 tokens

min: 0.0
mean: 0.59
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 4 tokens mean: 14.93 tokens max: 48 tokens	min: 4 tokens mean: 13.1 tokens max: 40 tokens	min: 0.0 mean: 0.59 max: 1.0

Samples:

sentence1	sentence2	score
`потолочный светильник yeelight smart led ceiling light c2001s500`	`yeelight smart led ceiling light c2001s500`	`1.0`
`канцелярские принадлежности`	`канцелярские принадлежности разные`	`0.4`
`usb-магнитола acv avs-1718g`	`автомагнитола acv avs-1718g`	`1.0`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
learning_rate: 2e-05
num_train_epochs: 10
warmup_ratio: 0.1
save_only_model: True
seed: 33
fp16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: True
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 33
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	loss	sts-dev_spearman_cosine
0.0882	50	-	2.7444	0.4991
0.1764	100	-	2.5535	0.6093
0.2646	150	-	2.3365	0.6761
0.3527	200	-	2.1920	0.7247
0.4409	250	-	2.2210	0.7446
0.5291	300	-	2.1432	0.7610
0.6173	350	-	2.2488	0.7769
0.7055	400	-	2.3736	0.7749
0.7937	450	-	2.0688	0.7946
0.8818	500	2.3647	2.5331	0.7879
0.9700	550	-	2.1087	0.7742
1.0582	600	-	2.1302	0.8068
1.1464	650	-	2.2669	0.8114
1.2346	700	-	2.0269	0.8039
1.3228	750	-	2.2095	0.8138
1.4109	800	-	2.5288	0.8190
1.4991	850	-	2.3442	0.8222
1.5873	900	-	2.3759	0.8289
1.6755	950	-	2.1893	0.8280
1.7637	1000	2.0682	2.0056	0.8426
1.8519	1050	-	2.0832	0.8527
1.9400	1100	-	2.0336	0.8515
2.0282	1150	-	2.0571	0.8591
2.1164	1200	-	2.1516	0.8565
2.2046	1250	-	2.2035	0.8602
2.2928	1300	-	2.5294	0.8513
2.3810	1350	-	2.4177	0.8647
2.4691	1400	-	2.1630	0.8709
2.5573	1450	-	2.1279	0.8661
2.6455	1500	1.678	2.1639	0.8744
2.7337	1550	-	2.2592	0.8799
2.8219	1600	-	2.2288	0.8822
2.9101	1650	-	2.2427	0.8831
2.9982	1700	-	2.4380	0.8776
3.0864	1750	-	2.1689	0.8826
3.1746	1800	-	1.8099	0.8868
3.2628	1850	-	2.0881	0.8832
3.3510	1900	-	2.0785	0.8892
3.4392	1950	-	2.2512	0.8865
3.5273	2000	1.2168	2.1249	0.8927
3.6155	2050	-	2.1179	0.8950
3.7037	2100	-	2.1932	0.8973
3.7919	2150	-	2.2628	0.8967
3.8801	2200	-	2.0764	0.8972
3.9683	2250	-	1.9575	0.9012
4.0564	2300	-	2.3302	0.8985
4.1446	2350	-	2.3008	0.8980
4.2328	2400	-	2.2886	0.8968
4.3210	2450	-	2.1694	0.8973
4.4092	2500	1.0851	2.1102	0.9010
4.4974	2550	-	2.2596	0.9021
4.5855	2600	-	2.1944	0.9019
4.6737	2650	-	2.0728	0.9029
4.7619	2700	-	2.4573	0.9031
4.8501	2750	-	2.2306	0.9057
4.9383	2800	-	2.2637	0.9068
5.0265	2850	-	2.5110	0.9068
5.1146	2900	-	2.6613	0.9042
5.2028	2950	-	2.4713	0.9070
5.2910	3000	0.8143	2.3709	0.9082
5.3792	3050	-	2.6083	0.9058
5.4674	3100	-	2.5377	0.9044
5.5556	3150	-	2.3146	0.9071
5.6437	3200	-	2.2603	0.9085
5.7319	3250	-	2.5842	0.9068
5.8201	3300	-	2.6045	0.9093
5.9083	3350	-	2.6207	0.9103
5.9965	3400	-	2.5992	0.9098
6.0847	3450	-	2.7799	0.9090
6.1728	3500	0.5704	2.7198	0.9098
6.2610	3550	-	2.9783	0.9089
6.3492	3600	-	2.4165	0.9120
6.4374	3650	-	2.4488	0.9122
6.5256	3700	-	2.6764	0.9113
6.6138	3750	-	2.5327	0.9130
6.7019	3800	-	2.5875	0.9129
6.7901	3850	-	2.7036	0.9130
6.8783	3900	-	2.7566	0.9120
6.9665	3950	-	2.5488	0.9127
7.0547	4000	0.4287	2.8512	0.9127
7.1429	4050	-	2.7361	0.9128
7.2310	4100	-	2.7434	0.9135
7.3192	4150	-	2.9410	0.9129
7.4074	4200	-	2.9452	0.9126
7.4956	4250	-	2.8665	0.9140
7.5838	4300	-	2.8215	0.9145
7.6720	4350	-	2.6978	0.9147
7.7601	4400	-	2.8445	0.9143
7.8483	4450	-	2.6041	0.9155
7.9365	4500	0.3099	2.7219	0.9155
8.0247	4550	-	2.7180	0.9160
8.1129	4600	-	2.6906	0.9160
8.2011	4650	-	2.8628	0.9156
8.2892	4700	-	2.7820	0.9158
8.3774	4750	-	2.8457	0.9157
8.4656	4800	-	2.7286	0.9160
8.5538	4850	-	2.7131	0.9164
8.6420	4900	-	2.8368	0.9165
8.7302	4950	-	2.8033	0.9167
8.8183	5000	0.2342	2.7307	0.9169
8.9065	5050	-	2.8483	0.9167
8.9947	5100	-	2.9736	0.9167
9.0829	5150	-	2.9151	0.9168
9.1711	5200	-	2.9375	0.9167
9.2593	5250	-	2.9968	0.9168
9.3474	5300	-	3.0024	0.9167
9.4356	5350	-	2.9444	0.9167
9.5238	5400	-	2.9477	0.9167
9.6120	5450	-	2.9205	0.9168
9.7002	5500	0.1639	2.9286	0.9167
9.7884	5550	-	2.9421	0.9168
9.8765	5600	-	2.9733	0.9168
9.9647	5650	-	2.9777	0.9169

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.41.2
PyTorch: 2.1.2+cu121
Accelerate: 0.31.0
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}

Downloads last month: 3

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for seregadgl101/test_bge_2_10ep

Base model

BAAI/bge-m3

Finetuned

(450)

this model

Paper for seregadgl101/test_bge_2_10ep

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 13

Evaluation results

Pearson Cosine on sts dev
self-reported

0.970
Spearman Cosine on sts dev
self-reported

0.917
Pearson Manhattan on sts dev
self-reported

0.970
Spearman Manhattan on sts dev
self-reported

0.917
Pearson Euclidean on sts dev
self-reported

0.970
Spearman Euclidean on sts dev
self-reported

0.917
Pearson Dot on sts dev
self-reported

0.963
Spearman Dot on sts dev
self-reported

0.917
Pearson Max on sts dev
self-reported

0.970
Spearman Max on sts dev
self-reported

0.917