xml-base base trained on Query triplets
This is a sentence-transformers model finetuned from heydariAI/persian-embeddings on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: heydariAI/persian-embeddings
- Maximum Sequence Length: 128 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence_transformers_model_id")
sentences = [
'پنکه رومیزی',
'پنکه رومیزی کوچک',
'چراغ رومیزی',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Evaluation
Metrics
Triplet
| Metric |
query-dev |
query-test |
| cosine_accuracy |
0.9676 |
0.9668 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 801,402 training samples
- Columns:
anchor, positive, and negative
- Approximate statistics based on the first 1000 samples:
|
anchor |
positive |
negative |
| type |
string |
string |
string |
| details |
- min: 3 tokens
- mean: 7.99 tokens
- max: 44 tokens
|
- min: 4 tokens
- mean: 9.86 tokens
- max: 24 tokens
|
- min: 4 tokens
- mean: 8.13 tokens
- max: 16 tokens
|
- Samples:
| anchor |
positive |
negative |
حراجی لباس بچه |
لباس بچگانه حراجی |
حراجی کفش زنانه |
گوشواره طلا دو حلقه اس |
گوشواره طلا زنانه دو حلقه |
انگشتر طلا زنانه دو بندی |
redmy a3قاب گوشی |
قاب گوشی مناسب برای گوشی ردمی A3 |
شارژر گوشی ردمی A3 |
- Loss:
GISTEmbedLoss with these parameters:{
"guide": "SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')",
"temperature": 0.0493,
"margin_strategy": "relative",
"margin": 0.0516,
"contrast_anchors": true,
"contrast_positives": true,
"gather_across_devices": false
}
Evaluation Dataset
json
- Dataset: json
- Size: 100,175 evaluation samples
- Columns:
anchor, positive, and negative
- Approximate statistics based on the first 1000 samples:
|
anchor |
positive |
negative |
| type |
string |
string |
string |
| details |
- min: 3 tokens
- mean: 7.8 tokens
- max: 25 tokens
|
- min: 4 tokens
- mean: 9.86 tokens
- max: 23 tokens
|
- min: 4 tokens
- mean: 8.09 tokens
- max: 16 tokens
|
- Samples:
| anchor |
positive |
negative |
کراپ تیشرت زنانه ورزشی |
تیشرت کراپ زنانه ورزشی |
شلوار ورزشی زنانه |
فیشیال دستگاه |
دستگاه بخور صورت برای فیشیال |
دستگاه تصفیه هوای خانگی |
پیراهن مشکی مردانه یقه خرگوشی |
پیراهن مردانه مشکی یقه دار طرح خرگوشی |
شلوار مشکی مردانه یقه خرگوشی |
- Loss:
GISTEmbedLoss with these parameters:{
"guide": "SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')",
"temperature": 0.0493,
"margin_strategy": "relative",
"margin": 0.0516,
"contrast_anchors": true,
"contrast_positives": true,
"gather_across_devices": false
}
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 1.1701480000238433e-05
num_train_epochs: 5
warmup_ratio: 0.15873389962653162
fp16: True
batch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 1.1701480000238433e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.15873389962653162
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 3
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: True
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
| Epoch |
Step |
Training Loss |
Validation Loss |
query-dev_cosine_accuracy |
query-test_cosine_accuracy |
| -1 |
-1 |
- |
- |
0.8824 |
- |
| 0.0799 |
1000 |
0.2209 |
0.1147 |
0.9180 |
- |
| 0.1597 |
2000 |
0.1248 |
0.0842 |
0.9316 |
- |
| 0.2396 |
3000 |
0.0962 |
0.0693 |
0.9370 |
- |
| 0.3195 |
4000 |
0.0842 |
0.0611 |
0.9426 |
- |
| 0.3993 |
5000 |
0.0742 |
0.0555 |
0.9458 |
- |
| 0.4792 |
6000 |
0.0681 |
0.0538 |
0.9490 |
- |
| 0.5591 |
7000 |
0.0661 |
0.0498 |
0.9488 |
- |
| 0.6389 |
8000 |
0.0637 |
0.0471 |
0.9525 |
- |
| 0.7188 |
9000 |
0.0609 |
0.0461 |
0.9528 |
- |
| 0.7987 |
10000 |
0.0573 |
0.0452 |
0.9525 |
- |
| 0.8785 |
11000 |
0.055 |
0.0449 |
0.9550 |
- |
| 0.9584 |
12000 |
0.0541 |
0.0431 |
0.9556 |
- |
| 1.0383 |
13000 |
0.0553 |
0.0427 |
0.9547 |
- |
| 1.1181 |
14000 |
0.053 |
0.0402 |
0.9586 |
- |
| 1.1980 |
15000 |
0.0464 |
0.0401 |
0.9583 |
- |
| 1.2779 |
16000 |
0.0437 |
0.0380 |
0.9586 |
- |
| 1.3577 |
17000 |
0.0426 |
0.0373 |
0.9599 |
- |
| 1.4376 |
18000 |
0.038 |
0.0376 |
0.9593 |
- |
| 1.5175 |
19000 |
0.037 |
0.0361 |
0.9605 |
- |
| 1.5973 |
20000 |
0.0348 |
0.0364 |
0.9607 |
- |
| 1.6772 |
21000 |
0.033 |
0.0349 |
0.9621 |
- |
| 1.7570 |
22000 |
0.029 |
0.0347 |
0.9609 |
- |
| 1.8369 |
23000 |
0.0278 |
0.0345 |
0.9617 |
- |
| 1.9168 |
24000 |
0.0261 |
0.0346 |
0.9620 |
- |
| 1.9966 |
25000 |
0.0269 |
0.0334 |
0.9626 |
- |
| 2.0765 |
26000 |
0.0267 |
0.0335 |
0.9632 |
- |
| 2.1564 |
27000 |
0.0246 |
0.0333 |
0.9643 |
- |
| 2.2362 |
28000 |
0.0227 |
0.0330 |
0.9629 |
- |
| 2.3161 |
29000 |
0.0224 |
0.0327 |
0.9642 |
- |
| 2.3960 |
30000 |
0.0209 |
0.0325 |
0.9642 |
- |
| 2.4758 |
31000 |
0.0195 |
0.0330 |
0.9648 |
- |
| 2.5557 |
32000 |
0.0191 |
0.0327 |
0.9652 |
- |
| 2.6356 |
33000 |
0.0189 |
0.0316 |
0.9643 |
- |
| 2.7154 |
34000 |
0.0165 |
0.0324 |
0.9645 |
- |
| 2.7953 |
35000 |
0.015 |
0.0309 |
0.9644 |
- |
| 2.8752 |
36000 |
0.0142 |
0.0323 |
0.9654 |
- |
| 2.9550 |
37000 |
0.0139 |
0.0316 |
0.9646 |
- |
| 3.0349 |
38000 |
0.0151 |
0.0303 |
0.9650 |
- |
| 3.1148 |
39000 |
0.0145 |
0.0307 |
0.9664 |
- |
| 3.1946 |
40000 |
0.0128 |
0.0303 |
0.9656 |
- |
| 3.2745 |
41000 |
0.0127 |
0.0300 |
0.9659 |
- |
| 3.3544 |
42000 |
0.0125 |
0.0305 |
0.9663 |
- |
| 3.4342 |
43000 |
0.0106 |
0.0305 |
0.9661 |
- |
| 3.5141 |
44000 |
0.011 |
0.0308 |
0.9670 |
- |
| 3.5940 |
45000 |
0.0105 |
0.0295 |
0.9665 |
- |
| 3.6738 |
46000 |
0.0101 |
0.0297 |
0.9666 |
- |
| 3.7537 |
47000 |
0.0091 |
0.0299 |
0.9667 |
- |
| 3.8336 |
48000 |
0.009 |
0.0297 |
0.9666 |
- |
| 3.9134 |
49000 |
0.0082 |
0.0298 |
0.9662 |
- |
| 3.9933 |
50000 |
0.0086 |
0.0301 |
0.9668 |
- |
| 4.0732 |
51000 |
0.0087 |
0.0290 |
0.9674 |
- |
| 4.1530 |
52000 |
0.0084 |
0.0287 |
0.9678 |
- |
| 4.2329 |
53000 |
0.0078 |
0.0288 |
0.9667 |
- |
| 4.3128 |
54000 |
0.008 |
0.0287 |
0.9669 |
- |
| 4.3926 |
55000 |
0.0074 |
0.0287 |
0.9669 |
- |
| 4.4725 |
56000 |
0.007 |
0.0288 |
0.9677 |
- |
| 4.5524 |
57000 |
0.0068 |
0.0288 |
0.9674 |
- |
| 4.6322 |
58000 |
0.007 |
0.0282 |
0.9677 |
- |
| 4.7121 |
59000 |
0.0064 |
0.0286 |
0.9678 |
- |
| 4.7919 |
60000 |
0.006 |
0.0283 |
0.9675 |
- |
| 4.8718 |
61000 |
0.0059 |
0.0284 |
0.9675 |
- |
| 4.9517 |
62000 |
0.0057 |
0.0284 |
0.9676 |
- |
| -1 |
-1 |
- |
- |
0.9676 |
0.9668 |
Framework Versions
- Python: 3.12.11
- Sentence Transformers: 5.1.0
- Transformers: 4.55.0
- PyTorch: 2.7.1+cu126
- Accelerate: 1.10.0
- Datasets: 4.0.0
- Tokenizers: 0.21.4
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
GISTEmbedLoss
@misc{solatorio2024gistembed,
title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
author={Aivin V. Solatorio},
year={2024},
eprint={2402.16829},
archivePrefix={arXiv},
primaryClass={cs.LG}
}