SentenceTransformer based on Syldehayem/all-MiniLM-L12-v2_embedder_train

This is a sentence-transformers model finetuned from Syldehayem/all-MiniLM-L12-v2_embedder_train. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Syldehayem/all-MiniLM-L12-v2_embedder_train")
# Run inference
sentences = [
    'Horror Short Film "Nice to Finally Meet You" | ALTER | Online Premiere',
    "The Curse of Pandora's Box Returns to #UniversalHHN 2021",
    'Mondays: The Spielberg Challenge Winner!',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 9,712 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 3 tokens
    • mean: 19.7 tokens
    • max: 49 tokens
    • min: 3 tokens
    • mean: 19.91 tokens
    • max: 49 tokens
    • min: 4 tokens
    • mean: 20.27 tokens
    • max: 50 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    মেয়ে যখন মায়ের মতন Bidhilipi #Shorts
    A Sci-Fi Short Film: "Voltok" - by Jonathan Vleeschower TheCGBros CGI MoCap Demo : "Finger Mocap Without Any Post Animation" by the MocapLab
    LEAKY PIPES Taking care of a baby at 15 "Fifteen" - Short film by Sameh Alaa
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 50
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 50
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.8237 500 5.0075
1.6474 1000 4.9816
2.4712 1500 5.013
3.2949 2000 4.981
4.1186 2500 4.9981
4.9423 3000 4.9727
5.7661 3500 4.9698
6.5898 4000 4.9839
7.4135 4500 5.0001
8.2372 5000 4.9996
9.0610 5500 4.9993
9.8847 6000 4.9999
10.7084 6500 5.0015
11.5321 7000 4.9934
12.3558 7500 4.9903
13.1796 8000 4.9875
14.0033 8500 5.0018
14.8270 9000 5.0088
15.6507 9500 4.9643
16.4745 10000 4.9447
17.2982 10500 4.8911
18.1219 11000 4.8719
18.9456 11500 4.8671
19.7694 12000 4.8268
20.5931 12500 4.8195
21.4168 13000 4.7726
22.2405 13500 4.7479
23.0643 14000 4.7465
23.8880 14500 4.7776
24.7117 15000 4.7366
25.5354 15500 4.7076
26.3591 16000 4.74
27.1829 16500 4.7118
28.0066 17000 4.6797
28.8303 17500 4.7144
29.6540 18000 4.662
30.4778 18500 4.6849
31.3015 19000 4.6608
32.1252 19500 4.6844
32.9489 20000 4.6561
33.7727 20500 4.6513
34.5964 21000 4.6418
35.4201 21500 4.635
36.2438 22000 4.6418
37.0675 22500 4.62
37.8913 23000 4.615
38.7150 23500 4.6189
39.5387 24000 4.6113
40.3624 24500 4.6054
41.1862 25000 4.5824
42.0099 25500 4.5907
42.8336 26000 4.5949
43.6573 26500 4.5769
44.4811 27000 4.5758
45.3048 27500 4.5613
46.1285 28000 4.5816
46.9522 28500 4.5538
47.7759 29000 4.5645
48.5997 29500 4.5653
49.4234 30000 4.5494

Framework Versions

  • Python: 3.12.9
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.7.0+cu126
  • Accelerate: 1.6.0
  • Datasets: 3.5.1
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
2
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Syldehayem/all-MiniLM-L12-v2_embedder_train

Unable to build the model tree, the base model loops to the model itself. Learn more.

Papers for Syldehayem/all-MiniLM-L12-v2_embedder_train