SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Syldehayem/all-MiniLM-L6-v2_embedder_train")
# Run inference
sentences = [
    'CGI 3D Animated Trailers: "Killing Anabella" - by Aman Bhanot | TheCGBros',
    'Kaise Katey Rajani | Khudito Pasan | Bengali Movie Video Song | Bengali Classic Song',
    'CGI 3D Animated Trailers: "Play On" - by Sun Woo Kang | TheCGBros',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 9,712 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 4 tokens
    • mean: 19.63 tokens
    • max: 49 tokens
    • min: 3 tokens
    • mean: 20.02 tokens
    • max: 49 tokens
    • min: 5 tokens
    • mean: 20.32 tokens
    • max: 62 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    13 Films In 2 Years - A Filmmaker's Journey EPISODE 2 (Documentary) দেওয়া নেওয়া ইত্যাদি Natok Korish Na Toh
    CGI & VFX Showreels: "B-War" - by Jorge Baldeon TheCGBros Hot Dog
    Excuse Me - Comedy Scene Mauchaak Ranjit Mallick, Mithu Mukherjee
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 50
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 50
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.8237 500 5.0003
1.6474 1000 4.9955
2.4712 1500 4.9898
3.2949 2000 4.9741
4.1186 2500 4.9602
4.9423 3000 4.9196
5.7661 3500 4.8714
6.5898 4000 4.8077
7.4135 4500 4.7834
8.2372 5000 4.7543
9.0610 5500 4.7321
9.8847 6000 4.7047
10.7084 6500 4.7031
11.5321 7000 4.6618
12.3558 7500 4.6335
13.1796 8000 4.6199
14.0033 8500 4.5678
14.8270 9000 4.585
15.6507 9500 4.5565
16.4745 10000 4.5897
17.2982 10500 4.532
18.1219 11000 4.5248
18.9456 11500 4.5226
19.7694 12000 4.4929
20.5931 12500 4.4835
21.4168 13000 4.468
22.2405 13500 4.4638
23.0643 14000 4.4377
23.8880 14500 4.4336
24.7117 15000 4.4322
25.5354 15500 4.4144
26.3591 16000 4.4041
27.1829 16500 4.4118
28.0066 17000 4.3932
28.8303 17500 4.3745
29.6540 18000 4.3673
30.4778 18500 4.3903
31.3015 19000 4.3573
32.1252 19500 4.3369
32.9489 20000 4.3424
33.7727 20500 4.3416
34.5964 21000 4.3402
35.4201 21500 4.3205
36.2438 22000 4.3288
37.0675 22500 4.3306
37.8913 23000 4.3067
38.7150 23500 4.3108
39.5387 24000 4.2793
40.3624 24500 4.3203
41.1862 25000 4.3012
42.0099 25500 4.288
42.8336 26000 4.2913
43.6573 26500 4.2956
44.4811 27000 4.2755
45.3048 27500 4.2914
46.1285 28000 4.2525
46.9522 28500 4.2877
47.7759 29000 4.2624
48.5997 29500 4.2649
49.4234 30000 4.2897

Framework Versions

  • Python: 3.12.9
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.7.0+cu126
  • Accelerate: 1.6.0
  • Datasets: 3.5.1
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Syldehayem/all-MiniLM-L6-v2_embedder_train

Finetuned
(718)
this model

Papers for Syldehayem/all-MiniLM-L6-v2_embedder_train