LamaDiab's picture
Training in progress, epoch 3, checkpoint
636c605 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:739818
  - loss:MultipleNegativesSymmetricRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: casa chandelier
    sentences:
      - cerulean blue set
      - steel frame lamp
      - chandlier
  - source_sentence: essence multi task concealer 15 natural nude
    sentences:
      - essence
      - face make-up
      - unicorn round bath bomb
  - source_sentence: the button down shirt
    sentences:
      - top
      - summer shirt
      - linen x tulle set
  - source_sentence: xbase 100 kids swimming goggles -  clear lenses - blue / yellow
    sentences:
      - glasses
      - strap adjustment goggles
      - evil eye wooden necklace
  - source_sentence: peblade warrior game
    sentences:
      - halo
      - ' peblade game'
      - toy figure
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy
            value: 0.9734987616539001
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LamaDiab/NewMiniLM-V14Data-128BATCH-SemanticEngine")
# Run inference
sentences = [
    'peblade warrior game',
    ' peblade game',
    'halo',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8796, 0.3215],
#         [0.8796, 1.0000, 0.3155],
#         [0.3215, 0.3155, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9735

Training Details

Training Dataset

Unnamed Dataset

  • Size: 739,818 training samples
  • Columns: anchor, positive, and itemCategory
  • Approximate statistics based on the first 1000 samples:
    anchor positive itemCategory
    type string string string
    details
    • min: 3 tokens
    • mean: 7.05 tokens
    • max: 121 tokens
    • min: 3 tokens
    • mean: 7.0 tokens
    • max: 44 tokens
    • min: 3 tokens
    • mean: 3.93 tokens
    • max: 11 tokens
  • Samples:
    anchor positive itemCategory
    maxical-d liquid medicine pediatrics medicine
    maltesers crunch cafe egg matesers chocolate sweet
    president'wys daughter books literature and fiction
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 9,509 evaluation samples
  • Columns: anchor, positive, negative, and itemCategory
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative itemCategory
    type string string string string
    details
    • min: 3 tokens
    • mean: 9.63 tokens
    • max: 43 tokens
    • min: 2 tokens
    • mean: 6.45 tokens
    • max: 150 tokens
    • min: 3 tokens
    • mean: 9.19 tokens
    • max: 38 tokens
    • min: 3 tokens
    • mean: 3.88 tokens
    • max: 10 tokens
  • Samples:
    anchor positive negative itemCategory
    pilot mechanical pencil progrex h-127 - 0.7 mm mechanical pencil stand with palestine planner pencil
    superior drawing marker -pen - set of 12 colors - 2 nib marker pen set falafel pocket marker
    first person singular author: haruki murakami english book little princess play set literature and fiction
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • weight_decay: 0.001
  • num_train_epochs: 5
  • warmup_ratio: 0.2
  • fp16: True
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • dataloader_persistent_workers: True
  • push_to_hub: True
  • hub_model_id: LamaDiab/NewMiniLM-V14Data-128BATCH-SemanticEngine
  • hub_strategy: all_checkpoints

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: True
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: LamaDiab/NewMiniLM-V14Data-128BATCH-SemanticEngine
  • hub_strategy: all_checkpoints
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss cosine_accuracy
0.0002 1 3.1444 - -
0.1730 1000 2.8902 0.5921 0.9447
0.3460 2000 2.3137 0.5213 0.9536
0.5190 3000 2.0646 0.4818 0.9587
0.6920 4000 1.7373 0.4519 0.9621
0.8651 5000 1.2623 0.4316 0.9629
1.0381 6000 1.1672 0.4093 0.9669
1.2111 7000 1.5637 0.3973 0.9706
1.3841 8000 1.4262 0.3820 0.9701
1.5571 9000 1.3676 0.3758 0.9712
1.7301 10000 1.1239 0.3620 0.9715
1.9031 11000 0.8423 0.3590 0.9704
2.0761 12000 0.9595 0.3555 0.9735
2.2491 13000 1.1842 0.3600 0.9738
2.4221 14000 1.1436 0.3563 0.9746
2.5952 15000 1.147 0.3496 0.9736
2.7682 16000 0.8589 0.3432 0.9733
2.9412 17000 0.6846 0.3368 0.9735

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.1.2
  • Transformers: 4.53.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.4.1
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}