LamaDiab's picture
Updating model weights
7d8e91e verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:798551
  - loss:MultipleNegativesSymmetricRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: chillax fluffy beanbag
    sentences:
      - living room furniture
      - lined orange
      - home_garden
      - home and garden
      - beanbag
  - source_sentence: must kindergarten backpack mermazing 2 cases
    sentences:
      - trolley travel bag 1 zipper 33 l colors coral high 16766
      - bag
      - fashion
      - girls backpack
      - fashion
  - source_sentence: rolltop vintage backpack black & havan
    sentences:
      - men backpack
      - bag
      - kids carpet sanford 1629-l
      - fashion
      - fashion
  - source_sentence: golden olive pouch
    sentences:
      - orange bucket hat mint with oranges without rope
      - ' bag'
      - fashion
      - bag
      - fashion
  - source_sentence: xbase 100 kids swimming goggles -  clear lenses - blue / yellow
    sentences:
      - snuggs wearable blanket monk red christmas deer
      - fashion
      - blue and yellow swimming goggles
      - fashion
      - glasses
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy
            value: 0.9800189137458801
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LamaDiab/MiniLM-v2-v28-SemanticEngine")
# Run inference
sentences = [
    'xbase 100 kids swimming goggles -  clear lenses - blue / yellow',
    'blue and yellow swimming goggles',
    'snuggs wearable blanket monk red christmas deer',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7809, 0.0460],
#         [0.7809, 1.0000, 0.0060],
#         [0.0460, 0.0060, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.98

Training Details

Training Dataset

Unnamed Dataset

  • Size: 798,551 training samples
  • Columns: anchor, positive, itemCategory, shoppingCategory, and shoppingCategory_normalized
  • Approximate statistics based on the first 1000 samples:
    anchor positive itemCategory shoppingCategory shoppingCategory_normalized
    type string string string string string
    details
    • min: 3 tokens
    • mean: 11.21 tokens
    • max: 93 tokens
    • min: 3 tokens
    • mean: 5.92 tokens
    • max: 32 tokens
    • min: 3 tokens
    • mean: 3.94 tokens
    • max: 9 tokens
    • min: 3 tokens
    • mean: 3.38 tokens
    • max: 5 tokens
    • min: 3 tokens
    • mean: 4.47 tokens
    • max: 5 tokens
  • Samples:
    anchor positive itemCategory shoppingCategory shoppingCategory_normalized
    sky blue dotted t-chemise t-shirt sky dotted top fashion fashion
    coffee - set of scented candles pleasant aroma candle candle home and garden home_garden
    nasturtium passion fruit chocolate chocolate sweet restaurants food_dining
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 9,509 evaluation samples
  • Columns: anchor, positive, negative, itemCategory, shoppingCategory, and shoppingCategory_normalized
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative itemCategory shoppingCategory shoppingCategory_normalized
    type string string string string string string
    details
    • min: 3 tokens
    • mean: 9.63 tokens
    • max: 43 tokens
    • min: 3 tokens
    • mean: 6.52 tokens
    • max: 150 tokens
    • min: 3 tokens
    • mean: 9.5 tokens
    • max: 33 tokens
    • min: 3 tokens
    • mean: 3.86 tokens
    • max: 9 tokens
    • min: 3 tokens
    • mean: 3.36 tokens
    • max: 5 tokens
    • min: 3 tokens
    • mean: 4.44 tokens
    • max: 5 tokens
  • Samples:
    anchor positive negative itemCategory shoppingCategory shoppingCategory_normalized
    pilot mechanical pencil progrex h-127 - 0.7 mm 0.7 mm pencil rush tip marker set, 6 pastel colors, carton box, black edition no. 116453 pencil stationary office_school
    superior drawing marker -pen - set of 12 colors - 2 nib marker pen set palestine scarf notebook (spirals) marker stationary office_school
    first person singular author: haruki murakami haruki murakami book chocolate flakes mocha literature and fiction entertainment sports_entertainment
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • warmup_ratio: 0.1
  • fp16: True
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • dataloader_persistent_workers: True
  • push_to_hub: True
  • hub_model_id: LamaDiab/MiniLM-v2-v28-SemanticEngine
  • hub_strategy: all_checkpoints

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: True
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: LamaDiab/MiniLM-v2-v28-SemanticEngine
  • hub_strategy: all_checkpoints
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss cosine_accuracy
0.0003 1 2.1128 - -
0.3205 1000 1.6867 0.7481 0.9634
0.6410 2000 1.1579 0.6880 0.9722
0.9615 3000 1.894 0.6943 0.9689
1.2819 4000 1.3968 0.6312 0.9768
1.6022 5000 1.2426 0.6099 0.9789
1.9225 6000 1.1736 0.6118 0.9785
2.2428 7000 1.1514 0.6085 0.9788
2.5631 8000 1.1096 0.6068 0.9797
2.8834 9000 1.1285 0.6031 0.9800

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.1.2
  • Transformers: 4.53.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.4.1
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}