LamaDiab's picture
Updating model weights
37a2397 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:485108
  - loss:MultipleNegativesSymmetricRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: must kindergarten backpack mermazing 2 cases
    sentences:
      - olive acid wash t-shirt
      - ' must backpack '
      - bag
  - source_sentence: y earrings
    sentences:
      - gold y earringscircles earrings
      - earring
      - malachite and black tourmaline bracelet - 8 mm
  - source_sentence: black corset
    sentences:
      - top
      - ' corset top'
      - white maron printed t-shirt
  - source_sentence: xbase 100 kids swimming goggles -  clear lenses - blue / yellow
    sentences:
      - glasses
      - strap adjustment goggles
      - duffle bag navy with brown leather
  - source_sentence: sand eel shad soft lure combo eelo 150 25 g ayu/blue
    sentences:
      - mfk 140 static kite - pulpy
      - soft combo lure
      - fishing
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy
            value: 0.9181827902793884
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LamaDiab/MiniLM-V19Data-128ConstantBATCH-SemanticEngine")
# Run inference
sentences = [
    'sand eel shad soft lure combo eelo 150 25 g ayu/blue',
    'soft combo lure',
    'mfk 140 static kite - pulpy',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9577, 0.4595],
#         [0.9577, 1.0000, 0.5243],
#         [0.4595, 0.5243, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9182

Training Details

Training Dataset

Unnamed Dataset

  • Size: 485,108 training samples
  • Columns: anchor, positive, and itemCategory
  • Approximate statistics based on the first 1000 samples:
    anchor positive itemCategory
    type string string string
    details
    • min: 3 tokens
    • mean: 3.94 tokens
    • max: 8 tokens
    • min: 3 tokens
    • mean: 7.94 tokens
    • max: 106 tokens
    • min: 3 tokens
    • mean: 3.99 tokens
    • max: 9 tokens
  • Samples:
    anchor positive itemCategory
    sweet alpine milk chocolate cookies sweet
    purse pocket purse bag
    hand soap johnson hand wash latte blossom hand soap
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 14.285714285714285,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 9,509 evaluation samples
  • Columns: anchor, positive, negative, and itemCategory
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative itemCategory
    type string string string string
    details
    • min: 3 tokens
    • mean: 9.63 tokens
    • max: 43 tokens
    • min: 3 tokens
    • mean: 7.01 tokens
    • max: 150 tokens
    • min: 3 tokens
    • mean: 9.39 tokens
    • max: 39 tokens
    • min: 3 tokens
    • mean: 3.88 tokens
    • max: 10 tokens
  • Samples:
    anchor positive negative itemCategory
    pilot mechanical pencil progrex h-127 - 0.7 mm pilot pencil plastic sharpener faber castell 1 hole 24 degree + faces eraser colors 583513 pencil
    superior drawing marker -pen - set of 12 colors - 2 nib superior true gel pen transparent orange 242615 marker
    first person singular author: haruki murakami first person singular book small "o.w.t" - marble top literature and fiction
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 14.285714285714285,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • weight_decay: 0.001
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • dataloader_persistent_workers: True
  • push_to_hub: True
  • hub_model_id: LamaDiab/MiniLM-19Data-128ConstantBATCH-SemanticEngine
  • hub_strategy: all_checkpoints

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: True
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: LamaDiab/MiniLM-19Data-128ConstantBATCH-SemanticEngine
  • hub_strategy: all_checkpoints
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss cosine_accuracy
0.0003 1 5.4604 - -
0.2639 1000 3.533 0.9935 0.8844
0.5277 2000 2.3361 0.9458 0.8922
0.7916 3000 1.3548 0.9701 0.8852
1.0554 4000 1.0121 0.9535 0.8963
1.3191 5000 1.3913 0.9373 0.9014
1.5828 6000 1.3354 0.9366 0.9083
1.8465 7000 1.2488 0.9145 0.9103
2.1102 8000 1.1746 0.9236 0.9104
2.3739 9000 1.127 0.9103 0.9129
2.6377 10000 1.0852 0.9026 0.9120
2.9014 11000 1.0764 0.8946 0.9143
3.1651 12000 1.0508 0.9052 0.9132
3.4288 13000 1.0045 0.9048 0.9143
3.6925 14000 0.998 0.9035 0.9154
3.9562 15000 0.994 0.8899 0.9173
4.2199 16000 0.9831 0.9013 0.9165
4.4836 17000 0.9434 0.8971 0.9170
4.7474 18000 0.9465 0.8993 0.9182

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.1.2
  • Transformers: 4.53.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.4.1
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}