LamaDiab's picture
Updating model weights
0ccfe42 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:647236
  - loss:MultipleNegativesSymmetricRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: essence multi task concealer 15 natural nude
    sentences:
      - pure oxygen 20 vol
      - essence
      - face make-up
  - source_sentence: faber castell jumbo colored pencil, metallic copper
    sentences:
      - ' faber castell colored pencil'
      - pencil
      - a4 photographic paper, 5 colors, 100 sheets, 80 gsm
  - source_sentence: gedo & the champ
    sentences:
      - children book
      - ' book'
      - diary of a wimpy kid do-it-youself book
  - source_sentence: green track suit
    sentences:
      - outfit
      - green track suit
      - tres
  - source_sentence: must kindergarten backpack mermazing 2 cases
    sentences:
      - crescent stand with 3 dates plate gold
      - school supplies
      - bag
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy
            value: 0.9700284004211426
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LamaDiab/v2MiniLM-V22Data-128ConstantBATCH-SemanticEngine")
# Run inference
sentences = [
    'must kindergarten backpack mermazing 2 cases',
    'school supplies',
    'crescent stand with 3 dates plate gold',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.5733, -0.2166],
#         [ 0.5733,  1.0000, -0.0339],
#         [-0.2166, -0.0339,  1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.97

Training Details

Training Dataset

Unnamed Dataset

  • Size: 647,236 training samples
  • Columns: anchor, positive, and itemCategory
  • Approximate statistics based on the first 1000 samples:
    anchor positive itemCategory
    type string string string
    details
    • min: 3 tokens
    • mean: 11.56 tokens
    • max: 50 tokens
    • min: 3 tokens
    • mean: 4.55 tokens
    • max: 12 tokens
    • min: 3 tokens
    • mean: 3.91 tokens
    • max: 9 tokens
  • Samples:
    anchor positive itemCategory
    petrol samsung galaxy smart phone smart phone
    must trolley bag must true football 4 cases wheels cover backpack bag
    sanpellegrino chino is a bold and refreshing italian beverage with a unique bittersweet flavor made from herbal extracts and citrus best served chilled for a distinctive taste experience chino can drink beverage
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 9,509 evaluation samples
  • Columns: anchor, positive, negative, and itemCategory
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative itemCategory
    type string string string string
    details
    • min: 3 tokens
    • mean: 9.63 tokens
    • max: 43 tokens
    • min: 3 tokens
    • mean: 6.61 tokens
    • max: 150 tokens
    • min: 3 tokens
    • mean: 9.58 tokens
    • max: 46 tokens
    • min: 3 tokens
    • mean: 3.88 tokens
    • max: 10 tokens
  • Samples:
    anchor positive negative itemCategory
    pilot mechanical pencil progrex h-127 - 0.7 mm pencil artist pen brush tip 1.5m gold no.250 pencil
    superior drawing marker -pen - set of 12 colors - 2 nib superior notte 11-101 a5 stapled squared notebook, 60 sheets, cardboard cover, 60 grams, 148 x 210 mm, turkish marker
    first person singular author: haruki murakami haruki murakami book yellow dinosaur assembling game literature and fiction
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • weight_decay: 0.001
  • num_train_epochs: 6
  • warmup_ratio: 0.2
  • fp16: True
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • dataloader_persistent_workers: True
  • push_to_hub: True
  • hub_model_id: v2MiniLM-V22Data-128ConstantBATCH-SemanticEngine
  • hub_strategy: all_checkpoints

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: True
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: v2MiniLM-V22Data-128ConstantBATCH-SemanticEngine
  • hub_strategy: all_checkpoints
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss cosine_accuracy
0.0002 1 3.9486 - -
0.1977 1000 3.172 0.5803 0.9392
0.3955 2000 2.6021 0.5286 0.9490
0.5932 3000 2.1529 0.4992 0.9545
0.7910 4000 1.3847 0.4794 0.9547
0.9887 5000 0.9942 0.4432 0.9548
1.1864 6000 1.4574 0.4378 0.9597
1.3841 7000 1.3286 0.4299 0.9629
1.5817 8000 1.2024 0.4179 0.9646
1.7794 9000 1.1554 0.4171 0.9648
1.9771 10000 1.0769 0.4174 0.9635
2.1747 11000 0.9984 0.4163 0.9677
2.3724 12000 0.9714 0.4026 0.9676
2.5701 13000 0.9208 0.4087 0.9674
2.7677 14000 0.9027 0.3975 0.9681
2.9654 15000 0.8854 0.4018 0.9680
3.1631 16000 0.8299 0.4085 0.9688
3.3607 17000 0.8103 0.3995 0.9687
3.5584 18000 0.7853 0.3974 0.9677
3.7561 19000 0.7734 0.3981 0.9685
3.9537 20000 0.7758 0.3996 0.9685
4.1514 21000 0.7463 0.4009 0.9690
4.3491 22000 0.7212 0.4014 0.9688
4.5467 23000 0.7312 0.3967 0.9695
4.7444 24000 0.7175 0.3956 0.9695
4.9421 25000 0.7196 0.3931 0.9701
5.1398 26000 0.6815 0.3936 0.9690
5.3374 27000 0.6875 0.3936 0.9695
5.5351 28000 0.6955 0.3948 0.9692
5.7328 29000 0.6946 0.3941 0.9697
5.9304 30000 0.676 0.3940 0.9700

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.1.2
  • Transformers: 4.53.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.4.1
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}