metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:705905
- loss:MultipleNegativesSymmetricRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: gerber baby food fruits apples bananas & cereal
sentences:
- world of sweets puzzle
- "baby\_food"
- baby food
- source_sentence: granville original one bite original rice crispy squares
sentences:
- ' one bite rice crispy '
- sweet
- bounty wafer rolls
- source_sentence: rosa / porcelain us andalusia mug
sentences:
- mug
- ' rosa mug'
- melamine small plate - teal
- source_sentence: cetaphil sunscreen spf 50+ cream 89 ml
sentences:
- sunscreen
- ' cetaphil sunscreen cream'
- garnier intensity (6.60) intense ruby
- source_sentence: italian dolce provolone
sentences:
- trident - gum strawberry flavor - 5 per pack
- >-
experience the authentic taste of italy with our italian dolce
provolone. indulge in its creamy texture, delicate flavors, and
versatility in both simple and sophisticated culinary creations.
- dairy
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
model-index:
- name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
results:
- task:
type: triplet
name: Triplet
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy
value: 0.9695025682449341
name: Cosine Accuracy
SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("LamaDiab/v3MiniLM-V18Data-256ConstantBATCH-SemanticEngine")
# Run inference
sentences = [
'italian dolce provolone',
'experience the authentic taste of italy with our italian dolce provolone. indulge in its creamy texture, delicate flavors, and versatility in both simple and sophisticated culinary creations.',
'trident - gum strawberry flavor - 5 per pack',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8329, 0.2031],
# [0.8329, 1.0000, 0.2395],
# [0.2031, 0.2395, 1.0000]])
Evaluation
Metrics
Triplet
- Evaluated with
TripletEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.9695 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 705,905 training samples
- Columns:
anchor,positive, anditemCategory - Approximate statistics based on the first 1000 samples:
anchor positive itemCategory type string string string details - min: 3 tokens
- mean: 13.19 tokens
- max: 51 tokens
- min: 3 tokens
- mean: 4.46 tokens
- max: 93 tokens
- min: 3 tokens
- mean: 3.91 tokens
- max: 11 tokens
- Samples:
anchor positive itemCategory mango nos nos smallmilk chocolate ganache cakesweetlux soap creamy perfection 165 gmsoaphand soapgrey deo originalclassic deodrantwomen's deodorant - Loss:
MultipleNegativesSymmetricRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Evaluation Dataset
Unnamed Dataset
- Size: 9,509 evaluation samples
- Columns:
anchor,positive,negative, anditemCategory - Approximate statistics based on the first 1000 samples:
anchor positive negative itemCategory type string string string string details - min: 3 tokens
- mean: 9.63 tokens
- max: 43 tokens
- min: 2 tokens
- mean: 6.53 tokens
- max: 150 tokens
- min: 3 tokens
- mean: 9.52 tokens
- max: 50 tokens
- min: 3 tokens
- mean: 3.88 tokens
- max: 10 tokens
- Samples:
anchor positive negative itemCategory pilot mechanical pencil progrex h-127 - 0.7 mmoffice suppliesscary halloween skull maskpencilsuperior drawing marker -pen - set of 12 colors - 2 nibsuperiorcoloring and writing book 21 x 29.7 cm 100 gsm 18 pages number subtraction ma4014markerfirst person singular author: haruki murakamiharuki murakami bookburied secretsliterature and fiction - Loss:
MultipleNegativesSymmetricRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 256per_device_eval_batch_size: 256learning_rate: 2e-05weight_decay: 0.01num_train_epochs: 6warmup_ratio: 0.2fp16: Truedataloader_num_workers: 1dataloader_prefetch_factor: 2dataloader_persistent_workers: Truepush_to_hub: Truehub_model_id: LamaDiab/v3MiniLM-V18Data-256ConstantBATCH-SemanticEnginehub_strategy: all_checkpoints
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 256per_device_eval_batch_size: 256per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 6max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.2warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 1dataloader_prefetch_factor: 2past_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Trueskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Trueresume_from_checkpoint: Nonehub_model_id: LamaDiab/v3MiniLM-V18Data-256ConstantBATCH-SemanticEnginehub_strategy: all_checkpointshub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
|---|---|---|---|---|
| 0.0004 | 1 | 4.1707 | - | - |
| 0.3626 | 1000 | 3.5534 | 0.5626 | 0.9461 |
| 0.7252 | 2000 | 2.3098 | 0.4896 | 0.9515 |
| 1.0877 | 3000 | 1.7306 | 0.4473 | 0.9593 |
| 1.45 | 4000 | 1.8694 | 0.4308 | 0.9606 |
| 1.8123 | 5000 | 1.6628 | 0.4218 | 0.9643 |
| 2.1746 | 6000 | 1.5159 | 0.4153 | 0.9648 |
| 2.5370 | 7000 | 1.435 | 0.4096 | 0.9669 |
| 2.8993 | 8000 | 1.3973 | 0.3964 | 0.9683 |
| 3.2616 | 9000 | 1.3101 | 0.3983 | 0.9674 |
| 3.6239 | 10000 | 1.3044 | 0.3955 | 0.9680 |
| 3.9862 | 11000 | 1.2367 | 0.3905 | 0.9683 |
| 4.3486 | 12000 | 1.2202 | 0.3892 | 0.9687 |
| 4.7109 | 13000 | 1.1993 | 0.3889 | 0.9685 |
| 5.0732 | 14000 | 1.1849 | 0.3886 | 0.9686 |
| 5.4355 | 15000 | 1.1555 | 0.3880 | 0.9692 |
| 5.7978 | 16000 | 1.1538 | 0.3887 | 0.9695 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 5.1.2
- Transformers: 4.53.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.9.0
- Datasets: 4.4.1
- Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}