SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("CharlyR/clip_distilled_rgb_emb")
# Run inference
sentences = [
    'rgb(30,57,15)',
    'Mid Green',
    'Dusk Mauve',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.6900, -0.2139],
#         [ 0.6900,  1.0000, -0.3807],
#         [-0.2139, -0.3807,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 125,000 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 11 tokens
    • mean: 11.0 tokens
    • max: 11 tokens
    • min: 3 tokens
    • mean: 4.68 tokens
    • max: 8 tokens
  • Samples:
    sentence_0 sentence_1
    rgb(116,59,58) Oxblood Red
    rgb(101,92,166) Windsor Purple
    rgb(232,19,216) Purplish Pink
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 10
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0640 500 2.9497
0.1280 1000 1.86
0.1920 1500 1.3757
0.2560 2000 1.2099
0.3200 2500 1.1121
0.3840 3000 1.0661
0.4480 3500 1.0024
0.5120 4000 1.0025
0.5760 4500 0.9696
0.6400 5000 0.9547
0.7040 5500 0.9279
0.7680 6000 0.911
0.8319 6500 0.9023
0.8959 7000 0.9043
0.9599 7500 0.892
1.0239 8000 0.8767
1.0879 8500 0.8663
1.1519 9000 0.866
1.2159 9500 0.854
1.2799 10000 0.8616
1.3439 10500 0.8469
1.4079 11000 0.8368
1.4719 11500 0.8299
1.5359 12000 0.8489
1.5999 12500 0.8107
1.6639 13000 0.8122
1.7279 13500 0.8288
1.7919 14000 0.8262
1.8559 14500 0.8132
1.9199 15000 0.8126
1.9839 15500 0.8213
2.0479 16000 0.7897
2.1119 16500 0.8009
2.1759 17000 0.822
2.2399 17500 0.8034
2.3039 18000 0.7876
2.3678 18500 0.79
2.4318 19000 0.7983
2.4958 19500 0.8155
2.5598 20000 0.7962
2.6238 20500 0.7949
2.6878 21000 0.7839
2.7518 21500 0.7865
2.8158 22000 0.7829
2.8798 22500 0.7861
2.9438 23000 0.7701
3.0078 23500 0.7972
3.0718 24000 0.7795
3.1358 24500 0.7655
3.1998 25000 0.7722
3.2638 25500 0.7603
3.3278 26000 0.766
3.3918 26500 0.7654
3.4558 27000 0.764
3.5198 27500 0.763
3.5838 28000 0.7708
3.6478 28500 0.764
3.7118 29000 0.7593
3.7758 29500 0.7667
3.8398 30000 0.7643
3.9038 30500 0.7555
3.9677 31000 0.7742
4.0317 31500 0.7554
4.0957 32000 0.7489
4.1597 32500 0.7545
4.2237 33000 0.7445
4.2877 33500 0.7701
4.3517 34000 0.7565
4.4157 34500 0.7352
4.4797 35000 0.7492
4.5437 35500 0.7526
4.6077 36000 0.7354
4.6717 36500 0.761
4.7357 37000 0.7436
4.7997 37500 0.749
4.8637 38000 0.7511
4.9277 38500 0.7264
4.9917 39000 0.7424
5.0557 39500 0.7428
5.1197 40000 0.7284
5.1837 40500 0.7302
5.2477 41000 0.7498
5.3117 41500 0.7272
5.3757 42000 0.7416
5.4397 42500 0.7169
5.5036 43000 0.7448
5.5676 43500 0.744
5.6316 44000 0.7396
5.6956 44500 0.7229
5.7596 45000 0.7262
5.8236 45500 0.7317
5.8876 46000 0.7358
5.9516 46500 0.7238
6.0156 47000 0.7274
6.0796 47500 0.7149
6.1436 48000 0.7262
6.2076 48500 0.7277
6.2716 49000 0.7161
6.3356 49500 0.7235
6.3996 50000 0.7321
6.4636 50500 0.7185
6.5276 51000 0.7303
6.5916 51500 0.7222
6.6556 52000 0.7202
6.7196 52500 0.7109
6.7836 53000 0.7122
6.8476 53500 0.728
6.9116 54000 0.7073
6.9756 54500 0.7237
7.0395 55000 0.7005
7.1035 55500 0.7303
7.1675 56000 0.7144
7.2315 56500 0.7172
7.2955 57000 0.7075
7.3595 57500 0.7188
7.4235 58000 0.7088
7.4875 58500 0.7133
7.5515 59000 0.7121
7.6155 59500 0.7216
7.6795 60000 0.7189
7.7435 60500 0.723
7.8075 61000 0.716
7.8715 61500 0.7083
7.9355 62000 0.7099
7.9995 62500 0.714
8.0635 63000 0.7008
8.1275 63500 0.7165
8.1915 64000 0.7044
8.2555 64500 0.703
8.3195 65000 0.6991
8.3835 65500 0.7048
8.4475 66000 0.7038
8.5115 66500 0.7137
8.5755 67000 0.6976
8.6394 67500 0.7176
8.7034 68000 0.7153
8.7674 68500 0.6924
8.8314 69000 0.7245
8.8954 69500 0.7057
8.9594 70000 0.6915
9.0234 70500 0.708
9.0874 71000 0.7071
9.1514 71500 0.7051
9.2154 72000 0.7049
9.2794 72500 0.7216
9.3434 73000 0.6964
9.4074 73500 0.6972
9.4714 74000 0.7049
9.5354 74500 0.6951
9.5994 75000 0.7055
9.6634 75500 0.6945
9.7274 76000 0.7058
9.7914 76500 0.7018
9.8554 77000 0.7029
9.9194 77500 0.6992
9.9834 78000 0.6876

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.1
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CharlyR/clip_distilled_rgb_emb

Finetuned
(753)
this model

Papers for CharlyR/clip_distilled_rgb_emb