SentenceTransformer based on pkqinys/shawhin-clip-title-thumbnail-embeddings

This is a sentence-transformers model finetuned from pkqinys/shawhin-clip-title-thumbnail-embeddings on the shawhin-yt-title-thumbnail-pairs dataset. It maps sentences & paragraphs to a None-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): CLIPModel()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio',
    'How to Evaluate (and Improve) Your LLM Apps',
    'How to Improve LLMs with Tools (ft. OpenAI Agents SDK)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

  • Datasets: yt-title-thumbnail-train and yt-title-thumbnail-valid
  • Evaluated with TripletEvaluator
Metric yt-title-thumbnail-train yt-title-thumbnail-valid
cosine_accuracy 1.0 0.9375

Training Details

Training Dataset

shawhin-yt-title-thumbnail-pairs

  • Dataset: shawhin-yt-title-thumbnail-pairs at 24578b7
  • Size: 75 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 75 samples:
    anchor positive negative
    type PIL.JpegImagePlugin.JpegImageFile string string
    details
    • min: 8 tokens
    • mean: 15.52 tokens
    • max: 27 tokens
    • min: 8 tokens
    • mean: 15.49 tokens
    • max: 27 tokens
  • Samples:
    anchor positive negative
    A Practical Introduction to Large Language Models (LLMs) Prompt Engineering: How to Trick AI into Solving Your Problems
    How to Build a Notion AI Agent (in 18 minutes) 4 Ways to Measure Fat Tails with Python (+ Example Code)
    Context Engineering Explained (5 Practical Tips) Why I Quit My $150,000 Data Science Job
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

shawhin-yt-title-thumbnail-pairs

  • Dataset: shawhin-yt-title-thumbnail-pairs at 24578b7
  • Size: 16 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 16 samples:
    anchor positive negative
    type PIL.JpegImagePlugin.JpegImageFile string string
    details
    • min: 8 tokens
    • mean: 14.5 tokens
    • max: 22 tokens
    • min: 9 tokens
    • mean: 13.56 tokens
    • max: 18 tokens
  • Samples:
    anchor positive negative
    Principal Component Analysis (PCA) Introduction & Example (Python) Code
    5 Reasons Why Every Data Scientist Should Consider Freelancing Text Embeddings, Classification, and Semantic Search (w/ Python Code)
    5 AI Projects You Can Build This Weekend (with Python) The Wavelet Transform
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • learning_rate: 1e-05
  • num_train_epochs: 4

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss yt-title-thumbnail-train_cosine_accuracy yt-title-thumbnail-valid_cosine_accuracy
0 0 - - 1.0 0.9375
0.2 1 0.5678 - - -
0.4 2 0.8427 - - -
0.6 3 0.8475 - - -
0.8 4 0.9326 - - -
1.0 5 0.9615 1.7712 1.0 0.9375
0.1 1 0.1598 - - -
0.2 2 0.084 - - -
0.3 3 0.2701 - - -
0.4 4 0.4036 - - -
0.5 5 0.3907 - - -
0.6 6 0.4165 - - -
0.7 7 0.6304 - - -
0.8 8 0.3674 - - -
0.9 9 0.7016 - - -
1.0 10 0.2821 1.2305 - -
1.1 11 0.2749 - - -
1.2 12 0.4927 - - -
1.3 13 0.2579 - - -
1.4 14 0.5595 - - -
1.5 15 0.5023 - - -
1.6 16 0.2853 - - -
1.7 17 0.44 - - -
1.8 18 0.5634 - - -
1.9 19 0.4209 - - -
2.0 20 0.0825 1.2300 - -
0.0526 1 0.0705 - - -
0.1053 2 0.0566 - - -
0.1579 3 0.0493 - - -
0.2105 4 0.0315 - - -
0.2632 5 0.0866 - - -
0.3158 6 0.069 - - -
0.3684 7 0.3537 - - -
0.4211 8 0.122 - - -
0.4737 9 0.1622 - - -
0.5263 10 0.1299 - - -
0.5789 11 0.2733 - - -
0.6316 12 0.1462 - - -
0.6842 13 0.3868 - - -
0.7368 14 0.2076 - - -
0.7895 15 0.1524 - - -
0.8421 16 0.2408 - - -
0.8947 17 0.3624 - - -
0.9474 18 0.3834 - - -
1.0 19 0.2456 0.7823 - -
1.0526 20 0.1087 - - -
1.1053 21 0.1406 - - -
1.1579 22 0.1029 - - -
1.2105 23 0.1475 - - -
1.2632 24 0.1049 - - -
1.3158 25 0.053 - - -
1.3684 26 0.3995 - - -
1.4211 27 0.1227 - - -
1.4737 28 0.2695 - - -
1.5263 29 0.2224 - - -
1.5789 30 0.0945 - - -
1.6316 31 0.1475 - - -
1.6842 32 0.4774 - - -
1.7368 33 0.0315 - - -
1.7895 34 0.1688 - - -
1.8421 35 0.3641 - - -
1.8947 36 0.1004 - - -
1.9474 37 0.1414 - - -
2.0 38 0.067 0.7837 - -
2.0526 39 0.1435 - - -
2.1053 40 0.2146 - - -
2.1579 41 0.2392 - - -
2.2105 42 0.1392 - - -
2.2632 43 0.3378 - - -
2.3158 44 0.0715 - - -
2.3684 45 0.1154 - - -
2.4211 46 0.2249 - - -
2.4737 47 0.0407 - - -
2.5263 48 0.0414 - - -
2.5789 49 0.1295 - - -
2.6316 50 0.0922 - - -
2.6842 51 0.077 - - -
2.7368 52 0.4554 - - -
2.7895 53 0.0699 - - -
2.8421 54 0.0663 - - -
2.8947 55 0.2612 - - -
2.9474 56 0.1907 - - -
3.0 57 0.1049 0.7811 - -
3.0526 58 0.1027 - - -
3.1053 59 0.2408 - - -
3.1579 60 0.0248 - - -
3.2105 61 0.2142 - - -
3.2632 62 0.1579 - - -
3.3158 63 0.0789 - - -
3.3684 64 0.0668 - - -
3.4211 65 0.1484 - - -
3.4737 66 0.3956 - - -
3.5263 67 0.1063 - - -
3.5789 68 0.4022 - - -
3.6316 69 0.5607 - - -
3.6842 70 0.0283 - - -
3.7368 71 0.0781 - - -
3.7895 72 0.248 - - -
3.8421 73 0.08 - - -
3.8947 74 0.2495 - - -
3.9474 75 0.1528 - - -
4.0 76 0.2234 - - -

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0
  • PyTorch: 2.3.0
  • Accelerate: 1.10.1
  • Datasets: 4.1.1
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pkqinys/shawhin-clip-title-thumbnail-embeddings

Unable to build the model tree, the base model loops to the model itself. Learn more.

Dataset used to train pkqinys/shawhin-clip-title-thumbnail-embeddings

Papers for pkqinys/shawhin-clip-title-thumbnail-embeddings

Evaluation results