SentenceTransformer based on intfloat/e5-large-v2

This is a sentence-transformers model finetuned from intfloat/e5-large-v2. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/e5-large-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'shizuko yoshinaga first time shots',
    "Debut of a MILF AV Actress Document. First Time Shots! Cute Smile and Made-to-Fuck Body on a Mature Woman in her 50's. Shizuko Yoshinaga categorized as Mature Woman, Shaved Pussy, Documentary",
    '50 And Filming Her First Creampie Fumie Saito categorized as Mature Woman, Documentary',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4913, 0.1716],
#         [0.4913, 1.0000, 0.4514],
#         [0.1716, 0.4514, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6884
cosine_accuracy@3 0.8069
cosine_accuracy@5 0.8451
cosine_accuracy@10 0.8874
cosine_precision@1 0.6884
cosine_precision@3 0.269
cosine_precision@5 0.169
cosine_precision@10 0.0887
cosine_recall@1 0.6884
cosine_recall@3 0.8069
cosine_recall@5 0.8451
cosine_recall@10 0.8874
cosine_ndcg@10 0.7879
cosine_mrr@10 0.756
cosine_map@100 0.7597

Training Details

Training Dataset

Unnamed Dataset

  • Size: 257,891 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 5 tokens
    • mean: 11.54 tokens
    • max: 70 tokens
    • min: 9 tokens
    • mean: 34.42 tokens
    • max: 176 tokens
    • min: 10 tokens
    • mean: 28.4 tokens
    • max: 122 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    mature masseuse stimulates client's throat and pussy Remarkable Masseuse's Dick Stimulates Throat And Vagina 12 People 240 Minutes 5 categorized as Mature Woman, Massage Mature Woman On The Forefront Of The Sex Industry - Surprisingly Successful Mature Woman Massage Specialist's Seductive Technique! categorized as Mature Woman, Massage
    Threesome featuring Akari Yukino and deep pussy digging Akari Yukino In Sweaty, Deep, Pussy Digging Sex categorized as Slender, Shemale, Anal Play, Threesome / Foursome, Facial, Daydreamers Super Fuck-a-thon! Cum-a-thon! 24-Hours Akari Hoshino Total Guerrilla SPECIAL!! categorized as Car Sex, Threesome / Foursome
    man massages woman at spa with lotion A Male Esthetician In A Women-Only Massage Parlor... categorized as Massage Parlor, Lotion New Masseur Came to a Women Only Massage Parlor and Starts Sending Numerous Women To Pleasure Heaven! categorized as Massage Parlor, Voyeur, Massage
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 10
  • num_train_epochs: 2
  • eval_strategy: steps
  • per_device_eval_batch_size: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 10
  • num_train_epochs: 2
  • max_steps: -1
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: steps
  • per_device_eval_batch_size: 10
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss test_cosine_ndcg@10
0.0194 500 0.8527 0.7377
0.0388 1000 0.2592 0.7422
0.0582 1500 0.2065 0.7625
0.0775 2000 0.1876 0.7671
0.0969 2500 0.1892 0.7511
0.1163 3000 0.1812 0.7567
0.1357 3500 0.1890 0.7485
0.1551 4000 0.1898 0.7425
0.1745 4500 0.1809 0.7477
0.1939 5000 0.1920 0.7497
0.2133 5500 0.1877 0.7347
0.2326 6000 0.1953 0.7270
0.2520 6500 0.1907 0.7327
0.2714 7000 0.1912 0.7154
0.2908 7500 0.1882 0.7113
0.3102 8000 0.1912 0.7404
0.3296 8500 0.1884 0.7199
0.3490 9000 0.1761 0.7309
0.3684 9500 0.1864 0.7406
0.3877 10000 0.1805 0.7239
0.4071 10500 0.1695 0.7383
0.4265 11000 0.1814 0.7413
0.4459 11500 0.1689 0.7406
0.4653 12000 0.1607 0.7475
0.4847 12500 0.1654 0.7337
0.5041 13000 0.1714 0.7442
0.5235 13500 0.1639 0.7409
0.5428 14000 0.1560 0.7311
0.5622 14500 0.1521 0.7238
0.5816 15000 0.1665 0.7395
0.6010 15500 0.1686 0.7399
0.6204 16000 0.1619 0.7496
0.6398 16500 0.1593 0.7337
0.6592 17000 0.1604 0.7567
0.6786 17500 0.1646 0.7464
0.6979 18000 0.1597 0.7482
0.7173 18500 0.1590 0.7491
0.7367 19000 0.1526 0.7201
0.7561 19500 0.1591 0.7542
0.7755 20000 0.1465 0.7456
0.7949 20500 0.1580 0.7556
0.8143 21000 0.1474 0.7511
0.8337 21500 0.1443 0.7564
0.8530 22000 0.1396 0.7580
0.8724 22500 0.1419 0.7555
0.8918 23000 0.1414 0.7615
0.9112 23500 0.1386 0.7542
0.9306 24000 0.1532 0.7540
0.9500 24500 0.1469 0.7664
0.9694 25000 0.1476 0.7549
0.9888 25500 0.1441 0.7629
1.0 25790 - 0.7567
1.0081 26000 0.1210 0.7555
1.0275 26500 0.0976 0.7619
1.0469 27000 0.1011 0.7697
1.0663 27500 0.0989 0.7639
1.0857 28000 0.0917 0.7632
1.1051 28500 0.0971 0.7646
1.1245 29000 0.0958 0.7615
1.1439 29500 0.1000 0.7619
1.1632 30000 0.0932 0.7620
1.1826 30500 0.0966 0.7608
1.2020 31000 0.0922 0.7505
1.2214 31500 0.0903 0.7722
1.2408 32000 0.0997 0.7689
1.2602 32500 0.0818 0.7684
1.2796 33000 0.0926 0.7651
1.2990 33500 0.1002 0.7737
1.3183 34000 0.0893 0.7684
1.3377 34500 0.0945 0.7690
1.3571 35000 0.0855 0.7761
1.3765 35500 0.0918 0.7725
1.3959 36000 0.0982 0.7767
1.4153 36500 0.0854 0.7685
1.4347 37000 0.0883 0.7718
1.4541 37500 0.0921 0.7681
1.4734 38000 0.0912 0.7763
1.4928 38500 0.0908 0.7716
1.5122 39000 0.0891 0.7772
1.5316 39500 0.0912 0.7757
1.5510 40000 0.0811 0.7762
1.5704 40500 0.0833 0.7725
1.5898 41000 0.0830 0.7800
1.6092 41500 0.0874 0.7787
1.6285 42000 0.0890 0.7837
1.6479 42500 0.0822 0.7754
1.6673 43000 0.0805 0.7800
1.6867 43500 0.0799 0.7788
1.7061 44000 0.0898 0.7838
1.7255 44500 0.0816 0.7813
1.7449 45000 0.0817 0.7826
1.7642 45500 0.0779 0.7824
1.7836 46000 0.0846 0.7815
1.8030 46500 0.0804 0.7847
1.8224 47000 0.0773 0.7808
1.8418 47500 0.0767 0.7813
1.8612 48000 0.0898 0.7855
1.8806 48500 0.0857 0.7854
1.9000 49000 0.0837 0.7844
1.9193 49500 0.0834 0.7827
1.9387 50000 0.0753 0.7861
1.9581 50500 0.0880 0.7867
1.9775 51000 0.0847 0.7879

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.3.0
  • Transformers: 5.5.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
235
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dostoevskyIdiot/intfloat-e5-large-v2-jaiv-v2

Finetuned
(32)
this model

Papers for dostoevskyIdiot/intfloat-e5-large-v2-jaiv-v2

Evaluation results