model-b-structured / README.md
radoslavralev's picture
Add new SentenceTransformer model
d5e7923 verified
|
raw
history blame
30.6 kB
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:359997
  - loss:MultipleNegativesRankingLoss
base_model: prajjwal1/bert-small
widget:
  - source_sentence: >-
      When do you use Ms. or Mrs.? Is one for a married woman and one for one
      that's not married? Which one is for what?
    sentences:
      - >-
        When do you use Ms. or Mrs.? Is one for a married woman and one for one
        that's not married? Which one is for what?
      - Nations that do/does otherwise? Which one do I use?
      - What is the best way to make money on Quora?
  - source_sentence: >-
      Which ointment is applied to the face of UFC fighters at the commencement
      of a bout? What does it do?
    sentences:
      - Why don't bikes have a gear indicator?
      - >-
        Which ointment is applied to the face of UFC fighters at the
        commencement of a bout? What does it do?
      - How do I get the body of a UFC Fighter?
  - source_sentence: Do you love the life you live?
    sentences:
      - Which file formats are compatible with iTunes?
      - Do you love the life you're living?
      - >-
        What is the best way to find a person just using their phone by trying
        to track the other persons phone and get a location from it?
  - source_sentence: >-
      Can I do shoulder and triceps workout on same day? What other combinations
      like this can I do?
    sentences:
      - >-
        Can I do shoulder and triceps workout on same day? I can What other
        combinations like thisdo?
      - How can I save a Snapchat video that others posted?
      - >-
        Can I do shoulder and triceps workout on same day? What other
        combinations like this can I do?
  - source_sentence: I am a married woman and I'm in love with married man. what should I do?
    sentences:
      - How can I earn money easily online?
      - >-
        I am not a married woman and I 'm in love with married man . what should
        I do ?
      - I am a married woman and I'm in love with married man. what should I do?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_ndcg@10
  - cosine_mrr@1
  - cosine_mrr@5
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: SentenceTransformer based on prajjwal1/bert-small
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: val
          type: val
        metrics:
          - type: cosine_accuracy@1
            value: 0.828025
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9027
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.931025
            name: Cosine Accuracy@5
          - type: cosine_precision@1
            value: 0.828025
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3008999999999999
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.186205
            name: Cosine Precision@5
          - type: cosine_recall@1
            value: 0.828025
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9027
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.931025
            name: Cosine Recall@5
          - type: cosine_ndcg@10
            value: 0.8942284691055087
            name: Cosine Ndcg@10
          - type: cosine_mrr@1
            value: 0.828025
            name: Cosine Mrr@1
          - type: cosine_mrr@5
            value: 0.8677179166666629
            name: Cosine Mrr@5
          - type: cosine_mrr@10
            value: 0.8721162896825339
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8742240723304836
            name: Cosine Map@100

SentenceTransformer based on prajjwal1/bert-small

This is a sentence-transformers model finetuned from prajjwal1/bert-small. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: prajjwal1/bert-small
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 512 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("redis/model-b-structured")
# Run inference
sentences = [
    "I am a married woman and I'm in love with married man. what should I do?",
    "I am a married woman and I'm in love with married man. what should I do?",
    "I am not a married woman and I 'm in love with married man . what should I do ?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 512]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 0.4050],
#         [1.0000, 1.0000, 0.4050],
#         [0.4050, 0.4050, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.828
cosine_accuracy@3 0.9027
cosine_accuracy@5 0.931
cosine_precision@1 0.828
cosine_precision@3 0.3009
cosine_precision@5 0.1862
cosine_recall@1 0.828
cosine_recall@3 0.9027
cosine_recall@5 0.931
cosine_ndcg@10 0.8942
cosine_mrr@1 0.828
cosine_mrr@5 0.8677
cosine_mrr@10 0.8721
cosine_map@100 0.8742

Training Details

Training Dataset

Unnamed Dataset

  • Size: 359,997 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 15.46 tokens
    • max: 49 tokens
    • min: 4 tokens
    • mean: 15.52 tokens
    • max: 49 tokens
    • min: 4 tokens
    • mean: 16.63 tokens
    • max: 59 tokens
  • Samples:
    anchor positive negative
    Shall I upgrade my iPhone 5s to iOS 10 final version? Should I upgrade an iPhone 5s to iOS 10? Shall my iPhone 5s upgrade Ito iOS 10 final version?
    Is Donald Trump really going to be the president of United States? Do you think Donald Trump could conceivably be the next President of the United States? Is Donald Trump really going not to be the president of United States ?
    What are real tips to improve work life balance? What are the best ways to create a work life balance? How far is Miami from Fort Lauderdale?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 40,000 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 15.71 tokens
    • max: 65 tokens
    • min: 6 tokens
    • mean: 15.79 tokens
    • max: 65 tokens
    • min: 5 tokens
    • mean: 16.59 tokens
    • max: 77 tokens
  • Samples:
    anchor positive negative
    Why were feathered dinosaur fossils only found in the last 20 years? Why were feathered dinosaur fossils only found in the last 20 years? Why are only few people aware that many dinosaurs had feathers?
    If FOX News is the conservative news station, which cable news network is for liberals/progressives? If FOX News is the conservative news station, which cable news network is for liberals/progressives? How much did Fox News and conservative leaning media networks stoke the anger that contributed to Donald Trump's popularity?
    How can guys last longer during sex? How do I last longer in sex? Why does economics require calculus?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 2e-05
  • weight_decay: 0.001
  • max_steps: 14060
  • warmup_ratio: 0.1
  • fp16: True
  • dataloader_drop_last: True
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 1
  • load_best_model_at_end: True
  • optim: adamw_torch
  • ddp_find_unused_parameters: False
  • push_to_hub: True
  • hub_model_id: redis/model-b-structured
  • eval_on_start: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 14060
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 1
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: False
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: redis/model-b-structured
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: True
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss val_cosine_ndcg@10
0 0 - 1.7418 0.7821
0.0711 100 2.0777 0.7932 0.8130
0.1422 200 0.7966 0.4005 0.8510
0.2134 300 0.3991 0.2603 0.8615
0.2845 400 0.3153 0.2051 0.8652
0.3556 500 0.2593 0.1740 0.8681
0.4267 600 0.2231 0.1568 0.8707
0.4979 700 0.2017 0.1443 0.8727
0.5690 800 0.1933 0.1322 0.8746
0.6401 900 0.1818 0.1217 0.8755
0.7112 1000 0.1714 0.1141 0.8769
0.7824 1100 0.157 0.1060 0.8780
0.8535 1200 0.1467 0.0998 0.8788
0.9246 1300 0.1394 0.0937 0.8805
0.9957 1400 0.1343 0.0910 0.8813
1.0669 1500 0.1222 0.0853 0.8822
1.1380 1600 0.1173 0.0820 0.8821
1.2091 1700 0.1082 0.0797 0.8828
1.2802 1800 0.1105 0.0777 0.8835
1.3514 1900 0.1093 0.0734 0.8833
1.4225 2000 0.1034 0.0744 0.8840
1.4936 2100 0.1016 0.0713 0.8845
1.5647 2200 0.0995 0.0699 0.8851
1.6358 2300 0.0994 0.0679 0.8849
1.7070 2400 0.1024 0.0667 0.8867
1.7781 2500 0.0911 0.0658 0.8868
1.8492 2600 0.0907 0.0640 0.8861
1.9203 2700 0.0941 0.0632 0.8859
1.9915 2800 0.093 0.0625 0.8870
2.0626 2900 0.0814 0.0618 0.8875
2.1337 3000 0.0811 0.0609 0.8868
2.2048 3100 0.0773 0.0602 0.8880
2.2760 3200 0.0813 0.0590 0.8873
2.3471 3300 0.0806 0.0584 0.8876
2.4182 3400 0.0765 0.0575 0.8882
2.4893 3500 0.0774 0.0581 0.8889
2.5605 3600 0.0761 0.0560 0.8883
2.6316 3700 0.0735 0.0560 0.8886
2.7027 3800 0.0711 0.0555 0.8891
2.7738 3900 0.0747 0.0551 0.8889
2.8450 4000 0.0731 0.0552 0.8897
2.9161 4100 0.0708 0.0543 0.8898
2.9872 4200 0.0778 0.0536 0.8901
3.0583 4300 0.0697 0.0540 0.8893
3.1294 4400 0.0668 0.0533 0.8900
3.2006 4500 0.0679 0.0526 0.8893
3.2717 4600 0.0652 0.0532 0.8902
3.3428 4700 0.0673 0.0520 0.8899
3.4139 4800 0.0625 0.0514 0.8903
3.4851 4900 0.0669 0.0515 0.8912
3.5562 5000 0.0641 0.0515 0.8915
3.6273 5100 0.0637 0.0509 0.8909
3.6984 5200 0.0635 0.0506 0.8908
3.7696 5300 0.0606 0.0499 0.8915
3.8407 5400 0.0633 0.0503 0.8917
3.9118 5500 0.0656 0.0498 0.8913
3.9829 5600 0.0658 0.0492 0.8916
4.0541 5700 0.0606 0.0489 0.8917
4.1252 5800 0.0585 0.0485 0.8914
4.1963 5900 0.0613 0.0490 0.8914
4.2674 6000 0.0568 0.0487 0.8909
4.3385 6100 0.0576 0.0481 0.8918
4.4097 6200 0.0603 0.0481 0.8915
4.4808 6300 0.0569 0.0480 0.8918
4.5519 6400 0.0553 0.0477 0.8921
4.6230 6500 0.057 0.0472 0.8918
4.6942 6600 0.0602 0.0472 0.8925
4.7653 6700 0.0541 0.0468 0.8922
4.8364 6800 0.0588 0.0468 0.8917
4.9075 6900 0.0588 0.0471 0.8920
4.9787 7000 0.0549 0.0469 0.8921
5.0498 7100 0.0522 0.0466 0.8920
5.1209 7200 0.0527 0.0462 0.8924
5.1920 7300 0.0519 0.0461 0.8924
5.2632 7400 0.0544 0.0459 0.8927
5.3343 7500 0.0549 0.0456 0.8925
5.4054 7600 0.0527 0.0460 0.8932
5.4765 7700 0.0519 0.0453 0.8920
5.5477 7800 0.0528 0.0455 0.8928
5.6188 7900 0.0525 0.0451 0.8929
5.6899 8000 0.0535 0.0454 0.8931
5.7610 8100 0.0526 0.0452 0.8931
5.8321 8200 0.0507 0.0454 0.8930
5.9033 8300 0.0511 0.0451 0.8932
5.9744 8400 0.0489 0.0451 0.8930
6.0455 8500 0.0509 0.0451 0.8929
6.1166 8600 0.0487 0.0447 0.8931
6.1878 8700 0.0494 0.0449 0.8932
6.2589 8800 0.0474 0.0444 0.8932
6.3300 8900 0.049 0.0448 0.8934
6.4011 9000 0.0492 0.0446 0.8934
6.4723 9100 0.0493 0.0443 0.8931
6.5434 9200 0.0517 0.0442 0.8931
6.6145 9300 0.0502 0.0445 0.8938
6.6856 9400 0.0501 0.0441 0.8935
6.7568 9500 0.0484 0.0439 0.8935
6.8279 9600 0.0472 0.0437 0.8935
6.8990 9700 0.0484 0.0435 0.8936
6.9701 9800 0.051 0.0433 0.8933
7.0413 9900 0.0496 0.0435 0.8935
7.1124 10000 0.0469 0.0434 0.8937
7.1835 10100 0.0479 0.0432 0.8935
7.2546 10200 0.0476 0.0430 0.8937
7.3257 10300 0.0454 0.0431 0.8934
7.3969 10400 0.0445 0.0430 0.8937
7.4680 10500 0.0471 0.0427 0.8936
7.5391 10600 0.0441 0.0429 0.8938
7.6102 10700 0.046 0.0429 0.8932
7.6814 10800 0.046 0.0428 0.8934
7.7525 10900 0.049 0.0428 0.8938
7.8236 11000 0.0476 0.0427 0.8939
7.8947 11100 0.0468 0.0425 0.8938
7.9659 11200 0.0465 0.0426 0.8940
8.0370 11300 0.048 0.0428 0.8938
8.1081 11400 0.0448 0.0425 0.8937
8.1792 11500 0.0431 0.0424 0.8939
8.2504 11600 0.0428 0.0424 0.8935
8.3215 11700 0.046 0.0424 0.8937
8.3926 11800 0.0471 0.0423 0.8938
8.4637 11900 0.0466 0.0424 0.8943
8.5349 12000 0.0431 0.0421 0.8941
8.6060 12100 0.0462 0.0421 0.8938
8.6771 12200 0.0425 0.0423 0.8941
8.7482 12300 0.0455 0.0421 0.8941
8.8193 12400 0.0445 0.0422 0.8940
8.8905 12500 0.0455 0.0422 0.8943
8.9616 12600 0.0448 0.0421 0.8941
9.0327 12700 0.0462 0.0421 0.8940
9.1038 12800 0.0429 0.0421 0.8939
9.1750 12900 0.0452 0.0421 0.8942
9.2461 13000 0.0439 0.0420 0.8943
9.3172 13100 0.0472 0.0420 0.8942
9.3883 13200 0.0447 0.0420 0.8943
9.4595 13300 0.0426 0.0420 0.8942
9.5306 13400 0.0445 0.0420 0.8942
9.6017 13500 0.0436 0.0419 0.8942
9.6728 13600 0.0445 0.0419 0.8943
9.7440 13700 0.0477 0.0419 0.8943
9.8151 13800 0.0439 0.0419 0.8942
9.8862 13900 0.0438 0.0419 0.8942
9.9573 14000 0.0468 0.0419 0.8942

Framework Versions

  • Python: 3.10.18
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.3
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.4.2
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}