metadata
language:
- en
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:557850
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: google-bert/bert-large-uncased
widget:
- source_sentence: A man is jumping unto his filthy bed.
sentences:
- A young male is looking at a newspaper while 2 females walks past him.
- The bed is dirty.
- The man is on the moon.
- source_sentence: >-
A carefully balanced male stands on one foot near a clean ocean beach
area.
sentences:
- A man is ouside near the beach.
- Three policemen patrol the streets on bikes
- A man is sitting on his couch.
- source_sentence: The man is wearing a blue shirt.
sentences:
- Near the trashcan the man stood and smoked
- >-
A man in a blue shirt leans on a wall beside a road with a blue van and
red car with water in the background.
- A man in a black shirt is playing a guitar.
- source_sentence: The girls are outdoors.
sentences:
- Two girls riding on an amusement part ride.
- a guy laughs while doing laundry
- >-
Three girls are standing together in a room, one is listening, one is
writing on a wall and the third is talking to them.
- source_sentence: >-
A construction worker peeking out of a manhole while his coworker sits on
the sidewalk smiling.
sentences:
- A worker is looking out of a manhole.
- A man is giving a presentation.
- The workers are both inside the manhole.
datasets:
- sentence-transformers/all-nli
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
model-index:
- name: SentenceTransformer based on google-bert/bert-large-uncased
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev
type: sts-dev
metrics:
- type: pearson_cosine
value: 0.7987980294416869
name: Pearson Cosine
- type: spearman_cosine
value: 0.8164680887920585
name: Spearman Cosine
SentenceTransformer based on google-bert/bert-large-uncased
This is a sentence-transformers model finetuned from google-bert/bert-large-uncased on the all-nli dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: google-bert/bert-large-uncased
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'A construction worker peeking out of a manhole while his coworker sits on the sidewalk smiling.',
'A worker is looking out of a manhole.',
'The workers are both inside the manhole.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8028, 0.6435],
# [0.8028, 1.0000, 0.7869],
# [0.6435, 0.7869, 1.0000]])
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts-dev - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.7988 |
| spearman_cosine | 0.8165 |
Training Details
Training Dataset
all-nli
- Dataset: all-nli at d482672
- Size: 557,850 training samples
- Columns:
anchor,positive, andnegative - Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 7 tokens
- mean: 10.46 tokens
- max: 46 tokens
- min: 6 tokens
- mean: 12.81 tokens
- max: 40 tokens
- min: 5 tokens
- mean: 13.4 tokens
- max: 50 tokens
- Samples:
anchor positive negative A person on a horse jumps over a broken down airplane.A person is outdoors, on a horse.A person is at a diner, ordering an omelette.Children smiling and waving at cameraThere are children presentThe kids are frowningA boy is jumping on skateboard in the middle of a red bridge.The boy does a skateboarding trick.The boy skates down the sidewalk. - Loss:
MatryoshkaLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768 ], "matryoshka_weights": [ 1 ], "n_dims_per_step": -1 }
Evaluation Dataset
all-nli
- Dataset: all-nli at d482672
- Size: 6,584 evaluation samples
- Columns:
anchor,positive, andnegative - Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 6 tokens
- mean: 17.95 tokens
- max: 63 tokens
- min: 4 tokens
- mean: 9.78 tokens
- max: 29 tokens
- min: 5 tokens
- mean: 10.35 tokens
- max: 29 tokens
- Samples:
anchor positive negative Two women are embracing while holding to go packages.Two woman are holding packages.The men are fighting outside a deli.Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.Two kids in numbered jerseys wash their hands.Two kids in jackets walk to school.A man selling donuts to a customer during a world exhibition event held in the city of AngelesA man selling donuts to a customer.A woman drinks her coffee in a small cafe. - Loss:
MatryoshkaLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768 ], "matryoshka_weights": [ 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 15warmup_ratio: 0.1
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 15max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss | Validation Loss | sts-dev_spearman_cosine |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.5941 |
| 0.0287 | 500 | 1.9263 | 0.7269 | 0.8006 |
| 0.0574 | 1000 | 0.8808 | 0.4899 | 0.8306 |
| 0.0860 | 1500 | 0.6811 | 0.3757 | 0.8432 |
| 0.1147 | 2000 | 0.5842 | 0.3250 | 0.8448 |
| 0.1434 | 2500 | 0.5269 | 0.3007 | 0.8472 |
| 0.1721 | 3000 | 0.4937 | 0.2855 | 0.8541 |
| 0.2008 | 3500 | 0.4717 | 0.2636 | 0.8510 |
| 0.2294 | 4000 | 0.4398 | 0.2596 | 0.8509 |
| 0.2581 | 4500 | 0.43 | 0.2507 | 0.8575 |
| 0.2868 | 5000 | 0.4094 | 0.2419 | 0.8566 |
| 0.3155 | 5500 | 0.3927 | 0.2349 | 0.8595 |
| 0.3442 | 6000 | 0.3904 | 0.2356 | 0.8568 |
| 0.3729 | 6500 | 0.3844 | 0.2275 | 0.8510 |
| 0.4015 | 7000 | 0.377 | 0.2220 | 0.8560 |
| 0.4302 | 7500 | 0.363 | 0.2235 | 0.8412 |
| 0.4589 | 8000 | 0.3616 | 0.2305 | 0.8531 |
| 0.4876 | 8500 | 0.3733 | 0.2306 | 0.8457 |
| 0.5163 | 9000 | 0.3675 | 0.2290 | 0.8460 |
| 0.5449 | 9500 | 0.358 | 0.2291 | 0.8459 |
| 0.5736 | 10000 | 0.3322 | 0.2218 | 0.8479 |
| 0.6023 | 10500 | 0.3376 | 0.2254 | 0.8339 |
| 0.6310 | 11000 | 0.3308 | 0.2140 | 0.8428 |
| 0.6597 | 11500 | 0.3475 | 0.2382 | 0.8339 |
| 0.6883 | 12000 | 0.3498 | 0.2172 | 0.8325 |
| 0.7170 | 12500 | 0.3266 | 0.2290 | 0.8479 |
| 0.7457 | 13000 | 0.3214 | 0.2297 | 0.8355 |
| 0.7744 | 13500 | 0.3237 | 0.2363 | 0.8325 |
| 0.8031 | 14000 | 0.3108 | 0.2334 | 0.8307 |
| 0.8318 | 14500 | 0.3143 | 0.3627 | 0.7954 |
| 0.8604 | 15000 | 0.3156 | 0.2238 | 0.8378 |
| 0.8891 | 15500 | 0.3204 | 0.2271 | 0.8390 |
| 0.9178 | 16000 | 0.314 | 0.2332 | 0.8349 |
| 0.9465 | 16500 | 0.3074 | 0.2277 | 0.8324 |
| 0.9752 | 17000 | 0.2937 | 0.2326 | 0.8274 |
| 1.0038 | 17500 | 0.2919 | 0.2350 | 0.8288 |
| 1.0325 | 18000 | 0.2483 | 0.2381 | 0.8367 |
| 1.0612 | 18500 | 0.2534 | 0.2397 | 0.8227 |
| 1.0899 | 19000 | 0.2699 | 0.2495 | 0.8221 |
| 1.1186 | 19500 | 0.2691 | 0.2468 | 0.8193 |
| 1.1472 | 20000 | 0.2843 | 0.2462 | 0.8346 |
| 1.1759 | 20500 | 0.2736 | 0.2387 | 0.8321 |
| 1.2046 | 21000 | 0.2728 | 0.2415 | 0.8364 |
| 1.2333 | 21500 | 0.2769 | 0.2483 | 0.8301 |
| 1.2620 | 22000 | 0.2633 | 0.2582 | 0.8340 |
| 1.2907 | 22500 | 0.2719 | 0.2484 | 0.8295 |
| 1.3193 | 23000 | 0.2787 | 0.2606 | 0.8297 |
| 1.3480 | 23500 | 0.2812 | 0.2595 | 0.8290 |
| 1.3767 | 24000 | 0.2868 | 0.2659 | 0.8208 |
| 1.4054 | 24500 | 0.2776 | 0.2520 | 0.8369 |
| 1.4341 | 25000 | 0.2772 | 0.2759 | 0.8307 |
| 1.4627 | 25500 | 0.2887 | 0.2735 | 0.8198 |
| 1.4914 | 26000 | 0.2892 | 0.2787 | 0.8367 |
| 1.5201 | 26500 | 0.2779 | 0.2612 | 0.8173 |
| 1.5488 | 27000 | 0.2791 | 0.2593 | 0.8230 |
| 1.5775 | 27500 | 0.2939 | 0.2678 | 0.8256 |
| 1.6061 | 28000 | 0.2808 | 0.2729 | 0.8241 |
| 1.6348 | 28500 | 0.2913 | 0.2700 | 0.8163 |
| 1.6635 | 29000 | 0.2919 | 0.2855 | 0.8315 |
| 1.6922 | 29500 | 0.284 | 0.2684 | 0.8338 |
| 1.7209 | 30000 | 0.2867 | 0.2703 | 0.8254 |
| 1.7496 | 30500 | 0.2781 | 0.2738 | 0.8186 |
| 1.7782 | 31000 | 0.2806 | 0.2621 | 0.8170 |
| 1.8069 | 31500 | 0.2859 | 0.2727 | 0.8197 |
| 1.8356 | 32000 | 0.2732 | 0.2716 | 0.8238 |
| 1.8643 | 32500 | 0.2797 | 0.2728 | 0.8178 |
| 1.8930 | 33000 | 0.2701 | 0.2715 | 0.8219 |
| 1.9216 | 33500 | 0.265 | 0.2638 | 0.8250 |
| 1.9503 | 34000 | 0.275 | 0.2660 | 0.8188 |
| 1.9790 | 34500 | 0.2684 | 0.2765 | 0.8112 |
| 2.0077 | 35000 | 0.2607 | 0.2648 | 0.8151 |
| 2.0364 | 35500 | 0.197 | 0.2673 | 0.8123 |
| 2.0650 | 36000 | 0.2075 | 0.2706 | 0.8129 |
| 2.0937 | 36500 | 0.2111 | 0.2647 | 0.8263 |
| 2.1224 | 37000 | 0.2202 | 0.2736 | 0.8133 |
| 2.1511 | 37500 | 0.2135 | 0.2640 | 0.8118 |
| 2.1798 | 38000 | 0.2229 | 0.2667 | 0.8166 |
| 2.2085 | 38500 | 0.209 | 0.2622 | 0.8090 |
| 2.2371 | 39000 | 0.2039 | 0.2639 | 0.8104 |
| 2.2658 | 39500 | 0.2113 | 0.2827 | 0.8235 |
| 2.2945 | 40000 | 0.2065 | 0.2698 | 0.8151 |
| 2.3232 | 40500 | 0.21 | 0.2593 | 0.8155 |
| 2.3519 | 41000 | 0.2083 | 0.2733 | 0.7975 |
| 2.3805 | 41500 | 0.231 | 0.2822 | 0.8088 |
| 2.4092 | 42000 | 0.2109 | 0.2667 | 0.8180 |
| 2.4379 | 42500 | 0.2006 | 0.2791 | 0.8071 |
| 2.4666 | 43000 | 0.2131 | 0.2747 | 0.8230 |
| 2.4953 | 43500 | 0.2101 | 0.2674 | 0.8165 |
Framework Versions
- Python: 3.13.0
- Sentence Transformers: 5.1.2
- Transformers: 4.57.1
- PyTorch: 2.9.1+cu128
- Accelerate: 1.11.0
- Datasets: 4.4.1
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}