SentenceTransformer based on google/bert_uncased_L-2_H-128_A-2

This is a sentence-transformers model finetuned from google/bert_uncased_L-2_H-128_A-2 on the generator dataset. It maps sentences & paragraphs to a 128-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google/bert_uncased_L-2_H-128_A-2
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 128 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text
  • Training Dataset:
    • generator

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 128, 'pooling_mode': 'mean', 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("swardiantara/bert-tiny-sst5-full-fixed-cosine")
# Run inference
sentences = [
    'a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films',
    "... feels as if -lrb- there 's -rrb- a choke leash around your neck so director nick cassavetes can give it a good , hard yank whenever he wants you to feel something .",
    "what with the incessant lounge music playing in the film 's background , you may mistake love liza for an adam sandler chanukah song .",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 128]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.2197, 0.2653],
#         [0.2197, 1.0000, 0.2309],
#         [0.2653, 0.2309, 1.0000]])

Training Details

Training Dataset

generator

  • Dataset: generator
  • Size: 36,495,696 training samples
  • Columns: text_a, text_b, and label
  • Approximate statistics based on the first 100 samples:
    text_a text_b label
    type string string list
    modality text text
    details
    • min: 21 tokens
    • mean: 21.0 tokens
    • max: 21 tokens
    • min: 5 tokens
    • mean: 24.62 tokens
    • max: 57 tokens
    • size: 2 elements
  • Samples:
    text_a text_b label
    a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films apparently reassembled from the cutting-room floor of any given daytime soap . [0.0, 0.75]
    a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films they presume their audience wo n't sit still for a sociology lesson , however entertainingly presented , so they trot out the conventional science-fiction elements of bug-eyed monsters and futuristic women in skimpy clothes . [0.0, 0.75]
    a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films the entire movie is filled with deja vu moments . [0.0, 0.5]
  • Loss: main.OrdinalProxyContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 1024
  • num_train_epochs: 10
  • learning_rate: 2e-05
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 1024
  • num_train_epochs: 10
  • max_steps: -1
  • learning_rate: 2e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0
  • optim: adamw_torch
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: None
  • trackio_bucket_id: None
  • trackio_static_space_id: None
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_static_graph: None
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: None
  • fsdp_config: None
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0140 500 0.0465
0.0281 1000 0.0441
0.0421 1500 0.0425
0.0561 2000 0.0409
0.0701 2500 0.0389
0.0842 3000 0.0367
0.0982 3500 0.0345
0.1122 4000 0.0327
0.1263 4500 0.0307
0.1403 5000 0.0281
0.1543 5500 0.0242
0.1683 6000 0.0199
0.1824 6500 0.0162
0.1964 7000 0.0131
0.2104 7500 0.0106
0.2245 8000 0.0084
0.2385 8500 0.0067
0.2525 9000 0.0053
0.2665 9500 0.0042
0.2806 10000 0.0034
0.2946 10500 0.0028
0.3086 11000 0.0023
0.3227 11500 0.0019
0.3367 12000 0.0017
0.3507 12500 0.0014
0.3647 13000 0.0012
0.3788 13500 0.0011
0.3928 14000 0.0010
0.4068 14500 0.0008
0.4209 15000 0.0007
0.4349 15500 0.0006
0.4489 16000 0.0006
0.4629 16500 0.0005
0.4770 17000 0.0005
0.4910 17500 0.0004
0.5050 18000 0.0004
0.5191 18500 0.0004
0.5331 19000 0.0003
0.5471 19500 0.0003
0.5612 20000 0.0003
0.5752 20500 0.0003
0.5892 21000 0.0002
0.6032 21500 0.0002
0.6173 22000 0.0002
0.6313 22500 0.0002
0.6453 23000 0.0002
0.6594 23500 0.0002
0.6734 24000 0.0002
0.6874 24500 0.0001
0.7014 25000 0.0001
0.7155 25500 0.0001
0.7295 26000 0.0001
0.7435 26500 0.0001
0.7576 27000 0.0001
0.7716 27500 0.0001
0.7856 28000 0.0001
0.7996 28500 0.0001
0.8137 29000 0.0001
0.8277 29500 0.0001
0.8417 30000 0.0001
0.8558 30500 0.0001
0.8698 31000 0.0001
0.8838 31500 0.0001
0.8978 32000 0.0001
0.9119 32500 0.0001
0.9259 33000 0.0001
0.9399 33500 0.0001
0.9540 34000 0.0001
0.9680 34500 0.0001
0.9820 35000 0.0001
0.9960 35500 0.0000
1.0 35641 -
1.0101 36000 0.0000
1.0241 36500 0.0000
1.0381 37000 0.0000
1.0522 37500 0.0000
1.0662 38000 0.0000
1.0802 38500 0.0000
1.0942 39000 0.0000
1.1083 39500 0.0000
1.1223 40000 0.0000
1.1363 40500 0.0000
1.1504 41000 0.0000
1.1644 41500 0.0000
1.1784 42000 0.0000
1.1924 42500 0.0000
1.2065 43000 0.0000
1.2205 43500 0.0000
1.2345 44000 0.0000
1.2486 44500 0.0000
1.2626 45000 0.0000
1.2766 45500 0.0000
1.2906 46000 0.0000
1.3047 46500 0.0000
1.3187 47000 0.0000
1.3327 47500 0.0000
1.3468 48000 0.0000
1.3608 48500 0.0000
1.3748 49000 0.0000
1.3888 49500 0.0000
1.4029 50000 0.0000
1.4169 50500 0.0000
1.4309 51000 0.0000
1.4450 51500 0.0000
1.4590 52000 0.0000
1.4730 52500 0.0000
1.4871 53000 0.0000
1.5011 53500 0.0000
1.5151 54000 0.0000
1.5291 54500 0.0000
1.5432 55000 0.0000
1.5572 55500 0.0000
1.5712 56000 0.0000
1.5853 56500 0.0000
1.5993 57000 0.0000
1.6133 57500 0.0000
1.6273 58000 0.0000
1.6414 58500 0.0000
1.6554 59000 0.0000
1.6694 59500 0.0000
1.6835 60000 0.0000
1.6975 60500 0.0000
1.7115 61000 0.0000
1.7255 61500 0.0000
1.7396 62000 0.0000
1.7536 62500 0.0000
1.7676 63000 0.0000
1.7817 63500 0.0000
1.7957 64000 0.0000
1.8097 64500 0.0000
1.8237 65000 0.0000
1.8378 65500 0.0000
1.8518 66000 0.0000
1.8658 66500 0.0000
1.8799 67000 0.0000
1.8939 67500 0.0000
1.9079 68000 0.0000
1.9219 68500 0.0000
1.9360 69000 0.0000
1.9500 69500 0.0000
1.9640 70000 0.0000
1.9781 70500 0.0000
1.9921 71000 0.0000
2.0 71282 -
2.0061 71500 0.0000
2.0201 72000 0.0000
2.0342 72500 0.0000
2.0482 73000 0.0000
2.0622 73500 0.0000
2.0763 74000 0.0000
2.0903 74500 0.0000
2.1043 75000 0.0000
2.1183 75500 0.0000
2.1324 76000 0.0000
2.1464 76500 0.0000
2.1604 77000 0.0000
2.1745 77500 0.0000
2.1885 78000 0.0000
2.2025 78500 0.0000
2.2165 79000 0.0000
2.2306 79500 0.0000
2.2446 80000 0.0000
2.2586 80500 0.0000
2.2727 81000 0.0000
2.2867 81500 0.0000
2.3007 82000 0.0000
2.3147 82500 0.0000
2.3288 83000 0.0000
2.3428 83500 0.0000
2.3568 84000 0.0000
2.3709 84500 0.0000
2.3849 85000 0.0000
2.3989 85500 0.0000
2.4130 86000 0.0000
2.4270 86500 0.0000
2.4410 87000 0.0000
2.4550 87500 0.0000
2.4691 88000 0.0000
2.4831 88500 0.0000
2.4971 89000 0.0000
2.5112 89500 0.0000
2.5252 90000 0.0000
2.5392 90500 0.0000
2.5532 91000 0.0000
2.5673 91500 0.0000
2.5813 92000 0.0000
2.5953 92500 0.0000
2.6094 93000 0.0000
2.6234 93500 0.0000
2.6374 94000 0.0000
2.6514 94500 0.0000
2.6655 95000 0.0000
2.6795 95500 0.0000
2.6935 96000 0.0000
2.7076 96500 0.0000
2.7216 97000 0.0000
2.7356 97500 0.0000
2.7496 98000 0.0000
2.7637 98500 0.0000
2.7777 99000 0.0000
2.7917 99500 0.0000
2.8058 100000 0.0000
2.8198 100500 0.0000
2.8338 101000 0.0000
2.8478 101500 0.0000
2.8619 102000 0.0000
2.8759 102500 0.0000
2.8899 103000 0.0000
2.9040 103500 0.0000
2.9180 104000 0.0000
2.9320 104500 0.0000
2.9460 105000 0.0000
2.9601 105500 0.0000
2.9741 106000 0.0000
2.9881 106500 0.0000
3.0 106923 -
  • The bold row denotes the saved checkpoint.

Training Time

  • Training: 4.3 hours
  • Evaluation: 2.8 seconds
  • Total: 4.3 hours

Framework Versions

  • Python: 3.12.4
  • Sentence Transformers: 5.5.1
  • Transformers: 5.11.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.13.0
  • Datasets: 2.21.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
27
Safetensors
Model size
4.39M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for swardiantara/bert-tiny-sst5-full-fixed-cosine

Finetuned
(119)
this model

Paper for swardiantara/bert-tiny-sst5-full-fixed-cosine