SentenceTransformer based on google-bert/bert-base-uncased

This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-uncased
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'cls', 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("omkar334/bert-base-uncased-retromae")
# Run inference
sentences = [
    'A kid throwing axes at targets in a competition.',
    'The audit steps in this section should be used to assess the potential risks posed by the lack of management or user support.',
    'This photograph is happy',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5798, 0.5999],
#         [0.5798, 1.0000, 0.5697],
#         [0.5999, 0.5697, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric sts-dev sts-test
pearson_cosine 0.2916 0.7346
spearman_cosine 0.3173 0.721

Training Details

Training Dataset

Unnamed Dataset

  • Size: 4,950 training samples
  • Columns: text
  • Approximate statistics based on the first 100 samples:
    text
    type string
    modality text
    details
    • min: 4 tokens
    • mean: 17.63 tokens
    • max: 53 tokens
  • Samples:
    text
    oh i see i'm a former uh a former TI'er i just recently quit and so uh i got myself involved in a sales job and right now uh my list of books to be read have to do with uh the art of selling so
    and uh so i had her baby sitting but she was six months pregnant and it was getting too much for her so i just quit i'd rather quit and take care of my own kids than let somebody else raise them
    The PMG pulled out a new $50 bill and mailed to many boys' mothers.
  • Loss: RetroMAELoss with these parameters:
    {
        "encoder_mask_ratio": 0.15,
        "decoder_mask_ratio": 0.5,
        "encoder_mlm_loss_weight": 1.0
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 50 evaluation samples
  • Columns: text
  • Approximate statistics based on the first 50 samples:
    text
    type string
    modality text
    details
    • min: 6 tokens
    • mean: 16.1 tokens
    • max: 43 tokens
  • Samples:
    text
    The audit steps in this section should be used to assess the potential risks posed by the lack of management or user support.
    Rumor has it that the next object of touchy-feely bowdlerization by Disney is Beowulf.
    Here is a prediction.
  • Loss: RetroMAELoss with these parameters:
    {
        "encoder_mask_ratio": 0.15,
        "decoder_mask_ratio": 0.5,
        "encoder_mlm_loss_weight": 1.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss sts-dev_spearman_cosine sts-test_spearman_cosine
-1 -1 - 0.3173 -
0.6452 100 8.114 - -
-1 -1 - - 0.7210

Training Time

  • Training: 2.5 minutes

Framework Versions

  • Python: 3.11.14
  • Sentence Transformers: 5.6.0.dev0
  • Transformers: 4.57.6
  • PyTorch: 2.12.0
  • Accelerate: 1.13.0
  • Datasets: 5.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

RetroMAELoss

@inproceedings{xiao-etal-2022-retromae,
    title = "{R}etro{MAE}: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder",
    author = "Xiao, Shitao and Liu, Zheng and Shao, Yingxia and Cao, Zhao",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.35/",
}
Downloads last month
25
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for omkar334/bert-base-uncased-retromae

Finetuned
(6784)
this model

Paper for omkar334/bert-base-uncased-retromae

Evaluation results