SentenceTransformer based on google-bert/bert-base-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-cased on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-cased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • csv

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("cafierom/5Epoch_905_Statin_Contrastive")
# Run inference
sentences = [
    'COc1cc(CN(C)C(=O)c2nn(c(OC[C@@H](O)C[C@@H](O)CC(O)=O)c2C(C)C)-c2ccc(F)cc2)cc(OC)c1',
    'CC(C)c1sc(c(C2CCCC2)c1\\C=C\\[C@@H](O)C[C@@H](O)CC([O-])=O)-c1ccccc1',
    'CC(C)[C@H](NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)CNC(=O)[C@H](Cc1ccccc1)NC(=O)CN)C(=O)N[C@@H](C)C(=O)N[C@@H](CCC(O)=O)C(O)=O',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.2038, 0.4376],
#         [0.2038, 1.0000, 0.9656],
#         [0.4376, 0.9656, 1.0000]])

Training Details

Training Dataset

csv

  • Dataset: csv
  • Size: 148,695 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 23 tokens
    • mean: 67.64 tokens
    • max: 145 tokens
    • min: 23 tokens
    • mean: 65.13 tokens
    • max: 126 tokens
    • 0: ~52.20%
    • 2: ~47.80%
  • Samples:
    premise hypothesis label
    Cc1cc(-c2ccc(Cl)cc2)c(\C=C[C@@H]2CC@@HCC(=O)O2)c(C)n1 CCC@HC(=O)O[C@H]1CC@@HC=C2C=CC@HC@H[C@@H]12 2
    CC(C)c1c(OCC@@HCC@@HCC(O)=O)n(nc1C(=O)NCc1ccccc1C)-c1ccc(F)cc1 CCOC(=O)c1c(C(C)C)n(CC[C@@H]2CC@@HCC(=O)O2)c(c1-c1ccccc1)-c1ccccc1 0
    CCC@HC(=O)O[C@H]1CCC=C2C=CC@HC@H[C@@H]12 CC(c1ccc(F)cc1)c1cc(C)cc(C)c1OCC(O)CC@@HCC([O-])=O 0
  • Loss: SoftmaxLoss

Evaluation Dataset

csv

  • Dataset: csv
  • Size: 26,241 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 23 tokens
    • mean: 67.34 tokens
    • max: 145 tokens
    • min: 23 tokens
    • mean: 65.76 tokens
    • max: 145 tokens
    • 0: ~52.90%
    • 2: ~47.10%
  • Samples:
    premise hypothesis label
    CC(C)n1c(CCC@@HCC@@HCC([O-])=O)c(-c2ccc(F)cc2)c2c1c(=O)n(-c1ccccc1)c1ccccc21 C[C@@H]1CC(OC(=O)NCCCCCCCCCCC(=O)NC@@HC(C)(C)C)[C@@H]2C@@HC@@HC=CC2=C1 2
    CCC@HC(=O)O[C@H]1CC@@HC[C@@H]2C=CC@HC@H[C@@H]12 CC(C)c1sc(c(C2CCCC2)c1\C=C[C@@H](O)CC@@HCC([O-])=O)-c1ccccc1 0
    CC(C)c1c(OCC@@HCC@@HCC(O)=O)n(nc1C(=O)NCc1ccccc1C)-c1ccc(F)cc1 CCC@HC(=O)O[C@H]1CCC=C2C=CC@HC@HC12 0
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • weight_decay: 0.01
  • num_train_epochs: 5
  • warmup_steps: 100
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 100
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0430 100 0.7368
0.0861 200 0.6716
0.1291 300 0.6334
0.1721 400 0.6271
0.2151 500 0.6048
0.2582 600 0.5918
0.3012 700 0.5863
0.3442 800 0.5739
0.3873 900 0.5865
0.4303 1000 0.5679
0.4733 1100 0.5724
0.5164 1200 0.5576
0.5594 1300 0.5719
0.6024 1400 0.5628
0.6454 1500 0.5639
0.6885 1600 0.5533
0.7315 1700 0.5495
0.7745 1800 0.5437
0.8176 1900 0.5487
0.8606 2000 0.544
0.9036 2100 0.5431
0.9466 2200 0.5576
0.9897 2300 0.5434
1.0327 2400 0.5436
1.0757 2500 0.5444
1.1188 2600 0.5385
1.1618 2700 0.5377
1.2048 2800 0.5366
1.2478 2900 0.5354
1.2909 3000 0.5384
1.3339 3100 0.5288
1.3769 3200 0.5271
1.4200 3300 0.5364
1.4630 3400 0.5253
1.5060 3500 0.5261
1.5491 3600 0.5279
1.5921 3700 0.532
1.6351 3800 0.535
1.6781 3900 0.5251
1.7212 4000 0.5252
1.7642 4100 0.5302
1.8072 4200 0.5247
1.8503 4300 0.5261
1.8933 4400 0.5224
1.9363 4500 0.5165
1.9793 4600 0.5333
2.0224 4700 0.52
2.0654 4800 0.5246
2.1084 4900 0.5141
2.1515 5000 0.5201
2.1945 5100 0.5218
2.2375 5200 0.5219
2.2806 5300 0.5144
2.3236 5400 0.5225
2.3666 5500 0.5206
2.4096 5600 0.513
2.4527 5700 0.5212
2.4957 5800 0.5211
2.5387 5900 0.5127
2.5818 6000 0.5041
2.6248 6100 0.5152
2.6678 6200 0.5152
2.7108 6300 0.5138
2.7539 6400 0.507
2.7969 6500 0.5182
2.8399 6600 0.4988
2.8830 6700 0.5078
2.9260 6800 0.5113
2.9690 6900 0.5114
3.0120 7000 0.5171
3.0551 7100 0.5108
3.0981 7200 0.5033
3.1411 7300 0.5065
3.1842 7400 0.5057
3.2272 7500 0.5055
3.2702 7600 0.5129
3.3133 7700 0.5122
3.3563 7800 0.5069
3.3993 7900 0.5011
3.4423 8000 0.5129
3.4854 8100 0.5098
3.5284 8200 0.5022
3.5714 8300 0.5039
3.6145 8400 0.5123
3.6575 8500 0.5105
3.7005 8600 0.5056
3.7435 8700 0.5061
3.7866 8800 0.5004
3.8296 8900 0.5001
3.8726 9000 0.5127
3.9157 9100 0.5062
3.9587 9200 0.501
4.0017 9300 0.4928
4.0448 9400 0.4981
4.0878 9500 0.4942
4.1308 9600 0.4982
4.1738 9700 0.5076
4.2169 9800 0.5013
4.2599 9900 0.5051
4.3029 10000 0.4983
4.3460 10100 0.5051
4.3890 10200 0.4948
4.4320 10300 0.496
4.4750 10400 0.503
4.5181 10500 0.5064
4.5611 10600 0.5049
4.6041 10700 0.4986
4.6472 10800 0.492
4.6902 10900 0.4984
4.7332 11000 0.5036
4.7762 11100 0.5017
4.8193 11200 0.5019
4.8623 11300 0.5021
4.9053 11400 0.4991
4.9484 11500 0.494
4.9914 11600 0.5089

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.0
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cafierom/5Epoch_905_Statin_Contrastive

Finetuned
(2770)
this model

Paper for cafierom/5Epoch_905_Statin_Contrastive