SentenceTransformer based on google-bert/bert-base-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-cased on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-cased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • csv

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Jimmy-Ooi/Tyrisonase_test_model_600_6epoch")
# Run inference
sentences = [
    'O=C(O)CSc1nnc(NC(=S)Nc2cccc(C(F)(F)F)c2)s1',
    'COc1ccc(NC(=O)NO)cc1',
    'CCCCc1ccc(/C(CC)=N/NC(N)=S)cc1',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6357, 0.8677],
#         [0.6357, 1.0000, 0.2004],
#         [0.8677, 0.2004, 1.0000]])

Training Details

Training Dataset

csv

  • Dataset: csv
  • Size: 67,830 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 11 tokens
    • mean: 38.27 tokens
    • max: 145 tokens
    • min: 11 tokens
    • mean: 37.66 tokens
    • max: 145 tokens
    • 0: ~52.90%
    • 2: ~47.10%
  • Samples:
    premise hypothesis label
    O=c1c(-c2ccc(O)cc2)coc2c(O)c(O)ccc12 O=C(/C=C/c1ccc(O)cc1)c1ccc(NS(=O)(=O)c2ccc(N+[O-])cc2)cc1 0
    O=c1c(-c2ccc(O)c(O)c2)coc2cc(O)ccc12 COc1ccc(C(=O)N/N=C/c2cc(OC)c(OC)c(OC)c2)cc1OC 0
    CC(C)=C/C(C)=N\NC(N)=S [O-][n+]1ccccc1O 2
  • Loss: SoftmaxLoss

Evaluation Dataset

csv

  • Dataset: csv
  • Size: 11,970 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 11 tokens
    • mean: 37.67 tokens
    • max: 145 tokens
    • min: 11 tokens
    • mean: 38.81 tokens
    • max: 145 tokens
    • 0: ~49.50%
    • 2: ~50.50%
  • Samples:
    premise hypothesis label
    COc1cc(OC)c(C2CCN(C)C2CO)c(O)c1-c1cc(-c2ccc(F)cc2)[nH]n1 NC(=S)N/N=C/c1ccc(/C=C/c2ccccc2)cc1 2
    CC(=O)OC[C@H]1OC@@HC@HC@H[C@@H]1OC(C)=O COc1cc(O)cc(O)c1C(=O)/C=C/c1ccc(O)cc1O 2
    CCCCC(=O)NC(=S)Nc1ccc(Br)cc1 CCCCc1ccc(/C(CC)=N/NC(N)=S)cc1 0
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • weight_decay: 0.01
  • num_train_epochs: 6
  • warmup_steps: 100
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 100
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0943 100 0.7594
0.1887 200 0.661
0.2830 300 0.6166
0.3774 400 0.5928
0.4717 500 0.5826
0.5660 600 0.565
0.6604 700 0.573
0.7547 800 0.5631
0.8491 900 0.5509
0.9434 1000 0.5461
1.0377 1100 0.5462
1.1321 1200 0.5393
1.2264 1300 0.5488
1.3208 1400 0.5428
1.4151 1500 0.5383
1.5094 1600 0.532
1.6038 1700 0.5415
1.6981 1800 0.537
1.7925 1900 0.527
1.8868 2000 0.5157
1.9811 2100 0.5244
2.0755 2200 0.5231
2.1698 2300 0.5275
2.2642 2400 0.5255
2.3585 2500 0.5168
2.4528 2600 0.5195
2.5472 2700 0.5177
2.6415 2800 0.5192
2.7358 2900 0.5209
2.8302 3000 0.5196
2.9245 3100 0.5108
3.0189 3200 0.5171
3.1132 3300 0.5147
3.2075 3400 0.5146
3.3019 3500 0.517
3.3962 3600 0.5123
3.4906 3700 0.5061
3.5849 3800 0.5068
3.6792 3900 0.503
3.7736 4000 0.5158
3.8679 4100 0.5063
3.9623 4200 0.5062
4.0566 4300 0.5038
4.1509 4400 0.5022
4.2453 4500 0.5148
4.3396 4600 0.5032
4.4340 4700 0.5146
4.5283 4800 0.5132
4.6226 4900 0.5042
4.7170 5000 0.4963
4.8113 5100 0.4946
4.9057 5200 0.5023
5.0 5300 0.5017
5.0943 5400 0.506
5.1887 5500 0.499
5.2830 5600 0.4953
5.3774 5700 0.4956
5.4717 5800 0.5036
5.5660 5900 0.5034
5.6604 6000 0.5132
5.7547 6100 0.4884
5.8491 6200 0.4981
5.9434 6300 0.4976

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.1
  • Transformers: 4.57.0
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jimmy-Ooi/Tyrisonase_test_model_600_6epoch

Finetuned
(2791)
this model

Paper for Jimmy-Ooi/Tyrisonase_test_model_600_6epoch