SentenceTransformer based on google/bert_uncased_L-2_H-128_A-2

This is a sentence-transformers model finetuned from google/bert_uncased_L-2_H-128_A-2 on the generator dataset. It maps sentences & paragraphs to a 128-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: google/bert_uncased_L-2_H-128_A-2
Maximum Sequence Length: 128 tokens
Output Dimensionality: 128 dimensions
Similarity Function: Cosine Similarity
Supported Modality: Text
Training Dataset:
- generator

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 128, 'pooling_mode': 'mean', 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("swardiantara/bert-tiny-sst5-full-fixed-cosine")
# Run inference
sentences = [
    'a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films',
    "... feels as if -lrb- there 's -rrb- a choke leash around your neck so director nick cassavetes can give it a good , hard yank whenever he wants you to feel something .",
    "what with the incessant lounge music playing in the film 's background , you may mistake love liza for an adam sandler chanukah song .",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 128]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.2197, 0.2653],
#         [0.2197, 1.0000, 0.2309],
#         [0.2653, 0.2309, 1.0000]])

Training Details

Training Dataset

generator

Dataset: generator
Size: 36,495,696 training samples
Columns: text_a, text_b, and label
Approximate statistics based on the first 100 samples:
text_a text_b label
type string string list

modality text text
details
min: 21 tokens
mean: 21.0 tokens
max: 21 tokens

min: 5 tokens
mean: 24.62 tokens
max: 57 tokens

size: 2 elements

	text_a	text_b	label
type	string	string	list
modality	text	text
details	min: 21 tokens mean: 21.0 tokens max: 21 tokens	min: 5 tokens mean: 24.62 tokens max: 57 tokens	size: 2 elements

Samples:

text_a	text_b	label
`a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films`	`apparently reassembled from the cutting-room floor of any given daytime soap .`	`[0.0, 0.75]`
`a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films`	`they presume their audience wo n't sit still for a sociology lesson , however entertainingly presented , so they trot out the conventional science-fiction elements of bug-eyed monsters and futuristic women in skimpy clothes .`	`[0.0, 0.75]`
`a stirring , funny and finally transporting re-imagining of beauty and the beast and 1930s horror films`	`the entire movie is filled with deja vu moments .`	`[0.0, 0.5]`

Loss: main.OrdinalProxyContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 1024
num_train_epochs: 10
learning_rate: 2e-05
load_best_model_at_end: True

All Hyperparameters

Click to expand

per_device_train_batch_size: 1024
num_train_epochs: 10
max_steps: -1
learning_rate: 2e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0
optim: adamw_torch
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 1
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: False
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: None
trackio_bucket_id: None
trackio_static_space_id: None
per_device_eval_batch_size: 8
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_static_graph: None
ddp_backend: None
ddp_timeout: 1800
fsdp: None
fsdp_config: None
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss
0.0140	500	0.0465
0.0281	1000	0.0441
0.0421	1500	0.0425
0.0561	2000	0.0409
0.0701	2500	0.0389
0.0842	3000	0.0367
0.0982	3500	0.0345
0.1122	4000	0.0327
0.1263	4500	0.0307
0.1403	5000	0.0281
0.1543	5500	0.0242
0.1683	6000	0.0199
0.1824	6500	0.0162
0.1964	7000	0.0131
0.2104	7500	0.0106
0.2245	8000	0.0084
0.2385	8500	0.0067
0.2525	9000	0.0053
0.2665	9500	0.0042
0.2806	10000	0.0034
0.2946	10500	0.0028
0.3086	11000	0.0023
0.3227	11500	0.0019
0.3367	12000	0.0017
0.3507	12500	0.0014
0.3647	13000	0.0012
0.3788	13500	0.0011
0.3928	14000	0.0010
0.4068	14500	0.0008
0.4209	15000	0.0007
0.4349	15500	0.0006
0.4489	16000	0.0006
0.4629	16500	0.0005
0.4770	17000	0.0005
0.4910	17500	0.0004
0.5050	18000	0.0004
0.5191	18500	0.0004
0.5331	19000	0.0003
0.5471	19500	0.0003
0.5612	20000	0.0003
0.5752	20500	0.0003
0.5892	21000	0.0002
0.6032	21500	0.0002
0.6173	22000	0.0002
0.6313	22500	0.0002
0.6453	23000	0.0002
0.6594	23500	0.0002
0.6734	24000	0.0002
0.6874	24500	0.0001
0.7014	25000	0.0001
0.7155	25500	0.0001
0.7295	26000	0.0001
0.7435	26500	0.0001
0.7576	27000	0.0001
0.7716	27500	0.0001
0.7856	28000	0.0001
0.7996	28500	0.0001
0.8137	29000	0.0001
0.8277	29500	0.0001
0.8417	30000	0.0001
0.8558	30500	0.0001
0.8698	31000	0.0001
0.8838	31500	0.0001
0.8978	32000	0.0001
0.9119	32500	0.0001
0.9259	33000	0.0001
0.9399	33500	0.0001
0.9540	34000	0.0001
0.9680	34500	0.0001
0.9820	35000	0.0001
0.9960	35500	0.0000
1.0	35641	-
1.0101	36000	0.0000
1.0241	36500	0.0000
1.0381	37000	0.0000
1.0522	37500	0.0000
1.0662	38000	0.0000
1.0802	38500	0.0000
1.0942	39000	0.0000
1.1083	39500	0.0000
1.1223	40000	0.0000
1.1363	40500	0.0000
1.1504	41000	0.0000
1.1644	41500	0.0000
1.1784	42000	0.0000
1.1924	42500	0.0000
1.2065	43000	0.0000
1.2205	43500	0.0000
1.2345	44000	0.0000
1.2486	44500	0.0000
1.2626	45000	0.0000
1.2766	45500	0.0000
1.2906	46000	0.0000
1.3047	46500	0.0000
1.3187	47000	0.0000
1.3327	47500	0.0000
1.3468	48000	0.0000
1.3608	48500	0.0000
1.3748	49000	0.0000
1.3888	49500	0.0000
1.4029	50000	0.0000
1.4169	50500	0.0000
1.4309	51000	0.0000
1.4450	51500	0.0000
1.4590	52000	0.0000
1.4730	52500	0.0000
1.4871	53000	0.0000
1.5011	53500	0.0000
1.5151	54000	0.0000
1.5291	54500	0.0000
1.5432	55000	0.0000
1.5572	55500	0.0000
1.5712	56000	0.0000
1.5853	56500	0.0000
1.5993	57000	0.0000
1.6133	57500	0.0000
1.6273	58000	0.0000
1.6414	58500	0.0000
1.6554	59000	0.0000
1.6694	59500	0.0000
1.6835	60000	0.0000
1.6975	60500	0.0000
1.7115	61000	0.0000
1.7255	61500	0.0000
1.7396	62000	0.0000
1.7536	62500	0.0000
1.7676	63000	0.0000
1.7817	63500	0.0000
1.7957	64000	0.0000
1.8097	64500	0.0000
1.8237	65000	0.0000
1.8378	65500	0.0000
1.8518	66000	0.0000
1.8658	66500	0.0000
1.8799	67000	0.0000
1.8939	67500	0.0000
1.9079	68000	0.0000
1.9219	68500	0.0000
1.9360	69000	0.0000
1.9500	69500	0.0000
1.9640	70000	0.0000
1.9781	70500	0.0000
1.9921	71000	0.0000
2.0	71282	-
2.0061	71500	0.0000
2.0201	72000	0.0000
2.0342	72500	0.0000
2.0482	73000	0.0000
2.0622	73500	0.0000
2.0763	74000	0.0000
2.0903	74500	0.0000
2.1043	75000	0.0000
2.1183	75500	0.0000
2.1324	76000	0.0000
2.1464	76500	0.0000
2.1604	77000	0.0000
2.1745	77500	0.0000
2.1885	78000	0.0000
2.2025	78500	0.0000
2.2165	79000	0.0000
2.2306	79500	0.0000
2.2446	80000	0.0000
2.2586	80500	0.0000
2.2727	81000	0.0000
2.2867	81500	0.0000
2.3007	82000	0.0000
2.3147	82500	0.0000
2.3288	83000	0.0000
2.3428	83500	0.0000
2.3568	84000	0.0000
2.3709	84500	0.0000
2.3849	85000	0.0000
2.3989	85500	0.0000
2.4130	86000	0.0000
2.4270	86500	0.0000
2.4410	87000	0.0000
2.4550	87500	0.0000
2.4691	88000	0.0000
2.4831	88500	0.0000
2.4971	89000	0.0000
2.5112	89500	0.0000
2.5252	90000	0.0000
2.5392	90500	0.0000
2.5532	91000	0.0000
2.5673	91500	0.0000
2.5813	92000	0.0000
2.5953	92500	0.0000
2.6094	93000	0.0000
2.6234	93500	0.0000
2.6374	94000	0.0000
2.6514	94500	0.0000
2.6655	95000	0.0000
2.6795	95500	0.0000
2.6935	96000	0.0000
2.7076	96500	0.0000
2.7216	97000	0.0000
2.7356	97500	0.0000
2.7496	98000	0.0000
2.7637	98500	0.0000
2.7777	99000	0.0000
2.7917	99500	0.0000
2.8058	100000	0.0000
2.8198	100500	0.0000
2.8338	101000	0.0000
2.8478	101500	0.0000
2.8619	102000	0.0000
2.8759	102500	0.0000
2.8899	103000	0.0000
2.9040	103500	0.0000
2.9180	104000	0.0000
2.9320	104500	0.0000
2.9460	105000	0.0000
2.9601	105500	0.0000
2.9741	106000	0.0000
2.9881	106500	0.0000
3.0	106923	-

The bold row denotes the saved checkpoint.

Training Time

Training: 4.3 hours
Evaluation: 2.8 seconds
Total: 4.3 hours

Framework Versions

Python: 3.12.4
Sentence Transformers: 5.5.1
Transformers: 5.11.0
PyTorch: 2.5.1+cu121
Accelerate: 1.13.0
Datasets: 2.21.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Downloads last month: 27

Safetensors

Model size

4.39M params

Tensor type

F32

Model tree for swardiantara/bert-tiny-sst5-full-fixed-cosine

Base model

google/bert_uncased_L-2_H-128_A-2

Finetuned

(119)

this model

Paper for swardiantara/bert-tiny-sst5-full-fixed-cosine

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 15