SentenceTransformer based on google-bert/bert-base-uncased

This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-uncased
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ar9av/tsdae-civic-bert")
# Run inference
sentences = [
    "right o'clock Oh Jack here . Man brought team 3rd November . going'll to . We Pledge of and . And we got guests Manager Brantley'll of . the . I appeal flag the United States of and to Republic which,, for . Let . God, come before this grateful live wonderful community We serve our community members in our respective capacities morning and give us eyes and full hearts, Lord, as handle business Thank you your, In Christ name amen Thank Price the pledge . First item is",
    "All right, it's nine o'clock. Oh, Jack's here too. Man, y'all brought the whole team. 9 o'clock, November 3rd, November already. Gosh. Time keeps on going. We'll call the meeting to order. We'll open with the Pledge of Allegiance and prayer. And since we've got special guests, I'll ask Town Manager Brantley Price if he'll lead us in the Pledge of Allegiance. And I'll say the prayer. I appeal to the flag of the United States of America and to the Republic for which it stands, one nation under God, indivisible, with liberty and justice for all. Let's pray together. God, we come before you this morning with grateful hearts to live in such a wonderful community. We're grateful for the opportunity to serve our community members in our respective capacities here this morning. Be with us and give us clear eyes and full hearts, Lord, as we handle the people's business. Thank you for your many, many blessings. In Christ's name, amen. Amen. Thank you, Manager Price, for the pledge. First item is",
    "for Wednesday, November 12th\nat 7:05 p.m. What you want? This meeting is being recorded. report anything. I have\nYeah. agenda\nbut there were other items\n>> that didn't get to this print Right. >> Caught you off guard there. >> I know you did. I'm not. Oh, so it\nworked very well. >> what did what did we end up\nbecause of the day? >> that's all right. You're fine. >> All right. So, we're still going to go\na wedding for\nI think we're gonna\nwant to go. All right. No problem. 1.7. First one is Parker Hill\nWell, it's already been\nConservation Commission. Yeah. And since it's not supposed to be\nwhat snacks,\n>> By the way, I I did autographs. put your\nyour name on your agenda and stuff and\nYou mind just announcing\nthat you've arrived and what time? Did provide? Guess we should wait. kind of that was\n>> right. So what doesn't go into the\ngeneral account instead\nof meeting\nI thought we had a\nCan you read it off? Oh, you\nYou just repeat the number.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8891, 0.8695],
#         [0.8891, 1.0000, 0.8283],
#         [0.8695, 0.8283, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 10,000 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 3 tokens
    • mean: 91.75 tokens
    • max: 202 tokens
    • min: 13 tokens
    • mean: 223.56 tokens
    • max: 256 tokens
  • Samples:
    sentence_0 sentence_1
    right I call this town council to order on Recognize is a quorum all those who attendance in those are in online . hello Davis are tonight . Thanks, I We having you . and expeditious tonight a big game JMU at . like have just a wo we've had two works who their a of for Our are with them time Ms. we have?? .? Here?? Here . Mr. Hunter? Here Hardy All right, I'd like to call this town council meeting to order on this November 12, 2025. Recognize there is a quorum present. Welcome all those who are in attendance in person and those who are tuned in online. A special hello to Dr. Davis's local and state government class students who are here tonight. Thanks for coming, although I think you were probably told you had to. We appreciate having you. We'll try and be expeditious in the people's business tonight because I know there's a big game later this evening between Longwood and JMU here at home. I'd also like to have just a moment. I won't mention names, but we've had two of our public works employees who lost their wives here very recently. And so we'll just have a brief moment of silence for them. Thank you. Our prayers and our hearts are with them at this difficult time. Ms. McKay, can we have a roll call, please? Mrs. Amos? Here. Mr. Reed? Here. Mr. Dwyer? Here. Mr. Parrott? Here. Mr. Yoland. Here. Mr. Hunter? Here. Mr. Hardy
    everyone November the City Columbia Board of Chair for meeting like introduce the members: Harding'm, Davis Whittle, Sidney Bang, Duvall also to introduce the staff the, Andrew, Board, Erica Hyan, Deputy and Madeline Land . is special, appeals . for the record, wishing and come the . No can floor When come the podium state your and speak the because meeting is recorded Applicants the board Welcome, everyone, to the November meeting of the City of Columbia Board of Zoning Appeals. I am Catherine Fenner, Chair for the Board, and will be serving as the chair for today's meeting. I would like to introduce the other members of the board: Josh Harding, I'm sorry, Davis Whittle, Sidney Lanham, Jonathan Bang, and Sherard Duvall. I would also like to introduce the staff that assists the board, Andrew Livingood, Zoning Board Administrator, Erica Hyan, Deputy Zoning Administrator, and Madeline Bowden, Land Use Board Coordinator. The board is charged with hearing applications for special exceptions, variances, and administrative appeals. All testimony is recorded for the record, and anyone wishing to speak will need to be sworn in and come to the podium to speak. No testimony can be taken from the floor. When you come to the podium, state your name and please speak clearly into the microphone because this meeting is being recorded. Applicants with cases before the board are allotted
    Corporation 7:30 WEDC Room 250 Highway Texas TO & PLEDGE OF ON ITEMS member the may Board not Agenda of the fill out a form meeting . that comments limited minutes for, six In addition, is not allowed to converse deliberate take on any presented CONSENT AGENDA matters Agenda are be routine by Board will be will not of items . discussion desired, that item will removed from the separately . act upon Wylie Economic Development Corporation
    Board Regular Meeting
    November 19, 2025 – 7:30 AM
    WEDC Office Conference Room - 250 South Highway 78, Wylie, Texas
    75098





    CALL TO ORDER
    INVOCATION & PLEDGE OF ALLEGIANCE
    COMMENTS ON NON-AGENDA ITEMS
    Any member of the public may address Board regarding an item that is not listed on the Agenda. Members of the public must
    fill out a form prior to the meeting in order to speak. Board requests that comments be limited to three minutes for an individual,
    six minutes for a group. In addition, Board is not allowed to converse, deliberate or take action on any matter presented during
    citizen participation. CONSENT AGENDA
    All matters listed under the Consent Agenda are considered to be routine by the Board and will be enacted by one motion. There will not be separate discussion of these items. If discussion is desired, that item will be removed from the Consent Agenda
    and will be considered separately. A. Consider and act upon appr
  • Loss: DenoisingAutoEncoderLoss

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.4 500 4.3532
0.8 1000 3.4547

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.3
  • PyTorch: 2.4.1+cu124
  • Accelerate: 1.12.0
  • Datasets: 4.4.2
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

DenoisingAutoEncoderLoss

@inproceedings{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    pages = "671--688",
    url = "https://arxiv.org/abs/2104.06979",
}
Downloads last month
8
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ar9av/tsdae-civic-bert

Finetuned
(6272)
this model