Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
• 1908.10084 • Published
• 12
This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the nz-hansard-triplets dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(4): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("dinushiTJ/nz-hansard-embedding-gemma")
# Run inference
queries = [
"Freedom camping is important to Aotearoa New Zealand for both our people and our visitors, with many freedom campers travelling widely to experience our wonderful te taiao and spend money in our communities. However, while the number of freedom campers has been steadily increasing over the last decade, so too have the negative environmental and social impacts. A particular concern is those who freedom camp in vehicles that do not contain fixed toilets and are disposing of human waste inappropriately, polluting our environment and angering our communities. [Interruption] I can say to the two members opposite interjecting that even their councils support the moves that we are making in this legislation, and I recommend to them to strap into their toilet seats as we flush out the people who are misbehaving in this space, and I encourage them to start seeing the light on this matter and support this bill.",
]
documents = [
'The practice of freedom camping holds significant value for New Zealand, benefiting both residents and international tourists who explore our natural landscapes and contribute to local economies. Nevertheless, the growth in freedom camping over the past ten years has unfortunately been accompanied by a rise in detrimental environmental and societal consequences. A key issue arises from campers using vehicles without integrated toilet facilities, leading to improper disposal of human waste, environmental contamination, and community dissatisfaction. Even local authorities, including those represented by the opposition, endorse the legislative changes proposed to address this misconduct, urging support for this crucial bill.',
'The preservation of our ancestral lands and waterways is paramount for Māori communities, who uphold the principle of kaitiakitanga for future generations. The increasing prevalence of unregulated camping practices, particularly those involving the inappropriate disposal of waste, poses a direct threat to the mauri of our natural resources and sacred sites. This environmental degradation disproportionately impacts iwi and hapū, whose cultural identity and spiritual well-being are intrinsically linked to the health of the whenua and wai. Therefore, any legislative reform concerning land use and environmental protection must explicitly incorporate Te Tiriti o Waitangi principles, ensuring active partnership with Māori in developing and implementing regulations that safeguard their traditional territories and cultural values from such detrimental activities.',
"Mr Speaker, I wish to raise a procedural concern. Your decision today to disallow the initial phrasing of my colleague's inquiry to the Minister of Corrections, which mentioned April Fool's Day, is problematic. The Table Office had previously deemed this question acceptable. I draw your attention to Speaker's ruling 160/6 and Standing Order 377, both of which, in my view, support the original question's validity. I request a definitive ruling from you on this issue, as your current stance appears to preclude future primary questions that might include references to informal dates like Mother's Day or Father's Day.",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.6540, 0.4763, 0.1748]])
nz-hansard-triplet-evalTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.988 |
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
One of the key purposes that they do is they amend the Court Security Act 1999, and some of the key things that they do there is they extend the powers of court security officers to deny entry, to remove and detain people who possess illegal drugs or who act threateningly or abusively or who commit minor crimes on court premises. I think this is an important part of these pieces of legislation, because it gives some clarity and it gives some particular powers to court security officers to exercise those powers when they are in the court, and it allows them to have the discretion not only when the court is in hearing but even when they’re not directed to take action against people who may come into the courts with different paraphernalia or with drugs, and to ensure that those matters are dealt with appropriately and quickly. It also just changes the definition of the court and clarifies what the court is and where they are able to exercise those powers. |
A primary objective of these legislative changes involves modifying the Court Security Act of 1999, specifically by broadening the authority of court security personnel. This expanded mandate permits them to refuse admission, remove, and hold individuals found with illicit substances, or those exhibiting aggressive or abusive behaviour, or committing minor infractions within court facilities. This aspect of the legislation is crucial, as it provides explicit guidelines and specific powers for security officers to enforce order within the court environment. It grants them the necessary discretion to intervene, whether proceedings are active or not, against individuals bringing prohibited items or drugs into the courts, thereby ensuring swift and proper resolution of such incidents. Furthermore, the bill refines the legal definition of 'court' and clarifies the geographical scope within which these powers can be exercised. |
A critical aspect of ensuring justice for all involves addressing the cultural safety and appropriate engagement of Māori within court settings. This legislation should have considered specific protocols for court security officers when interacting with Māori individuals, particularly those who may be unfamiliar with the Pākehā justice system or who are experiencing cultural distress. It is vital to ensure that security measures do not inadvertently create barriers or exacerbate existing inequities for Māori. This includes training for officers on Te Reo Māori, tikanga, and the historical context of Māori interactions with the justice system, to prevent misunderstandings and ensure respectful treatment. Furthermore, the definition of 'court premises' should acknowledge areas where Māori cultural practices, such as karakia or waiata, might occur, ensuring these are accommodated respectfully within security frameworks, rather than being seen as disruptive. |
The most interesting part of the Tribunals Powers and Procedures Legislation Bill is probably in relation to the Human Rights Review Tribunal, and the submissions from its chair, Mr Rodger Haines QC, are very convincing. I thank Mr Haines for his contributions. Due to what Mr Haines described as artificial restrictions in the Human Rights Act 1993, a significant case backlog has developed over the past years. To illustrate the backlog and the frustration it has generated, we can simply have a look at the Stuff reports that people fighting for their human rights face a “beyond acceptable” wait of more than two or three years for justice after politicians and officials ignored repeated pleas for a law change to help clear the backlog. As Mr Rodger Haines said in his submission, for the past three or four years the workload of—now—five full-time decision makers has been carried by one person, namely the chairperson himself. Inevitably, a backlog of serious proportion is increasing year by... |
Perhaps the most compelling aspect of the Tribunals Powers and Procedures Legislation Bill concerns the Human Rights Review Tribunal, where the compelling submissions from its chairperson, Mr Rodger Haines QC, warrant particular acknowledgement. I commend Mr Haines for his valuable input. As highlighted by Mr Haines, the Human Rights Act 1993's structural limitations have led to a substantial accumulation of cases over recent years. This backlog, and the resulting frustration, is evident in media reports detailing how individuals seeking to uphold their human rights endure 'unacceptable' delays of two to three years for resolution, despite persistent calls for legislative reform. Mr Haines' submission further revealed that for several years, the tribunal's entire workload, intended for five full-time decision-makers, has been managed by the chairperson alone. Consequently, a severe backlog continues to grow annually, rendering the tribunal effectively non-functional for many parties. T... |
A critical concern within the human rights framework is the persistent challenge Māori face in accessing justice for breaches of their Treaty rights and cultural protections. The Human Rights Review Tribunal, while vital, often struggles to adequately address the unique dimensions of Māori human rights, which are intrinsically linked to Te Tiriti o Waitangi. The existing backlog disproportionately affects Māori claimants, who may already face systemic barriers in navigating the Pākehā legal system. Future legislative reforms must specifically consider how to enhance the tribunal's capacity to hear and resolve cases involving Māori cultural rights, land rights, and the Crown's Treaty obligations. This includes ensuring culturally competent processes, the availability of Te Reo Māori services, and a deeper understanding of tikanga within the tribunal's operations, to ensure that justice delayed is not justice denied for Māori. |
The proposed changes should be commended because they will reduce the time that it takes to hear and resolve matters. They will ensure that tribunals can have more consistency in the way that they operate. In particular, it ensures that tribunals can continue to be the first option for the timely and specialist decision-making on particular matters of importance that can be required, and affirm the role of tribunals in providing a dispute resolution system that exists outside the court system. It’s important that the bill simplifies and standardises the statutory powers, because what that means in practice is that for people who are party to a dispute, there is a clearer process to follow, they can put the matter behind them sooner, and they can get on with their lives sooner, which in the end is the purpose that we want justice to serve—it allows people to go on with their lives and to make good contributions to our community. |
These legislative amendments deserve praise for their potential to shorten the duration required to hear and conclude disputes. They are designed to foster greater uniformity in tribunal operations, thereby solidifying tribunals' position as the preferred initial avenue for prompt and expert resolution of significant issues. This reinforces the vital function of tribunals in offering an alternative dispute resolution framework distinct from the traditional court system. The bill's emphasis on simplifying and standardising statutory authorities is crucial, as it translates into a more straightforward process for individuals involved in disputes. This enables them to resolve their issues more quickly, move past the conflict, and resume their normal lives, which is ultimately the core objective of a functioning justice system—to empower citizens to contribute positively to society. |
While enhancing the efficiency of general tribunals is valuable, it is equally imperative to ensure that dispute resolution mechanisms adequately serve Māori communities, respecting tikanga and Te Ao Māori principles. The current system often fails to provide culturally appropriate pathways for resolving disputes that arise within or affect Māori, such as those concerning whānau, hapū, or iwi. Future legislative efforts should explore strengthening or establishing specific Māori dispute resolution bodies, or integrating tikanga-based processes more deeply into existing tribunals, to ensure that Māori can access justice in a manner that aligns with their cultural values. This would not only improve access to justice but also affirm the Crown's Treaty obligations by recognising and supporting Māori self-determination in dispute resolution, moving beyond a one-size-fits-all approach to justice. |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.COSINE",
"triplet_margin": 0.3
}
eval_strategy: stepsper_device_train_batch_size: 1learning_rate: 2e-05num_train_epochs: 1warmup_ratio: 0.1load_best_model_at_end: Trueeval_on_start: Trueprompts: task: classification | query:overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 1per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Trueuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: task: classification | query: batch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | nz-hansard-triplet-eval_cosine_accuracy |
|---|---|---|---|
| 0 | 0 | - | 0.9880 |
| 0.0181 | 50 | 0.0484 | - |
| 0.0361 | 100 | 0.0024 | - |
| 0.0542 | 150 | 0.0241 | - |
| 0.0722 | 200 | 0.0 | 0.998 |
| 0.0903 | 250 | 0.013 | - |
| 0.1083 | 300 | 0.0099 | - |
| 0.1264 | 350 | 0.014 | - |
| 0.1444 | 400 | 0.0086 | 0.9980 |
| 0.1625 | 450 | 0.0207 | - |
| 0.1805 | 500 | 0.0515 | - |
| 0.1986 | 550 | 0.0135 | - |
| 0.2166 | 600 | 0.0398 | 0.9880 |
| 0.2347 | 650 | 0.0413 | - |
| 0.2527 | 700 | 0.025 | - |
| 0.2708 | 750 | 0.0086 | - |
| 0.2888 | 800 | 0.0259 | 0.9542 |
| 0.3069 | 850 | 0.0438 | - |
| 0.3249 | 900 | 0.0016 | - |
| 0.3430 | 950 | 0.0133 | - |
| 0.3610 | 1000 | 0.0326 | 0.9841 |
| 0.3791 | 1050 | 0.0282 | - |
| 0.3971 | 1100 | 0.0169 | - |
| 0.4152 | 1150 | 0.0083 | - |
| 0.4332 | 1200 | 0.0168 | 0.9841 |
| 0.4513 | 1250 | 0.0053 | - |
| 0.4693 | 1300 | 0.0349 | - |
| 0.4874 | 1350 | 0.0026 | - |
| 0.5054 | 1400 | 0.006 | 0.9701 |
| 0.5235 | 1450 | 0.0015 | - |
| 0.5415 | 1500 | 0.0091 | - |
| 0.5596 | 1550 | 0.0475 | - |
| 0.5776 | 1600 | 0.0112 | 0.9761 |
| 0.5957 | 1650 | 0.006 | - |
| 0.6137 | 1700 | 0.0007 | - |
| 0.6318 | 1750 | 0.029 | - |
| 0.6498 | 1800 | 0.0117 | 0.9880 |
| 0.6679 | 1850 | 0.0174 | - |
| 0.6859 | 1900 | 0.0395 | - |
| 0.7040 | 1950 | 0.0027 | - |
| 0.7220 | 2000 | 0.0066 | 0.9084 |
| 0.7401 | 2050 | 0.0254 | - |
| 0.7581 | 2100 | 0.0053 | - |
| 0.7762 | 2150 | 0.001 | - |
| 0.7942 | 2200 | 0.0051 | 0.9880 |
| 0.8123 | 2250 | 0.0114 | - |
| 0.8303 | 2300 | 0.0305 | - |
| 0.8484 | 2350 | 0.0406 | - |
| 0.8664 | 2400 | 0.0 | 0.9900 |
| 0.8845 | 2450 | 0.0058 | - |
| 0.9025 | 2500 | 0.0043 | - |
| 0.9206 | 2550 | 0.005 | - |
| 0.9386 | 2600 | 0.0 | 0.9880 |
| 0.9567 | 2650 | 0.0 | - |
| 0.9747 | 2700 | 0.0196 | - |
| 0.9928 | 2750 | 0.0 | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Base model
google/embeddinggemma-300m