Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use jebish7/cde-small-v1_MNR_3 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("jebish7/cde-small-v1_MNR_3", trust_remote_code=True)
sentences = [
"In terms of audited accounts submission for an Applicant, could you clarify the scenarios in which the Regulator might agree that a reviewed pro forma statement of financial position is not needed, and what factors would be considered in making that determination?",
"DocumentID: 1 | PassageID: 4.2.1.(3) | Passage: Where the regulator in another jurisdiction does not permit the implementation of policies, procedures, systems and controls consistent with these Rules, the Relevant Person must:\n(a)\tinform the Regulator in writing immediately; and\n(b)\tapply appropriate additional measures to manage the money laundering risks posed by the relevant branch or subsidiary.",
"DocumentID: 11 | PassageID: 2.3.15.(4) | Passage: The Applicant must submit to the Regulator the following records, as applicable:\n(a)\tAudited accounts, for the purposes of this Rule and Rule 2.3.2(1), for the last three full financial years, noting that:\n(i)\tif the Applicant applies for admission less than ninety days after the end of its last financial year, unless the Applicant has audited accounts for its latest full financial year, the accounts may be for the three years to the end of the previous financial year, but must also include audited or reviewed accounts for its most recent semi-annual financial reporting period; and\n(ii)\tif the Applicant applies for admission more than six months and seventy-five days after the end of its last financial year, audited or reviewed accounts for its most recent semi-annual financial reporting period (or longer period if available).\n(b)\tUnless the Regulator agrees it is not needed, a reviewed pro forma statement of financial position. The review must be conducted by an accredited professional auditor of the company or an independent accountant.",
"DocumentID: 36 | PassageID: D.1.3. | Passage: Principle 1 – Oversight and responsibility of climate-related financial risk exposures.Certain functions related to the management of climate-related financial risks may be delegated, but, as with other risks, the board is ultimately responsible and accountable for monitoring, managing and overseeing climate-related risks for the financial firm.\n"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from jxm/cde-small-v1 on the csv dataset. It maps sentences & paragraphs to a None-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({}) with Transformer model: DatasetTransformer
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("jebish7/cde-small-v1_MNR_3")
# Run inference
sentences = [
'What are the common scenarios or instances where assets and liabilities are not covered by the bases of accounting in Rule 5.3.2, and how should an Insurer address these in their reporting?',
'DocumentID: 12 | PassageID: 5.3.1.Guidance | Passage: \nThe exceptions provided in this Chapter relate to the following:\na.\tspecific Rules in respect of certain assets and liabilities, intended to achieve a regulatory objective not achieved by application of either or both of the bases of accounting set out in Rule \u200e5.3.2;\nb.\tassets and liabilities that are not dealt with in either or both of the bases of accounting set out in Rule \u200e5.3.2; and\nc.\tthe overriding power of the Regulator, set out in Rule \u200e5.1.6, to require an Insurer to adopt a particular measurement for a specific asset or liability.',
'DocumentID: 1 | PassageID: 14.4.1.Guidance.1. | Passage: Relevant Persons are reminded that in accordance with Federal AML Legislation, Relevant Persons or any of their Employees must not tip off any Person, that is, inform any Person that he is being scrutinised, or investigated by any other competent authority, for possible involvement in suspicious Transactions or activity related to money laundering or terrorist financing.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
If a financial institution offers Money Remittance as one of its services, under what circumstances is it deemed to be holding Relevant Money and therefore subject to regulatory compliance (a)? |
DocumentID: 13 |
What are the consequences for a Recognised Body or Authorised Person if they fail to comply with ADGM's requirements regarding severance payments? |
DocumentID: 7 |
If a Public Fund is structured as an Investment Trust, to whom should the Fund Manager report the review findings regarding delegated Regulated Activities or outsourced functions? |
DocumentID: 6 |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
per_device_train_batch_size: 16learning_rate: 2e-05warmup_ratio: 0.1batch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss |
|---|---|---|
| 0.1082 | 100 | 1.9962 |
| 0.2165 | 200 | 1.1626 |
| 0.3247 | 300 | 0.9907 |
| 0.4329 | 400 | 0.8196 |
| 0.5411 | 500 | 0.8082 |
| 0.6494 | 600 | 0.6944 |
| 0.7576 | 700 | 0.6559 |
| 0.8658 | 800 | 0.6242 |
| 0.9740 | 900 | 0.6299 |
| 1.0823 | 1000 | 0.6051 |
| 1.1905 | 1100 | 0.567 |
| 1.2987 | 1200 | 0.4679 |
| 1.4069 | 1300 | 0.3443 |
| 1.5152 | 1400 | 0.3356 |
| 1.6234 | 1500 | 0.2958 |
| 1.7316 | 1600 | 0.254 |
| 1.8398 | 1700 | 0.2694 |
| 1.9481 | 1800 | 0.2497 |
| 2.0563 | 1900 | 0.2671 |
| 2.1645 | 2000 | 0.2558 |
| 2.2727 | 2100 | 0.1943 |
| 2.3810 | 2200 | 0.1242 |
| 2.4892 | 2300 | 0.116 |
| 2.5974 | 2400 | 0.1081 |
| 2.7056 | 2500 | 0.1056 |
| 2.8139 | 2600 | 0.107 |
| 2.9221 | 2700 | 0.1154 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
jxm/cde-small-v1