--- license: mit language: - en metrics: - accuracy - recall base_model: - BAAI/bge-base-en-v1.5 widget: - source_sentence: 'Represent this sentence for searching relevant passages: RCW 36.75.190' sentences: - 'RCW 36.75.190 - Engineer''s report—Hearing—Order. Upon report by the examining engineer for the erection and construction upon any county road, or for acquisition by purchase, gift or condemnation of any bridge, trestle, or any other structure crossing any stream, body of water, gulch, navigable water, swamp or other topographical formation, which constitutes a boundary, publication shall be made and joint hearing had upon such report in the same manner and upon the same procedure as in the case of resolution or petition for the laying out and establishing of county roads. If upon the hearing the governing authorities jointly order the erection and construction or acquisition of such bridge, trestle, or other structure, they may jointly acquire land necessary therefor by purchase, gift, or condemnation in the manner as provided for acquiring land for county roads, and shall advertise calls for bids, require contractor''s deposit and bond, award contracts, and supervise construction as by law provided and in the same manner as required in the case of the construction of county roads. Any such bridges, trestles or other structures may be operated free, or may be operated as toll bridges, trestles, or other structures under the provisions of the laws of this state relating thereto. [ 1963 c 4 s 36.75.190 . Prior: 1937 c 187 s 29 ; RRS s 6450-29.]' - 'RCW 28B.30.285 - State treasurer receiving agent of certain federal aid—Trust funds not subject to appropriation. All federal grants received by the state treasurer pursuant to RCW 28B.30.270 shall be deemed trust funds under the control of the state treasurer and not subject to appropriation by the legislature. [ 1969 ex.s. c 223 s 28B.30.285 . Prior: 1955 c 66 s 4 . Formerly RCW 28.80.224 .]' - 'RCW 48.09.160 - Directors—Disqualification. No individual shall be a director of a domestic mutual insurer by reason of his or her holding public office. Adjudication as a bankrupt or taking the benefit of any insolvency law or making a general assignment for the benefit of creditors disqualifies an individual from being or acting as a director. [ 2009 c 549 s 7037 ; 1947 c 79 s .09.16; Rem. Supp. 1947 s 45.09.16.]' - source_sentence: 'Represent this sentence for searching relevant passages: RCW disclosure suspect identity civil redress' sentences: - 'RCW 49.60.525 - Review of existing recorded covenants and deed restrictions to identify documents that include racial or other unlawful restrictions on property ownership.(Expires July 1, 2027.) (1) Subject to the availability of amounts appropriated for this specific purpose, the University of Washington and Eastern Washington University shall review existing recorded covenants and deed restrictions to identify those recorded documents that include racial or other restrictions on property ownership or use against protected classes that are unlawful under RCW 49.60.224 . For properties subject to such racial and other unlawful restrictions, the universities shall provide notice to the property owner and to the county auditor of the county in which the property is located. The universities shall provide information to the property owner on how such provisions can be struck pursuant to RCW 49.60.227 . The universities may contract with other public and private not-for-profit higher education institutions that are regionally accredited to carry out the review and notification requirements of this section. (2) This section expires July 1, 2027. [ 2021 c 256 s 2 .] Findings — Intent — 2021 c 256: "The legislature finds that the existence of racial, religious, or ethnic-based property restrictions or covenants on a deed or chain of title for real property is like having a monument to racism on that property and is repugnant to the tenets of equality. Furthermore, such restrictions and covenants may cause mental anguish and tarnish a property owner''s sense of ownership in the property because the owner feels as though they have participated in a racist act themselves. It is the intent of the legislature that the owner, occupant, or tenant or homeowners'' association board of the property which is subject to an unlawful deed restriction or covenant pursuant to RCW 49.60.224 is entitled to have discriminatory covenants and restrictions that are contrary to public policy struck from their chain of title. The legislature has presented two ways this can be accomplished through RCW 49.60.227 (1) (a) and (b). If the owner, occupant, or tenant or homeowners'' association board of the property elects to pursue a judicial remedy, the legislature intends that the court issue a declaratory judgment ordering the county auditor, or in charter counties the county official charged with the responsibility for recording instruments in the county records, to entirely strike the racist or otherwise discriminatory covenants from the chain of title. Striking the language does not prevent preservation of the original record, outside of the chain of title, for historical or archival purposes. The legislature finds that striking racist, religious, and ethnic restrictions or covenants from the chain of title is no different than having an offensive statutory monument which the owner may entirely remove. So too should the owner be able to entirely remove the offensive written monument to racism or other unconstitutional discrimination." [ 2021 c 256 s 1 .] Application — 2021 c 256: "This act applies to real estate transactions entered into on or after January 1, 2022." [ 2021 c 256 s 5 .]' - 'RCW 10.97.070 - Disclosure of suspect''s identity to victim. (1) Criminal justice agencies may, in their discretion, disclose to persons who have suffered physical loss, property damage, or injury compensable through civil action, the identity of persons suspected as being responsible for such loss, damage, or injury together with such information as the agency reasonably believes may be of assistance to the victim in obtaining civil redress. Such disclosure may be made without regard to whether the suspected offender is an adult or a juvenile, whether charges have or have not been filed, or a prosecuting authority has declined to file a charge or a charge has been dismissed. (2) Unless the agency determines release would interfere with an ongoing criminal investigation, in any action brought pursuant to this chapter, criminal justice agencies shall disclose identifying information, including photographs of suspects, if the acts are alleged by the plaintiff or victim to be a violation of RCW 9A.50.020 . (3) The disclosure by a criminal justice agency of investigative information pursuant to subsection (1) of this section shall not establish a duty to disclose any additional information concerning the same incident or make any subsequent disclosure of investigative information, except to the extent an additional disclosure is compelled by legal process. [ 1993 c 128 s 10 ; 1977 ex.s. c 314 s 7 .] Effective date — 1993 c 128: See RCW 9A.50.902 .' - 'RCW 65.16.110 - Affidavit to cover payment of fees. The affidavit of publication of all notices required by law to be published shall state the full amount of the fee charged for such publication and that the fee has been paid in full. [ 1921 c 99 s 7 ; RRS s 253-7.]' - source_sentence: 'Represent this sentence for searching relevant passages: RCW 87.80 form and contents of notice' sentences: - 'RCW 36.32.270 - Competitive bids—Exemptions. The county legislative authority may waive the competitive bidding requirements of this chapter pursuant to RCW 39.04.280 if an exemption contained within that section applies to the purchase or public work. [ 1998 c 278 s 4 ; 1963 c 4 s 36.32.270 . Prior: 1961 c 169 s 3 ; 1945 c 61 s 4 ; Rem. Supp. 1945 s 10322-18.]' - 'RCW 87.80.060 - Form and contents of notice. The notice of the hearing on the petition shall state that a petition requesting the creation of a board of joint control to administer the facilities and activities, naming them if named in the petition, has been filed with the board of county commissioners of the county, naming the county; that the board of joint control, if it is created, will have authority to provide for apportionment of costs to carry out the objects of its creation among the member irrigation entities (naming them); shall state the day, hour, and place of the hearing on the petition; shall state that any person interested in the creation of the board of joint control may appear on or before the day of hearing on the petition, and show cause in writing, if any, why the same should not be granted, and the notice shall be over the name of the clerk of the board of county commissioners. [ 1996 c 320 s 6 ; 1949 c 56 s 6 ; Rem. Supp. 1949 s 7505-25.]' - 'RCW 18.88B.090 - Reinstatement of certification. (1) A certificate that has been expired for five years or less may be reinstated if the person holding the expired certificate: (a) Completes an abbreviated application form; (b) Pays any necessary fees, including the current certification fee, late renewal fees, and expired credential reissuance fees, unless exempt pursuant to *RCW 18.88B.091 ; (c) Provides a written declaration that no action has been taken by a state or federal jurisdiction or hospital which would prevent or restrict the person holding the expired certificate from practicing as a home care aide; (d) Provides a written declaration that the person holding the expired certificate has not voluntarily given up any credential or privilege or has not been restricted from practicing as a home care aide in lieu of or to avoid formal action; and (e) Submits to a state and federal background check as required by RCW 74.39A.056 , if the certificate has been expired for more than one year. (2) In addition to meeting the requirements of subsection (1) of this section, a certificate that has been expired for more than five years may be reinstated if the person holding the expired certificate demonstrates competence to the standards established by the secretary and meets other requirements established by the secretary. [ 2023 c 424 s 3 .] *Reviser''s note: RCW 18.88B.091 expired July 1, 2025.' - source_sentence: 'Represent this sentence for searching relevant passages: RCW 48.30A.055' sentences: - 'RCW 48.30A.055 - Insurance antifraud plan—Review—Disapproval—Notice—Audit to ensure compliance. If after review of an insurer''s antifraud plan, the commissioner finds that the plan does not comply with RCW 48.30A.050 , the commissioner may disapprove the antifraud plan. Notice of disapproval must include a statement of the specific reasons for disapproval. The insurer shall refile a plan disapproved by the commissioner within sixty days of the date of the notice of disapproval. The commissioner may audit insurers to ensure compliance with antifraud plans. [ 1995 c 285 s 11 .]' - 'RCW 18.160.090 - Surety bond—Security deposit—Venue and time limit for actions upon bonds—Limit of liability of surety—Payment of claims. (1) Before granting a license under this chapter, the director of fire protection shall require that the applicant file with the state director of fire protection a surety bond issued by a surety insurer who meets the requirements of chapter 48.28 RCW in a form acceptable to the director of fire protection running to the state of Washington in the penal sum of ten thousand dollars. However, the surety bond for a fire protection sprinkler system contractor whose business is restricted solely to NFPA 13-D or NFPA 13-R systems shall be in the penal sum of six thousand dollars. The bond shall be conditioned that the applicant will pay all purchasers of fire protection sprinkler systems with whom the applicant has a contract for the applicant to install, inspect, maintain, or service a fire protection sprinkler system, and who have obtained a judgment against the applicant for the breach of such a contract. The term "purchaser" means an owner of property who has entered into a contract for the installation of a fire protection sprinkler system on that property, or a contractor who contracts to install, inspect, maintain, or service such a system with an owner of property and subcontracts the work to the applicant. No other person, including, but not limited to, persons who supply labor, materials, or rental equipment to the applicant, shall have any rights against the bond. (2) In lieu of the surety bond required by this section the applicant may file with the director of fire protection a deposit consisting of cash or other security acceptable to the director of fire protection in an amount equal to the penal sum of the required bond. The director of fire protection may adopt rules necessary for the proper administration of the security. (3) Before granting renewal of a fire protection sprinkler system contractor''s license to any applicant, the director of fire protection shall require that the applicant file with the director satisfactory evidence that the surety bond or cash deposit is in full force. (4) Any purchaser of a fire protection sprinkler system having a claim against the licensee for the breach of a contract for the licensee to install, inspect, maintain, or service a fire protection sprinkler system may bring suit upon such bond in superior court of the county in which the work was done or of any county in which jurisdiction of the licensee may be had. Any such action must be brought not later than one year after the expiration of the licensee''s license or renewal license then in effect at the time of the alleged breach of contract. (5) The bond shall be considered one continuous obligation, and the surety upon the bond shall not be liable in aggregate or cumulative amount exceeding ten thousand dollars, or six thousand dollars if the bond was issued to a licensee whose business is restricted solely to NFPA 13-D or NFPA 13-R systems, regardless of the number of years the bond is in effect, or whether it is reinstated, renewed, reissued, or otherwise continued, and regardless of the year in which any claim accrued. The bond shall not be liable for any liability of the licensee for tortious acts, whether or not such liability is imposed by statute or common law, or is imposed by contract. The bond shall not be a substitute or supplemental to any liability or other insurance required by law or by the contract. (6) If the surety desires to make payment without awaiting court action against it, the amount of the bond shall be reduced to the extent of any payment made by the surety in good faith under the bond. Any payment shall be based on final judgments received by the surety. (7) Claims against the bond shall be satisfied from the bond in the following order: (a) Claims by a purchaser of a fire protection sprinkler system for the breach of a contract for the licensee to install, inspect, maintain, or service a fire protection sprinkler system; (b) Any court costs, interest, and attorneys'' fees the plaintiff may be entitled to recover by contract, statute, or court rule. A condition precedent to the surety being liable to any claimant is a final judgment against the licensee, unless the surety desires to make payment without awaiting court action. In the event of a dispute regarding the apportionment of the bond proceeds among claimants, the surety may bring an action for interpleader against all claimants upon the bond. (8) Any purchaser of a fire protection sprinkler system having an unsatisfied final judgment against the licensee for the breach of a contract for the licensee to install, inspect, maintain, or service a fire protection sprinkler system may execute upon the security held by the director of fire protection by serving a certified copy of the unsatisfied final judgment by registered or certified mail upon the director within one year of the date of entry of such judgment. Upon the receipt of service of such certified copy the director shall pay or order paid from the deposit, through the registry of the court which rendered judgment, towards the amount of the unsatisfied judgment. The priority of payment by the director shall be the order of receipt by the director, but the director shall have no liability for payment in excess of the amount of the deposit. [ 1991 sp.s. c 6 s 1 .]' - 'RCW 18.100.010 - Legislative intent. It is the legislative intent to provide for the incorporation of an individual or group of individuals to render the same professional service to the public for which such individuals are required by law to be licensed or to obtain other legal authorization. [ 1969 c 122 s 1 .]' - source_sentence: 'Represent this sentence for searching relevant passages: washington RCW nonprofit canon law' sentences: - 'RCW 43.21C.220 - Incorporation of city or town exempt from chapter. The incorporation of a city or town is exempted from compliance with this chapter. [ 1982 c 220 s 6 .] Severability — 1982 c 220: See note following RCW 36.93.100 . Incorporation proceedings exempt from chapter: RCW 36.93.170 .' - 'RCW 79A.05.085 - Lease of parklands for television stations—Lease rental rates, terms—Attachment of antennae. The commission shall determine the fair market value for television station leases based upon independent appraisals and existing leases for television stations shall be extended at said fair market rental for at least one period of not more than twenty years: PROVIDED, That the rates in said leases shall be renegotiated at five year intervals: PROVIDED FURTHER, That said stations shall permit the attachment of antennae of publicly operated broadcast and microwave stations where electronically practical to combine the towers: PROVIDED FURTHER, That notwithstanding any term to the contrary in any lease, this section shall not preclude the commission from prescribing new and reasonable lease terms relating to the modification, placement, or design of facilities operated by or for a station, and any extension of a lease granted under this section shall be subject to this proviso: PROVIDED FURTHER, That notwithstanding any other provision of law the director in his or her discretion may waive any requirement that any environmental impact statement or environmental assessment be submitted as to any lease negotiated and signed between January 1, 1974, and December 31, 1974. [ 2013 c 23 s 265 ; 1974 ex.s. c 151 s 1 . Formerly RCW 43.51.063 .]' - 'RCW 24.03A.050 - Subordination to canon law. To the extent religious doctrine or canon law governing the internal affairs of a nonprofit corporation is inconsistent with this chapter, the religious doctrine or canon law controls to the extent required by the United States Constitution, the state Constitution, or both. [ 2021 c 176 s 1110 .] Effective date — 2021 c 176: See note following RCW 24.03A.005 .' pipeline_tag: sentence-similarity library_name: sentence-transformers tags: - legal - law - WA - sentence-transformers - feature-extraction - sentence-similarity - dense - loss:MultipleNegativesRankingLoss model-index: - name: washington-state-law-embedding-model-Base results: - task: type: information-retrieval name: Information Retrieval dataset: name: RCW Validation type: rcw-validation metrics: - type: accuracy_at_10 value: 0.8441 name: Accuracy@10 - type: precision_at_10 value: 0.0844 name: Precision@10 - type: recall_at_10 value: 0.8441 name: Recall@10 - type: accuracy_at_1 value: 0.0891 name: Accuracy@1 - type: accuracy_at_3 value: 0.2595 name: Accuracy@3 - type: accuracy_at_5 value: 0.4318 name: Accuracy@5 - type: ndcg_at_10 value: 0.3876 name: NDCG@10 - type: mrr_at_10 value: 0.2524 name: MRR@10 - type: map_at_100 value: 0.2595 name: MAP@100 datasets: - CSI-lab/RCW_2025_Positive_Query_Pairs --- # Washington-state-law-embedding-model-Base **Washington-state-law-embedding-model-Base** is a highly specialized embedding model fine-tuned specifically for Legal Information Retrieval (IR) within the State of Washington. Generic embedding models often perform suboptimally on legal texts due to the semantic gap between natural language questions (e.g., "What dollar amount makes a theft a first degree felony?") and formal statutory legalese. This model bridges that gap, allowing plain-English queries, legal scenarios, and document drafts to be accurately mapped to their corresponding Washington State statutes (Revised Code of Washington - RCW). ## Available Models | Model | Language | Description | Query Prefix | |:------|:---------|:------------|:-------------| | [CSI-lab/Washington-state-law-embedding-model-Large](https://huggingface.co/CSI-lab/Washington-state-law-embedding-model-Large) | English | Fine-tuned `large` model (1024d) for WA State RCWs. Best performance. | `Represent this sentence for searching relevant passages: ` | | [CSI-lab/Washington-state-law-embedding-model-Base](https://huggingface.co/CSI-lab/Washington-state-law-embedding-model-Base) | English | Fine-tuned `base` model (768d) for WA State RCWs. Faster inference. | `Represent this sentence for searching relevant passages: ` | ## Model Overview * **Base Model:** `BAAI/bge-base-en-v1.5` * **Task:** Semantic Search / Information Retrieval / Legal Preemption Analysis * **Language:** English (Legal Domain) * **Max Sequence Length:** 512 tokens * **Output Dimensionality:** 768 dimensions * **Similarity Function:** Cosine Similarity ## Key Features - Fine-tuned for Washington State legal domain (RCW) - Optimized for semantic search and retrieval tasks - Supports natural language legal queries - Designed for RAG-based legal assistants - Improved retrieval accuracy over base BGE embeddings ## Intended Use Cases This model is optimized to act as the retriever component in legal Retrieval-Augmented Generation (RAG) pipelines. Primary use cases include: 1. **Statutory Cross-Referencing:** Mapping natural language legal questions to specific RCWs. 2. **Preemption Checking:** Automatically retrieving state laws that may preempt or conflict with proposed municipal ordinances. 3. **Legal Research Automation:** Clustering and searching local agency drafts against established state frameworks. 4. **AI Legal Assistants:** Powering chatbots and research tools that require accurate retrieval of Washington State laws before generating an answer. 5. **Automated Compliance:** Scanning contracts or external drafts against established state legislative frameworks. ## Technical Details & Training Methodology ### The Semantic Gap A standard dense retriever often fails on legal tasks because it relies on vocabulary overlap rather than conceptual legal mapping. To address this, `Washington-state-law-embedding-model` was fine-tuned using a synthetic, high-variance dataset. ### Training Data The model was fine-tuned on synthetic legal query–passage pairs generated from Washington State RCW statutes. The dataset includes: - Size: 455,424 training samples - Natural language paraphrases of legal questions - Hypothetical legal scenarios - Statute-grounded positive document matches The dataset spans 500+ legal categories derived from RCW structure. ### Hyperparameters & Architecture * **Loss Function:** Multiple Negatives Ranking (MNR) Loss * **Batch Size:** 256 * **Epochs:** 4 * **fp16:** True * **batch_sampler:** no_duplicates * **multi_dataset_batch_sampler:** round_robin * **Learning Rate Decay:** Linear * **Infrastructure:** High-Performance Computing (HPC) Cluster #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 256 - `per_device_eval_batch_size`: 256 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 4 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `parallelism_config`: None - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `hub_revision`: None - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `liger_kernel_config`: None - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: round_robin - `router_mapping`: {} - `learning_rate_mapping`: {}
## Evaluation Metrics The model was evaluated on a rigorously held-out validation set of synthetic municipal drafts mapped 1-to-1 against Washington State RCWs. The fine-tuning process yielded a **+31.27% absolute improvement in Recall@10** over the base model. | Metric | Base Model (Untrained) | Fine-Tuned (Epoch 4) | Absolute Improvement | |:-------|:-----------------------|:---------------------|:------------| | **Recall@10** | 0.5314 | **0.8441** | + 31.27% | | **Recall@5** | 0.2636 | **0.4318** | + 16.82% | | **NDCG@10** | 0.2341 | **0.3876** | + 15.35% | | **MRR@10** | 0.1462 | **0.2524** | + 10.62% | *Interpretation: When a user asks this model a legal question in plain English, there is an 84.4% probability that the exact governing state law will be returned in the top 10 search results.* ## Limitations - This model does not provide legal advice. - Performance is limited to Washington State law (RCW) and may not generalize to other jurisdictions. - Outputs depend on the quality of the underlying document corpus. - Should be used as a retrieval tool, not a final decision-making system. ## Usage Examples ### Semantic Search with `sentence-transformers`
**Warning:** Because this model is built on the BGE architecture, you **must** append the specific instruction prefix `"Represent this sentence for searching relevant passages:"` to your search queries to achieve optimal performance. **Do not** add this prefix to the database documents.
```python import torch from sentence_transformers import SentenceTransformer, util # 1. Load the fine-tuned model model = SentenceTransformer('CSI-lab/Washington-state-law-embedding-model-Base') # 2. Define the laws (Your Vector Database) laws = [ "RCW 9A.56.030: Theft in the first degree. A person is guilty of theft in the first degree if he or she commits theft of property or services which exceed(s) five thousand dollars in value.", "RCW 46.61.502: Driving under the influence. A person is guilty of driving while under the influence of intoxicating liquor...", "RCW 9A.36.011: Assault in the first degree. A person is guilty of assault in the first degree if he or she..." ] # 3. Define the user's search query user_query = "What dollar amount makes a theft a first degree felony?" # 4. CRITICAL: Add the required BGE prefix to the query ONLY query_prefix = "Represent this sentence for searching relevant passages: " formatted_query = query_prefix + user_query # 5. Encode the documents and the query law_embeddings = model.encode(laws, convert_to_tensor=True) query_embedding = model.encode(formatted_query, convert_to_tensor=True) # 6. Calculate Cosine Similarity cosine_scores = util.cos_sim(query_embedding, law_embeddings) # 7. Print the top result best_idx = cosine_scores.argmax().item() print(f"Top Match: {laws[best_idx]}") print(f"Similarity Score: {cosine_scores[0][best_idx]:.4f}") ``` # Model Citation ``` @misc{washington_state_law_embedding_base_2026, title={Washington-state-law-embedding-model-Base: Fine-Tuned Dense Retrieval for Washington State Law}, author={Tomar, Shlok}, year={2026}, publisher={Hugging Face} howpublished={\url{https://huggingface.co/CSI-lab/Washington-state-law-embedding-model-Base}}, note={Hugging Face Model Repository} } ``` ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```