CSI-lab's picture
Update README.md
8870b72 verified
metadata
license: mit
language:
  - en
metrics:
  - accuracy
  - recall
base_model:
  - BAAI/bge-base-en-v1.5
widget:
  - source_sentence: 'Represent this sentence for searching relevant passages: RCW 36.75.190'
    sentences:
      - >-
        RCW 36.75.190 - Engineer's report—Hearing—Order.

        Upon report by the examining engineer for the erection and construction
        upon any county road, or for acquisition by purchase, gift or
        condemnation of any bridge, trestle, or any other structure crossing any
        stream, body of water, gulch, navigable water, swamp or other
        topographical formation, which constitutes a boundary, publication shall
        be made and joint hearing had upon such report in the same manner and
        upon the same procedure as in the case of resolution or petition for the
        laying out and establishing of county roads. If upon the hearing the
        governing authorities jointly order the erection and construction or
        acquisition of such bridge, trestle, or other structure, they may
        jointly acquire land necessary therefor by purchase, gift, or
        condemnation in the manner as provided for acquiring land for county
        roads, and shall advertise calls for bids, require contractor's deposit
        and bond, award contracts, and supervise construction as by law provided
        and in the same manner as required in the case of the construction of
        county roads. Any such bridges, trestles or other structures may be
        operated free, or may be operated as toll bridges, trestles, or other
        structures under the provisions of the laws of this state relating
        thereto.

        [ 1963 c 4 s 36.75.190 . Prior: 1937 c 187 s 29 ; RRS s 6450-29.]
      - >-
        RCW 28B.30.285 - State treasurer receiving agent of certain federal
        aid—Trust funds not subject to appropriation.

        All federal grants received by the state treasurer pursuant to RCW
        28B.30.270 shall be deemed trust funds under the control of the state
        treasurer and not subject to appropriation by the legislature.

        [ 1969 ex.s. c 223 s 28B.30.285 . Prior: 1955 c 66 s 4 . Formerly RCW
        28.80.224 .]
      - >-
        RCW 48.09.160 - Directors—Disqualification.

        No individual shall be a director of a domestic mutual insurer by reason
        of his or her holding public office. Adjudication as a bankrupt or
        taking the benefit of any insolvency law or making a general assignment
        for the benefit of creditors disqualifies an individual from being or
        acting as a director.

        [ 2009 c 549 s 7037 ; 1947 c 79 s .09.16; Rem. Supp. 1947 s 45.09.16.]
  - source_sentence: >-
      Represent this sentence for searching relevant passages: RCW disclosure
      suspect identity civil redress
    sentences:
      - >-
        RCW 49.60.525 - Review of existing recorded covenants and deed
        restrictions to identify documents that include racial or other unlawful
        restrictions on property ownership.(Expires July 1, 2027.)

        (1) Subject to the availability of amounts appropriated for this
        specific purpose, the University of Washington and Eastern Washington
        University shall review existing recorded covenants and deed
        restrictions to identify those recorded documents that include racial or
        other restrictions on property ownership or use against protected
        classes that are unlawful under RCW 49.60.224 . For properties subject
        to such racial and other unlawful restrictions, the universities shall
        provide notice to the property owner and to the county auditor of the
        county in which the property is located. The universities shall provide
        information to the property owner on how such provisions can be struck
        pursuant to RCW 49.60.227 . The universities may contract with other
        public and private not-for-profit higher education institutions that are
        regionally accredited to carry out the review and notification
        requirements of this section. (2) This section expires July 1, 2027.

        [ 2021 c 256 s 2 .]

        Findings  Intent  2021 c 256: "The legislature finds that the
        existence of racial, religious, or ethnic-based property restrictions or
        covenants on a deed or chain of title for real property is like having a
        monument to racism on that property and is repugnant to the tenets of
        equality. Furthermore, such restrictions and covenants may cause mental
        anguish and tarnish a property owner's sense of ownership in the
        property because the owner feels as though they have participated in a
        racist act themselves. It is the intent of the legislature that the
        owner, occupant, or tenant or homeowners' association board of the
        property which is subject to an unlawful deed restriction or covenant
        pursuant to RCW 49.60.224 is entitled to have discriminatory covenants
        and restrictions that are contrary to public policy struck from their
        chain of title. The legislature has presented two ways this can be
        accomplished through RCW 49.60.227 (1) (a) and (b). If the owner,
        occupant, or tenant or homeowners' association board of the property
        elects to pursue a judicial remedy, the legislature intends that the
        court issue a declaratory judgment ordering the county auditor, or in
        charter counties the county official charged with the responsibility for
        recording instruments in the county records, to entirely strike the
        racist or otherwise discriminatory covenants from the chain of title.
        Striking the language does not prevent preservation of the original
        record, outside of the chain of title, for historical or archival
        purposes. The legislature finds that striking racist, religious, and
        ethnic restrictions or covenants from the chain of title is no different
        than having an offensive statutory monument which the owner may entirely
        remove. So too should the owner be able to entirely remove the offensive
        written monument to racism or other unconstitutional discrimination." [
        2021 c 256 s 1 .]

        Application  2021 c 256: "This act applies to real estate transactions
        entered into on or after January 1, 2022." [ 2021 c 256 s 5 .]
      - >-
        RCW 10.97.070 - Disclosure of suspect's identity to victim.

        (1) Criminal justice agencies may, in their discretion, disclose to
        persons who have suffered physical loss, property damage, or injury
        compensable through civil action, the identity of persons suspected as
        being responsible for such loss, damage, or injury together with such
        information as the agency reasonably believes may be of assistance to
        the victim in obtaining civil redress. Such disclosure may be made
        without regard to whether the suspected offender is an adult or a
        juvenile, whether charges have or have not been filed, or a prosecuting
        authority has declined to file a charge or a charge has been dismissed.
        (2) Unless the agency determines release would interfere with an ongoing
        criminal investigation, in any action brought pursuant to this chapter,
        criminal justice agencies shall disclose identifying information,
        including photographs of suspects, if the acts are alleged by the
        plaintiff or victim to be a violation of RCW 9A.50.020 . (3) The
        disclosure by a criminal justice agency of investigative information
        pursuant to subsection (1) of this section shall not establish a duty to
        disclose any additional information concerning the same incident or make
        any subsequent disclosure of investigative information, except to the
        extent an additional disclosure is compelled by legal process.

        [ 1993 c 128 s 10 ; 1977 ex.s. c 314 s 7 .]

        Effective date  1993 c 128: See RCW 9A.50.902 .
      - >-
        RCW 65.16.110 - Affidavit to cover payment of fees.

        The affidavit of publication of all notices required by law to be
        published shall state the full amount of the fee charged for such
        publication and that the fee has been paid in full.

        [ 1921 c 99 s 7 ; RRS s 253-7.]
  - source_sentence: >-
      Represent this sentence for searching relevant passages: RCW 87.80 form
      and contents of notice
    sentences:
      - >-
        RCW 36.32.270 - Competitive bids—Exemptions.

        The county legislative authority may waive the competitive bidding
        requirements of this chapter pursuant to RCW 39.04.280 if an exemption
        contained within that section applies to the purchase or public work.

        [ 1998 c 278 s 4 ; 1963 c 4 s 36.32.270 . Prior: 1961 c 169 s 3 ; 1945 c
        61 s 4 ; Rem. Supp. 1945 s 10322-18.]
      - >-
        RCW 87.80.060 - Form and contents of notice.

        The notice of the hearing on the petition shall state that a petition
        requesting the creation of a board of joint control to administer the
        facilities and activities, naming them if named in the petition, has
        been filed with the board of county commissioners of the county, naming
        the county; that the board of joint control, if it is created, will have
        authority to provide for apportionment of costs to carry out the objects
        of its creation among the member irrigation entities (naming them);
        shall state the day, hour, and place of the hearing on the petition;
        shall state that any person interested in the creation of the board of
        joint control may appear on or before the day of hearing on the
        petition, and show cause in writing, if any, why the same should not be
        granted, and the notice shall be over the name of the clerk of the board
        of county commissioners.

        [ 1996 c 320 s 6 ; 1949 c 56 s 6 ; Rem. Supp. 1949 s 7505-25.]
      - >-
        RCW 18.88B.090 - Reinstatement of certification.

        (1) A certificate that has been expired for five years or less may be
        reinstated if the person holding the expired certificate: (a) Completes
        an abbreviated application form; (b) Pays any necessary fees, including
        the current certification fee, late renewal fees, and expired credential
        reissuance fees, unless exempt pursuant to *RCW 18.88B.091 ; (c)
        Provides a written declaration that no action has been taken by a state
        or federal jurisdiction or hospital which would prevent or restrict the
        person holding the expired certificate from practicing as a home care
        aide; (d) Provides a written declaration that the person holding the
        expired certificate has not voluntarily given up any credential or
        privilege or has not been restricted from practicing as a home care aide
        in lieu of or to avoid formal action; and (e) Submits to a state and
        federal background check as required by RCW 74.39A.056 , if the
        certificate has been expired for more than one year. (2) In addition to
        meeting the requirements of subsection (1) of this section, a
        certificate that has been expired for more than five years may be
        reinstated if the person holding the expired certificate demonstrates
        competence to the standards established by the secretary and meets other
        requirements established by the secretary.

        [ 2023 c 424 s 3 .]

        *Reviser's note: RCW 18.88B.091 expired July 1, 2025.
  - source_sentence: 'Represent this sentence for searching relevant passages: RCW 48.30A.055'
    sentences:
      - >-
        RCW 48.30A.055 - Insurance antifraud
        plan—Review—Disapproval—Notice—Audit to ensure compliance.

        If after review of an insurer's antifraud plan, the commissioner finds
        that the plan does not comply with RCW 48.30A.050 , the commissioner may
        disapprove the antifraud plan. Notice of disapproval must include a
        statement of the specific reasons for disapproval. The insurer shall
        refile a plan disapproved by the commissioner within sixty days of the
        date of the notice of disapproval. The commissioner may audit insurers
        to ensure compliance with antifraud plans.

        [ 1995 c 285 s 11 .]
      - >-
        RCW 18.160.090 - Surety bond—Security deposit—Venue and time limit for
        actions upon bonds—Limit of liability of surety—Payment of claims.

        (1) Before granting a license under this chapter, the director of fire
        protection shall require that the applicant file with the state director
        of fire protection a surety bond issued by a surety insurer who meets
        the requirements of chapter 48.28 RCW in a form acceptable to the
        director of fire protection running to the state of Washington in the
        penal sum of ten thousand dollars. However, the surety bond for a fire
        protection sprinkler system contractor whose business is restricted
        solely to NFPA 13-D or NFPA 13-R systems shall be in the penal sum of
        six thousand dollars. The bond shall be conditioned that the applicant
        will pay all purchasers of fire protection sprinkler systems with whom
        the applicant has a contract for the applicant to install, inspect,
        maintain, or service a fire protection sprinkler system, and who have
        obtained a judgment against the applicant for the breach of such a
        contract. The term "purchaser" means an owner of property who has
        entered into a contract for the installation of a fire protection
        sprinkler system on that property, or a contractor who contracts to
        install, inspect, maintain, or service such a system with an owner of
        property and subcontracts the work to the applicant. No other person,
        including, but not limited to, persons who supply labor, materials, or
        rental equipment to the applicant, shall have any rights against the
        bond. (2) In lieu of the surety bond required by this section the
        applicant may file with the director of fire protection a deposit
        consisting of cash or other security acceptable to the director of fire
        protection in an amount equal to the penal sum of the required bond. The
        director of fire protection may adopt rules necessary for the proper
        administration of the security. (3) Before granting renewal of a fire
        protection sprinkler system contractor's license to any applicant, the
        director of fire protection shall require that the applicant file with
        the director satisfactory evidence that the surety bond or cash deposit
        is in full force. (4) Any purchaser of a fire protection sprinkler
        system having a claim against the licensee for the breach of a contract
        for the licensee to install, inspect, maintain, or service a fire
        protection sprinkler system may bring suit upon such bond in superior
        court of the county in which the work was done or of any county in which
        jurisdiction of the licensee may be had. Any such action must be brought
        not later than one year after the expiration of the licensee's license
        or renewal license then in effect at the time of the alleged breach of
        contract. (5) The bond shall be considered one continuous obligation,
        and the surety upon the bond shall not be liable in aggregate or
        cumulative amount exceeding ten thousand dollars, or six thousand
        dollars if the bond was issued to a licensee whose business is
        restricted solely to NFPA 13-D or NFPA 13-R systems, regardless of the
        number of years the bond is in effect, or whether it is reinstated,
        renewed, reissued, or otherwise continued, and regardless of the year in
        which any claim accrued. The bond shall not be liable for any liability
        of the licensee for tortious acts, whether or not such liability is
        imposed by statute or common law, or is imposed by contract. The bond
        shall not be a substitute or supplemental to any liability or other
        insurance required by law or by the contract. (6) If the surety desires
        to make payment without awaiting court action against it, the amount of
        the bond shall be reduced to the extent of any payment made by the
        surety in good faith under the bond. Any payment shall be based on final
        judgments received by the surety. (7) Claims against the bond shall be
        satisfied from the bond in the following order: (a) Claims by a
        purchaser of a fire protection sprinkler system for the breach of a
        contract for the licensee to install, inspect, maintain, or service a
        fire protection sprinkler system; (b) Any court costs, interest, and
        attorneys' fees the plaintiff may be entitled to recover by contract,
        statute, or court rule. A condition precedent to the surety being liable
        to any claimant is a final judgment against the licensee, unless the
        surety desires to make payment without awaiting court action. In the
        event of a dispute regarding the apportionment of the bond proceeds
        among claimants, the surety may bring an action for interpleader against
        all claimants upon the bond. (8) Any purchaser of a fire protection
        sprinkler system having an unsatisfied final judgment against the
        licensee for the breach of a contract for the licensee to install,
        inspect, maintain, or service a fire protection sprinkler system may
        execute upon the security held by the director of fire protection by
        serving a certified copy of the unsatisfied final judgment by registered
        or certified mail upon the director within one year of the date of entry
        of such judgment. Upon the receipt of service of such certified copy the
        director shall pay or order paid from the deposit, through the registry
        of the court which rendered judgment, towards the amount of the
        unsatisfied judgment. The priority of payment by the director shall be
        the order of receipt by the director, but the director shall have no
        liability for payment in excess of the amount of the deposit.

        [ 1991 sp.s. c 6 s 1 .]
      - >-
        RCW 18.100.010 - Legislative intent.

        It is the legislative intent to provide for the incorporation of an
        individual or group of individuals to render the same professional
        service to the public for which such individuals are required by law to
        be licensed or to obtain other legal authorization.

        [ 1969 c 122 s 1 .]
  - source_sentence: >-
      Represent this sentence for searching relevant passages: washington RCW
      nonprofit canon law
    sentences:
      - >-
        RCW 43.21C.220 - Incorporation of city or town exempt from chapter.

        The incorporation of a city or town is exempted from compliance with
        this chapter.

        [ 1982 c 220 s 6 .]

        Severability  1982 c 220: See note following RCW 36.93.100 .

        Incorporation proceedings exempt from chapter: RCW 36.93.170 .
      - >-
        RCW 79A.05.085 - Lease of parklands for television stations—Lease rental
        rates, terms—Attachment of antennae.

        The commission shall determine the fair market value for television
        station leases based upon independent appraisals and existing leases for
        television stations shall be extended at said fair market rental for at
        least one period of not more than twenty years: PROVIDED, That the rates
        in said leases shall be renegotiated at five year intervals: PROVIDED
        FURTHER, That said stations shall permit the attachment of antennae of
        publicly operated broadcast and microwave stations where electronically
        practical to combine the towers: PROVIDED FURTHER, That notwithstanding
        any term to the contrary in any lease, this section shall not preclude
        the commission from prescribing new and reasonable lease terms relating
        to the modification, placement, or design of facilities operated by or
        for a station, and any extension of a lease granted under this section
        shall be subject to this proviso: PROVIDED FURTHER, That notwithstanding
        any other provision of law the director in his or her discretion may
        waive any requirement that any environmental impact statement or
        environmental assessment be submitted as to any lease negotiated and
        signed between January 1, 1974, and December 31, 1974.

        [ 2013 c 23 s 265 ; 1974 ex.s. c 151 s 1 . Formerly RCW 43.51.063 .]
      - >-
        RCW 24.03A.050 - Subordination to canon law.

        To the extent religious doctrine or canon law governing the internal
        affairs of a nonprofit corporation is inconsistent with this chapter,
        the religious doctrine or canon law controls to the extent required by
        the United States Constitution, the state Constitution, or both.

        [ 2021 c 176 s 1110 .]

        Effective date  2021 c 176: See note following RCW 24.03A.005 .
pipeline_tag: sentence-similarity
library_name: sentence-transformers
tags:
  - legal
  - law
  - WA
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - dense
  - loss:MultipleNegativesRankingLoss
model-index:
  - name: washington-state-law-embedding-model-Base
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: RCW Validation
          type: rcw-validation
        metrics:
          - type: accuracy_at_10
            value: 0.8441
            name: Accuracy@10
          - type: precision_at_10
            value: 0.0844
            name: Precision@10
          - type: recall_at_10
            value: 0.8441
            name: Recall@10
          - type: accuracy_at_1
            value: 0.0891
            name: Accuracy@1
          - type: accuracy_at_3
            value: 0.2595
            name: Accuracy@3
          - type: accuracy_at_5
            value: 0.4318
            name: Accuracy@5
          - type: ndcg_at_10
            value: 0.3876
            name: NDCG@10
          - type: mrr_at_10
            value: 0.2524
            name: MRR@10
          - type: map_at_100
            value: 0.2595
            name: MAP@100
datasets:
  - CSI-lab/RCW_2025_Positive_Query_Pairs

Washington-state-law-embedding-model-Base

Washington-state-law-embedding-model-Base is a highly specialized embedding model fine-tuned specifically for Legal Information Retrieval (IR) within the State of Washington.

Generic embedding models often perform suboptimally on legal texts due to the semantic gap between natural language questions (e.g., "What dollar amount makes a theft a first degree felony?") and formal statutory legalese. This model bridges that gap, allowing plain-English queries, legal scenarios, and document drafts to be accurately mapped to their corresponding Washington State statutes (Revised Code of Washington - RCW).

Available Models

Model Language Description Query Prefix
CSI-lab/Washington-state-law-embedding-model-Large English Fine-tuned large model (1024d) for WA State RCWs. Best performance. Represent this sentence for searching relevant passages:
CSI-lab/Washington-state-law-embedding-model-Base English Fine-tuned base model (768d) for WA State RCWs. Faster inference. Represent this sentence for searching relevant passages:

Model Overview

  • Base Model: BAAI/bge-base-en-v1.5
  • Task: Semantic Search / Information Retrieval / Legal Preemption Analysis
  • Language: English (Legal Domain)
  • Max Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Key Features

  • Fine-tuned for Washington State legal domain (RCW)
  • Optimized for semantic search and retrieval tasks
  • Supports natural language legal queries
  • Designed for RAG-based legal assistants
  • Improved retrieval accuracy over base BGE embeddings

Intended Use Cases

This model is optimized to act as the retriever component in legal Retrieval-Augmented Generation (RAG) pipelines. Primary use cases include:

  1. Statutory Cross-Referencing: Mapping natural language legal questions to specific RCWs.
  2. Preemption Checking: Automatically retrieving state laws that may preempt or conflict with proposed municipal ordinances.
  3. Legal Research Automation: Clustering and searching local agency drafts against established state frameworks.
  4. AI Legal Assistants: Powering chatbots and research tools that require accurate retrieval of Washington State laws before generating an answer.
  5. Automated Compliance: Scanning contracts or external drafts against established state legislative frameworks.

Technical Details & Training Methodology

The Semantic Gap

A standard dense retriever often fails on legal tasks because it relies on vocabulary overlap rather than conceptual legal mapping. To address this, Washington-state-law-embedding-model was fine-tuned using a synthetic, high-variance dataset.

Training Data

The model was fine-tuned on synthetic legal query–passage pairs generated from Washington State RCW statutes.

The dataset includes:

  • Size: 455,424 training samples
  • Natural language paraphrases of legal questions
  • Hypothetical legal scenarios
  • Statute-grounded positive document matches

The dataset spans 500+ legal categories derived from RCW structure.

Hyperparameters & Architecture

  • Loss Function: Multiple Negatives Ranking (MNR) Loss
  • Batch Size: 256
  • Epochs: 4
  • fp16: True
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin
  • Learning Rate Decay: Linear
  • Infrastructure: High-Performance Computing (HPC) Cluster

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Evaluation Metrics

The model was evaluated on a rigorously held-out validation set of synthetic municipal drafts mapped 1-to-1 against Washington State RCWs. The fine-tuning process yielded a +31.27% absolute improvement in Recall@10 over the base model.

Metric Base Model (Untrained) Fine-Tuned (Epoch 4) Absolute Improvement
Recall@10 0.5314 0.8441 + 31.27%
Recall@5 0.2636 0.4318 + 16.82%
NDCG@10 0.2341 0.3876 + 15.35%
MRR@10 0.1462 0.2524 + 10.62%

Interpretation: When a user asks this model a legal question in plain English, there is an 84.4% probability that the exact governing state law will be returned in the top 10 search results.

Limitations

  • This model does not provide legal advice.
  • Performance is limited to Washington State law (RCW) and may not generalize to other jurisdictions.
  • Outputs depend on the quality of the underlying document corpus.
  • Should be used as a retrieval tool, not a final decision-making system.

Usage Examples

Semantic Search with sentence-transformers

Warning: Because this model is built on the BGE architecture, you must append the specific instruction prefix
"Represent this sentence for searching relevant passages:"
to your search queries to achieve optimal performance.

Do not add this prefix to the database documents.

import torch
from sentence_transformers import SentenceTransformer, util

# 1. Load the fine-tuned model
model = SentenceTransformer('CSI-lab/Washington-state-law-embedding-model-Base')

# 2. Define the laws (Your Vector Database)
laws = [
    "RCW 9A.56.030: Theft in the first degree. A person is guilty of theft in the first degree if he or she commits theft of property or services which exceed(s) five thousand dollars in value.",
    "RCW 46.61.502: Driving under the influence. A person is guilty of driving while under the influence of intoxicating liquor...",
    "RCW 9A.36.011: Assault in the first degree. A person is guilty of assault in the first degree if he or she..."
]

# 3. Define the user's search query
user_query = "What dollar amount makes a theft a first degree felony?"

# 4. CRITICAL: Add the required BGE prefix to the query ONLY
query_prefix = "Represent this sentence for searching relevant passages: "
formatted_query = query_prefix + user_query

# 5. Encode the documents and the query
law_embeddings = model.encode(laws, convert_to_tensor=True)
query_embedding = model.encode(formatted_query, convert_to_tensor=True)

# 6. Calculate Cosine Similarity
cosine_scores = util.cos_sim(query_embedding, law_embeddings)

# 7. Print the top result
best_idx = cosine_scores.argmax().item()
print(f"Top Match: {laws[best_idx]}")
print(f"Similarity Score: {cosine_scores[0][best_idx]:.4f}")

Model Citation

@misc{washington_state_law_embedding_base_2026,
  title={Washington-state-law-embedding-model-Base: Fine-Tuned Dense Retrieval for Washington State Law},
  author={Tomar, Shlok},
  year={2026},
  publisher={Hugging Face}
  howpublished={\url{https://huggingface.co/CSI-lab/Washington-state-law-embedding-model-Base}},
  note={Hugging Face Model Repository}
}

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}