Instructions to use CSI-lab/Washington-state-law-embedding-model-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use CSI-lab/Washington-state-law-embedding-model-Base with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("CSI-lab/Washington-state-law-embedding-model-Base") sentences = [ "Represent this sentence for searching relevant passages: RCW 36.75.190", "RCW 36.75.190 - Engineer's report—Hearing—Order.\nUpon report by the examining engineer for the erection and construction upon any county road, or for acquisition by purchase, gift or condemnation of any bridge, trestle, or any other structure crossing any stream, body of water, gulch, navigable water, swamp or other topographical formation, which constitutes a boundary, publication shall be made and joint hearing had upon such report in the same manner and upon the same procedure as in the case of resolution or petition for the laying out and establishing of county roads. If upon the hearing the governing authorities jointly order the erection and construction or acquisition of such bridge, trestle, or other structure, they may jointly acquire land necessary therefor by purchase, gift, or condemnation in the manner as provided for acquiring land for county roads, and shall advertise calls for bids, require contractor's deposit and bond, award contracts, and supervise construction as by law provided and in the same manner as required in the case of the construction of county roads. Any such bridges, trestles or other structures may be operated free, or may be operated as toll bridges, trestles, or other structures under the provisions of the laws of this state relating thereto.\n[ 1963 c 4 s 36.75.190 . Prior: 1937 c 187 s 29 ; RRS s 6450-29.]", "RCW 28B.30.285 - State treasurer receiving agent of certain federal aid—Trust funds not subject to appropriation.\nAll federal grants received by the state treasurer pursuant to RCW 28B.30.270 shall be deemed trust funds under the control of the state treasurer and not subject to appropriation by the legislature.\n[ 1969 ex.s. c 223 s 28B.30.285 . Prior: 1955 c 66 s 4 . Formerly RCW 28.80.224 .]", "RCW 48.09.160 - Directors—Disqualification.\nNo individual shall be a director of a domestic mutual insurer by reason of his or her holding public office. Adjudication as a bankrupt or taking the benefit of any insolvency law or making a general assignment for the benefit of creditors disqualifies an individual from being or acting as a director.\n[ 2009 c 549 s 7037 ; 1947 c 79 s .09.16; Rem. Supp. 1947 s 45.09.16.]" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
license: mit
language:
- en
metrics:
- accuracy
- recall
base_model:
- BAAI/bge-base-en-v1.5
widget:
- source_sentence: 'Represent this sentence for searching relevant passages: RCW 36.75.190'
sentences:
- >-
RCW 36.75.190 - Engineer's report—Hearing—Order.
Upon report by the examining engineer for the erection and construction
upon any county road, or for acquisition by purchase, gift or
condemnation of any bridge, trestle, or any other structure crossing any
stream, body of water, gulch, navigable water, swamp or other
topographical formation, which constitutes a boundary, publication shall
be made and joint hearing had upon such report in the same manner and
upon the same procedure as in the case of resolution or petition for the
laying out and establishing of county roads. If upon the hearing the
governing authorities jointly order the erection and construction or
acquisition of such bridge, trestle, or other structure, they may
jointly acquire land necessary therefor by purchase, gift, or
condemnation in the manner as provided for acquiring land for county
roads, and shall advertise calls for bids, require contractor's deposit
and bond, award contracts, and supervise construction as by law provided
and in the same manner as required in the case of the construction of
county roads. Any such bridges, trestles or other structures may be
operated free, or may be operated as toll bridges, trestles, or other
structures under the provisions of the laws of this state relating
thereto.
[ 1963 c 4 s 36.75.190 . Prior: 1937 c 187 s 29 ; RRS s 6450-29.]
- >-
RCW 28B.30.285 - State treasurer receiving agent of certain federal
aid—Trust funds not subject to appropriation.
All federal grants received by the state treasurer pursuant to RCW
28B.30.270 shall be deemed trust funds under the control of the state
treasurer and not subject to appropriation by the legislature.
[ 1969 ex.s. c 223 s 28B.30.285 . Prior: 1955 c 66 s 4 . Formerly RCW
28.80.224 .]
- >-
RCW 48.09.160 - Directors—Disqualification.
No individual shall be a director of a domestic mutual insurer by reason
of his or her holding public office. Adjudication as a bankrupt or
taking the benefit of any insolvency law or making a general assignment
for the benefit of creditors disqualifies an individual from being or
acting as a director.
[ 2009 c 549 s 7037 ; 1947 c 79 s .09.16; Rem. Supp. 1947 s 45.09.16.]
- source_sentence: >-
Represent this sentence for searching relevant passages: RCW disclosure
suspect identity civil redress
sentences:
- >-
RCW 49.60.525 - Review of existing recorded covenants and deed
restrictions to identify documents that include racial or other unlawful
restrictions on property ownership.(Expires July 1, 2027.)
(1) Subject to the availability of amounts appropriated for this
specific purpose, the University of Washington and Eastern Washington
University shall review existing recorded covenants and deed
restrictions to identify those recorded documents that include racial or
other restrictions on property ownership or use against protected
classes that are unlawful under RCW 49.60.224 . For properties subject
to such racial and other unlawful restrictions, the universities shall
provide notice to the property owner and to the county auditor of the
county in which the property is located. The universities shall provide
information to the property owner on how such provisions can be struck
pursuant to RCW 49.60.227 . The universities may contract with other
public and private not-for-profit higher education institutions that are
regionally accredited to carry out the review and notification
requirements of this section. (2) This section expires July 1, 2027.
[ 2021 c 256 s 2 .]
Findings — Intent — 2021 c 256: "The legislature finds that the
existence of racial, religious, or ethnic-based property restrictions or
covenants on a deed or chain of title for real property is like having a
monument to racism on that property and is repugnant to the tenets of
equality. Furthermore, such restrictions and covenants may cause mental
anguish and tarnish a property owner's sense of ownership in the
property because the owner feels as though they have participated in a
racist act themselves. It is the intent of the legislature that the
owner, occupant, or tenant or homeowners' association board of the
property which is subject to an unlawful deed restriction or covenant
pursuant to RCW 49.60.224 is entitled to have discriminatory covenants
and restrictions that are contrary to public policy struck from their
chain of title. The legislature has presented two ways this can be
accomplished through RCW 49.60.227 (1) (a) and (b). If the owner,
occupant, or tenant or homeowners' association board of the property
elects to pursue a judicial remedy, the legislature intends that the
court issue a declaratory judgment ordering the county auditor, or in
charter counties the county official charged with the responsibility for
recording instruments in the county records, to entirely strike the
racist or otherwise discriminatory covenants from the chain of title.
Striking the language does not prevent preservation of the original
record, outside of the chain of title, for historical or archival
purposes. The legislature finds that striking racist, religious, and
ethnic restrictions or covenants from the chain of title is no different
than having an offensive statutory monument which the owner may entirely
remove. So too should the owner be able to entirely remove the offensive
written monument to racism or other unconstitutional discrimination." [
2021 c 256 s 1 .]
Application — 2021 c 256: "This act applies to real estate transactions
entered into on or after January 1, 2022." [ 2021 c 256 s 5 .]
- >-
RCW 10.97.070 - Disclosure of suspect's identity to victim.
(1) Criminal justice agencies may, in their discretion, disclose to
persons who have suffered physical loss, property damage, or injury
compensable through civil action, the identity of persons suspected as
being responsible for such loss, damage, or injury together with such
information as the agency reasonably believes may be of assistance to
the victim in obtaining civil redress. Such disclosure may be made
without regard to whether the suspected offender is an adult or a
juvenile, whether charges have or have not been filed, or a prosecuting
authority has declined to file a charge or a charge has been dismissed.
(2) Unless the agency determines release would interfere with an ongoing
criminal investigation, in any action brought pursuant to this chapter,
criminal justice agencies shall disclose identifying information,
including photographs of suspects, if the acts are alleged by the
plaintiff or victim to be a violation of RCW 9A.50.020 . (3) The
disclosure by a criminal justice agency of investigative information
pursuant to subsection (1) of this section shall not establish a duty to
disclose any additional information concerning the same incident or make
any subsequent disclosure of investigative information, except to the
extent an additional disclosure is compelled by legal process.
[ 1993 c 128 s 10 ; 1977 ex.s. c 314 s 7 .]
Effective date — 1993 c 128: See RCW 9A.50.902 .
- >-
RCW 65.16.110 - Affidavit to cover payment of fees.
The affidavit of publication of all notices required by law to be
published shall state the full amount of the fee charged for such
publication and that the fee has been paid in full.
[ 1921 c 99 s 7 ; RRS s 253-7.]
- source_sentence: >-
Represent this sentence for searching relevant passages: RCW 87.80 form
and contents of notice
sentences:
- >-
RCW 36.32.270 - Competitive bids—Exemptions.
The county legislative authority may waive the competitive bidding
requirements of this chapter pursuant to RCW 39.04.280 if an exemption
contained within that section applies to the purchase or public work.
[ 1998 c 278 s 4 ; 1963 c 4 s 36.32.270 . Prior: 1961 c 169 s 3 ; 1945 c
61 s 4 ; Rem. Supp. 1945 s 10322-18.]
- >-
RCW 87.80.060 - Form and contents of notice.
The notice of the hearing on the petition shall state that a petition
requesting the creation of a board of joint control to administer the
facilities and activities, naming them if named in the petition, has
been filed with the board of county commissioners of the county, naming
the county; that the board of joint control, if it is created, will have
authority to provide for apportionment of costs to carry out the objects
of its creation among the member irrigation entities (naming them);
shall state the day, hour, and place of the hearing on the petition;
shall state that any person interested in the creation of the board of
joint control may appear on or before the day of hearing on the
petition, and show cause in writing, if any, why the same should not be
granted, and the notice shall be over the name of the clerk of the board
of county commissioners.
[ 1996 c 320 s 6 ; 1949 c 56 s 6 ; Rem. Supp. 1949 s 7505-25.]
- >-
RCW 18.88B.090 - Reinstatement of certification.
(1) A certificate that has been expired for five years or less may be
reinstated if the person holding the expired certificate: (a) Completes
an abbreviated application form; (b) Pays any necessary fees, including
the current certification fee, late renewal fees, and expired credential
reissuance fees, unless exempt pursuant to *RCW 18.88B.091 ; (c)
Provides a written declaration that no action has been taken by a state
or federal jurisdiction or hospital which would prevent or restrict the
person holding the expired certificate from practicing as a home care
aide; (d) Provides a written declaration that the person holding the
expired certificate has not voluntarily given up any credential or
privilege or has not been restricted from practicing as a home care aide
in lieu of or to avoid formal action; and (e) Submits to a state and
federal background check as required by RCW 74.39A.056 , if the
certificate has been expired for more than one year. (2) In addition to
meeting the requirements of subsection (1) of this section, a
certificate that has been expired for more than five years may be
reinstated if the person holding the expired certificate demonstrates
competence to the standards established by the secretary and meets other
requirements established by the secretary.
[ 2023 c 424 s 3 .]
*Reviser's note: RCW 18.88B.091 expired July 1, 2025.
- source_sentence: 'Represent this sentence for searching relevant passages: RCW 48.30A.055'
sentences:
- >-
RCW 48.30A.055 - Insurance antifraud
plan—Review—Disapproval—Notice—Audit to ensure compliance.
If after review of an insurer's antifraud plan, the commissioner finds
that the plan does not comply with RCW 48.30A.050 , the commissioner may
disapprove the antifraud plan. Notice of disapproval must include a
statement of the specific reasons for disapproval. The insurer shall
refile a plan disapproved by the commissioner within sixty days of the
date of the notice of disapproval. The commissioner may audit insurers
to ensure compliance with antifraud plans.
[ 1995 c 285 s 11 .]
- >-
RCW 18.160.090 - Surety bond—Security deposit—Venue and time limit for
actions upon bonds—Limit of liability of surety—Payment of claims.
(1) Before granting a license under this chapter, the director of fire
protection shall require that the applicant file with the state director
of fire protection a surety bond issued by a surety insurer who meets
the requirements of chapter 48.28 RCW in a form acceptable to the
director of fire protection running to the state of Washington in the
penal sum of ten thousand dollars. However, the surety bond for a fire
protection sprinkler system contractor whose business is restricted
solely to NFPA 13-D or NFPA 13-R systems shall be in the penal sum of
six thousand dollars. The bond shall be conditioned that the applicant
will pay all purchasers of fire protection sprinkler systems with whom
the applicant has a contract for the applicant to install, inspect,
maintain, or service a fire protection sprinkler system, and who have
obtained a judgment against the applicant for the breach of such a
contract. The term "purchaser" means an owner of property who has
entered into a contract for the installation of a fire protection
sprinkler system on that property, or a contractor who contracts to
install, inspect, maintain, or service such a system with an owner of
property and subcontracts the work to the applicant. No other person,
including, but not limited to, persons who supply labor, materials, or
rental equipment to the applicant, shall have any rights against the
bond. (2) In lieu of the surety bond required by this section the
applicant may file with the director of fire protection a deposit
consisting of cash or other security acceptable to the director of fire
protection in an amount equal to the penal sum of the required bond. The
director of fire protection may adopt rules necessary for the proper
administration of the security. (3) Before granting renewal of a fire
protection sprinkler system contractor's license to any applicant, the
director of fire protection shall require that the applicant file with
the director satisfactory evidence that the surety bond or cash deposit
is in full force. (4) Any purchaser of a fire protection sprinkler
system having a claim against the licensee for the breach of a contract
for the licensee to install, inspect, maintain, or service a fire
protection sprinkler system may bring suit upon such bond in superior
court of the county in which the work was done or of any county in which
jurisdiction of the licensee may be had. Any such action must be brought
not later than one year after the expiration of the licensee's license
or renewal license then in effect at the time of the alleged breach of
contract. (5) The bond shall be considered one continuous obligation,
and the surety upon the bond shall not be liable in aggregate or
cumulative amount exceeding ten thousand dollars, or six thousand
dollars if the bond was issued to a licensee whose business is
restricted solely to NFPA 13-D or NFPA 13-R systems, regardless of the
number of years the bond is in effect, or whether it is reinstated,
renewed, reissued, or otherwise continued, and regardless of the year in
which any claim accrued. The bond shall not be liable for any liability
of the licensee for tortious acts, whether or not such liability is
imposed by statute or common law, or is imposed by contract. The bond
shall not be a substitute or supplemental to any liability or other
insurance required by law or by the contract. (6) If the surety desires
to make payment without awaiting court action against it, the amount of
the bond shall be reduced to the extent of any payment made by the
surety in good faith under the bond. Any payment shall be based on final
judgments received by the surety. (7) Claims against the bond shall be
satisfied from the bond in the following order: (a) Claims by a
purchaser of a fire protection sprinkler system for the breach of a
contract for the licensee to install, inspect, maintain, or service a
fire protection sprinkler system; (b) Any court costs, interest, and
attorneys' fees the plaintiff may be entitled to recover by contract,
statute, or court rule. A condition precedent to the surety being liable
to any claimant is a final judgment against the licensee, unless the
surety desires to make payment without awaiting court action. In the
event of a dispute regarding the apportionment of the bond proceeds
among claimants, the surety may bring an action for interpleader against
all claimants upon the bond. (8) Any purchaser of a fire protection
sprinkler system having an unsatisfied final judgment against the
licensee for the breach of a contract for the licensee to install,
inspect, maintain, or service a fire protection sprinkler system may
execute upon the security held by the director of fire protection by
serving a certified copy of the unsatisfied final judgment by registered
or certified mail upon the director within one year of the date of entry
of such judgment. Upon the receipt of service of such certified copy the
director shall pay or order paid from the deposit, through the registry
of the court which rendered judgment, towards the amount of the
unsatisfied judgment. The priority of payment by the director shall be
the order of receipt by the director, but the director shall have no
liability for payment in excess of the amount of the deposit.
[ 1991 sp.s. c 6 s 1 .]
- >-
RCW 18.100.010 - Legislative intent.
It is the legislative intent to provide for the incorporation of an
individual or group of individuals to render the same professional
service to the public for which such individuals are required by law to
be licensed or to obtain other legal authorization.
[ 1969 c 122 s 1 .]
- source_sentence: >-
Represent this sentence for searching relevant passages: washington RCW
nonprofit canon law
sentences:
- >-
RCW 43.21C.220 - Incorporation of city or town exempt from chapter.
The incorporation of a city or town is exempted from compliance with
this chapter.
[ 1982 c 220 s 6 .]
Severability — 1982 c 220: See note following RCW 36.93.100 .
Incorporation proceedings exempt from chapter: RCW 36.93.170 .
- >-
RCW 79A.05.085 - Lease of parklands for television stations—Lease rental
rates, terms—Attachment of antennae.
The commission shall determine the fair market value for television
station leases based upon independent appraisals and existing leases for
television stations shall be extended at said fair market rental for at
least one period of not more than twenty years: PROVIDED, That the rates
in said leases shall be renegotiated at five year intervals: PROVIDED
FURTHER, That said stations shall permit the attachment of antennae of
publicly operated broadcast and microwave stations where electronically
practical to combine the towers: PROVIDED FURTHER, That notwithstanding
any term to the contrary in any lease, this section shall not preclude
the commission from prescribing new and reasonable lease terms relating
to the modification, placement, or design of facilities operated by or
for a station, and any extension of a lease granted under this section
shall be subject to this proviso: PROVIDED FURTHER, That notwithstanding
any other provision of law the director in his or her discretion may
waive any requirement that any environmental impact statement or
environmental assessment be submitted as to any lease negotiated and
signed between January 1, 1974, and December 31, 1974.
[ 2013 c 23 s 265 ; 1974 ex.s. c 151 s 1 . Formerly RCW 43.51.063 .]
- >-
RCW 24.03A.050 - Subordination to canon law.
To the extent religious doctrine or canon law governing the internal
affairs of a nonprofit corporation is inconsistent with this chapter,
the religious doctrine or canon law controls to the extent required by
the United States Constitution, the state Constitution, or both.
[ 2021 c 176 s 1110 .]
Effective date — 2021 c 176: See note following RCW 24.03A.005 .
pipeline_tag: sentence-similarity
library_name: sentence-transformers
tags:
- legal
- law
- WA
- sentence-transformers
- feature-extraction
- sentence-similarity
- dense
- loss:MultipleNegativesRankingLoss
model-index:
- name: washington-state-law-embedding-model-Base
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: RCW Validation
type: rcw-validation
metrics:
- type: accuracy_at_10
value: 0.8441
name: Accuracy@10
- type: precision_at_10
value: 0.0844
name: Precision@10
- type: recall_at_10
value: 0.8441
name: Recall@10
- type: accuracy_at_1
value: 0.0891
name: Accuracy@1
- type: accuracy_at_3
value: 0.2595
name: Accuracy@3
- type: accuracy_at_5
value: 0.4318
name: Accuracy@5
- type: ndcg_at_10
value: 0.3876
name: NDCG@10
- type: mrr_at_10
value: 0.2524
name: MRR@10
- type: map_at_100
value: 0.2595
name: MAP@100
datasets:
- CSI-lab/RCW_2025_Positive_Query_Pairs
Washington-state-law-embedding-model-Base
Washington-state-law-embedding-model-Base is a highly specialized embedding model fine-tuned specifically for Legal Information Retrieval (IR) within the State of Washington.
Generic embedding models often perform suboptimally on legal texts due to the semantic gap between natural language questions (e.g., "What dollar amount makes a theft a first degree felony?") and formal statutory legalese. This model bridges that gap, allowing plain-English queries, legal scenarios, and document drafts to be accurately mapped to their corresponding Washington State statutes (Revised Code of Washington - RCW).
Available Models
| Model | Language | Description | Query Prefix |
|---|---|---|---|
| CSI-lab/Washington-state-law-embedding-model-Large | English | Fine-tuned large model (1024d) for WA State RCWs. Best performance. |
Represent this sentence for searching relevant passages: |
| CSI-lab/Washington-state-law-embedding-model-Base | English | Fine-tuned base model (768d) for WA State RCWs. Faster inference. |
Represent this sentence for searching relevant passages: |
Model Overview
- Base Model:
BAAI/bge-base-en-v1.5 - Task: Semantic Search / Information Retrieval / Legal Preemption Analysis
- Language: English (Legal Domain)
- Max Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Key Features
- Fine-tuned for Washington State legal domain (RCW)
- Optimized for semantic search and retrieval tasks
- Supports natural language legal queries
- Designed for RAG-based legal assistants
- Improved retrieval accuracy over base BGE embeddings
Intended Use Cases
This model is optimized to act as the retriever component in legal Retrieval-Augmented Generation (RAG) pipelines. Primary use cases include:
- Statutory Cross-Referencing: Mapping natural language legal questions to specific RCWs.
- Preemption Checking: Automatically retrieving state laws that may preempt or conflict with proposed municipal ordinances.
- Legal Research Automation: Clustering and searching local agency drafts against established state frameworks.
- AI Legal Assistants: Powering chatbots and research tools that require accurate retrieval of Washington State laws before generating an answer.
- Automated Compliance: Scanning contracts or external drafts against established state legislative frameworks.
Technical Details & Training Methodology
The Semantic Gap
A standard dense retriever often fails on legal tasks because it relies on vocabulary overlap rather than conceptual legal mapping. To address this, Washington-state-law-embedding-model was fine-tuned using a synthetic, high-variance dataset.
Training Data
The model was fine-tuned on synthetic legal query–passage pairs generated from Washington State RCW statutes.
The dataset includes:
- Size: 455,424 training samples
- Natural language paraphrases of legal questions
- Hypothetical legal scenarios
- Statute-grounded positive document matches
The dataset spans 500+ legal categories derived from RCW structure.
Hyperparameters & Architecture
- Loss Function: Multiple Negatives Ranking (MNR) Loss
- Batch Size: 256
- Epochs: 4
- fp16: True
- batch_sampler: no_duplicates
- multi_dataset_batch_sampler: round_robin
- Learning Rate Decay: Linear
- Infrastructure: High-Performance Computing (HPC) Cluster
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 256per_device_eval_batch_size: 256per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 4max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Evaluation Metrics
The model was evaluated on a rigorously held-out validation set of synthetic municipal drafts mapped 1-to-1 against Washington State RCWs. The fine-tuning process yielded a +31.27% absolute improvement in Recall@10 over the base model.
| Metric | Base Model (Untrained) | Fine-Tuned (Epoch 4) | Absolute Improvement |
|---|---|---|---|
| Recall@10 | 0.5314 | 0.8441 | + 31.27% |
| Recall@5 | 0.2636 | 0.4318 | + 16.82% |
| NDCG@10 | 0.2341 | 0.3876 | + 15.35% |
| MRR@10 | 0.1462 | 0.2524 | + 10.62% |
Interpretation: When a user asks this model a legal question in plain English, there is an 84.4% probability that the exact governing state law will be returned in the top 10 search results.
Limitations
- This model does not provide legal advice.
- Performance is limited to Washington State law (RCW) and may not generalize to other jurisdictions.
- Outputs depend on the quality of the underlying document corpus.
- Should be used as a retrieval tool, not a final decision-making system.
Usage Examples
Semantic Search with sentence-transformers
Warning: Because this model is built on the BGE architecture, you must append the specific instruction prefix"Represent this sentence for searching relevant passages:"
to your search queries to achieve optimal performance.
Do not add this prefix to the database documents.
import torch
from sentence_transformers import SentenceTransformer, util
# 1. Load the fine-tuned model
model = SentenceTransformer('CSI-lab/Washington-state-law-embedding-model-Base')
# 2. Define the laws (Your Vector Database)
laws = [
"RCW 9A.56.030: Theft in the first degree. A person is guilty of theft in the first degree if he or she commits theft of property or services which exceed(s) five thousand dollars in value.",
"RCW 46.61.502: Driving under the influence. A person is guilty of driving while under the influence of intoxicating liquor...",
"RCW 9A.36.011: Assault in the first degree. A person is guilty of assault in the first degree if he or she..."
]
# 3. Define the user's search query
user_query = "What dollar amount makes a theft a first degree felony?"
# 4. CRITICAL: Add the required BGE prefix to the query ONLY
query_prefix = "Represent this sentence for searching relevant passages: "
formatted_query = query_prefix + user_query
# 5. Encode the documents and the query
law_embeddings = model.encode(laws, convert_to_tensor=True)
query_embedding = model.encode(formatted_query, convert_to_tensor=True)
# 6. Calculate Cosine Similarity
cosine_scores = util.cos_sim(query_embedding, law_embeddings)
# 7. Print the top result
best_idx = cosine_scores.argmax().item()
print(f"Top Match: {laws[best_idx]}")
print(f"Similarity Score: {cosine_scores[0][best_idx]:.4f}")
Model Citation
@misc{washington_state_law_embedding_base_2026,
title={Washington-state-law-embedding-model-Base: Fine-Tuned Dense Retrieval for Washington State Law},
author={Tomar, Shlok},
year={2026},
publisher={Hugging Face}
howpublished={\url{https://huggingface.co/CSI-lab/Washington-state-law-embedding-model-Base}},
note={Hugging Face Model Repository}
}
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}