Symio-ai/legal-precedent-matcher

Model Description

Legal Precedent Matcher identifies cases with similar factual patterns, legal issues, and procedural postures. Given a case summary or fact pattern, it retrieves the most analogous precedents from the vector store, scoring them by factual similarity, legal issue overlap, and jurisdictional relevance.

Enables the GLACIER pipeline to find controlling and persuasive authority that matches the user's case facts.

Intended Use

Primary: Match current case facts against known precedents for authority identification
Secondary: Find distinguishable cases for anticipating opposing counsel arguments
Integration: Powers precedent retrieval in GLACIER Stages 2, 3, and 5

Task Type

sentence-similarity -- Bi-encoder embedding model producing similarity scores for case-to-case matching

Base Model

voyage-ai/voyage-law-2 -- Purpose-built legal embedding model, deployed via AWS SageMaker (Voyage Law-2 marketplace subscription active)

Training Data

Source	Records	Description
CourtListener Paired Cases	~2M pairs	Cases citing each other as analogous or distinguishable
Legal Brief Citations	~500K pairs	Brief-to-cited-case pairs showing attorney judgment of relevance
Expert Fact Pattern Matches	~100K pairs	Attorney-curated "similar cases" datasets
Negative Pairs	~1M	Same practice area but factually dissimilar cases

Matching Dimensions

Factual similarity: How closely do the facts align?
Legal issue overlap: Are the same causes of action or defenses involved?
Procedural posture: Same stage of litigation?
Jurisdictional relevance: Binding vs. persuasive authority
Temporal proximity: Recency weighting (newer = higher weight)

Benchmark Criteria (90%+ Target)

Metric	Target	Description
Recall@20	>= 90%	Relevant precedents in top-20 results
MAP	>= 0.85	Mean average precision across test queries
Factual Similarity Correlation	>= 0.87	Agreement with expert fact-pattern ratings
Cross-jurisdiction accuracy	>= 82%	Finding persuasive authority across jurisdictions
Embedding throughput	>= 500 docs/sec	Batch encoding speed

GLACIER Pipeline Integration

STAGE 2 (Research) --> precedent-matcher retrieves analogous cases from bedrock-legal cache
STAGE 3 (WDC #1)  --> matched precedents validate legal theory strength
STAGE 5 (WDC #2)  --> verify cited precedents are truly analogous (not just topical)

Integration: Embeddings stored in bedrock-legal SQLite vector store. Retrieved via cosine similarity with Voyage Law-2 embeddings, then reranked by legal-research-ranker.

Training Configuration

Contrastive learning with in-batch negatives
Temperature: 0.05
Batch size: 128
Embedding dimension: 1024
Hardware: AWS SageMaker ml.g5.4xlarge (Voyage Law-2 endpoint)

Limitations

Factual similarity is inherently subjective; experts disagree ~15% of the time
Very novel fact patterns with no close precedent will return low-confidence matches
International and non-US case matching is not supported
Temporal dynamics (overruled/superseded) require separate validation

Version History

Version	Date	Notes
v0.1	2026-04-10	Initial model card, repo created

Downloads last month: -; Downloads are not tracked for this model. How to track