Symio-ai/legal-precedent-matcher
Model Description
Legal Precedent Matcher identifies cases with similar factual patterns, legal issues, and procedural postures. Given a case summary or fact pattern, it retrieves the most analogous precedents from the vector store, scoring them by factual similarity, legal issue overlap, and jurisdictional relevance.
Enables the GLACIER pipeline to find controlling and persuasive authority that matches the user's case facts.
Intended Use
- Primary: Match current case facts against known precedents for authority identification
- Secondary: Find distinguishable cases for anticipating opposing counsel arguments
- Integration: Powers precedent retrieval in GLACIER Stages 2, 3, and 5
Task Type
sentence-similarity -- Bi-encoder embedding model producing similarity scores for case-to-case matching
Base Model
voyage-ai/voyage-law-2 -- Purpose-built legal embedding model, deployed via AWS SageMaker (Voyage Law-2 marketplace subscription active)
Training Data
| Source | Records | Description |
|---|---|---|
| CourtListener Paired Cases | ~2M pairs | Cases citing each other as analogous or distinguishable |
| Legal Brief Citations | ~500K pairs | Brief-to-cited-case pairs showing attorney judgment of relevance |
| Expert Fact Pattern Matches | ~100K pairs | Attorney-curated "similar cases" datasets |
| Negative Pairs | ~1M | Same practice area but factually dissimilar cases |
Matching Dimensions
- Factual similarity: How closely do the facts align?
- Legal issue overlap: Are the same causes of action or defenses involved?
- Procedural posture: Same stage of litigation?
- Jurisdictional relevance: Binding vs. persuasive authority
- Temporal proximity: Recency weighting (newer = higher weight)
Benchmark Criteria (90%+ Target)
| Metric | Target | Description |
|---|---|---|
| Recall@20 | >= 90% | Relevant precedents in top-20 results |
| MAP | >= 0.85 | Mean average precision across test queries |
| Factual Similarity Correlation | >= 0.87 | Agreement with expert fact-pattern ratings |
| Cross-jurisdiction accuracy | >= 82% | Finding persuasive authority across jurisdictions |
| Embedding throughput | >= 500 docs/sec | Batch encoding speed |
GLACIER Pipeline Integration
STAGE 2 (Research) --> precedent-matcher retrieves analogous cases from bedrock-legal cache
STAGE 3 (WDC #1) --> matched precedents validate legal theory strength
STAGE 5 (WDC #2) --> verify cited precedents are truly analogous (not just topical)
Integration: Embeddings stored in bedrock-legal SQLite vector store. Retrieved via cosine similarity with Voyage Law-2 embeddings, then reranked by legal-research-ranker.
Training Configuration
- Contrastive learning with in-batch negatives
- Temperature: 0.05
- Batch size: 128
- Embedding dimension: 1024
- Hardware: AWS SageMaker ml.g5.4xlarge (Voyage Law-2 endpoint)
Limitations
- Factual similarity is inherently subjective; experts disagree ~15% of the time
- Very novel fact patterns with no close precedent will return low-confidence matches
- International and non-US case matching is not supported
- Temporal dynamics (overruled/superseded) require separate validation
Version History
| Version | Date | Notes |
|---|---|---|
| v0.1 | 2026-04-10 | Initial model card, repo created |