Symio-ai/legal-case-classifier

Model Description

Legal Case Classifier classifies legal opinions and filings by practice area, procedural posture, outcome, and relevance to a query. Given a case text or summary, it assigns multi-label classifications across 50+ legal categories and procedural postures.

Enables intelligent routing and prioritization of research results in the GLACIER pipeline.

Intended Use

  • Primary: Classify retrieved case law by practice area and relevance during research
  • Secondary: Categorize incoming case documents for knowledge base organization
  • Integration: Powers the research ranking and precedent matching pipeline

Task Type

text-classification -- Multi-label classification across legal taxonomy

Base Model

nlpaueb/legal-bert-base-uncased -- Legal domain pre-training provides strong baseline for legal text understanding

Training Data

Source Records Description
CourtListener Opinions ~5M Court opinions with practice area metadata
Caselaw Access Project ~6.7M Historical case law with topic classifications
FOLIO Legal Ontology taxonomy Legal practice area taxonomy (200+ categories)
Manual Annotations ~100K Expert-labeled cases for fine-grained categories

Classification Categories (top-level)

  • Practice areas: tort, contract, property, criminal, family, admin, constitutional, etc.
  • Procedural posture: MTD, MSJ, trial, appeal, cert, habeas, etc.
  • Outcome: plaintiff-win, defendant-win, mixed, remanded, reversed, etc.
  • Jurisdiction: federal, FL state, MS state, other
  • Relevance: on-point, persuasive, distinguishable, contrary

Benchmark Criteria (90%+ Target)

Metric Target Description
Practice Area F1 >= 91% Multi-label F1 for practice area classification
Procedural Posture Accuracy >= 93% Correct procedural posture identification
Outcome Accuracy >= 90% Win/loss/mixed classification
Relevance Precision >= 88% Must not over-classify as "on-point"
Throughput >= 100 docs/sec Batch classification speed

GLACIER Pipeline Integration

STAGE 2 (Research) --> case-classifier filters and categorizes research results
STAGE 3 (WDC #1)  --> classified cases inform jurisdiction/posture validation
STAGE 5 (WDC #2)  --> verify cited cases match the claimed practice area/posture

Training Configuration

  • Epochs: 8
  • Learning rate: 2e-5
  • Batch size: 32
  • Max sequence length: 512
  • Loss: Binary cross-entropy (multi-label)
  • Hardware: AWS SageMaker ml.g5.4xlarge

Limitations

  • US-focused classification taxonomy; international law categories are sparse
  • Very recent statutory changes may not be reflected in training data
  • Edge cases between practice areas may show lower confidence
  • Does not perform substantive legal analysis, only classification

Version History

Version Date Notes
v0.1 2026-04-10 Initial model card, repo created
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Symio-ai/legal-case-classifier

Finetuned
(100)
this model