Symio-ai/legal-case-classifier
Model Description
Legal Case Classifier classifies legal opinions and filings by practice area, procedural posture, outcome, and relevance to a query. Given a case text or summary, it assigns multi-label classifications across 50+ legal categories and procedural postures.
Enables intelligent routing and prioritization of research results in the GLACIER pipeline.
Intended Use
- Primary: Classify retrieved case law by practice area and relevance during research
- Secondary: Categorize incoming case documents for knowledge base organization
- Integration: Powers the research ranking and precedent matching pipeline
Task Type
text-classification -- Multi-label classification across legal taxonomy
Base Model
nlpaueb/legal-bert-base-uncased -- Legal domain pre-training provides strong baseline for legal text understanding
Training Data
| Source | Records | Description |
|---|---|---|
| CourtListener Opinions | ~5M | Court opinions with practice area metadata |
| Caselaw Access Project | ~6.7M | Historical case law with topic classifications |
| FOLIO Legal Ontology | taxonomy | Legal practice area taxonomy (200+ categories) |
| Manual Annotations | ~100K | Expert-labeled cases for fine-grained categories |
Classification Categories (top-level)
- Practice areas: tort, contract, property, criminal, family, admin, constitutional, etc.
- Procedural posture: MTD, MSJ, trial, appeal, cert, habeas, etc.
- Outcome: plaintiff-win, defendant-win, mixed, remanded, reversed, etc.
- Jurisdiction: federal, FL state, MS state, other
- Relevance: on-point, persuasive, distinguishable, contrary
Benchmark Criteria (90%+ Target)
| Metric | Target | Description |
|---|---|---|
| Practice Area F1 | >= 91% | Multi-label F1 for practice area classification |
| Procedural Posture Accuracy | >= 93% | Correct procedural posture identification |
| Outcome Accuracy | >= 90% | Win/loss/mixed classification |
| Relevance Precision | >= 88% | Must not over-classify as "on-point" |
| Throughput | >= 100 docs/sec | Batch classification speed |
GLACIER Pipeline Integration
STAGE 2 (Research) --> case-classifier filters and categorizes research results
STAGE 3 (WDC #1) --> classified cases inform jurisdiction/posture validation
STAGE 5 (WDC #2) --> verify cited cases match the claimed practice area/posture
Training Configuration
- Epochs: 8
- Learning rate: 2e-5
- Batch size: 32
- Max sequence length: 512
- Loss: Binary cross-entropy (multi-label)
- Hardware: AWS SageMaker ml.g5.4xlarge
Limitations
- US-focused classification taxonomy; international law categories are sparse
- Very recent statutory changes may not be reflected in training data
- Edge cases between practice areas may show lower confidence
- Does not perform substantive legal analysis, only classification
Version History
| Version | Date | Notes |
|---|---|---|
| v0.1 | 2026-04-10 | Initial model card, repo created |
Model tree for Symio-ai/legal-case-classifier
Base model
nlpaueb/legal-bert-base-uncased