LOCUS-Substantive / README.md
jbarrow's picture
Upload LOCUS-Substantive weights, tokenizer, and model card
5ab976c verified
metadata
base_model: answerdotai/ModernBERT-base
library_name: transformers
pipeline_tag: text-classification
tags:
  - text-classification
  - legal
  - locus
  - modernbert
license: apache-2.0
datasets:
  - LocalLaws/LOCUS-v1.0

LocalLaws/LOCUS-Substantive

A ModernBERT classifier for the Substantive (binary) axis of the LOCUS (Local Ordinances Corpus, United States) dataset.

Fine-tuned from answerdotai/ModernBERT-base on LocalLaws/LOCUS-v1.0.

Labels

  • not_substantive
  • substantive

Training

Base model answerdotai/ModernBERT-base
Max length 1024
Classifier pooling mean
Train / val / test 79106 / 10447 / 10447

Evaluation

Metric binary-F1
Validation binary-F1 0.9402
Test binary-F1 0.9422
Test accuracy 0.9328
              precision    recall  f1-score   support

           0     0.9517    0.8898    0.9197      4519
           1     0.9200    0.9656    0.9422      5928

    accuracy                         0.9328     10447
   macro avg     0.9358    0.9277    0.9310     10447
weighted avg     0.9337    0.9328    0.9325     10447

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("LocalLaws/LOCUS-Substantive")
model = AutoModelForSequenceClassification.from_pretrained("LocalLaws/LOCUS-Substantive")
model.eval()

text = "No person shall keep any swine within the city limits."
enc = tok(text, return_tensors="pt", truncation=True, max_length=1024)
with torch.no_grad():
    logits = model(**enc).logits
pred = logits.argmax(-1).item()
print(model.config.id2label[pred])