Upload LOCUS-Substantive weights, tokenizer, and model card

5ab976c verified about 21 hours ago

1.81 kB

base_model: answerdotai/ModernBERT-base
library_name: transformers
pipeline_tag: text-classification
tags:
  - text-classification
  - legal
  - locus
  - modernbert
license: apache-2.0
datasets:
  - LocalLaws/LOCUS-v1.0

LocalLaws/LOCUS-Substantive

A ModernBERT classifier for the Substantive (binary) axis of the LOCUS (Local Ordinances Corpus, United States) dataset.

Fine-tuned from answerdotai/ModernBERT-base on LocalLaws/LOCUS-v1.0.

Labels

not_substantive
substantive

Training


Base model	`answerdotai/ModernBERT-base`
Max length	1024
Classifier pooling	`mean`
Train / val / test	79106 / 10447 / 10447

Evaluation


Metric	binary-F1
Validation binary-F1	0.9402
Test binary-F1	0.9422
Test accuracy	0.9328

              precision    recall  f1-score   support

           0     0.9517    0.8898    0.9197      4519
           1     0.9200    0.9656    0.9422      5928

    accuracy                         0.9328     10447
   macro avg     0.9358    0.9277    0.9310     10447
weighted avg     0.9337    0.9328    0.9325     10447

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("LocalLaws/LOCUS-Substantive")
model = AutoModelForSequenceClassification.from_pretrained("LocalLaws/LOCUS-Substantive")
model.eval()

text = "No person shall keep any swine within the city limits."
enc = tok(text, return_tensors="pt", truncation=True, max_length=1024)
with torch.no_grad():
    logits = model(**enc).logits
pred = logits.argmax(-1).item()
print(model.config.id2label[pred])