File size: 1,814 Bytes
5ab976c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
base_model: answerdotai/ModernBERT-base
library_name: transformers
pipeline_tag: text-classification
tags:
  - text-classification
  - legal
  - locus
  - modernbert
license: apache-2.0
datasets:
  - LocalLaws/LOCUS-v1.0
---

# LocalLaws/LOCUS-Substantive

A ModernBERT classifier for the **Substantive (binary)** axis of the LOCUS
(Local Ordinances Corpus, United States) dataset.

Fine-tuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on
[LocalLaws/LOCUS-v1.0](https://huggingface.co/datasets/LocalLaws/LOCUS-v1.0).

## Labels

- `not_substantive`
- `substantive`

## Training

| | |
|---|---|
| Base model | `answerdotai/ModernBERT-base` |
| Max length | 1024 |
| Classifier pooling | `mean` |
| Train / val / test | 79106 / 10447 / 10447 |

## Evaluation

| | |
|---|---|
| Metric | binary-F1 |
| Validation binary-F1 | 0.9402 |
| Test binary-F1 | 0.9422 |
| Test accuracy | 0.9328 |

```
              precision    recall  f1-score   support

           0     0.9517    0.8898    0.9197      4519
           1     0.9200    0.9656    0.9422      5928

    accuracy                         0.9328     10447
   macro avg     0.9358    0.9277    0.9310     10447
weighted avg     0.9337    0.9328    0.9325     10447

```

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("LocalLaws/LOCUS-Substantive")
model = AutoModelForSequenceClassification.from_pretrained("LocalLaws/LOCUS-Substantive")
model.eval()

text = "No person shall keep any swine within the city limits."
enc = tok(text, return_tensors="pt", truncation=True, max_length=1024)
with torch.no_grad():
    logits = model(**enc).logits
pred = logits.argmax(-1).item()
print(model.config.id2label[pred])
```