File size: 1,963 Bytes
3dc34e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
base_model: answerdotai/ModernBERT-base
library_name: transformers
pipeline_tag: text-classification
tags:
  - text-classification
  - legal
  - locus
  - modernbert
license: apache-2.0
datasets:
  - LocalLaws/LOCUS-v1.0
---

# LocalLaws/LOCUS-Topic

A ModernBERT classifier for the **Topic** axis of the LOCUS
(Local Ordinances Corpus, United States) dataset.

Fine-tuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on
[LocalLaws/LOCUS-v1.0](https://huggingface.co/datasets/LocalLaws/LOCUS-v1.0).

## Labels

- `Buildings`
- `Business`
- `Nuisance`
- `Other`
- `Zoning`

## Training

| | |
|---|---|
| Base model | `answerdotai/ModernBERT-base` |
| Max length | 1024 |
| Classifier pooling | `mean` |
| Train / val / test | 45183 / 5848 / 5928 |

## Evaluation

| | |
|---|---|
| Metric | macro-F1 |
| Validation macro-F1 | 0.8127 |
| Test macro-F1 | 0.8173 |
| Test accuracy | 0.8190 |

```
              precision    recall  f1-score   support

   Buildings     0.7438    0.8506    0.7936       877
    Business     0.8273    0.8381    0.8326       846
    Nuisance     0.7617    0.8419    0.7998       930
       Other     0.8916    0.7657    0.8239      2083
      Zoning     0.8169    0.8574    0.8367      1192

    accuracy                         0.8190      5928
   macro avg     0.8083    0.8307    0.8173      5928
weighted avg     0.8251    0.8190    0.8194      5928

```

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("LocalLaws/LOCUS-Topic")
model = AutoModelForSequenceClassification.from_pretrained("LocalLaws/LOCUS-Topic")
model.eval()

text = "No person shall keep any swine within the city limits."
enc = tok(text, return_tensors="pt", truncation=True, max_length=1024)
with torch.no_grad():
    logits = model(**enc).logits
pred = logits.argmax(-1).item()
print(model.config.id2label[pred])
```