--- base_model: answerdotai/ModernBERT-base library_name: transformers pipeline_tag: text-classification tags: - text-classification - legal - locus - modernbert license: apache-2.0 datasets: - LocalLaws/LOCUS-v1.0 --- # LocalLaws/LOCUS-Topic A ModernBERT classifier for the **Topic** axis of the LOCUS (Local Ordinances Corpus, United States) dataset. Fine-tuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on [LocalLaws/LOCUS-v1.0](https://huggingface.co/datasets/LocalLaws/LOCUS-v1.0). ## Labels - `Buildings` - `Business` - `Nuisance` - `Other` - `Zoning` ## Training | | | |---|---| | Base model | `answerdotai/ModernBERT-base` | | Max length | 1024 | | Classifier pooling | `mean` | | Train / val / test | 45183 / 5848 / 5928 | ## Evaluation | | | |---|---| | Metric | macro-F1 | | Validation macro-F1 | 0.8127 | | Test macro-F1 | 0.8173 | | Test accuracy | 0.8190 | ``` precision recall f1-score support Buildings 0.7438 0.8506 0.7936 877 Business 0.8273 0.8381 0.8326 846 Nuisance 0.7617 0.8419 0.7998 930 Other 0.8916 0.7657 0.8239 2083 Zoning 0.8169 0.8574 0.8367 1192 accuracy 0.8190 5928 macro avg 0.8083 0.8307 0.8173 5928 weighted avg 0.8251 0.8190 0.8194 5928 ``` ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tok = AutoTokenizer.from_pretrained("LocalLaws/LOCUS-Topic") model = AutoModelForSequenceClassification.from_pretrained("LocalLaws/LOCUS-Topic") model.eval() text = "No person shall keep any swine within the city limits." enc = tok(text, return_tensors="pt", truncation=True, max_length=1024) with torch.no_grad(): logits = model(**enc).logits pred = logits.argmax(-1).item() print(model.config.id2label[pred]) ```