Upload LOCUS-Substantive weights, tokenizer, and model card

5ab976c verified 1 day ago

1.81 kB

	---
	base_model: answerdotai/ModernBERT-base
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- text-classification
	- legal
	- locus
	- modernbert
	license: apache-2.0
	datasets:
	- LocalLaws/LOCUS-v1.0
	---

	# LocalLaws/LOCUS-Substantive

	A ModernBERT classifier for the Substantive (binary) axis of the LOCUS
	(Local Ordinances Corpus, United States) dataset.

	Fine-tuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on
	[LocalLaws/LOCUS-v1.0](https://huggingface.co/datasets/LocalLaws/LOCUS-v1.0).

	## Labels

	- `not_substantive`
	- `substantive`

	## Training

	\| \| \|
	\|---\|---\|
	\| Base model \| `answerdotai/ModernBERT-base` \|
	\| Max length \| 1024 \|
	\| Classifier pooling \| `mean` \|
	\| Train / val / test \| 79106 / 10447 / 10447 \|

	## Evaluation

	\| \| \|
	\|---\|---\|
	\| Metric \| binary-F1 \|
	\| Validation binary-F1 \| 0.9402 \|
	\| Test binary-F1 \| 0.9422 \|
	\| Test accuracy \| 0.9328 \|

	```
	precision recall f1-score support

	0 0.9517 0.8898 0.9197 4519
	1 0.9200 0.9656 0.9422 5928

	accuracy 0.9328 10447
	macro avg 0.9358 0.9277 0.9310 10447
	weighted avg 0.9337 0.9328 0.9325 10447

	```

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	tok = AutoTokenizer.from_pretrained("LocalLaws/LOCUS-Substantive")
	model = AutoModelForSequenceClassification.from_pretrained("LocalLaws/LOCUS-Substantive")
	model.eval()

	text = "No person shall keep any swine within the city limits."
	enc = tok(text, return_tensors="pt", truncation=True, max_length=1024)
	with torch.no_grad():
	logits = model(**enc).logits
	pred = logits.argmax(-1).item()
	print(model.config.id2label[pred])
	```