Jim Crow Law Classifier (ModernBERT-base)
A text classification model fine-tuned on biglam/on_the_books to identify Jim Crow laws in historical US legislative text.
Model Description
This model classifies sections of US state legislation as either Jim Crow laws (discriminatory laws targeting racial minorities) or non-Jim Crow laws. It was fine-tuned from answerdotai/ModernBERT-base, which supports up to 8,192 tokens of context.
Performance
Evaluated on a stratified 15% held-out test set (268 samples):
| Metric | Score |
|---|---|
| F1 | 0.9487 |
| Accuracy | 0.9701 |
| Precision | 0.9367 |
| Recall | 0.9610 |
Training Details
- Base model: answerdotai/ModernBERT-base (149M parameters)
- Dataset: biglam/on_the_books (1,785 samples total; 1,517 train / 268 test)
- Max sequence length: 1024 tokens
- Epochs: 5 (best checkpoint at epoch 5 by F1)
- Batch size: 16
- Learning rate: 2e-5 with linear decay
- Warmup: 6% of training steps
- Weight decay: 0.01
- Hardware: NVIDIA T4 GPU
- Training time: ~8 minutes
Usage
from transformers import pipeline
classifier = pipeline("text-classification", model="davanstrien/jim-crow-laws-ml-agent")
text = "The Commission shall provide separate sleeping quarters and separate eating space for the different races."
result = classifier(text)
print(result)
# [{'label': 'jim_crow', 'score': 0.99...}]
Dataset
The On the Books dataset contains 1,785 sections of North Carolina state legislation from the Jim Crow era, annotated by historians as either Jim Crow laws or non-Jim Crow laws. The dataset is imbalanced: 71% non-Jim Crow, 29% Jim Crow.
Labels
no_jim_crow(0): Non-discriminatory legislationjim_crow(1): Jim Crow law (racially discriminatory legislation)
Limitations
- Trained only on North Carolina legislation; may not generalize to other states
- Historical language patterns may not transfer to modern legal text
- The model may be biased toward the specific annotation criteria used in the dataset
- Downloads last month
- 68
Model tree for davanstrien/jim-crow-laws-ml-agent
Base model
answerdotai/ModernBERT-baseDataset used to train davanstrien/jim-crow-laws-ml-agent
Evaluation results
- F1 on biglam/on_the_bookstest set self-reported0.949
- Accuracy on biglam/on_the_bookstest set self-reported0.970
- Precision on biglam/on_the_bookstest set self-reported0.937
- Recall on biglam/on_the_bookstest set self-reported0.961