Jim Crow Law Classifier (ModernBERT-base)

A text classification model fine-tuned on biglam/on_the_books to identify Jim Crow laws in historical US legislative text.

Model Description

This model classifies sections of US state legislation as either Jim Crow laws (discriminatory laws targeting racial minorities) or non-Jim Crow laws. It was fine-tuned from answerdotai/ModernBERT-base, which supports up to 8,192 tokens of context.

Performance

Evaluated on a stratified 15% held-out test set (268 samples):

Metric Score
F1 0.9487
Accuracy 0.9701
Precision 0.9367
Recall 0.9610

Training Details

  • Base model: answerdotai/ModernBERT-base (149M parameters)
  • Dataset: biglam/on_the_books (1,785 samples total; 1,517 train / 268 test)
  • Max sequence length: 1024 tokens
  • Epochs: 5 (best checkpoint at epoch 5 by F1)
  • Batch size: 16
  • Learning rate: 2e-5 with linear decay
  • Warmup: 6% of training steps
  • Weight decay: 0.01
  • Hardware: NVIDIA T4 GPU
  • Training time: ~8 minutes

Usage

from transformers import pipeline

classifier = pipeline("text-classification", model="davanstrien/jim-crow-laws-ml-agent")

text = "The Commission shall provide separate sleeping quarters and separate eating space for the different races."
result = classifier(text)
print(result)
# [{'label': 'jim_crow', 'score': 0.99...}]

Dataset

The On the Books dataset contains 1,785 sections of North Carolina state legislation from the Jim Crow era, annotated by historians as either Jim Crow laws or non-Jim Crow laws. The dataset is imbalanced: 71% non-Jim Crow, 29% Jim Crow.

Labels

  • no_jim_crow (0): Non-discriminatory legislation
  • jim_crow (1): Jim Crow law (racially discriminatory legislation)

Limitations

  • Trained only on North Carolina legislation; may not generalize to other states
  • Historical language patterns may not transfer to modern legal text
  • The model may be biased toward the specific annotation criteria used in the dataset
Downloads last month
68
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for davanstrien/jim-crow-laws-ml-agent

Finetuned
(1245)
this model

Dataset used to train davanstrien/jim-crow-laws-ml-agent

Evaluation results