Text Classification
Transformers
Safetensors
English
modernbert
legal
glam
digital-humanities
jim-crow
north-carolina
legislation
Generated from Trainer
Eval Results (legacy)
text-embeddings-inference
Instructions to use davanstrien/dhd-demo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use davanstrien/dhd-demo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="davanstrien/dhd-demo")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("davanstrien/dhd-demo") model = AutoModelForSequenceClassification.from_pretrained("davanstrien/dhd-demo") - Notebooks
- Google Colab
- Kaggle
File size: 3,946 Bytes
ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d 517f8c6 ab7544d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | ---
license: apache-2.0
base_model: answerdotai/ModernBERT-base
datasets:
- biglam/on_the_books
language:
- en
library_name: transformers
pipeline_tag: text-classification
tags:
- text-classification
- legal
- glam
- digital-humanities
- jim-crow
- north-carolina
- legislation
- generated_from_trainer
metrics:
- f1
- accuracy
- roc_auc
model-index:
- name: dhd-demo
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: biglam/on_the_books
type: biglam/on_the_books
split: train (held-out 10%)
metrics:
- type: accuracy
value: 0.9832
- type: f1
value: 0.9709
- type: precision
value: 0.9615
- type: recall
value: 0.9804
- type: f1_macro
value: 0.9796
- type: roc_auc
value: 0.9980
---
# dhd-demo: ModernBERT Jim Crow law classifier
Fine-tuned [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on
[`biglam/on_the_books`](https://huggingface.co/datasets/biglam/on_the_books) to classify North
Carolina session-law sections (1866–1967) as Jim Crow laws or not.
Built as a live demo for the *Digital Humanities & Discovery* webinar
(2026-05-05) showing end-to-end fine-tuning via `hf jobs`.
## Labels
- `0` = `no_jim_crow`
- `1` = `jim_crow`
## Training data
[`biglam/on_the_books`](https://huggingface.co/datasets/biglam/on_the_books) — 1,785 expert-labeled chapter/section pairs from NC session
laws, 512 positive / 1,273 negative. Split 90/10 (stratified) for train/eval.
Class imbalance handled with inverse-frequency cross-entropy weights.
## Training setup
| | |
|---|---|
| Base model | `answerdotai/ModernBERT-base` |
| Epochs | 4 |
| Batch size | 16 |
| Learning rate | 5e-5 |
| Warmup steps | 50 |
| Weight decay | 0.01 |
| Max sequence length | 1024 |
| Precision | bf16 |
| Loss | weighted cross-entropy |
| Seed | 42 |
| Hardware | 1× NVIDIA L4 (24 GB) via `hf jobs` |
| Train runtime | 223 s |
## Evaluation (held-out 10% split, n=179)
| Metric | Value |
|---|---|
| Accuracy | 0.9832 |
| F1 (positive class) | 0.9709 |
| Precision | 0.9615 |
| Recall | 0.9804 |
| F1 (macro) | 0.9796 |
| ROC-AUC | 0.9980 |
### Per-epoch results
| Epoch | Train loss | Val loss | Accuracy | F1 | Precision | Recall | ROC-AUC |
|------:|-----------:|---------:|---------:|----:|----------:|-------:|--------:|
| 1 | 0.0856 | 0.1061 | 0.9553 | 0.9273 | 0.8644 | 1.0000 | 0.9960 |
| 2 | 0.0353 | 0.0538 | 0.9777 | 0.9615 | 0.9434 | 0.9804 | 0.9989 |
| 3 | 0.0015 | 0.1310 | 0.9777 | 0.9600 | 0.9796 | 0.9412 | 0.9980 |
| 4 | 0.0019 | 0.0949 | **0.9832** | **0.9709** | 0.9615 | 0.9804 | 0.9980 |
## Usage
```python
from transformers import pipeline
clf = pipeline("text-classification", model="davanstrien/dhd-demo")
clf("All schools for the white and colored races shall be kept separate.")
```
## Limitations
- Trained on **North Carolina** laws, 1866–1967. Will not transfer cleanly to
other jurisdictions or modern legal language.
- The training labels reflect what named expert sources / project staff
flagged. The negative class is "not flagged," not "verified
non-discriminatory."
- OCR noise from period scans is present in training and will be present at
inference time on similar corpora.
- Eval set is small (n=179); treat the high metrics as encouraging but
bounded by sample size.
See the [dataset card](https://huggingface.co/datasets/biglam/on_the_books) for full
context, including the *Algorithms of Resistance* framing of the original
**On the Books** project at UNC Chapel Hill Libraries.
## Citation
Please cite the original project:
> On the Books: Jim Crow and Algorithms of Resistance.
> University of North Carolina at Chapel Hill Libraries.
> https://onthebooks.lib.unc.edu — DOI: https://doi.org/10.17615/5c4g-sd44
## Framework versions
- Transformers 5.7.0
- PyTorch 2.11.0+cu130
- Datasets 4.8.5
- Tokenizers 0.22.2
|