Text Classification
Transformers
Safetensors
English
modernbert
legal
glam
digital-humanities
jim-crow
north-carolina
legislation
Generated from Trainer
Eval Results (legacy)
text-embeddings-inference
Instructions to use davanstrien/dhd-demo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use davanstrien/dhd-demo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="davanstrien/dhd-demo")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("davanstrien/dhd-demo") model = AutoModelForSequenceClassification.from_pretrained("davanstrien/dhd-demo") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: answerdotai/ModernBERT-base | |
| datasets: | |
| - biglam/on_the_books | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| tags: | |
| - text-classification | |
| - legal | |
| - glam | |
| - digital-humanities | |
| - jim-crow | |
| - north-carolina | |
| - legislation | |
| - generated_from_trainer | |
| metrics: | |
| - f1 | |
| - accuracy | |
| - roc_auc | |
| model-index: | |
| - name: dhd-demo | |
| results: | |
| - task: | |
| type: text-classification | |
| name: Text Classification | |
| dataset: | |
| name: biglam/on_the_books | |
| type: biglam/on_the_books | |
| split: train (held-out 10%) | |
| metrics: | |
| - type: accuracy | |
| value: 0.9832 | |
| - type: f1 | |
| value: 0.9709 | |
| - type: precision | |
| value: 0.9615 | |
| - type: recall | |
| value: 0.9804 | |
| - type: f1_macro | |
| value: 0.9796 | |
| - type: roc_auc | |
| value: 0.9980 | |
| # dhd-demo: ModernBERT Jim Crow law classifier | |
| Fine-tuned [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on | |
| [`biglam/on_the_books`](https://huggingface.co/datasets/biglam/on_the_books) to classify North | |
| Carolina session-law sections (1866–1967) as Jim Crow laws or not. | |
| Built as a live demo for the *Digital Humanities & Discovery* webinar | |
| (2026-05-05) showing end-to-end fine-tuning via `hf jobs`. | |
| ## Labels | |
| - `0` = `no_jim_crow` | |
| - `1` = `jim_crow` | |
| ## Training data | |
| [`biglam/on_the_books`](https://huggingface.co/datasets/biglam/on_the_books) — 1,785 expert-labeled chapter/section pairs from NC session | |
| laws, 512 positive / 1,273 negative. Split 90/10 (stratified) for train/eval. | |
| Class imbalance handled with inverse-frequency cross-entropy weights. | |
| ## Training setup | |
| | | | | |
| |---|---| | |
| | Base model | `answerdotai/ModernBERT-base` | | |
| | Epochs | 4 | | |
| | Batch size | 16 | | |
| | Learning rate | 5e-5 | | |
| | Warmup steps | 50 | | |
| | Weight decay | 0.01 | | |
| | Max sequence length | 1024 | | |
| | Precision | bf16 | | |
| | Loss | weighted cross-entropy | | |
| | Seed | 42 | | |
| | Hardware | 1× NVIDIA L4 (24 GB) via `hf jobs` | | |
| | Train runtime | 223 s | | |
| ## Evaluation (held-out 10% split, n=179) | |
| | Metric | Value | | |
| |---|---| | |
| | Accuracy | 0.9832 | | |
| | F1 (positive class) | 0.9709 | | |
| | Precision | 0.9615 | | |
| | Recall | 0.9804 | | |
| | F1 (macro) | 0.9796 | | |
| | ROC-AUC | 0.9980 | | |
| ### Per-epoch results | |
| | Epoch | Train loss | Val loss | Accuracy | F1 | Precision | Recall | ROC-AUC | | |
| |------:|-----------:|---------:|---------:|----:|----------:|-------:|--------:| | |
| | 1 | 0.0856 | 0.1061 | 0.9553 | 0.9273 | 0.8644 | 1.0000 | 0.9960 | | |
| | 2 | 0.0353 | 0.0538 | 0.9777 | 0.9615 | 0.9434 | 0.9804 | 0.9989 | | |
| | 3 | 0.0015 | 0.1310 | 0.9777 | 0.9600 | 0.9796 | 0.9412 | 0.9980 | | |
| | 4 | 0.0019 | 0.0949 | **0.9832** | **0.9709** | 0.9615 | 0.9804 | 0.9980 | | |
| ## Usage | |
| ```python | |
| from transformers import pipeline | |
| clf = pipeline("text-classification", model="davanstrien/dhd-demo") | |
| clf("All schools for the white and colored races shall be kept separate.") | |
| ``` | |
| ## Limitations | |
| - Trained on **North Carolina** laws, 1866–1967. Will not transfer cleanly to | |
| other jurisdictions or modern legal language. | |
| - The training labels reflect what named expert sources / project staff | |
| flagged. The negative class is "not flagged," not "verified | |
| non-discriminatory." | |
| - OCR noise from period scans is present in training and will be present at | |
| inference time on similar corpora. | |
| - Eval set is small (n=179); treat the high metrics as encouraging but | |
| bounded by sample size. | |
| See the [dataset card](https://huggingface.co/datasets/biglam/on_the_books) for full | |
| context, including the *Algorithms of Resistance* framing of the original | |
| **On the Books** project at UNC Chapel Hill Libraries. | |
| ## Citation | |
| Please cite the original project: | |
| > On the Books: Jim Crow and Algorithms of Resistance. | |
| > University of North Carolina at Chapel Hill Libraries. | |
| > https://onthebooks.lib.unc.edu — DOI: https://doi.org/10.17615/5c4g-sd44 | |
| ## Framework versions | |
| - Transformers 5.7.0 | |
| - PyTorch 2.11.0+cu130 | |
| - Datasets 4.8.5 | |
| - Tokenizers 0.22.2 | |