Biblical Entity Recognizer (Chirho) - Model 6
A DistilBERT-based Named Entity Recognition model fine-tuned on 200K+ annotated tokens from the King James Version (KJV) Bible to recognize six types of biblical entities using BIO tagging.
Repo: LoveJesus/biblical-entity-recognizer-chirho
Model Overview
| Property | Value |
|---|---|
| Base Model | distilbert-base-uncased |
| Parameters | ~66M |
| Task | Token Classification (NER) |
| Tagging Scheme | BIO (Beginning-Inside-Outside) |
| Number of Labels | 13 |
| Entity Types | 6 |
| Max Sequence Length | 128 tokens |
| Framework | HuggingFace Transformers |
| License | MIT |
Entity Types
| Entity Type | BIO Tags | Description | Examples |
|---|---|---|---|
| PERSON | B-PERSON, I-PERSON | Biblical persons and figures | Moses, David, Paul, Mary |
| DIVINE | B-DIVINE, I-DIVINE | Names and titles of God | God, LORD, Jesus Christ, Holy Spirit |
| PEOPLE_GROUP | B-PEOPLE_GROUP, I-PEOPLE_GROUP | Nations, tribes, and groups | Israelites, Philistines, Pharisees, Corinthians |
| PLACE | B-PLACE, I-PLACE | Geographical locations | Jerusalem, Bethlehem, Egypt, Gethsemane |
| EVENT | B-EVENT, I-EVENT | Biblical events and feasts | Passover, Pentecost, Sabbath |
| ARTIFACT | B-ARTIFACT, I-ARTIFACT | Sacred objects and instruments | Urim, Thummim |
Full Label Set (13 Labels)
O, B-PERSON, I-PERSON, B-DIVINE, I-DIVINE, B-PEOPLE_GROUP, I-PEOPLE_GROUP,
B-PLACE, I-PLACE, B-EVENT, I-EVENT, B-ARTIFACT, I-ARTIFACT
Evaluation Results
| Metric | Score |
|---|---|
| F1 | 0.9810 |
| Precision | 97.78% |
| Recall | 98.43% |
| Best Epoch | 4 |
Entity-level metrics are computed using the seqeval library, which evaluates complete entity spans rather than individual token labels, providing a rigorous assessment of recognition quality.
Per-Entity F1
| Entity Type | F1 |
|---|---|
| DIVINE | 0.9993 |
| PLACE | 0.9742 |
| PERSON | 0.9617 |
| PEOPLE_GROUP | 0.9512 |
| Overall (macro) | 0.9786 |
Per-entity metrics evaluated using
seqevalon the held-out test set. The model excels at recognizing divine names (F1=0.9993) and performs strongly across all entity types.
Usage
Quick Start: Pipeline API
# For God so loved the world that he gave his only begotten Son,
# that whoever believes in him should not perish but have eternal life. - John 3:16
from transformers import pipeline
ner_pipeline_chirho = pipeline(
"token-classification",
model="LoveJesus/biblical-entity-recognizer-chirho",
aggregation_strategy="simple",
)
text_chirho = "And Moses said unto the LORD in the land of Egypt"
entities_chirho = ner_pipeline_chirho(text_chirho)
for entity_chirho in entities_chirho:
print(f"{entity_chirho['word']}: {entity_chirho['entity_group']} ({entity_chirho['score']:.3f})")
# Moses: PERSON (0.998)
# LORD: DIVINE (0.999)
# Egypt: PLACE (0.997)
Manual Inference
# For God so loved the world that he gave his only begotten Son,
# that whoever believes in him should not perish but have eternal life. - John 3:16
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/biblical-entity-recognizer-chirho")
model_chirho = AutoModelForTokenClassification.from_pretrained("LoveJesus/biblical-entity-recognizer-chirho")
text_chirho = "Then Jesus went with them unto a place called Gethsemane."
inputs_chirho = tokenizer_chirho(text_chirho, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs_chirho = model_chirho(**inputs_chirho)
predictions_chirho = torch.argmax(outputs_chirho.logits, dim=2)
tokens_chirho = tokenizer_chirho.convert_ids_to_tokens(inputs_chirho["input_ids"][0])
pred_ids_chirho = predictions_chirho[0].tolist()
id2label_chirho = model_chirho.config.id2label
for token_chirho, pred_id_chirho in zip(tokens_chirho, pred_ids_chirho):
label_chirho = id2label_chirho[pred_id_chirho]
if label_chirho != "O" and token_chirho not in ["[CLS]", "[SEP]", "[PAD]"]:
print(f" {token_chirho}: {label_chirho}")
# jesus: B-PERSON
# gethsemane: B-PLACE
Batch Processing
# For God so loved the world that he gave his only begotten Son,
# that whoever believes in him should not perish but have eternal life. - John 3:16
from transformers import pipeline
ner_pipeline_chirho = pipeline(
"token-classification",
model="LoveJesus/biblical-entity-recognizer-chirho",
aggregation_strategy="simple",
)
verses_chirho = [
"Now when Jesus was born in Bethlehem of Judaea in the days of Herod the king.",
"And Solomon built the house of the LORD in Jerusalem.",
"The LORD is my shepherd; I shall not want.",
"And Paul said unto the Corinthians, Grace be unto you from God our Father.",
]
for verse_chirho in verses_chirho:
entities_chirho = ner_pipeline_chirho(verse_chirho)
print(f"\n{verse_chirho}")
for entity_chirho in entities_chirho:
print(f" {entity_chirho['word']}: {entity_chirho['entity_group']} ({entity_chirho['score']:.3f})")
Training Details
Dataset
| Property | Value |
|---|---|
| Source Text | King James Version (KJV) Bible |
| Text Source | ScrollMapper bible_databases (Public Domain) |
| Entity Source | STEPBible TIPNR (CC BY) + curated divine names list |
| Annotated Tokens | 200,000+ |
| Annotation Scheme | BIO (Beginning-Inside-Outside) |
| Format | JSONL with tokens_chirho, ner_tags_chirho, reference_chirho fields |
| Split Strategy | 80/10/10 by book (not verse) to prevent data leakage |
| Dataset Repo | LoveJesus/biblical-ner-dataset-chirho |
Hyperparameters
| Parameter | Value |
|---|---|
| Learning Rate | 5.0e-5 |
| Batch Size | 32 |
| Epochs | 5 (best at epoch 4) |
| Weight Decay | 0.01 |
| Warmup Ratio | 0.1 |
| Max Sequence Length | 128 |
| Seed | 42 |
| Optimizer | AdamW (default Trainer) |
| Early Stopping Patience | 3 epochs |
| Metric for Best Model | F1 (entity-level) |
Subword Alignment
When DistilBERT's WordPiece tokenizer splits a word into multiple subword tokens, only the first subtoken receives the original BIO label. Subsequent subtokens of the same word receive -100 (ignored in loss computation). This prevents the model from being penalized on tokens it cannot meaningfully label.
Hardware Compatibility
The training script supports:
- Apple MPS (Metal Performance Shaders) for Apple Silicon Macs
- CUDA for NVIDIA GPUs
- CPU fallback
Architecture
DistilBERT-base-uncased (66M parameters)
|
v
6-layer Transformer Encoder
|
v
Token Classification Head (Linear: 768 -> 13)
|
v
BIO Label Predictions per Token
DistilBERT is a distilled version of BERT that retains 97% of BERT's language understanding while being 60% faster and 40% smaller. It uses 6 transformer layers (vs. BERT's 12), a hidden size of 768, and 12 attention heads.
Limitations
- Domain Specificity: Trained exclusively on KJV Bible text; may not generalize well to modern English biblical translations or extra-biblical religious texts
- Archaic Language: Optimized for Early Modern English (KJV) vocabulary and syntax ("thou", "unto", "begat")
- Entity Coverage: The six entity categories may not cover all possible biblical entity types (e.g., no separate category for books of the Bible, religious practices, or time periods)
- Base Model Vocabulary: DistilBERT was pre-trained on modern English; some rare biblical proper nouns may be heavily subword-tokenized, potentially reducing recognition accuracy for uncommon names
- Assistive Tool: This model is intended as an assistive tool for Bible study and research, not as a replacement for careful scriptural reading
Intended Use
- Bible study applications requiring automatic entity highlighting
- Biblical text analysis and digital humanities research
- Building knowledge graphs of biblical persons, places, and events
- Enhancing Bible search engines with entity-aware queries
- Educational tools for learning biblical geography, persons, and events
Citation
@misc{lovejesus2026biblicalentityrecognizer,
title={Biblical Entity Recognizer: DistilBERT NER for KJV Bible Text},
author={LoveJesus},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/LoveJesus/biblical-entity-recognizer-chirho}
}
License
MIT
For God so loved the world that he gave his only begotten Son, that whoever believes in him should not perish but have eternal life. - John 3:16
- Downloads last month
- 24
Dataset used to train LoveJesus/biblical-entity-recognizer-chirho
Space using LoveJesus/biblical-entity-recognizer-chirho 1
Evaluation results
- F1 on Biblical NER Dataset (Chirho)self-reported0.981
- Precision on Biblical NER Dataset (Chirho)self-reported0.978
- Recall on Biblical NER Dataset (Chirho)self-reported0.984