YAML Metadata
Warning:
The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
Biblical Morphological Parser (mT5-small)
For God so loved the world that he gave his only begotten Son, that whoever believes in him should not perish but have eternal life. - John 3:16
What This Does
This model parses biblical Hebrew and Greek words into their morphological components: part of speech, stem, lemma, tense, person, gender, number, and English gloss.
Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("LoveJesus/biblical-parser-chirho")
model = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/biblical-parser-chirho")
# Parse a Hebrew word
input_text = 'parse [hebrew]: בָּרָא [GEN 1:1] context: בְּרֵאשִׁית אֱלֹהִים'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "class:verb | stem:qal | lemma:ברא | morph:... | person:3 | gender:m | number:s | gloss:he created"
# Parse a Greek word
input_text = 'parse [greek]: λόγος [JHN 1:1] context: ἐν ἀρχῇ ἦν'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Input Format
parse [{language}]: {word} [{verse_ref}] context: {surrounding_words}
{language}:hebreworgreek{word}: The biblical word in original script{verse_ref}: Book chapter:verse reference{surrounding_words}: 2 words before and after for disambiguation
Output Format
Pipe-separated morphological tags:
class:{pos} | stem:{stem} | lemma:{lemma} | morph:{code} | person:{p} | gender:{g} | number:{n} | gloss:{english}
Training Data
- Macula Hebrew (Clear-Bible): ~425K OT words with morphology and glosses
- Macula Greek SBLGNT (Clear-Bible): ~138K NT words with morphology and glosses
- Subsampled to ~200K words (100K per language), stratified by book
Model Details
| Property | Value |
|---|---|
| Base model | google/mt5-small (300M params) |
| Architecture | Encoder-decoder (Seq2Seq) |
| Languages | Biblical Hebrew, Koine Greek |
| Training | 5 epochs, lr=3e-4, batch=32 |
| Hardware | NVIDIA A100/H200 GPU |
Limitations
- Trained on Macula morphological annotations — may not match all scholarly traditions
- Handles individual words, not full syntactic analysis
- Performance may vary on words not well-represented in training data
Evaluation Results
Evaluated on a held-out test set of ~20K word-level parsing examples.
Overall Metrics
| Metric | Score |
|---|---|
| Exact Match (all tags correct) | 0.525 |
| Average Tag F1 (across all tags) | 0.886 |
Per-Tag F1
| Tag | F1 |
|---|---|
| class (POS) | 0.963 |
| number | 0.966 |
| POS | 0.958 |
| lemma | 0.935 |
| person | 0.933 |
| gender | 0.928 |
| type | 0.900 |
| morph | 0.890 |
| state | 0.878 |
| stem | 0.859 |
| gloss | 0.539 |
Per-Language Exact Match
| Language | Exact Match |
|---|---|
| Hebrew | 0.514 |
| Greek | 0.559 |
The
glosstag (English translation) is the hardest to predict exactly, pulling down the overall exact match rate. The model achieves strong F1 on structural/morphological tags (class, number, POS, person, gender all > 0.92).
Built with love for Jesus. Published by LoveJesus. Part of the bible.systems project.
- Downloads last month
- 22
Dataset used to train LoveJesus/biblical-parser-chirho
Space using LoveJesus/biblical-parser-chirho 1
Evaluation results
- Exact Match on Biblical Tutor Dataset (Chirho)self-reported0.525
- Average Tag F1 on Biblical Tutor Dataset (Chirho)self-reported0.886