YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Biblical Morphological Parser (mT5-small)

For God so loved the world that he gave his only begotten Son, that whoever believes in him should not perish but have eternal life. - John 3:16

What This Does

This model parses biblical Hebrew and Greek words into their morphological components: part of speech, stem, lemma, tense, person, gender, number, and English gloss.

Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("LoveJesus/biblical-parser-chirho")
model = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/biblical-parser-chirho")

# Parse a Hebrew word
input_text = 'parse [hebrew]: בָּרָא [GEN 1:1] context: בְּרֵאשִׁית אֱלֹהִים'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "class:verb | stem:qal | lemma:ברא | morph:... | person:3 | gender:m | number:s | gloss:he created"

# Parse a Greek word
input_text = 'parse [greek]: λόγος [JHN 1:1] context: ἐν ἀρχῇ ἦν'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Input Format

parse [{language}]: {word} [{verse_ref}] context: {surrounding_words}
  • {language}: hebrew or greek
  • {word}: The biblical word in original script
  • {verse_ref}: Book chapter:verse reference
  • {surrounding_words}: 2 words before and after for disambiguation

Output Format

Pipe-separated morphological tags:

class:{pos} | stem:{stem} | lemma:{lemma} | morph:{code} | person:{p} | gender:{g} | number:{n} | gloss:{english}

Training Data

  • Macula Hebrew (Clear-Bible): ~425K OT words with morphology and glosses
  • Macula Greek SBLGNT (Clear-Bible): ~138K NT words with morphology and glosses
  • Subsampled to ~200K words (100K per language), stratified by book

Model Details

Property Value
Base model google/mt5-small (300M params)
Architecture Encoder-decoder (Seq2Seq)
Languages Biblical Hebrew, Koine Greek
Training 5 epochs, lr=3e-4, batch=32
Hardware NVIDIA A100/H200 GPU

Limitations

  • Trained on Macula morphological annotations — may not match all scholarly traditions
  • Handles individual words, not full syntactic analysis
  • Performance may vary on words not well-represented in training data

Evaluation Results

Evaluated on a held-out test set of ~20K word-level parsing examples.

Overall Metrics

Metric Score
Exact Match (all tags correct) 0.525
Average Tag F1 (across all tags) 0.886

Per-Tag F1

Tag F1
class (POS) 0.963
number 0.966
POS 0.958
lemma 0.935
person 0.933
gender 0.928
type 0.900
morph 0.890
state 0.878
stem 0.859
gloss 0.539

Per-Language Exact Match

Language Exact Match
Hebrew 0.514
Greek 0.559

The gloss tag (English translation) is the hardest to predict exactly, pulling down the overall exact match rate. The model achieves strong F1 on structural/morphological tags (class, number, POS, person, gender all > 0.92).


Built with love for Jesus. Published by LoveJesus. Part of the bible.systems project.

Downloads last month
22
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train LoveJesus/biblical-parser-chirho

Space using LoveJesus/biblical-parser-chirho 1

Evaluation results