YAML Metadata
Warning:
The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
Biblical Interlinear Glosser (mT5-small)
For God so loved the world that he gave his only begotten Son, that whoever believes in him should not perish but have eternal life. - John 3:16
What This Does
This model produces word-by-word English glosses for biblical Hebrew and Greek verses, creating an interlinear translation.
Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("LoveJesus/biblical-glosser-chirho")
model = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/biblical-glosser-chirho")
# Gloss Genesis 1:1
input_text = 'gloss [hebrew]: בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ [GEN 1:1]'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "In-beginning | created | God | [direct object marker] | the-heavens | and | the-earth"
# Gloss John 1:1
input_text = 'gloss [greek]: Ἐν ἀρχῇ ἦν ὁ λόγος [JHN 1:1]'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "In | [the] beginning | was | the | Word"
Input Format
gloss [{language}]: {verse_text_in_original_script} [{verse_ref}]
Output Format
Word-by-word English glosses separated by |:
word1_gloss | word2_gloss | word3_gloss | ...
Training Data
- Macula Hebrew (Clear-Bible): ~23K OT verses with word-level glosses
- Macula Greek SBLGNT (Clear-Bible): ~8K NT verses with word-level glosses
- Total: ~31K verse-level glossing examples
Model Details
| Property | Value |
|---|---|
| Base model | google/mt5-small (300M params) |
| Architecture | Encoder-decoder (Seq2Seq) |
| Languages | Biblical Hebrew, Koine Greek |
| Training | 8 epochs, lr=3e-4, batch=16 |
| Hardware | NVIDIA A100/H200 GPU |
Limitations
- Glosses are word-level, not fluent English translations
- Based on Macula glossing conventions — may differ from other interlinear traditions
- Long verses (>30 words) may be truncated due to sequence length limits
Evaluation Results
Evaluated on a held-out test set of glossing examples.
| Metric | Score |
|---|---|
| BLEU | 22.06 |
| Word Accuracy | 0.20 |
BLEU measures n-gram overlap between predicted and reference glosses. Word accuracy measures exact word-level match rate. Interlinear glossing is challenging because many Hebrew/Greek words have multiple valid English glosses, so these metrics represent a lower bound on actual quality.
Built with love for Jesus. Published by LoveJesus. Part of the bible.systems project.
- Downloads last month
- 21
Dataset used to train LoveJesus/biblical-glosser-chirho
Evaluation results
- BLEU on Biblical Tutor Dataset (Chirho)self-reported22.060
- Word Accuracy on Biblical Tutor Dataset (Chirho)self-reported0.200