YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Biblical Interlinear Glosser (mT5-small)

For God so loved the world that he gave his only begotten Son, that whoever believes in him should not perish but have eternal life. - John 3:16

What This Does

This model produces word-by-word English glosses for biblical Hebrew and Greek verses, creating an interlinear translation.

Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("LoveJesus/biblical-glosser-chirho")
model = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/biblical-glosser-chirho")

# Gloss Genesis 1:1
input_text = 'gloss [hebrew]: בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ [GEN 1:1]'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "In-beginning | created | God | [direct object marker] | the-heavens | and | the-earth"

# Gloss John 1:1
input_text = 'gloss [greek]: Ἐν ἀρχῇ ἦν ὁ λόγος [JHN 1:1]'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "In | [the] beginning | was | the | Word"

Input Format

gloss [{language}]: {verse_text_in_original_script} [{verse_ref}]

Output Format

Word-by-word English glosses separated by |:

word1_gloss | word2_gloss | word3_gloss | ...

Training Data

  • Macula Hebrew (Clear-Bible): ~23K OT verses with word-level glosses
  • Macula Greek SBLGNT (Clear-Bible): ~8K NT verses with word-level glosses
  • Total: ~31K verse-level glossing examples

Model Details

Property Value
Base model google/mt5-small (300M params)
Architecture Encoder-decoder (Seq2Seq)
Languages Biblical Hebrew, Koine Greek
Training 8 epochs, lr=3e-4, batch=16
Hardware NVIDIA A100/H200 GPU

Limitations

  • Glosses are word-level, not fluent English translations
  • Based on Macula glossing conventions — may differ from other interlinear traditions
  • Long verses (>30 words) may be truncated due to sequence length limits

Evaluation Results

Evaluated on a held-out test set of glossing examples.

Metric Score
BLEU 22.06
Word Accuracy 0.20

BLEU measures n-gram overlap between predicted and reference glosses. Word accuracy measures exact word-level match rate. Interlinear glossing is challenging because many Hebrew/Greek words have multiple valid English glosses, so these metrics represent a lower bound on actual quality.


Built with love for Jesus. Published by LoveJesus. Part of the bible.systems project.

Downloads last month
21
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train LoveJesus/biblical-glosser-chirho

Evaluation results