File size: 4,222 Bytes

---
language:
  - he
  - el
license: mit
tags:
  - biblical-hebrew
  - biblical-greek
  - morphology
  - parsing
  - mt5
  - seq2seq
datasets:
  - LoveJesus/biblical-tutor-dataset-chirho
pipeline_tag: text2text-generation
model-index:
  - name: biblical-parser-chirho
    results:
      - task:
          type: text2text-generation
          name: Morphological Parsing
        dataset:
          type: LoveJesus/biblical-tutor-dataset-chirho
          name: Biblical Tutor Dataset (Chirho)
        metrics:
          - type: exact_match
            value: 0.525
            name: Exact Match
          - type: f1
            value: 0.886
            name: Average Tag F1
---

# Biblical Morphological Parser (mT5-small)

*For God so loved the world that he gave his only begotten Son, that whoever believes in him should not perish but have eternal life. - John 3:16*

## What This Does

This model parses biblical Hebrew and Greek words into their morphological components: part of speech, stem, lemma, tense, person, gender, number, and English gloss.

## Usage

```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("LoveJesus/biblical-parser-chirho")
model = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/biblical-parser-chirho")

# Parse a Hebrew word
input_text = 'parse [hebrew]: בָּרָא [GEN 1:1] context: בְּרֵאשִׁית אֱלֹהִים'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Expected: "class:verb | stem:qal | lemma:ברא | morph:... | person:3 | gender:m | number:s | gloss:he created"

# Parse a Greek word
input_text = 'parse [greek]: λόγος [JHN 1:1] context: ἐν ἀρχῇ ἦν'
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Input Format

```
parse [{language}]: {word} [{verse_ref}] context: {surrounding_words}
```

- `{language}`: `hebrew` or `greek`
- `{word}`: The biblical word in original script
- `{verse_ref}`: Book chapter:verse reference
- `{surrounding_words}`: 2 words before and after for disambiguation

## Output Format

Pipe-separated morphological tags:
```
class:{pos} | stem:{stem} | lemma:{lemma} | morph:{code} | person:{p} | gender:{g} | number:{n} | gloss:{english}
```

## Training Data

- **Macula Hebrew** (Clear-Bible): ~425K OT words with morphology and glosses
- **Macula Greek SBLGNT** (Clear-Bible): ~138K NT words with morphology and glosses
- Subsampled to ~200K words (100K per language), stratified by book

## Model Details

| Property | Value |
|----------|-------|
| Base model | google/mt5-small (300M params) |
| Architecture | Encoder-decoder (Seq2Seq) |
| Languages | Biblical Hebrew, Koine Greek |
| Training | 5 epochs, lr=3e-4, batch=32 |
| Hardware | NVIDIA A100/H200 GPU |

## Limitations

- Trained on Macula morphological annotations — may not match all scholarly traditions
- Handles individual words, not full syntactic analysis
- Performance may vary on words not well-represented in training data

## Evaluation Results

Evaluated on a held-out test set of ~20K word-level parsing examples.

### Overall Metrics

| Metric | Score |
|--------|-------|
| **Exact Match** (all tags correct) | **0.525** |
| **Average Tag F1** (across all tags) | **0.886** |

### Per-Tag F1

| Tag | F1 |
|-----|-----|
| class (POS) | 0.963 |
| number | 0.966 |
| POS | 0.958 |
| lemma | 0.935 |
| person | 0.933 |
| gender | 0.928 |
| type | 0.900 |
| morph | 0.890 |
| state | 0.878 |
| stem | 0.859 |
| gloss | 0.539 |

### Per-Language Exact Match

| Language | Exact Match |
|----------|-------------|
| Hebrew | 0.514 |
| Greek | 0.559 |

> The `gloss` tag (English translation) is the hardest to predict exactly, pulling down the overall exact match rate. The model achieves strong F1 on structural/morphological tags (class, number, POS, person, gender all > 0.92).


---

Built with love for Jesus. Published by [LoveJesus](https://huggingface.co/LoveJesus).
Part of the [bible.systems](https://bible.systems) project.