LoveJesus commited on
Commit
1600c5a
·
verified ·
1 Parent(s): 87959a7

Add model card

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - he
4
+ - el
5
+ license: mit
6
+ tags:
7
+ - biblical-hebrew
8
+ - biblical-greek
9
+ - morphology
10
+ - parsing
11
+ - mt5
12
+ - seq2seq
13
+ datasets:
14
+ - LoveJesus/biblical-tutor-dataset-chirho
15
+ pipeline_tag: text2text-generation
16
+ ---
17
+
18
+ # Biblical Morphological Parser (mT5-small)
19
+
20
+ *For God so loved the world that he gave his only begotten Son, that whoever believes in him should not perish but have eternal life. - John 3:16*
21
+
22
+ ## What This Does
23
+
24
+ This model parses biblical Hebrew and Greek words into their morphological components: part of speech, stem, lemma, tense, person, gender, number, and English gloss.
25
+
26
+ ## Usage
27
+
28
+ ```python
29
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
30
+
31
+ tokenizer = AutoTokenizer.from_pretrained("LoveJesus/biblical-parser-chirho")
32
+ model = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/biblical-parser-chirho")
33
+
34
+ # Parse a Hebrew word
35
+ input_text = 'parse [hebrew]: בָּרָא [GEN 1:1] context: בְּרֵאשִׁית אֱלֹהִים'
36
+ inputs = tokenizer(input_text, return_tensors="pt")
37
+ outputs = model.generate(**inputs, max_length=128)
38
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
39
+ # Expected: "class:verb | stem:qal | lemma:ברא | morph:... | person:3 | gender:m | number:s | gloss:he created"
40
+
41
+ # Parse a Greek word
42
+ input_text = 'parse [greek]: λόγος [JHN 1:1] context: ἐν ἀρχῇ ἦν'
43
+ inputs = tokenizer(input_text, return_tensors="pt")
44
+ outputs = model.generate(**inputs, max_length=128)
45
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
46
+ ```
47
+
48
+ ## Input Format
49
+
50
+ ```
51
+ parse [{language}]: {word} [{verse_ref}] context: {surrounding_words}
52
+ ```
53
+
54
+ - `{language}`: `hebrew` or `greek`
55
+ - `{word}`: The biblical word in original script
56
+ - `{verse_ref}`: Book chapter:verse reference
57
+ - `{surrounding_words}`: 2 words before and after for disambiguation
58
+
59
+ ## Output Format
60
+
61
+ Pipe-separated morphological tags:
62
+ ```
63
+ class:{pos} | stem:{stem} | lemma:{lemma} | morph:{code} | person:{p} | gender:{g} | number:{n} | gloss:{english}
64
+ ```
65
+
66
+ ## Training Data
67
+
68
+ - **Macula Hebrew** (Clear-Bible): ~425K OT words with morphology and glosses
69
+ - **Macula Greek SBLGNT** (Clear-Bible): ~138K NT words with morphology and glosses
70
+ - Subsampled to ~200K words (100K per language), stratified by book
71
+
72
+ ## Model Details
73
+
74
+ | Property | Value |
75
+ |----------|-------|
76
+ | Base model | google/mt5-small (300M params) |
77
+ | Architecture | Encoder-decoder (Seq2Seq) |
78
+ | Languages | Biblical Hebrew, Koine Greek |
79
+ | Training | 5 epochs, lr=3e-4, batch=32 |
80
+ | Hardware | NVIDIA A100/H200 GPU |
81
+
82
+ ## Limitations
83
+
84
+ - Trained on Macula morphological annotations — may not match all scholarly traditions
85
+ - Handles individual words, not full syntactic analysis
86
+ - Performance may vary on words not well-represented in training data
87
+
88
+ ---
89
+
90
+ Built with love for Jesus. Published by [LoveJesus](https://huggingface.co/LoveJesus).
91
+ Part of the [bible.systems](https://bible.systems) project.