File size: 13,821 Bytes
95e2c58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1673778
95e2c58
 
1673778
 
 
 
 
 
 
 
 
 
 
 
 
 
95e2c58
 
1673778
 
 
95e2c58
 
1673778
95e2c58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1673778
95e2c58
 
 
 
1673778
 
 
95e2c58
 
 
1673778
95e2c58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1673778
95e2c58
 
 
1673778
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95e2c58
 
 
 
 
 
 
 
 
 
 
 
 
 
1673778
95e2c58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1673778
 
 
 
95e2c58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
---
license: mit
language:
  - en
tags:
  - text2text-generation
  - flan-t5
  - bible
  - simplification
  - readability
  - difficulty-scoring
  - multi-task
  - seq2seq
datasets:
  - LoveJesus/passage-difficulty-simplifier-dataset-chirho
pipeline_tag: text2text-generation
base_model: google/flan-t5-base
model-index:
  - name: passage-difficulty-simplifier-chirho
    results:
    - task:
        type: text2text-generation
        name: Text Generation
      metrics:
      - name: Eval Loss
        type: eval_loss
        value: 2.228
      - name: Difficulty Accuracy
        type: accuracy
        value: 0.9377
      - name: Combined Score
        type: combined_score
        value: 0.3781
---

<!-- For God so loved the world that he gave his only begotten Son,
that whoever believes in him should not perish but have eternal life. - John 3:16 -->

# Passage Difficulty Scorer & Plain-Language Simplifier (Model 8)

A fine-tuned **google/flan-t5-base** (248M parameters) for dual-task Bible passage processing: (1) reading difficulty scoring and (2) archaic-to-modern English simplification. Both tasks are learned jointly through multi-task training on the same model. Upgraded from flan-t5-small (80M) for improved accuracy.

## Model Description

This model takes Bible passages as input and performs one of two tasks, selected by a natural language prefix:

### Task 1: Difficulty Scoring

Analyzes a Bible passage and produces a structured difficulty assessment.

- **Prefix**: `rate difficulty:`
- **Output format**: `reading_level: [1-12] | vocab_complexity: [low/medium/high] | archaic_forms: [count] | difficulty: [easy/medium/hard]`

### Task 2: Simplification

Converts archaic or complex Bible passages into plain modern English.

- **Prefix**: `simplify:`
- **Output**: Plain-language paraphrase of the input verse

## Training Details

| Parameter | Value |
|---|---|
| **Base model** | `google/flan-t5-base` (248M params) |
| **Architecture** | Encoder-Decoder (T5) |
| **Training approach** | Full fine-tuning, multi-task |
| **Trainer** | `Seq2SeqTrainer` with `DataCollatorForSeq2Seq` |
| **Epochs** | 5 |
| **Batch size** | 32 (H200 GPU) |
| **Effective batch size** | 32 (gradient accumulation = 1 on H200) |
| **Learning rate** | 2e-4 |
| **LR scheduler** | Cosine with 10% warmup |
| **Weight decay** | 0.01 |
| **Label smoothing** | 0.1 |
| **Mixed precision** | bf16 (H200) |
| **Max input length** | 256 tokens |
| **Max target length** | 256 tokens |
| **Early stopping** | Patience = 2, monitoring `eval_loss` |
| **Best model selection** | Lowest `eval_loss` |
| **Generation (eval)** | `predict_with_generate=True`, beam search |

### Dataset

Trained on approximately **120K+ examples** combining both tasks, split by Bible book to prevent verse-level leakage (80/10/10 by book):

| Task | Target Count | Description |
|---|---|---|
| Difficulty scoring | ~64K | Verses from 6 translations with algorithmically computed labels |
| Simplification | ~96K | Cross-translation pairs mapping complex to simple English |

#### Translations Used

| Translation | Style | Role |
|---|---|---|
| KJV (King James Version) | Formal, archaic | Complex source |
| ASV (American Standard Version) | Formal, dated | Complex source |
| YLT (Young's Literal Translation) | Ultra-literal | Complex source |
| Darby Bible | Literal, dated | Complex source / Difficulty scoring |
| BBE (Bible in Basic English) | 850-word vocabulary, ~Grade 4 | Simple target |
| OEB (Open English Bible) | Modern, public domain | Simple target |

#### Simplification Pairs

| Complex Source | Simple Target |
|---|---|
| KJV | BBE |
| KJV | OEB |
| ASV | BBE |
| YLT | OEB |

#### Data Source

Bible text sourced from **ScrollMapper Bible Databases** (public domain translations on GitHub).

#### Difficulty Scoring Labels

Labels are computed algorithmically from textual features:

- **Reading level** (1-12): Approximate Flesch-Kincaid grade level analog, adjusted for archaic vocabulary and uncommon word ratio
- **Vocabulary complexity** (low/medium/high): Ratio of words outside a ~3,000-word common English vocabulary
- **Archaic forms** (count): Number of archaic English words detected (thee, thou, hath, doth, -eth/-est verb endings, etc.)
- **Difficulty** (easy/medium/hard): Composite score from reading level, vocabulary complexity, and archaic form count

## Usage

### Quick Start: Simplification

```python
# For God so loved the world that he gave his only begotten Son,
# that whoever believes in him should not perish but have eternal life. - John 3:16

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
model_chirho = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")

input_text_chirho = "simplify: And the LORD God formed man of the dust of the ground, and breathed into his nostrils the breath of life; and man became a living soul."

inputs_chirho = tokenizer_chirho(input_text_chirho, return_tensors="pt", max_length=256, truncation=True)
outputs_chirho = model_chirho.generate(**inputs_chirho, max_length=256, num_beams=4, early_stopping=True)
result_chirho = tokenizer_chirho.decode(outputs_chirho[0], skip_special_tokens=True)

print(result_chirho)
# Expected: A simplified, modern English version of the verse
```

### Quick Start: Difficulty Scoring

```python
# For God so loved the world that he gave his only begotten Son,
# that whoever believes in him should not perish but have eternal life. - John 3:16

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import re

tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
model_chirho = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")

input_text_chirho = "rate difficulty: For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life."

inputs_chirho = tokenizer_chirho(input_text_chirho, return_tensors="pt", max_length=256, truncation=True)
outputs_chirho = model_chirho.generate(**inputs_chirho, max_length=256, num_beams=4, early_stopping=True)
raw_output_chirho = tokenizer_chirho.decode(outputs_chirho[0], skip_special_tokens=True)

print(raw_output_chirho)
# Expected: "reading_level: X | vocab_complexity: Y | archaic_forms: Z | difficulty: W"

# Parse structured output
reading_level_chirho = re.search(r"reading_level:\s*(\d+)", raw_output_chirho)
difficulty_chirho = re.search(r"difficulty:\s*(\w+)", raw_output_chirho)
vocab_chirho = re.search(r"vocab_complexity:\s*(\w+)", raw_output_chirho)
archaic_chirho = re.search(r"archaic_forms:\s*(\d+)", raw_output_chirho)

if reading_level_chirho:
    print(f"Reading Level: Grade {reading_level_chirho.group(1)}")
if difficulty_chirho:
    print(f"Difficulty: {difficulty_chirho.group(1)}")
if vocab_chirho:
    print(f"Vocabulary Complexity: {vocab_chirho.group(1)}")
if archaic_chirho:
    print(f"Archaic Forms: {archaic_chirho.group(1)}")
```

### Batch Inference

```python
# For God so loved the world that he gave his only begotten Son,
# that whoever believes in him should not perish but have eternal life. - John 3:16

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
model_chirho = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
model_chirho.eval()

verses_chirho = [
    "simplify: Verily, verily, I say unto thee, Except a man be born again, he cannot see the kingdom of God.",
    "simplify: Wherefore, as by one man sin entered into the world, and death by sin; and so death passed upon all men, for that all have sinned:",
    "rate difficulty: In the beginning God created the heaven and the earth.",
    "rate difficulty: Jesus wept.",
]

inputs_chirho = tokenizer_chirho(verses_chirho, return_tensors="pt", max_length=256, truncation=True, padding=True)

with torch.no_grad():
    outputs_chirho = model_chirho.generate(**inputs_chirho, max_length=256, num_beams=4, early_stopping=True)

results_chirho = tokenizer_chirho.batch_decode(outputs_chirho, skip_special_tokens=True)

for verse_chirho, result_chirho in zip(verses_chirho, results_chirho):
    print(f"Input:  {verse_chirho}")
    print(f"Output: {result_chirho}\n")
```

## Evaluation

### Metrics

| Task | Metric | Description |
|---|---|---|
| Difficulty Scoring | `difficulty_accuracy_chirho` | Exact match on easy/medium/hard label |
| Difficulty Scoring | Reading level MAE | Mean absolute error on grade level (1-12) |
| Difficulty Scoring | Vocab complexity accuracy | Exact match on low/medium/high |
| Simplification | BLEU | Corpus-level BLEU score (sacrebleu) |
| Simplification | BERTScore F1 | Semantic similarity to reference simplifications |
| Simplification | Exact match | Proportion of predictions matching reference exactly |
| Combined | `combined_score_chirho` | 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match |

### Results (v2 - flan-t5-base upgrade)

| Metric | Score |
|---|---|
| **Eval loss** | **2.228** (best at epoch 3) |
| **Difficulty accuracy** | **93.8%** |
| **Simplification exact match** | 0.50% |
| **Combined score** | **0.378** |
| Train loss | 1.964 |
| Hardware | NVIDIA H200 (143GB), ~64 min |

### Training Trajectory

| Epoch | Eval Loss | Difficulty Acc | Combined Score |
|-------|-----------|----------------|----------------|
| 1 | 2.282 | 87.1% | 0.351 |
| 2 | 2.244 | 91.9% | 0.370 |
| **3** | **2.228** | 93.8% | 0.378 |
| 4 | 2.236 | 94.7% | 0.382 |
| 5 | 2.241 | 94.8% | 0.382 |

Best model selected by lowest eval_loss (epoch 3). Difficulty accuracy continued improving through epoch 5 but loss began increasing at epoch 4, indicating mild overfitting on the simplification task.

## Try It Live

**[Interactive Demo on HuggingFace Spaces](https://huggingface.co/spaces/LoveJesus/passage-difficulty-simplifier-chirho)**

The Gradio-powered demo provides two tabs:
- **Simplify**: Enter any Bible verse and receive a plain-language version
- **Difficulty**: Enter a verse and get reading level, vocabulary complexity, archaic form count, and overall difficulty

## Limitations

- Trained exclusively on Bible text; does not generalize to other literary or domain-specific texts
- Simplification quality varies by verse length and complexity; very long passages may be truncated
- Difficulty scoring labels are algorithmically generated (not human-annotated), which introduces systematic biases
- Base model (248M params) balances accuracy with accessibility
- Simplification targets (BBE, OEB) have their own translation biases; outputs reflect those stylistic choices
- Archaic form detection relies on a fixed word list and may miss uncommon archaic constructions
- The model does not preserve verse references or theological nuance; it is a readability tool, not a study Bible

## Intended Use

- Bible study tools that need plain-language paraphrasing of archaic translations
- Reading level assessment for curriculum planning or children's ministry materials
- Accessibility applications that present Bible text at appropriate reading levels
- Research into text simplification for historical English

## Out-of-Scope Use

- Replacing authoritative Bible translations for doctrinal study
- General-purpose text simplification outside of biblical literature
- Machine translation between languages (this model operates only in English)

## Model Architecture

```
google/flan-t5-base (Encoder-Decoder)
  Encoder: 12 layers, 12 heads, d_model=768
  Decoder: 12 layers, 12 heads, d_model=768
  Total parameters: ~248M (all trainable, full fine-tuning)
  Vocabulary: SentencePiece, 32,128 tokens
```

## Repository Structure

```
passage-difficulty-simplifier-chirho/
  src-chirho/
    train-chirho/train-simplifier-chirho.py    # Training script
    eval-chirho/evaluate-chirho.py             # Evaluation script
    data-chirho/build-simplifier-dataset-chirho.ts  # Dataset builder (Bun/TS)
    data-chirho/download-translations-chirho.ts     # Translation downloader
    upload-hf-chirho.py                        # HuggingFace upload script
  space-chirho/
    app.py                                     # Gradio demo application
  data-chirho/
    raw-chirho/                                # Raw Bible CSVs
    processed-chirho/                          # JSONL train/val/test splits
  models-chirho/
    simplifier-chirho/best-chirho/             # Best checkpoint
  cards-chirho/
    simplifier-card-chirho.md                  # This model card
  config-chirho.yaml                           # Training configuration
  spec-chirho/
    progress-chirho.sqlite                     # Agent progress log
```

## Training Reproducibility

```bash
# 1. Download Bible translations
cd passage-difficulty-simplifier-chirho
bun run src-chirho/data-chirho/download-translations-chirho.ts

# 2. Build dual-task dataset
bun run src-chirho/data-chirho/build-simplifier-dataset-chirho.ts

# 3. Train model
python src-chirho/train-chirho/train-simplifier-chirho.py

# 4. Evaluate
python src-chirho/eval-chirho/evaluate-chirho.py

# 5. Upload to HuggingFace
python src-chirho/upload-hf-chirho.py
```

## License

MIT

## Citation

```bibtex
@misc{lovejesus2026passagedifficultysimplifier,
  title={Passage Difficulty Scorer & Plain-Language Simplifier: Multi-Task Flan-T5 for Bible Readability},
  author={loveJesus},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/LoveJesus/passage-difficulty-simplifier-chirho}
}
```

---

Built with love for Jesus. Published by [loveJesus](https://huggingface.co/LoveJesus).