YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Passage Difficulty Scorer & Plain-Language Simplifier (Model 8)

A fine-tuned google/flan-t5-base (248M parameters) for dual-task Bible passage processing: (1) reading difficulty scoring and (2) archaic-to-modern English simplification. Both tasks are learned jointly through multi-task training on the same model. Upgraded from flan-t5-small (80M) for improved accuracy.

Model Description

This model takes Bible passages as input and performs one of two tasks, selected by a natural language prefix:

Task 1: Difficulty Scoring

Analyzes a Bible passage and produces a structured difficulty assessment.

Prefix: rate difficulty:
Output format: reading_level: [1-12] | vocab_complexity: [low/medium/high] | archaic_forms: [count] | difficulty: [easy/medium/hard]

Task 2: Simplification

Converts archaic or complex Bible passages into plain modern English.

Prefix: simplify:
Output: Plain-language paraphrase of the input verse

Training Details

Parameter	Value
Base model	`google/flan-t5-base` (248M params)
Architecture	Encoder-Decoder (T5)
Training approach	Full fine-tuning, multi-task
Trainer	`Seq2SeqTrainer` with `DataCollatorForSeq2Seq`
Epochs	5
Batch size	32 (H200 GPU)
Effective batch size	32 (gradient accumulation = 1 on H200)
Learning rate	2e-4
LR scheduler	Cosine with 10% warmup
Weight decay	0.01
Label smoothing	0.1
Mixed precision	bf16 (H200)
Max input length	256 tokens
Max target length	256 tokens
Early stopping	Patience = 2, monitoring `eval_loss`
Best model selection	Lowest `eval_loss`
Generation (eval)	`predict_with_generate=True`, beam search

Dataset

Trained on approximately 120K+ examples combining both tasks, split by Bible book to prevent verse-level leakage (80/10/10 by book):

Task	Target Count	Description
Difficulty scoring	~64K	Verses from 6 translations with algorithmically computed labels
Simplification	~96K	Cross-translation pairs mapping complex to simple English

Translations Used

Translation	Style	Role
KJV (King James Version)	Formal, archaic	Complex source
ASV (American Standard Version)	Formal, dated	Complex source
YLT (Young's Literal Translation)	Ultra-literal	Complex source
Darby Bible	Literal, dated	Complex source / Difficulty scoring
BBE (Bible in Basic English)	850-word vocabulary, ~Grade 4	Simple target
OEB (Open English Bible)	Modern, public domain	Simple target

Simplification Pairs

Complex Source	Simple Target
KJV	BBE
KJV	OEB
ASV	BBE
YLT	OEB

Data Source

Bible text sourced from ScrollMapper Bible Databases (public domain translations on GitHub).

Difficulty Scoring Labels

Labels are computed algorithmically from textual features:

Reading level (1-12): Approximate Flesch-Kincaid grade level analog, adjusted for archaic vocabulary and uncommon word ratio
Vocabulary complexity (low/medium/high): Ratio of words outside a ~3,000-word common English vocabulary
Archaic forms (count): Number of archaic English words detected (thee, thou, hath, doth, -eth/-est verb endings, etc.)
Difficulty (easy/medium/hard): Composite score from reading level, vocabulary complexity, and archaic form count

Usage

Quick Start: Simplification

# For God so loved the world that he gave his only begotten Son,
# that whoever believes in him should not perish but have eternal life. - John 3:16

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
model_chirho = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")

input_text_chirho = "simplify: And the LORD God formed man of the dust of the ground, and breathed into his nostrils the breath of life; and man became a living soul."

inputs_chirho = tokenizer_chirho(input_text_chirho, return_tensors="pt", max_length=256, truncation=True)
outputs_chirho = model_chirho.generate(**inputs_chirho, max_length=256, num_beams=4, early_stopping=True)
result_chirho = tokenizer_chirho.decode(outputs_chirho[0], skip_special_tokens=True)

print(result_chirho)
# Expected: A simplified, modern English version of the verse

Quick Start: Difficulty Scoring

# For God so loved the world that he gave his only begotten Son,
# that whoever believes in him should not perish but have eternal life. - John 3:16

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import re

tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
model_chirho = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")

input_text_chirho = "rate difficulty: For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life."

inputs_chirho = tokenizer_chirho(input_text_chirho, return_tensors="pt", max_length=256, truncation=True)
outputs_chirho = model_chirho.generate(**inputs_chirho, max_length=256, num_beams=4, early_stopping=True)
raw_output_chirho = tokenizer_chirho.decode(outputs_chirho[0], skip_special_tokens=True)

print(raw_output_chirho)
# Expected: "reading_level: X | vocab_complexity: Y | archaic_forms: Z | difficulty: W"

# Parse structured output
reading_level_chirho = re.search(r"reading_level:\s*(\d+)", raw_output_chirho)
difficulty_chirho = re.search(r"difficulty:\s*(\w+)", raw_output_chirho)
vocab_chirho = re.search(r"vocab_complexity:\s*(\w+)", raw_output_chirho)
archaic_chirho = re.search(r"archaic_forms:\s*(\d+)", raw_output_chirho)

if reading_level_chirho:
    print(f"Reading Level: Grade {reading_level_chirho.group(1)}")
if difficulty_chirho:
    print(f"Difficulty: {difficulty_chirho.group(1)}")
if vocab_chirho:
    print(f"Vocabulary Complexity: {vocab_chirho.group(1)}")
if archaic_chirho:
    print(f"Archaic Forms: {archaic_chirho.group(1)}")

Batch Inference

# For God so loved the world that he gave his only begotten Son,
# that whoever believes in him should not perish but have eternal life. - John 3:16

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer_chirho = AutoTokenizer.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
model_chirho = AutoModelForSeq2SeqLM.from_pretrained("LoveJesus/passage-difficulty-simplifier-chirho")
model_chirho.eval()

verses_chirho = [
    "simplify: Verily, verily, I say unto thee, Except a man be born again, he cannot see the kingdom of God.",
    "simplify: Wherefore, as by one man sin entered into the world, and death by sin; and so death passed upon all men, for that all have sinned:",
    "rate difficulty: In the beginning God created the heaven and the earth.",
    "rate difficulty: Jesus wept.",
]

inputs_chirho = tokenizer_chirho(verses_chirho, return_tensors="pt", max_length=256, truncation=True, padding=True)

with torch.no_grad():
    outputs_chirho = model_chirho.generate(**inputs_chirho, max_length=256, num_beams=4, early_stopping=True)

results_chirho = tokenizer_chirho.batch_decode(outputs_chirho, skip_special_tokens=True)

for verse_chirho, result_chirho in zip(verses_chirho, results_chirho):
    print(f"Input:  {verse_chirho}")
    print(f"Output: {result_chirho}\n")

Evaluation

Metrics

Task	Metric	Description
Difficulty Scoring	`difficulty_accuracy_chirho`	Exact match on easy/medium/hard label
Difficulty Scoring	Reading level MAE	Mean absolute error on grade level (1-12)
Difficulty Scoring	Vocab complexity accuracy	Exact match on low/medium/high
Simplification	BLEU	Corpus-level BLEU score (sacrebleu)
Simplification	BERTScore F1	Semantic similarity to reference simplifications
Simplification	Exact match	Proportion of predictions matching reference exactly
Combined	`combined_score_chirho`	0.4 * difficulty_accuracy + 0.6 * simplification_exact_match

Results (v2 - flan-t5-base upgrade)

Metric	Score
Eval loss	2.228 (best at epoch 3)
Difficulty accuracy	93.8%
Simplification exact match	0.50%
Combined score	0.378
Train loss	1.964
Hardware	NVIDIA H200 (143GB), ~64 min

Training Trajectory

Epoch	Eval Loss	Difficulty Acc	Combined Score
1	2.282	87.1%	0.351
2	2.244	91.9%	0.370
3	2.228	93.8%	0.378
4	2.236	94.7%	0.382
5	2.241	94.8%	0.382

Best model selected by lowest eval_loss (epoch 3). Difficulty accuracy continued improving through epoch 5 but loss began increasing at epoch 4, indicating mild overfitting on the simplification task.

Try It Live

Interactive Demo on HuggingFace Spaces

The Gradio-powered demo provides two tabs:

Simplify: Enter any Bible verse and receive a plain-language version
Difficulty: Enter a verse and get reading level, vocabulary complexity, archaic form count, and overall difficulty

Limitations

Trained exclusively on Bible text; does not generalize to other literary or domain-specific texts
Simplification quality varies by verse length and complexity; very long passages may be truncated
Difficulty scoring labels are algorithmically generated (not human-annotated), which introduces systematic biases
Base model (248M params) balances accuracy with accessibility
Simplification targets (BBE, OEB) have their own translation biases; outputs reflect those stylistic choices
Archaic form detection relies on a fixed word list and may miss uncommon archaic constructions
The model does not preserve verse references or theological nuance; it is a readability tool, not a study Bible

Intended Use

Bible study tools that need plain-language paraphrasing of archaic translations
Reading level assessment for curriculum planning or children's ministry materials
Accessibility applications that present Bible text at appropriate reading levels
Research into text simplification for historical English

Out-of-Scope Use

Replacing authoritative Bible translations for doctrinal study
General-purpose text simplification outside of biblical literature
Machine translation between languages (this model operates only in English)

Model Architecture

google/flan-t5-base (Encoder-Decoder)
  Encoder: 12 layers, 12 heads, d_model=768
  Decoder: 12 layers, 12 heads, d_model=768
  Total parameters: ~248M (all trainable, full fine-tuning)
  Vocabulary: SentencePiece, 32,128 tokens

Repository Structure

passage-difficulty-simplifier-chirho/
  src-chirho/
    train-chirho/train-simplifier-chirho.py    # Training script
    eval-chirho/evaluate-chirho.py             # Evaluation script
    data-chirho/build-simplifier-dataset-chirho.ts  # Dataset builder (Bun/TS)
    data-chirho/download-translations-chirho.ts     # Translation downloader
    upload-hf-chirho.py                        # HuggingFace upload script
  space-chirho/
    app.py                                     # Gradio demo application
  data-chirho/
    raw-chirho/                                # Raw Bible CSVs
    processed-chirho/                          # JSONL train/val/test splits
  models-chirho/
    simplifier-chirho/best-chirho/             # Best checkpoint
  cards-chirho/
    simplifier-card-chirho.md                  # This model card
  config-chirho.yaml                           # Training configuration
  spec-chirho/
    progress-chirho.sqlite                     # Agent progress log

Training Reproducibility

# 1. Download Bible translations
cd passage-difficulty-simplifier-chirho
bun run src-chirho/data-chirho/download-translations-chirho.ts

# 2. Build dual-task dataset
bun run src-chirho/data-chirho/build-simplifier-dataset-chirho.ts

# 3. Train model
python src-chirho/train-chirho/train-simplifier-chirho.py

# 4. Evaluate
python src-chirho/eval-chirho/evaluate-chirho.py

# 5. Upload to HuggingFace
python src-chirho/upload-hf-chirho.py

License

MIT

Citation

@misc{lovejesus2026passagedifficultysimplifier,
  title={Passage Difficulty Scorer & Plain-Language Simplifier: Multi-Task Flan-T5 for Bible Readability},
  author={loveJesus},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/LoveJesus/passage-difficulty-simplifier-chirho}
}

Built with love for Jesus. Published by loveJesus.

Downloads last month: 3

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for LoveJesus/passage-difficulty-simplifier-chirho

Base model

google/flan-t5-base

Finetuned

(908)

this model

Dataset used to train LoveJesus/passage-difficulty-simplifier-chirho

Space using LoveJesus/passage-difficulty-simplifier-chirho 1

Evaluation results

Eval Loss
self-reported

2.228
Difficulty Accuracy
self-reported

0.938
Combined Score
self-reported

0.378