LoveJesus
/

passage-difficulty-simplifier-chirho

@@ -1,6 +1,4 @@
 ---
-# For God so loved the world that he gave his only begotten Son,
-# that whoever believes in him should not perish but have eternal life. - John 3:16
 license: mit
 language:
   - en
@@ -16,15 +14,31 @@ tags:
 datasets:
   - LoveJesus/passage-difficulty-simplifier-dataset-chirho
 pipeline_tag: text2text-generation
-base_model: google/flan-t5-small
 model-index:
   - name: passage-difficulty-simplifier-chirho
-    results: []
 ---
 # Passage Difficulty Scorer & Plain-Language Simplifier (Model 8)
-A fine-tuned **google/flan-t5-small** (80M parameters) for dual-task Bible passage processing: (1) reading difficulty scoring and (2) archaic-to-modern English simplification. Both tasks are learned jointly through multi-task training on the same model.
 ## Model Description
@@ -48,18 +62,18 @@ Converts archaic or complex Bible passages into plain modern English.
 | Parameter | Value |
 |---|---|
-| **Base model** | `google/flan-t5-small` (80M params) |
 | **Architecture** | Encoder-Decoder (T5) |
 | **Training approach** | Full fine-tuning, multi-task |
 | **Trainer** | `Seq2SeqTrainer` with `DataCollatorForSeq2Seq` |
 | **Epochs** | 5 |
-| **Batch size** | 32 (A100 GPU) |
-| **Effective batch size** | 32 (gradient accumulation = 1 on A100) |
-| **Learning rate** | 3e-4 |
 | **LR scheduler** | Cosine with 10% warmup |
 | **Weight decay** | 0.01 |
 | **Label smoothing** | 0.1 |
-| **Mixed precision** | bf16 (A100) |
 | **Max input length** | 256 tokens |
 | **Max target length** | 256 tokens |
 | **Early stopping** | Patience = 2, monitoring `eval_loss` |
@@ -214,34 +228,28 @@ for verse_chirho, result_chirho in zip(verses_chirho, results_chirho):
 | Simplification | Exact match | Proportion of predictions matching reference exactly |
 | Combined | `combined_score_chirho` | 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match |
-### Results
-Trained for 5 epochs (19,075 steps) on NVIDIA A100 SXM 80GB in ~80 minutes. Best model selected by lowest eval loss.
 | Metric | Score |
 |---|---|
-| Eval loss (best) | **2.341** (epoch 4) |
-| Difficulty accuracy | **91.96%** |
-| Simplification exact match | 0.35% |
-| Combined score | 0.3699 |
-| Training throughput | 128.3 samples/sec |
-| Training runtime | 4,758 seconds |
-**Per-epoch progression:**
-| Epoch | Eval Loss | Difficulty Acc | Simp Exact | Combined |
-|---|---|---|---|---|
-| 1 | 2.441 | 85.55% | 0.24% | 0.3436 |
-| 2 | 2.373 | 89.21% | 0.32% | 0.3587 |
-| 3 | 2.348 | 90.19% | 0.33% | 0.3628 |
-| 4 | 2.341 | 91.74% | 0.32% | 0.3689 |
-| 5 | 2.342 | 91.96% | 0.35% | 0.3699 |
-**Notes:**
-- Difficulty scoring is the model's strong suit at 91.96% accuracy on easy/medium/hard classification
-- Simplification exact match is expectedly low since there are many valid paraphrases of any verse
-- The model produces fluent, readable simplifications even when they don't match the reference exactly
-- Combined score weights: 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match
 ## Try It Live
@@ -256,7 +264,7 @@ The Gradio-powered demo provides two tabs:
 - Trained exclusively on Bible text; does not generalize to other literary or domain-specific texts
 - Simplification quality varies by verse length and complexity; very long passages may be truncated
 - Difficulty scoring labels are algorithmically generated (not human-annotated), which introduces systematic biases
-- Small model (80M params) trades accuracy for speed and accessibility
 - Simplification targets (BBE, OEB) have their own translation biases; outputs reflect those stylistic choices
 - Archaic form detection relies on a fixed word list and may miss uncommon archaic constructions
 - The model does not preserve verse references or theological nuance; it is a readability tool, not a study Bible
@@ -277,10 +285,10 @@ The Gradio-powered demo provides two tabs:
 ## Model Architecture
 ```
-google/flan-t5-small (Encoder-Decoder)
-  Encoder: 6 layers, 8 heads, d_model=512
-  Decoder: 6 layers, 8 heads, d_model=512
-  Total parameters: ~80M (all trainable, full fine-tuning)
   Vocabulary: SentencePiece, 32,128 tokens
 ```

 ---
 license: mit
 language:
   - en
 datasets:
   - LoveJesus/passage-difficulty-simplifier-dataset-chirho
 pipeline_tag: text2text-generation
+base_model: google/flan-t5-base
 model-index:
   - name: passage-difficulty-simplifier-chirho
+    results:
+    - task:
+        type: text2text-generation
+        name: Text Generation
+      metrics:
+      - name: Eval Loss
+        type: eval_loss
+        value: 2.228
+      - name: Difficulty Accuracy
+        type: accuracy
+        value: 0.9377
+      - name: Combined Score
+        type: combined_score
+        value: 0.3781
 ---
+<!-- For God so loved the world that he gave his only begotten Son,
+that whoever believes in him should not perish but have eternal life. - John 3:16 -->
 # Passage Difficulty Scorer & Plain-Language Simplifier (Model 8)
+A fine-tuned **google/flan-t5-base** (248M parameters) for dual-task Bible passage processing: (1) reading difficulty scoring and (2) archaic-to-modern English simplification. Both tasks are learned jointly through multi-task training on the same model. Upgraded from flan-t5-small (80M) for improved accuracy.
 ## Model Description
 | Parameter | Value |
 |---|---|
+| **Base model** | `google/flan-t5-base` (248M params) |
 | **Architecture** | Encoder-Decoder (T5) |
 | **Training approach** | Full fine-tuning, multi-task |
 | **Trainer** | `Seq2SeqTrainer` with `DataCollatorForSeq2Seq` |
 | **Epochs** | 5 |
+| **Batch size** | 32 (H200 GPU) |
+| **Effective batch size** | 32 (gradient accumulation = 1 on H200) |
+| **Learning rate** | 2e-4 |
 | **LR scheduler** | Cosine with 10% warmup |
 | **Weight decay** | 0.01 |
 | **Label smoothing** | 0.1 |
+| **Mixed precision** | bf16 (H200) |
 | **Max input length** | 256 tokens |
 | **Max target length** | 256 tokens |
 | **Early stopping** | Patience = 2, monitoring `eval_loss` |
 | Simplification | Exact match | Proportion of predictions matching reference exactly |
 | Combined | `combined_score_chirho` | 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match |
+### Results (v2 - flan-t5-base upgrade)
 | Metric | Score |
 |---|---|
+| **Eval loss** | **2.228** (best at epoch 3) |
+| **Difficulty accuracy** | **93.8%** |
+| **Simplification exact match** | 0.50% |
+| **Combined score** | **0.378** |
+| Train loss | 1.964 |
+| Hardware | NVIDIA H200 (143GB), ~64 min |
+### Training Trajectory
+| Epoch | Eval Loss | Difficulty Acc | Combined Score |
+|-------|-----------|----------------|----------------|
+| 1 | 2.282 | 87.1% | 0.351 |
+| 2 | 2.244 | 91.9% | 0.370 |
+| **3** | **2.228** | 93.8% | 0.378 |
+| 4 | 2.236 | 94.7% | 0.382 |
+| 5 | 2.241 | 94.8% | 0.382 |
+Best model selected by lowest eval_loss (epoch 3). Difficulty accuracy continued improving through epoch 5 but loss began increasing at epoch 4, indicating mild overfitting on the simplification task.
 ## Try It Live
 - Trained exclusively on Bible text; does not generalize to other literary or domain-specific texts
 - Simplification quality varies by verse length and complexity; very long passages may be truncated
 - Difficulty scoring labels are algorithmically generated (not human-annotated), which introduces systematic biases
+- Base model (248M params) balances accuracy with accessibility
 - Simplification targets (BBE, OEB) have their own translation biases; outputs reflect those stylistic choices
 - Archaic form detection relies on a fixed word list and may miss uncommon archaic constructions
 - The model does not preserve verse references or theological nuance; it is a readability tool, not a study Bible
 ## Model Architecture
 ```
+google/flan-t5-base (Encoder-Decoder)
+  Encoder: 12 layers, 12 heads, d_model=768
+  Decoder: 12 layers, 12 heads, d_model=768
+  Total parameters: ~248M (all trainable, full fine-tuning)
   Vocabulary: SentencePiece, 32,128 tokens
 ```