Upgrade to flan-t5-base (248M): difficulty_accuracy=94.8%, eval_loss=2.228
Browse files
README.md
CHANGED
|
@@ -1,6 +1,4 @@
|
|
| 1 |
---
|
| 2 |
-
# For God so loved the world that he gave his only begotten Son,
|
| 3 |
-
# that whoever believes in him should not perish but have eternal life. - John 3:16
|
| 4 |
license: mit
|
| 5 |
language:
|
| 6 |
- en
|
|
@@ -16,15 +14,31 @@ tags:
|
|
| 16 |
datasets:
|
| 17 |
- LoveJesus/passage-difficulty-simplifier-dataset-chirho
|
| 18 |
pipeline_tag: text2text-generation
|
| 19 |
-
base_model: google/flan-t5-
|
| 20 |
model-index:
|
| 21 |
- name: passage-difficulty-simplifier-chirho
|
| 22 |
-
results:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
---
|
| 24 |
|
|
|
|
|
|
|
|
|
|
| 25 |
# Passage Difficulty Scorer & Plain-Language Simplifier (Model 8)
|
| 26 |
|
| 27 |
-
A fine-tuned **google/flan-t5-
|
| 28 |
|
| 29 |
## Model Description
|
| 30 |
|
|
@@ -48,18 +62,18 @@ Converts archaic or complex Bible passages into plain modern English.
|
|
| 48 |
|
| 49 |
| Parameter | Value |
|
| 50 |
|---|---|
|
| 51 |
-
| **Base model** | `google/flan-t5-
|
| 52 |
| **Architecture** | Encoder-Decoder (T5) |
|
| 53 |
| **Training approach** | Full fine-tuning, multi-task |
|
| 54 |
| **Trainer** | `Seq2SeqTrainer` with `DataCollatorForSeq2Seq` |
|
| 55 |
| **Epochs** | 5 |
|
| 56 |
-
| **Batch size** | 32 (
|
| 57 |
-
| **Effective batch size** | 32 (gradient accumulation = 1 on
|
| 58 |
-
| **Learning rate** |
|
| 59 |
| **LR scheduler** | Cosine with 10% warmup |
|
| 60 |
| **Weight decay** | 0.01 |
|
| 61 |
| **Label smoothing** | 0.1 |
|
| 62 |
-
| **Mixed precision** | bf16 (
|
| 63 |
| **Max input length** | 256 tokens |
|
| 64 |
| **Max target length** | 256 tokens |
|
| 65 |
| **Early stopping** | Patience = 2, monitoring `eval_loss` |
|
|
@@ -214,34 +228,28 @@ for verse_chirho, result_chirho in zip(verses_chirho, results_chirho):
|
|
| 214 |
| Simplification | Exact match | Proportion of predictions matching reference exactly |
|
| 215 |
| Combined | `combined_score_chirho` | 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match |
|
| 216 |
|
| 217 |
-
### Results
|
| 218 |
-
|
| 219 |
-
Trained for 5 epochs (19,075 steps) on NVIDIA A100 SXM 80GB in ~80 minutes. Best model selected by lowest eval loss.
|
| 220 |
|
| 221 |
| Metric | Score |
|
| 222 |
|---|---|
|
| 223 |
-
| Eval loss
|
| 224 |
-
| Difficulty accuracy | **
|
| 225 |
-
| Simplification exact match | 0.
|
| 226 |
-
| Combined score | 0.
|
| 227 |
-
|
|
| 228 |
-
|
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
| Epoch | Eval Loss | Difficulty Acc |
|
| 233 |
-
|---
|
| 234 |
-
| 1 | 2.
|
| 235 |
-
| 2 | 2.
|
| 236 |
-
| 3 | 2.
|
| 237 |
-
| 4 | 2.
|
| 238 |
-
| 5 | 2.
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
- Difficulty scoring is the model's strong suit at 91.96% accuracy on easy/medium/hard classification
|
| 242 |
-
- Simplification exact match is expectedly low since there are many valid paraphrases of any verse
|
| 243 |
-
- The model produces fluent, readable simplifications even when they don't match the reference exactly
|
| 244 |
-
- Combined score weights: 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match
|
| 245 |
|
| 246 |
## Try It Live
|
| 247 |
|
|
@@ -256,7 +264,7 @@ The Gradio-powered demo provides two tabs:
|
|
| 256 |
- Trained exclusively on Bible text; does not generalize to other literary or domain-specific texts
|
| 257 |
- Simplification quality varies by verse length and complexity; very long passages may be truncated
|
| 258 |
- Difficulty scoring labels are algorithmically generated (not human-annotated), which introduces systematic biases
|
| 259 |
-
-
|
| 260 |
- Simplification targets (BBE, OEB) have their own translation biases; outputs reflect those stylistic choices
|
| 261 |
- Archaic form detection relies on a fixed word list and may miss uncommon archaic constructions
|
| 262 |
- The model does not preserve verse references or theological nuance; it is a readability tool, not a study Bible
|
|
@@ -277,10 +285,10 @@ The Gradio-powered demo provides two tabs:
|
|
| 277 |
## Model Architecture
|
| 278 |
|
| 279 |
```
|
| 280 |
-
google/flan-t5-
|
| 281 |
-
Encoder:
|
| 282 |
-
Decoder:
|
| 283 |
-
Total parameters: ~
|
| 284 |
Vocabulary: SentencePiece, 32,128 tokens
|
| 285 |
```
|
| 286 |
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
license: mit
|
| 3 |
language:
|
| 4 |
- en
|
|
|
|
| 14 |
datasets:
|
| 15 |
- LoveJesus/passage-difficulty-simplifier-dataset-chirho
|
| 16 |
pipeline_tag: text2text-generation
|
| 17 |
+
base_model: google/flan-t5-base
|
| 18 |
model-index:
|
| 19 |
- name: passage-difficulty-simplifier-chirho
|
| 20 |
+
results:
|
| 21 |
+
- task:
|
| 22 |
+
type: text2text-generation
|
| 23 |
+
name: Text Generation
|
| 24 |
+
metrics:
|
| 25 |
+
- name: Eval Loss
|
| 26 |
+
type: eval_loss
|
| 27 |
+
value: 2.228
|
| 28 |
+
- name: Difficulty Accuracy
|
| 29 |
+
type: accuracy
|
| 30 |
+
value: 0.9377
|
| 31 |
+
- name: Combined Score
|
| 32 |
+
type: combined_score
|
| 33 |
+
value: 0.3781
|
| 34 |
---
|
| 35 |
|
| 36 |
+
<!-- For God so loved the world that he gave his only begotten Son,
|
| 37 |
+
that whoever believes in him should not perish but have eternal life. - John 3:16 -->
|
| 38 |
+
|
| 39 |
# Passage Difficulty Scorer & Plain-Language Simplifier (Model 8)
|
| 40 |
|
| 41 |
+
A fine-tuned **google/flan-t5-base** (248M parameters) for dual-task Bible passage processing: (1) reading difficulty scoring and (2) archaic-to-modern English simplification. Both tasks are learned jointly through multi-task training on the same model. Upgraded from flan-t5-small (80M) for improved accuracy.
|
| 42 |
|
| 43 |
## Model Description
|
| 44 |
|
|
|
|
| 62 |
|
| 63 |
| Parameter | Value |
|
| 64 |
|---|---|
|
| 65 |
+
| **Base model** | `google/flan-t5-base` (248M params) |
|
| 66 |
| **Architecture** | Encoder-Decoder (T5) |
|
| 67 |
| **Training approach** | Full fine-tuning, multi-task |
|
| 68 |
| **Trainer** | `Seq2SeqTrainer` with `DataCollatorForSeq2Seq` |
|
| 69 |
| **Epochs** | 5 |
|
| 70 |
+
| **Batch size** | 32 (H200 GPU) |
|
| 71 |
+
| **Effective batch size** | 32 (gradient accumulation = 1 on H200) |
|
| 72 |
+
| **Learning rate** | 2e-4 |
|
| 73 |
| **LR scheduler** | Cosine with 10% warmup |
|
| 74 |
| **Weight decay** | 0.01 |
|
| 75 |
| **Label smoothing** | 0.1 |
|
| 76 |
+
| **Mixed precision** | bf16 (H200) |
|
| 77 |
| **Max input length** | 256 tokens |
|
| 78 |
| **Max target length** | 256 tokens |
|
| 79 |
| **Early stopping** | Patience = 2, monitoring `eval_loss` |
|
|
|
|
| 228 |
| Simplification | Exact match | Proportion of predictions matching reference exactly |
|
| 229 |
| Combined | `combined_score_chirho` | 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match |
|
| 230 |
|
| 231 |
+
### Results (v2 - flan-t5-base upgrade)
|
|
|
|
|
|
|
| 232 |
|
| 233 |
| Metric | Score |
|
| 234 |
|---|---|
|
| 235 |
+
| **Eval loss** | **2.228** (best at epoch 3) |
|
| 236 |
+
| **Difficulty accuracy** | **93.8%** |
|
| 237 |
+
| **Simplification exact match** | 0.50% |
|
| 238 |
+
| **Combined score** | **0.378** |
|
| 239 |
+
| Train loss | 1.964 |
|
| 240 |
+
| Hardware | NVIDIA H200 (143GB), ~64 min |
|
| 241 |
+
|
| 242 |
+
### Training Trajectory
|
| 243 |
+
|
| 244 |
+
| Epoch | Eval Loss | Difficulty Acc | Combined Score |
|
| 245 |
+
|-------|-----------|----------------|----------------|
|
| 246 |
+
| 1 | 2.282 | 87.1% | 0.351 |
|
| 247 |
+
| 2 | 2.244 | 91.9% | 0.370 |
|
| 248 |
+
| **3** | **2.228** | 93.8% | 0.378 |
|
| 249 |
+
| 4 | 2.236 | 94.7% | 0.382 |
|
| 250 |
+
| 5 | 2.241 | 94.8% | 0.382 |
|
| 251 |
+
|
| 252 |
+
Best model selected by lowest eval_loss (epoch 3). Difficulty accuracy continued improving through epoch 5 but loss began increasing at epoch 4, indicating mild overfitting on the simplification task.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 253 |
|
| 254 |
## Try It Live
|
| 255 |
|
|
|
|
| 264 |
- Trained exclusively on Bible text; does not generalize to other literary or domain-specific texts
|
| 265 |
- Simplification quality varies by verse length and complexity; very long passages may be truncated
|
| 266 |
- Difficulty scoring labels are algorithmically generated (not human-annotated), which introduces systematic biases
|
| 267 |
+
- Base model (248M params) balances accuracy with accessibility
|
| 268 |
- Simplification targets (BBE, OEB) have their own translation biases; outputs reflect those stylistic choices
|
| 269 |
- Archaic form detection relies on a fixed word list and may miss uncommon archaic constructions
|
| 270 |
- The model does not preserve verse references or theological nuance; it is a readability tool, not a study Bible
|
|
|
|
| 285 |
## Model Architecture
|
| 286 |
|
| 287 |
```
|
| 288 |
+
google/flan-t5-base (Encoder-Decoder)
|
| 289 |
+
Encoder: 12 layers, 12 heads, d_model=768
|
| 290 |
+
Decoder: 12 layers, 12 heads, d_model=768
|
| 291 |
+
Total parameters: ~248M (all trainable, full fine-tuning)
|
| 292 |
Vocabulary: SentencePiece, 32,128 tokens
|
| 293 |
```
|
| 294 |
|