LoveJesus commited on
Commit
1673778
·
verified ·
1 Parent(s): 8f8fc93

Upgrade to flan-t5-base (248M): difficulty_accuracy=94.8%, eval_loss=2.228

Browse files
Files changed (1) hide show
  1. README.md +48 -40
README.md CHANGED
@@ -1,6 +1,4 @@
1
  ---
2
- # For God so loved the world that he gave his only begotten Son,
3
- # that whoever believes in him should not perish but have eternal life. - John 3:16
4
  license: mit
5
  language:
6
  - en
@@ -16,15 +14,31 @@ tags:
16
  datasets:
17
  - LoveJesus/passage-difficulty-simplifier-dataset-chirho
18
  pipeline_tag: text2text-generation
19
- base_model: google/flan-t5-small
20
  model-index:
21
  - name: passage-difficulty-simplifier-chirho
22
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ---
24
 
 
 
 
25
  # Passage Difficulty Scorer & Plain-Language Simplifier (Model 8)
26
 
27
- A fine-tuned **google/flan-t5-small** (80M parameters) for dual-task Bible passage processing: (1) reading difficulty scoring and (2) archaic-to-modern English simplification. Both tasks are learned jointly through multi-task training on the same model.
28
 
29
  ## Model Description
30
 
@@ -48,18 +62,18 @@ Converts archaic or complex Bible passages into plain modern English.
48
 
49
  | Parameter | Value |
50
  |---|---|
51
- | **Base model** | `google/flan-t5-small` (80M params) |
52
  | **Architecture** | Encoder-Decoder (T5) |
53
  | **Training approach** | Full fine-tuning, multi-task |
54
  | **Trainer** | `Seq2SeqTrainer` with `DataCollatorForSeq2Seq` |
55
  | **Epochs** | 5 |
56
- | **Batch size** | 32 (A100 GPU) |
57
- | **Effective batch size** | 32 (gradient accumulation = 1 on A100) |
58
- | **Learning rate** | 3e-4 |
59
  | **LR scheduler** | Cosine with 10% warmup |
60
  | **Weight decay** | 0.01 |
61
  | **Label smoothing** | 0.1 |
62
- | **Mixed precision** | bf16 (A100) |
63
  | **Max input length** | 256 tokens |
64
  | **Max target length** | 256 tokens |
65
  | **Early stopping** | Patience = 2, monitoring `eval_loss` |
@@ -214,34 +228,28 @@ for verse_chirho, result_chirho in zip(verses_chirho, results_chirho):
214
  | Simplification | Exact match | Proportion of predictions matching reference exactly |
215
  | Combined | `combined_score_chirho` | 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match |
216
 
217
- ### Results
218
-
219
- Trained for 5 epochs (19,075 steps) on NVIDIA A100 SXM 80GB in ~80 minutes. Best model selected by lowest eval loss.
220
 
221
  | Metric | Score |
222
  |---|---|
223
- | Eval loss (best) | **2.341** (epoch 4) |
224
- | Difficulty accuracy | **91.96%** |
225
- | Simplification exact match | 0.35% |
226
- | Combined score | 0.3699 |
227
- | Training throughput | 128.3 samples/sec |
228
- | Training runtime | 4,758 seconds |
229
-
230
- **Per-epoch progression:**
231
-
232
- | Epoch | Eval Loss | Difficulty Acc | Simp Exact | Combined |
233
- |---|---|---|---|---|
234
- | 1 | 2.441 | 85.55% | 0.24% | 0.3436 |
235
- | 2 | 2.373 | 89.21% | 0.32% | 0.3587 |
236
- | 3 | 2.348 | 90.19% | 0.33% | 0.3628 |
237
- | 4 | 2.341 | 91.74% | 0.32% | 0.3689 |
238
- | 5 | 2.342 | 91.96% | 0.35% | 0.3699 |
239
-
240
- **Notes:**
241
- - Difficulty scoring is the model's strong suit at 91.96% accuracy on easy/medium/hard classification
242
- - Simplification exact match is expectedly low since there are many valid paraphrases of any verse
243
- - The model produces fluent, readable simplifications even when they don't match the reference exactly
244
- - Combined score weights: 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match
245
 
246
  ## Try It Live
247
 
@@ -256,7 +264,7 @@ The Gradio-powered demo provides two tabs:
256
  - Trained exclusively on Bible text; does not generalize to other literary or domain-specific texts
257
  - Simplification quality varies by verse length and complexity; very long passages may be truncated
258
  - Difficulty scoring labels are algorithmically generated (not human-annotated), which introduces systematic biases
259
- - Small model (80M params) trades accuracy for speed and accessibility
260
  - Simplification targets (BBE, OEB) have their own translation biases; outputs reflect those stylistic choices
261
  - Archaic form detection relies on a fixed word list and may miss uncommon archaic constructions
262
  - The model does not preserve verse references or theological nuance; it is a readability tool, not a study Bible
@@ -277,10 +285,10 @@ The Gradio-powered demo provides two tabs:
277
  ## Model Architecture
278
 
279
  ```
280
- google/flan-t5-small (Encoder-Decoder)
281
- Encoder: 6 layers, 8 heads, d_model=512
282
- Decoder: 6 layers, 8 heads, d_model=512
283
- Total parameters: ~80M (all trainable, full fine-tuning)
284
  Vocabulary: SentencePiece, 32,128 tokens
285
  ```
286
 
 
1
  ---
 
 
2
  license: mit
3
  language:
4
  - en
 
14
  datasets:
15
  - LoveJesus/passage-difficulty-simplifier-dataset-chirho
16
  pipeline_tag: text2text-generation
17
+ base_model: google/flan-t5-base
18
  model-index:
19
  - name: passage-difficulty-simplifier-chirho
20
+ results:
21
+ - task:
22
+ type: text2text-generation
23
+ name: Text Generation
24
+ metrics:
25
+ - name: Eval Loss
26
+ type: eval_loss
27
+ value: 2.228
28
+ - name: Difficulty Accuracy
29
+ type: accuracy
30
+ value: 0.9377
31
+ - name: Combined Score
32
+ type: combined_score
33
+ value: 0.3781
34
  ---
35
 
36
+ <!-- For God so loved the world that he gave his only begotten Son,
37
+ that whoever believes in him should not perish but have eternal life. - John 3:16 -->
38
+
39
  # Passage Difficulty Scorer & Plain-Language Simplifier (Model 8)
40
 
41
+ A fine-tuned **google/flan-t5-base** (248M parameters) for dual-task Bible passage processing: (1) reading difficulty scoring and (2) archaic-to-modern English simplification. Both tasks are learned jointly through multi-task training on the same model. Upgraded from flan-t5-small (80M) for improved accuracy.
42
 
43
  ## Model Description
44
 
 
62
 
63
  | Parameter | Value |
64
  |---|---|
65
+ | **Base model** | `google/flan-t5-base` (248M params) |
66
  | **Architecture** | Encoder-Decoder (T5) |
67
  | **Training approach** | Full fine-tuning, multi-task |
68
  | **Trainer** | `Seq2SeqTrainer` with `DataCollatorForSeq2Seq` |
69
  | **Epochs** | 5 |
70
+ | **Batch size** | 32 (H200 GPU) |
71
+ | **Effective batch size** | 32 (gradient accumulation = 1 on H200) |
72
+ | **Learning rate** | 2e-4 |
73
  | **LR scheduler** | Cosine with 10% warmup |
74
  | **Weight decay** | 0.01 |
75
  | **Label smoothing** | 0.1 |
76
+ | **Mixed precision** | bf16 (H200) |
77
  | **Max input length** | 256 tokens |
78
  | **Max target length** | 256 tokens |
79
  | **Early stopping** | Patience = 2, monitoring `eval_loss` |
 
228
  | Simplification | Exact match | Proportion of predictions matching reference exactly |
229
  | Combined | `combined_score_chirho` | 0.4 * difficulty_accuracy + 0.6 * simplification_exact_match |
230
 
231
+ ### Results (v2 - flan-t5-base upgrade)
 
 
232
 
233
  | Metric | Score |
234
  |---|---|
235
+ | **Eval loss** | **2.228** (best at epoch 3) |
236
+ | **Difficulty accuracy** | **93.8%** |
237
+ | **Simplification exact match** | 0.50% |
238
+ | **Combined score** | **0.378** |
239
+ | Train loss | 1.964 |
240
+ | Hardware | NVIDIA H200 (143GB), ~64 min |
241
+
242
+ ### Training Trajectory
243
+
244
+ | Epoch | Eval Loss | Difficulty Acc | Combined Score |
245
+ |-------|-----------|----------------|----------------|
246
+ | 1 | 2.282 | 87.1% | 0.351 |
247
+ | 2 | 2.244 | 91.9% | 0.370 |
248
+ | **3** | **2.228** | 93.8% | 0.378 |
249
+ | 4 | 2.236 | 94.7% | 0.382 |
250
+ | 5 | 2.241 | 94.8% | 0.382 |
251
+
252
+ Best model selected by lowest eval_loss (epoch 3). Difficulty accuracy continued improving through epoch 5 but loss began increasing at epoch 4, indicating mild overfitting on the simplification task.
 
 
 
 
253
 
254
  ## Try It Live
255
 
 
264
  - Trained exclusively on Bible text; does not generalize to other literary or domain-specific texts
265
  - Simplification quality varies by verse length and complexity; very long passages may be truncated
266
  - Difficulty scoring labels are algorithmically generated (not human-annotated), which introduces systematic biases
267
+ - Base model (248M params) balances accuracy with accessibility
268
  - Simplification targets (BBE, OEB) have their own translation biases; outputs reflect those stylistic choices
269
  - Archaic form detection relies on a fixed word list and may miss uncommon archaic constructions
270
  - The model does not preserve verse references or theological nuance; it is a readability tool, not a study Bible
 
285
  ## Model Architecture
286
 
287
  ```
288
+ google/flan-t5-base (Encoder-Decoder)
289
+ Encoder: 12 layers, 12 heads, d_model=768
290
+ Decoder: 12 layers, 12 heads, d_model=768
291
+ Total parameters: ~248M (all trainable, full fine-tuning)
292
  Vocabulary: SentencePiece, 32,128 tokens
293
  ```
294