Talmud Punctuator β Model B (Steinsaltz)
Fine-tuned BEREL 3.0 for predicting punctuation in Talmudic Aramaic/Hebrew text.
Model B reflects the Steinsaltz/William Davidson Edition punctuation style, trained on a 15,000-sentence sample across 36 masekhtot of the Babylonian Talmud.
This is a preliminary model. For better accuracy, retrain on the full 80K-sentence
dataset using the included Colab notebook (train_on_colab.ipynb).
Training details
- Base model: BEREL 3.0 (
dicta-il/BEREL_3.0) - Head: Linear classification (768 β 8 labels)
- Data: 15,000 sentences sampled from 36 masekhtot (~80K available)
- Epochs: 3, Batch size: 16, LR: 2e-5
- Final loss: 0.2148
Labels
| Label | Meaning |
|---|---|
O |
No punctuation |
, |
Comma |
. |
Period |
: |
Colon |
; |
Semicolon |
? |
Question mark |
! |
Exclamation mark |
β |
Em-dash |
Usage
Use with the punctuator.py script from mivami.
Upgrading to full dataset
Use the Colab notebook to train on all 80K sentences with a free GPU:
- Set
CONTINUE_FROM = "Joshua2/talmud-punctuator-B"to start from this checkpoint - Upload
Steinsaltz_combined.txt - Run on a T4 GPU (~3-5 hours)
Model tree for Joshua2/talmud-punctuator-B
Base model
dicta-il/BEREL_3.0