Joshua2
/

talmud-punctuator-B

Token Classification

Official Aramaic (700-300 BCE)

Model card Files Files and versions

Talmud Punctuator — Model B (Steinsaltz)

Fine-tuned BEREL 3.0 for predicting punctuation in Talmudic Aramaic/Hebrew text.

Model B reflects the Steinsaltz/William Davidson Edition punctuation style, trained on a 15,000-sentence sample across 36 masekhtot of the Babylonian Talmud.

This is a preliminary model. For better accuracy, retrain on the full 80K-sentence dataset using the included Colab notebook (train_on_colab.ipynb).

Training details

Base model: BEREL 3.0 (dicta-il/BEREL_3.0)
Head: Linear classification (768 → 8 labels)
Data: 15,000 sentences sampled from 36 masekhtot (~80K available)
Epochs: 3, Batch size: 16, LR: 2e-5
Final loss: 0.2148

Labels

Label	Meaning
`O`	No punctuation
`,`	Comma
`.`	Period
`:`	Colon
`;`	Semicolon
`?`	Question mark
`!`	Exclamation mark
`—`	Em-dash

Usage

Use with the punctuator.py script from mivami.

Upgrading to full dataset

Use the Colab notebook to train on all 80K sentences with a free GPU:

Set CONTINUE_FROM = "Joshua2/talmud-punctuator-B" to start from this checkpoint
Upload Steinsaltz_combined.txt
Run on a T4 GPU (~3-5 hours)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Joshua2/talmud-punctuator-B

Base model

dicta-il/BEREL_3.0

Finetuned

(9)

this model