Talmud Punctuator β€” Model B (Steinsaltz)

Fine-tuned BEREL 3.0 for predicting punctuation in Talmudic Aramaic/Hebrew text.

Model B reflects the Steinsaltz/William Davidson Edition punctuation style, trained on a 15,000-sentence sample across 36 masekhtot of the Babylonian Talmud.

This is a preliminary model. For better accuracy, retrain on the full 80K-sentence dataset using the included Colab notebook (train_on_colab.ipynb).

Training details

  • Base model: BEREL 3.0 (dicta-il/BEREL_3.0)
  • Head: Linear classification (768 β†’ 8 labels)
  • Data: 15,000 sentences sampled from 36 masekhtot (~80K available)
  • Epochs: 3, Batch size: 16, LR: 2e-5
  • Final loss: 0.2148

Labels

Label Meaning
O No punctuation
, Comma
. Period
: Colon
; Semicolon
? Question mark
! Exclamation mark
β€” Em-dash

Usage

Use with the punctuator.py script from mivami.

Upgrading to full dataset

Use the Colab notebook to train on all 80K sentences with a free GPU:

  1. Set CONTINUE_FROM = "Joshua2/talmud-punctuator-B" to start from this checkpoint
  2. Upload Steinsaltz_combined.txt
  3. Run on a T4 GPU (~3-5 hours)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Joshua2/talmud-punctuator-B

Finetuned
(4)
this model