BEREL Linker NER

A Hebrew Named Entity Recognition (NER) model for Rabbinic literature, fine-tuned from BEREL 3.0 โ€” a BERT-based language model pre-trained on Rabbinic Hebrew texts by DICTA.

Model Description

This model identifies two entity types in Rabbinic Hebrew text:

Label Hebrew Description
Cit (B-ืžืงื•ืจ / I-ืžืงื•ืจ) ืžืงื•ืจ Citations โ€” references to Jewish texts and sources
Per (B-ื‘ืŸ-ืื“ื / I-ื‘ืŸ-ืื“ื) ื‘ืŸ-ืื“ื Persons โ€” names of people

It uses BIO tagging and was trained for the purpose of automatically linking citations and persons in Sefaria's corpus of Rabbinic literature.

Training Details

  • Base model: dicta-il/BEREL_3.0
  • Architecture: BertForTokenClassification (BERT-base, 12 layers, 12 attention heads, hidden size 768)
  • Parameters: ~183.8M (larger than standard BERT-base due to 128,000-token Hebrew vocabulary)
  • Training dataset: Sefaria/he_berel_gold (~3,000 annotated examples of Rabbinic text, split 80/20 train/test)
  • Batch size: 16
  • Early stopping patience: 2 epochs

Performance

Best checkpoint was saved at epoch 3 (of a possible 10) via early stopping:

Metric Score
F1 87.2%
Precision 85.7%
Recall 88.8%
Eval loss 0.0815

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "Sefaria/berel-linker-ner"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)

ner = pipeline(
    "ner",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="first",
    stride=128,
)

text = "ื“ื‘ืจื™ ื”ืจืžื‘\"ื ื‘ื”ืœื›ื•ืช ืฉื‘ืช"
entities = ner(text)
print(entities)

Label Map

{
  "O": 0,
  "I-ืžืงื•ืจ": 1,
  "I-ื‘ืŸ-ืื“ื": 2,
  "B-ืžืงื•ืจ": 5,
  "B-ื‘ืŸ-ืื“ื": 6
}

About

Developed by Sefaria for automated entity linking in classical Jewish texts.

Downloads last month
27
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Sefaria/berel-linker-ner

Finetuned
(6)
this model

Dataset used to train Sefaria/berel-linker-ner