HalleluBERT: Let every token that has meaning bear its weight

HalleluBERT is a family of RoBERTa-based Modern Hebrew language models pre-trained from scratch on ~49.1 GB of deduplicated Hebrew web text (HeDC4 / HeRo corpus) and Hebrew Wikipedia. The models aim to provide the first fully converged Hebrew RoBERTa encoder family, including a large variant, and to push state-of-the-art performance on core Hebrew benchmarks.

We release two variants:

  • HalleluBERT-base: 126M parameters (fp32)
  • HalleluBERT-large: 357M parameters (fp32)

Model Details

Detail HalleluBERT-base HalleluBERT-large
Architecture RoBERTa-base RoBERTa-large
Parameters ~126M ~357M
Tokenizer GPT-2 style byte-level BPE (52,009 vocab) Same
Pretraining corpus HeDC4 (mC4 + OSCAR22) + Hebrew Wikipedia (~49.1 GB) Same
Objective Masked Language Modeling Same
Training steps 100k updates, global batch size 8k Same
LR schedule 10k warmup + polynomial decay Same
Peak learning rate 0.0004 0.00015
Training time ~30.2 hours (TPUv4-128 pod) ~6.0 days (TPUv4-128 pod)
Precision fp32 fp32
Framework fairseq fairseq

Downstream Evaluation

We evaluate HalleluBERT on three Hebrew benchmarks (following the HeRo suite, restricted to NER + sentiment):

  • NER (BMC split 1): micro-F1
  • NER (NEMOΒ², token-level): micro-F1
  • Sentiment (SMCD, deduplicated): macro-F1

We select the best configuration by validation performance and report the best score out of 10 runs on the official test sets.

πŸ§ͺ Evaluation Results

Legend: Bold = best, underline = second-best within each model size group.

Model BMC (micro-F1) NEMO (micro-F1) AVG NER SMCD (macro-F1) AVG (all)
Large models
HalleluBERT_large 93.23 88.70 90.96 84.91 88.95
XLM-RoBERTa_large 92.31 86.41 89.36 83.74 87.49
Base models
HeBERT 89.33 76.16 82.74 82.64 82.71
AlephBERT 91.36 81.52 86.44 83.66 85.51
HeRo 92.00 83.35 87.68 80.95 85.43
HalleluBERT_base 93.33 87.06 90.20 83.09 87.83
mmBERT_small 83.96 71.95 77.96 81.89 79.27
AlephBERT-Gimmel 92.46 85.86 89.16 82.66 86.99
XLM-RoBERTa_base 86.32 79.37 82.84 82.07 82.59
mmBERT_base 84.61 77.97 81.29 83.55 82.04

Fairseq Checkpoint

Get the fairseq checkpoint here.

Citation

If you use HalleluBERT in your research, please cite the corresponding paper (replace with your final bib entry if you already have one):

@misc{scheibleschmitt2025hallelubertlettokenmeaning,
      title={HalleluBERT: Let every token that has meaning bear its weight}, 
      author={Raphael Scheible-Schmitt},
      year={2025},
      eprint={2510.21372},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.21372}, 
}

πŸ“œ License

MIT License

Downloads last month
9
Safetensors
Model size
0.1B params
Tensor type
I64
Β·
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for HalleluBERT/HalleluBERT_base