Lev Israel
Leaderboard default
9060c03

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
language:
  - he
  - arc
  - en
license: cc-by-4.0
task_categories:
  - sentence-similarity
  - text-retrieval
tags:
  - rabbinic
  - hebrew
  - aramaic
  - talmud
  - cross-lingual
  - bitext
  - sefaria
size_categories:
  - 1K<n<10K

Rabbinic Hebrew/Aramaic - English Parallel Corpus

A benchmark dataset for evaluating embedding models on Rabbinic Hebrew and Aramaic texts, with parallel English translations sourced from Sefaria.

Dataset Description

This dataset contains parallel text pairs spanning diverse Rabbinic literature across multiple centuries and genres. It is designed for evaluating cross-lingual embedding models on their ability to align Hebrew/Aramaic source texts with English translations.

Languages

  • Source: Rabbinic Hebrew, Jewish Babylonian Aramaic, Jewish Palestinian Aramaic
  • Target: English

Dataset Structure

Each example contains:

  • ref: Sefaria reference string (e.g., "Berakhot.2a:1")
  • he: Hebrew/Aramaic source text
  • en: English translation
  • category: Text category

Categories

Category Count Description
Mishnah 789 Tannaitic legal compilation (~200 CE)
Tanakh Commentary 674 Rashi, Ramban, Radak, Rabbeinu Behaye on Torah
Jerusalem Talmud 520 Palestinian Talmud (~400 CE)
Talmud 480 Babylonian Talmud (~500 CE)
Midrash Rabbah 393 Midrashic compilations
Hasidic/Kabbalistic 304 Likutei Moharan, Tomer Devorah, Kalach Pitchei Chokhmah
Philosophy 240 Guide for the Perplexed, Sefer HaIkkarim
Halacha 160 Sefer HaChinukh, Mishneh Torah
Mussar/Ethics 108 Chafetz Chaim, Kav HaYashar, Iggeret HaRamban
Targum 40 Aramaic Targum to Song of Songs

Intended Use

Evaluating embedding models for cross-lingual retrieval:

  • Given a Hebrew/Aramaic text, can the model find its English translation from a pool of candidates?
  • Models that excel at this task likely capture the semantics of Rabbinic literature well.

Source

All texts and translations are from Sefaria, a free library of Jewish texts.

Translations

Translations come from various sources including:

  • William Davidson Talmud (Steinsaltz)
  • Sefaria Community translations
  • Historical translations (e.g., Friedlander's Guide for the Perplexed)

Citation

If you use this dataset, please cite Sefaria:

@misc{sefaria,
  title = {Sefaria: A Living Library of Jewish Texts},
  url = {https://www.sefaria.org},
  year = {2026}
}

License

The dataset is released under CC-BY 4.0, following Sefaria's licensing for their open texts.