A newer version of the Gradio SDK is available: 6.13.0
language:
- he
- arc
- en
license: cc-by-4.0
task_categories:
- sentence-similarity
- text-retrieval
tags:
- rabbinic
- hebrew
- aramaic
- talmud
- cross-lingual
- bitext
- sefaria
size_categories:
- 1K<n<10K
Rabbinic Hebrew/Aramaic - English Parallel Corpus
A benchmark dataset for evaluating embedding models on Rabbinic Hebrew and Aramaic texts, with parallel English translations sourced from Sefaria.
Dataset Description
This dataset contains parallel text pairs spanning diverse Rabbinic literature across multiple centuries and genres. It is designed for evaluating cross-lingual embedding models on their ability to align Hebrew/Aramaic source texts with English translations.
Languages
- Source: Rabbinic Hebrew, Jewish Babylonian Aramaic, Jewish Palestinian Aramaic
- Target: English
Dataset Structure
Each example contains:
ref: Sefaria reference string (e.g., "Berakhot.2a:1")he: Hebrew/Aramaic source texten: English translationcategory: Text category
Categories
| Category | Count | Description |
|---|---|---|
| Mishnah | 789 | Tannaitic legal compilation (~200 CE) |
| Tanakh Commentary | 674 | Rashi, Ramban, Radak, Rabbeinu Behaye on Torah |
| Jerusalem Talmud | 520 | Palestinian Talmud (~400 CE) |
| Talmud | 480 | Babylonian Talmud (~500 CE) |
| Midrash Rabbah | 393 | Midrashic compilations |
| Hasidic/Kabbalistic | 304 | Likutei Moharan, Tomer Devorah, Kalach Pitchei Chokhmah |
| Philosophy | 240 | Guide for the Perplexed, Sefer HaIkkarim |
| Halacha | 160 | Sefer HaChinukh, Mishneh Torah |
| Mussar/Ethics | 108 | Chafetz Chaim, Kav HaYashar, Iggeret HaRamban |
| Targum | 40 | Aramaic Targum to Song of Songs |
Intended Use
Evaluating embedding models for cross-lingual retrieval:
- Given a Hebrew/Aramaic text, can the model find its English translation from a pool of candidates?
- Models that excel at this task likely capture the semantics of Rabbinic literature well.
Source
All texts and translations are from Sefaria, a free library of Jewish texts.
Translations
Translations come from various sources including:
- William Davidson Talmud (Steinsaltz)
- Sefaria Community translations
- Historical translations (e.g., Friedlander's Guide for the Perplexed)
Citation
If you use this dataset, please cite Sefaria:
@misc{sefaria,
title = {Sefaria: A Living Library of Jewish Texts},
url = {https://www.sefaria.org},
year = {2026}
}
License
The dataset is released under CC-BY 4.0, following Sefaria's licensing for their open texts.