A newer version of the Gradio SDK is available:
6.4.0
title: Rabbinic Embedding Benchmark
emoji: 📚
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: mit
datasets:
- Sefaria/Rabbinic-Hebrew-English-Pairs
- Sefaria/Rabbinic-Embedding-Leaderboard
Rabbinic Hebrew/Aramaic Embedding Benchmark
Evaluate embedding models on cross-lingual retrieval between Hebrew/Aramaic source texts and their English translations from Sefaria.
How It Works
Given a Hebrew/Aramaic text, can the model find its correct English translation from a pool of candidates? Models that excel at this task produce high-quality embeddings for Rabbinic literature.
Metrics
| Metric | Description |
|---|---|
| MRR | Mean Reciprocal Rank (average of 1/rank of correct answer) |
| Recall@k | % of queries where correct translation is in top k results |
| Bitext Accuracy | True pair vs random pair classification |
Corpus
The benchmark uses the Sefaria/Rabbinic-Hebrew-English-Pairs dataset, which includes diverse texts with English translations:
- Talmud: Bavli & Yerushalmi
- Mishnah: Selected tractates
- Midrash: Midrash Rabbah
- Commentary: Rashi, Ramban, Radak, Rabbeinu Behaye
- Philosophy: Guide for the Perplexed, Sefer HaIkkarim
- Hasidic/Kabbalistic: Likutei Moharan, Tomer Devorah, Kalach Pitchei Chokhmah
- Mussar: Chafetz Chaim, Kav HaYashar, Iggeret HaRamban
- Halacha: Sefer HaChinukh, Mishneh Torah
All texts sourced from Sefaria.
Leaderboard
Results are stored persistently in the Sefaria/Rabbinic-Embedding-Leaderboard dataset.
Configuration (Space Secrets)
The following environment variables can be set in Space settings:
Required for Leaderboard Persistence
| Secret | Description |
|---|---|
HF_TOKEN |
HuggingFace token with write access to Sefaria/Rabbinic-Embedding-Leaderboard. Without this, evaluations will run but results won't be saved to the leaderboard. |
Optional for API-based Models
| Secret | Description |
|---|---|
OPENAI_API_KEY |
For OpenAI embedding models |
VOYAGE_API_KEY |
For Voyage AI embedding models |
GEMINI_API_KEY |
For Google Gemini embedding models |
Users can also enter API keys directly in the interface (they are not stored).
Local Development
# Clone and install dependencies
git clone https://huggingface.co/spaces/Sefaria/Rabbinic-Embedding-Benchmark
cd Rabbinic-Embedding-Benchmark
pip install -r requirements.txt
# Run locally (leaderboard will be read-only without HF_TOKEN)
python app.py
# Or with write access to leaderboard
export HF_TOKEN=your_token_here
python app.py