Lev Israel
Refactor to use gr.Progress API and upgrade to Gradio 5
1a6f495

A newer version of the Gradio SDK is available: 6.4.0

Upgrade
metadata
title: Rabbinic Embedding Benchmark
emoji: 📚
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: mit
datasets:
  - Sefaria/Rabbinic-Hebrew-English-Pairs
  - Sefaria/Rabbinic-Embedding-Leaderboard

Rabbinic Hebrew/Aramaic Embedding Benchmark

Evaluate embedding models on cross-lingual retrieval between Hebrew/Aramaic source texts and their English translations from Sefaria.

How It Works

Given a Hebrew/Aramaic text, can the model find its correct English translation from a pool of candidates? Models that excel at this task produce high-quality embeddings for Rabbinic literature.

Metrics

Metric Description
MRR Mean Reciprocal Rank (average of 1/rank of correct answer)
Recall@k % of queries where correct translation is in top k results
Bitext Accuracy True pair vs random pair classification

Corpus

The benchmark uses the Sefaria/Rabbinic-Hebrew-English-Pairs dataset, which includes diverse texts with English translations:

  • Talmud: Bavli & Yerushalmi
  • Mishnah: Selected tractates
  • Midrash: Midrash Rabbah
  • Commentary: Rashi, Ramban, Radak, Rabbeinu Behaye
  • Philosophy: Guide for the Perplexed, Sefer HaIkkarim
  • Hasidic/Kabbalistic: Likutei Moharan, Tomer Devorah, Kalach Pitchei Chokhmah
  • Mussar: Chafetz Chaim, Kav HaYashar, Iggeret HaRamban
  • Halacha: Sefer HaChinukh, Mishneh Torah

All texts sourced from Sefaria.

Leaderboard

Results are stored persistently in the Sefaria/Rabbinic-Embedding-Leaderboard dataset.

Configuration (Space Secrets)

The following environment variables can be set in Space settings:

Required for Leaderboard Persistence

Secret Description
HF_TOKEN HuggingFace token with write access to Sefaria/Rabbinic-Embedding-Leaderboard. Without this, evaluations will run but results won't be saved to the leaderboard.

Optional for API-based Models

Secret Description
OPENAI_API_KEY For OpenAI embedding models
VOYAGE_API_KEY For Voyage AI embedding models
GEMINI_API_KEY For Google Gemini embedding models

Users can also enter API keys directly in the interface (they are not stored).

Local Development

# Clone and install dependencies
git clone https://huggingface.co/spaces/Sefaria/Rabbinic-Embedding-Benchmark
cd Rabbinic-Embedding-Benchmark
pip install -r requirements.txt

# Run locally (leaderboard will be read-only without HF_TOKEN)
python app.py

# Or with write access to leaderboard
export HF_TOKEN=your_token_here
python app.py

Related