File size: 3,098 Bytes
112e258
 
 
 
 
 
1a6f495
112e258
 
 
 
 
 
 
018c4c5
112e258
018c4c5
112e258
018c4c5
112e258
018c4c5
112e258
018c4c5
112e258
018c4c5
 
 
 
112e258
 
018c4c5
 
 
112e258
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
018c4c5
112e258
018c4c5
112e258
 
 
 
 
018c4c5
112e258
 
 
 
 
018c4c5
112e258
018c4c5
 
 
 
112e258
 
 
018c4c5
112e258
 
018c4c5
 
112e258
 
 
 
018c4c5
112e258
018c4c5
112e258
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: Rabbinic Embedding Benchmark
emoji: 📚
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: mit
datasets:
  - Sefaria/Rabbinic-Hebrew-English-Pairs
  - Sefaria/Rabbinic-Embedding-Leaderboard
---

# Rabbinic Hebrew/Aramaic Embedding Benchmark

Evaluate embedding models on cross-lingual retrieval between Hebrew/Aramaic source texts and their English translations from Sefaria.

## How It Works

Given a Hebrew/Aramaic text, can the model find its correct English translation from a pool of candidates? Models that excel at this task produce high-quality embeddings for Rabbinic literature.

## Metrics

| Metric | Description |
|--------|-------------|
| **MRR** | Mean Reciprocal Rank (average of 1/rank of correct answer) |
| **Recall@k** | % of queries where correct translation is in top k results |
| **Bitext Accuracy** | True pair vs random pair classification |

## Corpus

The benchmark uses the [Sefaria/Rabbinic-Hebrew-English-Pairs](https://huggingface.co/datasets/Sefaria/Rabbinic-Hebrew-English-Pairs) dataset, which includes diverse texts with English translations:

- **Talmud**: Bavli & Yerushalmi
- **Mishnah**: Selected tractates
- **Midrash**: Midrash Rabbah
- **Commentary**: Rashi, Ramban, Radak, Rabbeinu Behaye
- **Philosophy**: Guide for the Perplexed, Sefer HaIkkarim
- **Hasidic/Kabbalistic**: Likutei Moharan, Tomer Devorah, Kalach Pitchei Chokhmah
- **Mussar**: Chafetz Chaim, Kav HaYashar, Iggeret HaRamban
- **Halacha**: Sefer HaChinukh, Mishneh Torah

All texts sourced from [Sefaria](https://www.sefaria.org).

## Leaderboard

Results are stored persistently in the [Sefaria/Rabbinic-Embedding-Leaderboard](https://huggingface.co/datasets/Sefaria/Rabbinic-Embedding-Leaderboard) dataset.

## Configuration (Space Secrets)

The following environment variables can be set in Space settings:

### Required for Leaderboard Persistence

| Secret | Description |
|--------|-------------|
| `HF_TOKEN` | HuggingFace token with write access to `Sefaria/Rabbinic-Embedding-Leaderboard`. Without this, evaluations will run but results won't be saved to the leaderboard. |

### Optional for API-based Models

| Secret | Description |
|--------|-------------|
| `OPENAI_API_KEY` | For OpenAI embedding models |
| `VOYAGE_API_KEY` | For Voyage AI embedding models |
| `GEMINI_API_KEY` | For Google Gemini embedding models |

Users can also enter API keys directly in the interface (they are not stored).

## Local Development

```bash
# Clone and install dependencies
git clone https://huggingface.co/spaces/Sefaria/Rabbinic-Embedding-Benchmark
cd Rabbinic-Embedding-Benchmark
pip install -r requirements.txt

# Run locally (leaderboard will be read-only without HF_TOKEN)
python app.py

# Or with write access to leaderboard
export HF_TOKEN=your_token_here
python app.py
```

## Related

- [Benchmark Dataset](https://huggingface.co/datasets/Sefaria/Rabbinic-Hebrew-English-Pairs)
- [Leaderboard Dataset](https://huggingface.co/datasets/Sefaria/Rabbinic-Embedding-Leaderboard)
- [Sefaria](https://www.sefaria.org)