Merge branch 'main' of https://github.com/EliezerIsrael/RabHebBench
Browse files- README.md +2 -13
- space_README.md +0 -48
README.md
CHANGED
|
@@ -22,13 +22,7 @@ Given a Hebrew/Aramaic text, the benchmark tests whether the embedding model can
|
|
| 22 |
## Corpus
|
| 23 |
|
| 24 |
The benchmark includes diverse texts from Sefaria with English translations:
|
| 25 |
-
|
| 26 |
-
- **Talmud**: Bavli and Yerushalmi (Aramaic + Hebrew)
|
| 27 |
-
- **Mishnah**: All tractates (Rabbinic Hebrew)
|
| 28 |
-
- **Midrash**: Midrash Rabbah (Hebrew/Aramaic)
|
| 29 |
-
- **Tanakh Commentary**: Rashi and Ramban on Tanakh (Hebrew)
|
| 30 |
-
- **Hasidic/Kabbalistic**: Likutei Moharan, Tomer Devorah (Hebrew)
|
| 31 |
-
- **Halacha**: Sefer HaHinuch, Intro to Shev Shmateta (Hebrew)
|
| 32 |
|
| 33 |
## Usage
|
| 34 |
|
|
@@ -38,12 +32,7 @@ The benchmark includes diverse texts from Sefaria with English translations:
|
|
| 38 |
|
| 39 |
## Models
|
| 40 |
|
| 41 |
-
|
| 42 |
-
- `intfloat/multilingual-e5-large`
|
| 43 |
-
- `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`
|
| 44 |
-
- `BAAI/bge-m3`
|
| 45 |
-
|
| 46 |
-
You can also evaluate any sentence-transformer compatible model from Hugging Face Hub.
|
| 47 |
|
| 48 |
## Local Development
|
| 49 |
|
|
|
|
| 22 |
## Corpus
|
| 23 |
|
| 24 |
The benchmark includes diverse texts from Sefaria with English translations:
|
| 25 |
+
Representative Segment pairs from Talmud Bavli, Yerushalmi, Mishnah, Midrash, Tanakh Commentary, Halacha, Hassidic texts, Works of Philosophy, and Kabbalah.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
## Usage
|
| 28 |
|
|
|
|
| 32 |
|
| 33 |
## Models
|
| 34 |
|
| 35 |
+
Support for OpenAI, Google, and Voyage embedding APIs, and any sentence-transformer compatible model from Hugging Face Hub.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
## Local Development
|
| 38 |
|
space_README.md
DELETED
|
@@ -1,48 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: Rabbinic Embedding Benchmark
|
| 3 |
-
emoji: 📚
|
| 4 |
-
colorFrom: blue
|
| 5 |
-
colorTo: purple
|
| 6 |
-
sdk: gradio
|
| 7 |
-
sdk_version: 4.44.0
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
-
license: mit
|
| 11 |
-
---
|
| 12 |
-
|
| 13 |
-
# Rabbinic Hebrew/Aramaic Embedding Benchmark
|
| 14 |
-
|
| 15 |
-
Evaluate embedding models on cross-lingual retrieval between Hebrew/Aramaic source texts and their English translations from Sefaria.
|
| 16 |
-
|
| 17 |
-
## How It Works
|
| 18 |
-
|
| 19 |
-
Given a Hebrew/Aramaic text, can the model find its correct English translation from a pool of candidates? Models that excel at this task produce high-quality embeddings for Rabbinic literature.
|
| 20 |
-
|
| 21 |
-
## Metrics
|
| 22 |
-
|
| 23 |
-
| Metric | Description |
|
| 24 |
-
|--------|-------------|
|
| 25 |
-
| **MRR** | Mean Reciprocal Rank (average of 1/rank of correct answer) |
|
| 26 |
-
| **Recall@k** | % of queries where correct translation is in top k results |
|
| 27 |
-
| **Bitext Accuracy** | True pair vs random pair classification |
|
| 28 |
-
|
| 29 |
-
## Corpus
|
| 30 |
-
|
| 31 |
-
The benchmark includes diverse texts with English translations:
|
| 32 |
-
|
| 33 |
-
- **Talmud**: Bavli & Yerushalmi
|
| 34 |
-
- **Mishnah**: Selected tractates
|
| 35 |
-
- **Midrash**: Midrash Rabbah
|
| 36 |
-
- **Commentary**: Rashi, Ramban, Radak, Rabbeinu Behaye
|
| 37 |
-
- **Philosophy**: Guide for the Perplexed, Sefer HaIkkarim
|
| 38 |
-
- **Hasidic/Kabbalistic**: Likutei Moharan, Tomer Devorah, Kalach Pitchei Chokhmah
|
| 39 |
-
- **Mussar**: Chafetz Chaim, Kav HaYashar, Iggeret HaRamban
|
| 40 |
-
- **Halacha**: Sefer HaChinukh, Mishneh Torah
|
| 41 |
-
|
| 42 |
-
All texts sourced from [Sefaria](https://www.sefaria.org).
|
| 43 |
-
|
| 44 |
-
## API Keys
|
| 45 |
-
|
| 46 |
-
For API-based models (OpenAI, Voyage AI, Gemini), you can either:
|
| 47 |
-
- Enter your API key in the interface (not stored)
|
| 48 |
-
- Set environment variables in Space settings: `OPENAI_API_KEY`, `VOYAGE_API_KEY`, `GEMINI_API_KEY`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|