Lev Israel commited on
Commit
5990acd
·
2 Parent(s): a9dad42dfe0a7c

Merge branch 'main' of https://github.com/EliezerIsrael/RabHebBench

Browse files
Files changed (2) hide show
  1. README.md +2 -13
  2. space_README.md +0 -48
README.md CHANGED
@@ -22,13 +22,7 @@ Given a Hebrew/Aramaic text, the benchmark tests whether the embedding model can
22
  ## Corpus
23
 
24
  The benchmark includes diverse texts from Sefaria with English translations:
25
-
26
- - **Talmud**: Bavli and Yerushalmi (Aramaic + Hebrew)
27
- - **Mishnah**: All tractates (Rabbinic Hebrew)
28
- - **Midrash**: Midrash Rabbah (Hebrew/Aramaic)
29
- - **Tanakh Commentary**: Rashi and Ramban on Tanakh (Hebrew)
30
- - **Hasidic/Kabbalistic**: Likutei Moharan, Tomer Devorah (Hebrew)
31
- - **Halacha**: Sefer HaHinuch, Intro to Shev Shmateta (Hebrew)
32
 
33
  ## Usage
34
 
@@ -38,12 +32,7 @@ The benchmark includes diverse texts from Sefaria with English translations:
38
 
39
  ## Models
40
 
41
- ### Curated Models
42
- - `intfloat/multilingual-e5-large`
43
- - `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`
44
- - `BAAI/bge-m3`
45
-
46
- You can also evaluate any sentence-transformer compatible model from Hugging Face Hub.
47
 
48
  ## Local Development
49
 
 
22
  ## Corpus
23
 
24
  The benchmark includes diverse texts from Sefaria with English translations:
25
+ Representative Segment pairs from Talmud Bavli, Yerushalmi, Mishnah, Midrash, Tanakh Commentary, Halacha, Hassidic texts, Works of Philosophy, and Kabbalah.
 
 
 
 
 
 
26
 
27
  ## Usage
28
 
 
32
 
33
  ## Models
34
 
35
+ Support for OpenAI, Google, and Voyage embedding APIs, and any sentence-transformer compatible model from Hugging Face Hub.
 
 
 
 
 
36
 
37
  ## Local Development
38
 
space_README.md DELETED
@@ -1,48 +0,0 @@
1
- ---
2
- title: Rabbinic Embedding Benchmark
3
- emoji: 📚
4
- colorFrom: blue
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 4.44.0
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- # Rabbinic Hebrew/Aramaic Embedding Benchmark
14
-
15
- Evaluate embedding models on cross-lingual retrieval between Hebrew/Aramaic source texts and their English translations from Sefaria.
16
-
17
- ## How It Works
18
-
19
- Given a Hebrew/Aramaic text, can the model find its correct English translation from a pool of candidates? Models that excel at this task produce high-quality embeddings for Rabbinic literature.
20
-
21
- ## Metrics
22
-
23
- | Metric | Description |
24
- |--------|-------------|
25
- | **MRR** | Mean Reciprocal Rank (average of 1/rank of correct answer) |
26
- | **Recall@k** | % of queries where correct translation is in top k results |
27
- | **Bitext Accuracy** | True pair vs random pair classification |
28
-
29
- ## Corpus
30
-
31
- The benchmark includes diverse texts with English translations:
32
-
33
- - **Talmud**: Bavli & Yerushalmi
34
- - **Mishnah**: Selected tractates
35
- - **Midrash**: Midrash Rabbah
36
- - **Commentary**: Rashi, Ramban, Radak, Rabbeinu Behaye
37
- - **Philosophy**: Guide for the Perplexed, Sefer HaIkkarim
38
- - **Hasidic/Kabbalistic**: Likutei Moharan, Tomer Devorah, Kalach Pitchei Chokhmah
39
- - **Mussar**: Chafetz Chaim, Kav HaYashar, Iggeret HaRamban
40
- - **Halacha**: Sefer HaChinukh, Mishneh Torah
41
-
42
- All texts sourced from [Sefaria](https://www.sefaria.org).
43
-
44
- ## API Keys
45
-
46
- For API-based models (OpenAI, Voyage AI, Gemini), you can either:
47
- - Enter your API key in the interface (not stored)
48
- - Set environment variables in Space settings: `OPENAI_API_KEY`, `VOYAGE_API_KEY`, `GEMINI_API_KEY`