Spaces:

Sefaria
/

Rabbinic-Embedding-Bench

Running

App Files Files Community

Lev Israel commited on Jan 12

Commit

5990acd

2 Parent(s): a9dad42 dfe0a7c

Merge branch 'main' of https://github.com/EliezerIsrael/RabHebBench

Browse files

Files changed (2) hide show

README.md +2 -13
space_README.md +0 -48

README.md CHANGED Viewed

@@ -22,13 +22,7 @@ Given a Hebrew/Aramaic text, the benchmark tests whether the embedding model can
 ## Corpus
 The benchmark includes diverse texts from Sefaria with English translations:
-- **Talmud**: Bavli and Yerushalmi (Aramaic + Hebrew)
-- **Mishnah**: All tractates (Rabbinic Hebrew)
-- **Midrash**: Midrash Rabbah (Hebrew/Aramaic)
-- **Tanakh Commentary**: Rashi and Ramban on Tanakh (Hebrew)
-- **Hasidic/Kabbalistic**: Likutei Moharan, Tomer Devorah (Hebrew)
-- **Halacha**: Sefer HaHinuch, Intro to Shev Shmateta (Hebrew)
 ## Usage
@@ -38,12 +32,7 @@ The benchmark includes diverse texts from Sefaria with English translations:
 ## Models
-### Curated Models
-- `intfloat/multilingual-e5-large`
-- `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`
-- `BAAI/bge-m3`
-You can also evaluate any sentence-transformer compatible model from Hugging Face Hub.
 ## Local Development

 ## Corpus
 The benchmark includes diverse texts from Sefaria with English translations:
+Representative Segment pairs from Talmud Bavli, Yerushalmi, Mishnah, Midrash, Tanakh Commentary, Halacha, Hassidic texts, Works of Philosophy, and Kabbalah.
 ## Usage
 ## Models
+Support for OpenAI, Google, and Voyage embedding APIs, and any sentence-transformer compatible model from Hugging Face Hub.
 ## Local Development

space_README.md DELETED Viewed

@@ -1,48 +0,0 @@
----
-title: Rabbinic Embedding Benchmark
-emoji: 📚
-colorFrom: blue
-colorTo: purple
-sdk: gradio
-sdk_version: 4.44.0
-app_file: app.py
-pinned: false
-license: mit
----
-# Rabbinic Hebrew/Aramaic Embedding Benchmark
-Evaluate embedding models on cross-lingual retrieval between Hebrew/Aramaic source texts and their English translations from Sefaria.
-## How It Works
-Given a Hebrew/Aramaic text, can the model find its correct English translation from a pool of candidates? Models that excel at this task produce high-quality embeddings for Rabbinic literature.
-## Metrics
-| Metric | Description |
-|--------|-------------|
-| **MRR** | Mean Reciprocal Rank (average of 1/rank of correct answer) |
-| **Recall@k** | % of queries where correct translation is in top k results |
-| **Bitext Accuracy** | True pair vs random pair classification |
-## Corpus
-The benchmark includes diverse texts with English translations:
-- **Talmud**: Bavli & Yerushalmi
-- **Mishnah**: Selected tractates
-- **Midrash**: Midrash Rabbah
-- **Commentary**: Rashi, Ramban, Radak, Rabbeinu Behaye
-- **Philosophy**: Guide for the Perplexed, Sefer HaIkkarim
-- **Hasidic/Kabbalistic**: Likutei Moharan, Tomer Devorah, Kalach Pitchei Chokhmah
-- **Mussar**: Chafetz Chaim, Kav HaYashar, Iggeret HaRamban
-- **Halacha**: Sefer HaChinukh, Mishneh Torah
-All texts sourced from [Sefaria](https://www.sefaria.org).
-## API Keys
-For API-based models (OpenAI, Voyage AI, Gemini), you can either:
-- Enter your API key in the interface (not stored)
-- Set environment variables in Space settings: `OPENAI_API_KEY`, `VOYAGE_API_KEY`, `GEMINI_API_KEY`