Spaces:
Sleeping
title: Hadith Search
emoji: ๐
colorFrom: indigo
colorTo: green
sdk: docker
pinned: false
license: mit
๐ Hadith Search
Semantic search across thousands of Prophetic traditions โ find the Hadith closest to your question by meaning, not just keywords.
What Is This?
A hybrid AI-powered search engine over a comprehensive corpus of Islamic Hadith (prophetic traditions). It combines neural semantic embeddings with classical BM25 and anchor-based retrieval to surface the most relevant traditions โ even when your query uses different wording than the Hadith itself.
Each result includes the full Hadith text, its chain of narration (Isnad), topic classification, and a direct source link.
Demo
๐ Live on HuggingFace Spaces โ
How It Works
User Query (Arabic)
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Arabic Preprocessing โ
โ Remove tashkeel ยท Normalize letters โ
โ Unicode variant unification โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Hybrid Search (3 signals) โ
โ โ
โ โ Anchor 40% โ hadith entity match โ
โ โก Semantic 35% โ neural meaning match โ
โ โข BM25 25% โ keyword precision โ
โ โ
โ Model: paraphrase-multilingual-MiniLM-L12 โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Top-K ranked Hadiths
(text ยท isnad ยท topic ยท source URL)
The anchor signal is weighted higher here (40%) because Hadith have strong named-entity anchors (narrators, topics, keywords) that are highly discriminative โ making entity-aware matching the dominant signal.
Features
- Anchor-weighted hybrid โ prioritizes entity matching (40%) over pure semantics
- Full Hadith metadata โ text, Isnad chain, topic classification, source URL
- Arabic-native โ built for Arabic queries with proper diacritic handling
- RTL Arabic UI โ responsive glassmorphism design
- Fast cold start โ model baked into Docker image at build time
- Cached embeddings โ TTL-based in-memory cache for repeated queries
Tech Stack
| Layer | Technology |
|---|---|
| Backend | FastAPI + Uvicorn |
| Embeddings | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |
| Vector Search | FAISS (CPU) |
| Keyword Search | BM25 (rank-bm25) |
| Frontend | Vanilla HTML/CSS/JS โ RTL Arabic |
| Deployment | Docker on HuggingFace Spaces |
Project Structure
โโโ app.py # FastAPI entrypoint, /api/search endpoint
โโโ hadith_mcp.py # Search orchestrator, RAG initialization
โโโ retrieval.py # Hybrid search: BM25 + semantic + anchor
โโโ hf_model.py # Thread-safe SentenceTransformer + TTL cache
โโโ utils.py # Arabic text utilities (tashkeel, normalization)
โโโ index.html # Frontend UI
โโโ assets/
โ โโโ script.js # Fetch + render result cards
โ โโโ style.css # Glassmorphism RTL design
โโโ data/
โ โโโ hadith.csv # Hadith corpus (text, isnad, title, topic, url)
โ โโโ hadith_embeddings.npy # Pre-computed embeddings
โ โโโ bm25.pkl # BM25 index
โ โโโ faiss_anchor.index # FAISS anchor index
โ โโโ anchor_dict.pkl # anchor โ hadith row indices
โ โโโ unique_anchor_texts.pkl # Ordered anchor list
โโโ Dockerfile
API
POST /api/search
// Request
{ "query": "ุฅูู
ุง ุงูุฃุนู
ุงู ุจุงูููุงุช", "top_k": 5 }
// Response
{
"results": [
{
"rank": 1,
"title": "ุญุฏูุซ ุงูููุฉ",
"text": "ุนููู ุนูู
ูุฑู ุจููู ุงููุฎูุทููุงุจู ููุงูู ุณูู
ูุนูุชู ุฑูุณูููู ุงูููููู...",
"topic": "ุงูููุฉ ูุงูุฅุฎูุงุต",
"source_url": "https://..."
}
]
}
top_k accepts 1โ10.
Local Setup
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860 --reload
# open http://localhost:7860
Built by
ูุญูู ุงูููุณุงูู โ HuggingFace
Part of a series of Islamic knowledge retrieval engines. See also: Tafsir Search ยท Quran Semantic Retrieval