--- title: Hadith Search emoji: ๐Ÿ“œ colorFrom: indigo colorTo: green sdk: docker pinned: false license: mit ---
# ๐Ÿ“œ Hadith Search **Semantic search across thousands of Prophetic traditions โ€” find the Hadith closest to your question by meaning, not just keywords.** [![HuggingFace Space](https://img.shields.io/badge/๐Ÿค—%20HuggingFace-Live%20Demo-yellow?style=for-the-badge)](https://huggingface.co/spaces/NightPrince/Hadith_Search) [![License: MIT](https://img.shields.io/badge/License-MIT-green?style=for-the-badge)](LICENSE) [![Python 3.10](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=python)](https://python.org) [![FastAPI](https://img.shields.io/badge/FastAPI-teal?style=for-the-badge&logo=fastapi)](https://fastapi.tiangolo.com)
--- ## What Is This? A hybrid AI-powered search engine over a comprehensive corpus of Islamic Hadith (prophetic traditions). It combines neural semantic embeddings with classical BM25 and anchor-based retrieval to surface the most relevant traditions โ€” even when your query uses different wording than the Hadith itself. Each result includes the full Hadith text, its chain of narration (Isnad), topic classification, and a direct source link. --- ## Demo ๐Ÿ”— **[Live on HuggingFace Spaces โ†’](https://huggingface.co/spaces/NightPrince/Hadith_Search)** --- ## How It Works ``` User Query (Arabic) โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Arabic Preprocessing โ”‚ โ”‚ Remove tashkeel ยท Normalize letters โ”‚ โ”‚ Unicode variant unification โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Hybrid Search (3 signals) โ”‚ โ”‚ โ”‚ โ”‚ โ‘  Anchor 40% โ€” hadith entity match โ”‚ โ”‚ โ‘ก Semantic 35% โ€” neural meaning match โ”‚ โ”‚ โ‘ข BM25 25% โ€” keyword precision โ”‚ โ”‚ โ”‚ โ”‚ Model: paraphrase-multilingual-MiniLM-L12 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ Top-K ranked Hadiths (text ยท isnad ยท topic ยท source URL) ``` The **anchor signal** is weighted higher here (40%) because Hadith have strong named-entity anchors (narrators, topics, keywords) that are highly discriminative โ€” making entity-aware matching the dominant signal. --- ## Features - **Anchor-weighted hybrid** โ€” prioritizes entity matching (40%) over pure semantics - **Full Hadith metadata** โ€” text, Isnad chain, topic classification, source URL - **Arabic-native** โ€” built for Arabic queries with proper diacritic handling - **RTL Arabic UI** โ€” responsive glassmorphism design - **Fast cold start** โ€” model baked into Docker image at build time - **Cached embeddings** โ€” TTL-based in-memory cache for repeated queries --- ## Tech Stack | Layer | Technology | |---|---| | Backend | FastAPI + Uvicorn | | Embeddings | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | | Vector Search | FAISS (CPU) | | Keyword Search | BM25 (`rank-bm25`) | | Frontend | Vanilla HTML/CSS/JS โ€” RTL Arabic | | Deployment | Docker on HuggingFace Spaces | --- ## Project Structure ``` โ”œโ”€โ”€ app.py # FastAPI entrypoint, /api/search endpoint โ”œโ”€โ”€ hadith_mcp.py # Search orchestrator, RAG initialization โ”œโ”€โ”€ retrieval.py # Hybrid search: BM25 + semantic + anchor โ”œโ”€โ”€ hf_model.py # Thread-safe SentenceTransformer + TTL cache โ”œโ”€โ”€ utils.py # Arabic text utilities (tashkeel, normalization) โ”œโ”€โ”€ index.html # Frontend UI โ”œโ”€โ”€ assets/ โ”‚ โ”œโ”€โ”€ script.js # Fetch + render result cards โ”‚ โ””โ”€โ”€ style.css # Glassmorphism RTL design โ”œโ”€โ”€ data/ โ”‚ โ”œโ”€โ”€ hadith.csv # Hadith corpus (text, isnad, title, topic, url) โ”‚ โ”œโ”€โ”€ hadith_embeddings.npy # Pre-computed embeddings โ”‚ โ”œโ”€โ”€ bm25.pkl # BM25 index โ”‚ โ”œโ”€โ”€ faiss_anchor.index # FAISS anchor index โ”‚ โ”œโ”€โ”€ anchor_dict.pkl # anchor โ†’ hadith row indices โ”‚ โ””โ”€โ”€ unique_anchor_texts.pkl # Ordered anchor list โ””โ”€โ”€ Dockerfile ``` --- ## API ### `POST /api/search` ```json // Request { "query": "ุฅู†ู…ุง ุงู„ุฃุนู…ุงู„ ุจุงู„ู†ูŠุงุช", "top_k": 5 } // Response { "results": [ { "rank": 1, "title": "ุญุฏูŠุซ ุงู„ู†ูŠุฉ", "text": "ุนูŽู†ู’ ุนูู…ูŽุฑูŽ ุจู’ู†ู ุงู„ู’ุฎูŽุทูŽู‘ุงุจู ู‚ูŽุงู„ูŽ ุณูŽู…ูุนู’ุชู ุฑูŽุณููˆู„ูŽ ุงู„ู„ูŽู‘ู‡ู...", "topic": "ุงู„ู†ูŠุฉ ูˆุงู„ุฅุฎู„ุงุต", "source_url": "https://..." } ] } ``` `top_k` accepts 1โ€“10. --- ## Local Setup ```bash pip install -r requirements.txt uvicorn app:app --host 0.0.0.0 --port 7860 --reload # open http://localhost:7860 ``` --- ## Built by **ูŠุญูŠู‰ ุงู„ู†ูˆุณุงู†ูŠ** โ€” [HuggingFace](https://huggingface.co/NightPrince) --- *Part of a series of Islamic knowledge retrieval engines. See also: [Tafsir Search](https://github.com/NightPrinceY/Tafsir_Search) ยท [Quran Semantic Retrieval](https://github.com/NightPrinceY/Quran-Semantic-Retrieval)*