Spaces:
Sleeping
Sleeping
| title: Hadith Search | |
| emoji: ๐ | |
| colorFrom: indigo | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| <div align="center"> | |
| # ๐ Hadith Search | |
| **Semantic search across thousands of Prophetic traditions โ find the Hadith closest to your question by meaning, not just keywords.** | |
| [](https://huggingface.co/spaces/NightPrince/Hadith_Search) | |
| [](LICENSE) | |
| [](https://python.org) | |
| [](https://fastapi.tiangolo.com) | |
| </div> | |
| --- | |
| ## What Is This? | |
| A hybrid AI-powered search engine over a comprehensive corpus of Islamic Hadith (prophetic traditions). It combines neural semantic embeddings with classical BM25 and anchor-based retrieval to surface the most relevant traditions โ even when your query uses different wording than the Hadith itself. | |
| Each result includes the full Hadith text, its chain of narration (Isnad), topic classification, and a direct source link. | |
| --- | |
| ## Demo | |
| ๐ **[Live on HuggingFace Spaces โ](https://huggingface.co/spaces/NightPrince/Hadith_Search)** | |
| --- | |
| ## How It Works | |
| ``` | |
| User Query (Arabic) | |
| โ | |
| โผ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Arabic Preprocessing โ | |
| โ Remove tashkeel ยท Normalize letters โ | |
| โ Unicode variant unification โ | |
| โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โผ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Hybrid Search (3 signals) โ | |
| โ โ | |
| โ โ Anchor 40% โ hadith entity match โ | |
| โ โก Semantic 35% โ neural meaning match โ | |
| โ โข BM25 25% โ keyword precision โ | |
| โ โ | |
| โ Model: paraphrase-multilingual-MiniLM-L12 โ | |
| โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โผ | |
| Top-K ranked Hadiths | |
| (text ยท isnad ยท topic ยท source URL) | |
| ``` | |
| The **anchor signal** is weighted higher here (40%) because Hadith have strong named-entity anchors (narrators, topics, keywords) that are highly discriminative โ making entity-aware matching the dominant signal. | |
| --- | |
| ## Features | |
| - **Anchor-weighted hybrid** โ prioritizes entity matching (40%) over pure semantics | |
| - **Full Hadith metadata** โ text, Isnad chain, topic classification, source URL | |
| - **Arabic-native** โ built for Arabic queries with proper diacritic handling | |
| - **RTL Arabic UI** โ responsive glassmorphism design | |
| - **Fast cold start** โ model baked into Docker image at build time | |
| - **Cached embeddings** โ TTL-based in-memory cache for repeated queries | |
| --- | |
| ## Tech Stack | |
| | Layer | Technology | | |
| |---|---| | |
| | Backend | FastAPI + Uvicorn | | |
| | Embeddings | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | | |
| | Vector Search | FAISS (CPU) | | |
| | Keyword Search | BM25 (`rank-bm25`) | | |
| | Frontend | Vanilla HTML/CSS/JS โ RTL Arabic | | |
| | Deployment | Docker on HuggingFace Spaces | | |
| --- | |
| ## Project Structure | |
| ``` | |
| โโโ app.py # FastAPI entrypoint, /api/search endpoint | |
| โโโ hadith_mcp.py # Search orchestrator, RAG initialization | |
| โโโ retrieval.py # Hybrid search: BM25 + semantic + anchor | |
| โโโ hf_model.py # Thread-safe SentenceTransformer + TTL cache | |
| โโโ utils.py # Arabic text utilities (tashkeel, normalization) | |
| โโโ index.html # Frontend UI | |
| โโโ assets/ | |
| โ โโโ script.js # Fetch + render result cards | |
| โ โโโ style.css # Glassmorphism RTL design | |
| โโโ data/ | |
| โ โโโ hadith.csv # Hadith corpus (text, isnad, title, topic, url) | |
| โ โโโ hadith_embeddings.npy # Pre-computed embeddings | |
| โ โโโ bm25.pkl # BM25 index | |
| โ โโโ faiss_anchor.index # FAISS anchor index | |
| โ โโโ anchor_dict.pkl # anchor โ hadith row indices | |
| โ โโโ unique_anchor_texts.pkl # Ordered anchor list | |
| โโโ Dockerfile | |
| ``` | |
| --- | |
| ## API | |
| ### `POST /api/search` | |
| ```json | |
| // Request | |
| { "query": "ุฅูู ุง ุงูุฃุนู ุงู ุจุงูููุงุช", "top_k": 5 } | |
| // Response | |
| { | |
| "results": [ | |
| { | |
| "rank": 1, | |
| "title": "ุญุฏูุซ ุงูููุฉ", | |
| "text": "ุนููู ุนูู ูุฑู ุจููู ุงููุฎูุทููุงุจู ููุงูู ุณูู ูุนูุชู ุฑูุณูููู ุงูููููู...", | |
| "topic": "ุงูููุฉ ูุงูุฅุฎูุงุต", | |
| "source_url": "https://..." | |
| } | |
| ] | |
| } | |
| ``` | |
| `top_k` accepts 1โ10. | |
| --- | |
| ## Local Setup | |
| ```bash | |
| pip install -r requirements.txt | |
| uvicorn app:app --host 0.0.0.0 --port 7860 --reload | |
| # open http://localhost:7860 | |
| ``` | |
| --- | |
| ## Built by | |
| **ูุญูู ุงูููุณุงูู** โ [HuggingFace](https://huggingface.co/NightPrince) | |
| --- | |
| *Part of a series of Islamic knowledge retrieval engines. See also: [Tafsir Search](https://github.com/NightPrinceY/Tafsir_Search) ยท [Quran Semantic Retrieval](https://github.com/NightPrinceY/Quran-Semantic-Retrieval)* | |