Spaces:
Sleeping
Sleeping
Commit ยท
e36606a
1
Parent(s): 808922d
Add impressive README with architecture, API docs, and cross-project links
Browse files
README.md
CHANGED
|
@@ -1,11 +1,162 @@
|
|
| 1 |
---
|
| 2 |
title: Hadith Search
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: indigo
|
| 5 |
-
colorTo:
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
license: mit
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: Hadith Search
|
| 3 |
+
emoji: ๐
|
| 4 |
colorFrom: indigo
|
| 5 |
+
colorTo: green
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
license: mit
|
| 9 |
---
|
| 10 |
|
| 11 |
+
<div align="center">
|
| 12 |
+
|
| 13 |
+
# ๐ Hadith Search
|
| 14 |
+
|
| 15 |
+
**Semantic search across thousands of Prophetic traditions โ find the Hadith closest to your question by meaning, not just keywords.**
|
| 16 |
+
|
| 17 |
+
[](https://huggingface.co/spaces/NightPrince/Hadith_Search)
|
| 18 |
+
[](LICENSE)
|
| 19 |
+
[](https://python.org)
|
| 20 |
+
[](https://fastapi.tiangolo.com)
|
| 21 |
+
|
| 22 |
+
</div>
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## What Is This?
|
| 27 |
+
|
| 28 |
+
A hybrid AI-powered search engine over a comprehensive corpus of Islamic Hadith (prophetic traditions). It combines neural semantic embeddings with classical BM25 and anchor-based retrieval to surface the most relevant traditions โ even when your query uses different wording than the Hadith itself.
|
| 29 |
+
|
| 30 |
+
Each result includes the full Hadith text, its chain of narration (Isnad), topic classification, and a direct source link.
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Demo
|
| 35 |
+
|
| 36 |
+
๐ **[Live on HuggingFace Spaces โ](https://huggingface.co/spaces/NightPrince/Hadith_Search)**
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## How It Works
|
| 41 |
+
|
| 42 |
+
```
|
| 43 |
+
User Query (Arabic)
|
| 44 |
+
โ
|
| 45 |
+
โผ
|
| 46 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
| 47 |
+
โ Arabic Preprocessing โ
|
| 48 |
+
โ Remove tashkeel ยท Normalize letters โ
|
| 49 |
+
โ Unicode variant unification โ
|
| 50 |
+
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
| 51 |
+
โ
|
| 52 |
+
โผ
|
| 53 |
+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
| 54 |
+
โ Hybrid Search (3 signals) โ
|
| 55 |
+
โ โ
|
| 56 |
+
โ โ Anchor 40% โ hadith entity match โ
|
| 57 |
+
โ โก Semantic 35% โ neural meaning match โ
|
| 58 |
+
โ โข BM25 25% โ keyword precision โ
|
| 59 |
+
โ โ
|
| 60 |
+
โ Model: paraphrase-multilingual-MiniLM-L12 โ
|
| 61 |
+
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
|
| 62 |
+
โ
|
| 63 |
+
โผ
|
| 64 |
+
Top-K ranked Hadiths
|
| 65 |
+
(text ยท isnad ยท topic ยท source URL)
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
The **anchor signal** is weighted higher here (40%) because Hadith have strong named-entity anchors (narrators, topics, keywords) that are highly discriminative โ making entity-aware matching the dominant signal.
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
## Features
|
| 73 |
+
|
| 74 |
+
- **Anchor-weighted hybrid** โ prioritizes entity matching (40%) over pure semantics
|
| 75 |
+
- **Full Hadith metadata** โ text, Isnad chain, topic classification, source URL
|
| 76 |
+
- **Arabic-native** โ built for Arabic queries with proper diacritic handling
|
| 77 |
+
- **RTL Arabic UI** โ responsive glassmorphism design
|
| 78 |
+
- **Fast cold start** โ model baked into Docker image at build time
|
| 79 |
+
- **Cached embeddings** โ TTL-based in-memory cache for repeated queries
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## Tech Stack
|
| 84 |
+
|
| 85 |
+
| Layer | Technology |
|
| 86 |
+
|---|---|
|
| 87 |
+
| Backend | FastAPI + Uvicorn |
|
| 88 |
+
| Embeddings | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` |
|
| 89 |
+
| Vector Search | FAISS (CPU) |
|
| 90 |
+
| Keyword Search | BM25 (`rank-bm25`) |
|
| 91 |
+
| Frontend | Vanilla HTML/CSS/JS โ RTL Arabic |
|
| 92 |
+
| Deployment | Docker on HuggingFace Spaces |
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
## Project Structure
|
| 97 |
+
|
| 98 |
+
```
|
| 99 |
+
โโโ app.py # FastAPI entrypoint, /api/search endpoint
|
| 100 |
+
โโโ hadith_mcp.py # Search orchestrator, RAG initialization
|
| 101 |
+
โโโ retrieval.py # Hybrid search: BM25 + semantic + anchor
|
| 102 |
+
โโโ hf_model.py # Thread-safe SentenceTransformer + TTL cache
|
| 103 |
+
โโโ utils.py # Arabic text utilities (tashkeel, normalization)
|
| 104 |
+
โโโ index.html # Frontend UI
|
| 105 |
+
โโโ assets/
|
| 106 |
+
โ โโโ script.js # Fetch + render result cards
|
| 107 |
+
โ โโโ style.css # Glassmorphism RTL design
|
| 108 |
+
โโโ data/
|
| 109 |
+
โ โโโ hadith.csv # Hadith corpus (text, isnad, title, topic, url)
|
| 110 |
+
โ โโโ hadith_embeddings.npy # Pre-computed embeddings
|
| 111 |
+
โ โโโ bm25.pkl # BM25 index
|
| 112 |
+
โ โโโ faiss_anchor.index # FAISS anchor index
|
| 113 |
+
โ โโโ anchor_dict.pkl # anchor โ hadith row indices
|
| 114 |
+
โ โโโ unique_anchor_texts.pkl # Ordered anchor list
|
| 115 |
+
โโโ Dockerfile
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
---
|
| 119 |
+
|
| 120 |
+
## API
|
| 121 |
+
|
| 122 |
+
### `POST /api/search`
|
| 123 |
+
|
| 124 |
+
```json
|
| 125 |
+
// Request
|
| 126 |
+
{ "query": "ุฅูู
ุง ุงูุฃุนู
ุงู ๏ฟฝ๏ฟฝุงูููุงุช", "top_k": 5 }
|
| 127 |
+
|
| 128 |
+
// Response
|
| 129 |
+
{
|
| 130 |
+
"results": [
|
| 131 |
+
{
|
| 132 |
+
"rank": 1,
|
| 133 |
+
"title": "ุญุฏูุซ ุงูููุฉ",
|
| 134 |
+
"text": "ุนููู ุนูู
ูุฑู ุจููู ุงููุฎูุทููุงุจู ููุงูู ุณูู
ูุนูุชู ุฑูุณูููู ุงูููููู...",
|
| 135 |
+
"topic": "ุงูููุฉ ูุงูุฅุฎูุงุต",
|
| 136 |
+
"source_url": "https://..."
|
| 137 |
+
}
|
| 138 |
+
]
|
| 139 |
+
}
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
`top_k` accepts 1โ10.
|
| 143 |
+
|
| 144 |
+
---
|
| 145 |
+
|
| 146 |
+
## Local Setup
|
| 147 |
+
|
| 148 |
+
```bash
|
| 149 |
+
pip install -r requirements.txt
|
| 150 |
+
uvicorn app:app --host 0.0.0.0 --port 7860 --reload
|
| 151 |
+
# open http://localhost:7860
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
---
|
| 155 |
+
|
| 156 |
+
## Built by
|
| 157 |
+
|
| 158 |
+
**ูุญูู ุงูููุณุงูู** โ [HuggingFace](https://huggingface.co/NightPrince)
|
| 159 |
+
|
| 160 |
+
---
|
| 161 |
+
|
| 162 |
+
*Part of a series of Islamic knowledge retrieval engines. See also: [Tafsir Search](https://github.com/NightPrinceY/Tafsir_Search) ยท [Quran Semantic Retrieval](https://github.com/NightPrinceY/Quran-Semantic-Retrieval)*
|