NightPrince Claude Sonnet 4.6 commited on
Commit
e36606a
ยท
1 Parent(s): 808922d

Add impressive README with architecture, API docs, and cross-project links

Browse files
Files changed (1) hide show
  1. README.md +154 -3
README.md CHANGED
@@ -1,11 +1,162 @@
1
  ---
2
  title: Hadith Search
3
- emoji: ๐Ÿข
4
  colorFrom: indigo
5
- colorTo: gray
6
  sdk: docker
7
  pinned: false
8
  license: mit
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Hadith Search
3
+ emoji: ๐Ÿ“œ
4
  colorFrom: indigo
5
+ colorTo: green
6
  sdk: docker
7
  pinned: false
8
  license: mit
9
  ---
10
 
11
+ <div align="center">
12
+
13
+ # ๐Ÿ“œ Hadith Search
14
+
15
+ **Semantic search across thousands of Prophetic traditions โ€” find the Hadith closest to your question by meaning, not just keywords.**
16
+
17
+ [![HuggingFace Space](https://img.shields.io/badge/๐Ÿค—%20HuggingFace-Live%20Demo-yellow?style=for-the-badge)](https://huggingface.co/spaces/NightPrince/Hadith_Search)
18
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green?style=for-the-badge)](LICENSE)
19
+ [![Python 3.10](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=python)](https://python.org)
20
+ [![FastAPI](https://img.shields.io/badge/FastAPI-teal?style=for-the-badge&logo=fastapi)](https://fastapi.tiangolo.com)
21
+
22
+ </div>
23
+
24
+ ---
25
+
26
+ ## What Is This?
27
+
28
+ A hybrid AI-powered search engine over a comprehensive corpus of Islamic Hadith (prophetic traditions). It combines neural semantic embeddings with classical BM25 and anchor-based retrieval to surface the most relevant traditions โ€” even when your query uses different wording than the Hadith itself.
29
+
30
+ Each result includes the full Hadith text, its chain of narration (Isnad), topic classification, and a direct source link.
31
+
32
+ ---
33
+
34
+ ## Demo
35
+
36
+ ๐Ÿ”— **[Live on HuggingFace Spaces โ†’](https://huggingface.co/spaces/NightPrince/Hadith_Search)**
37
+
38
+ ---
39
+
40
+ ## How It Works
41
+
42
+ ```
43
+ User Query (Arabic)
44
+ โ”‚
45
+ โ–ผ
46
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
47
+ โ”‚ Arabic Preprocessing โ”‚
48
+ โ”‚ Remove tashkeel ยท Normalize letters โ”‚
49
+ โ”‚ Unicode variant unification โ”‚
50
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
51
+ โ”‚
52
+ โ–ผ
53
+ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
54
+ โ”‚ Hybrid Search (3 signals) โ”‚
55
+ โ”‚ โ”‚
56
+ โ”‚ โ‘  Anchor 40% โ€” hadith entity match โ”‚
57
+ โ”‚ โ‘ก Semantic 35% โ€” neural meaning match โ”‚
58
+ โ”‚ โ‘ข BM25 25% โ€” keyword precision โ”‚
59
+ โ”‚ โ”‚
60
+ โ”‚ Model: paraphrase-multilingual-MiniLM-L12 โ”‚
61
+ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
62
+ โ”‚
63
+ โ–ผ
64
+ Top-K ranked Hadiths
65
+ (text ยท isnad ยท topic ยท source URL)
66
+ ```
67
+
68
+ The **anchor signal** is weighted higher here (40%) because Hadith have strong named-entity anchors (narrators, topics, keywords) that are highly discriminative โ€” making entity-aware matching the dominant signal.
69
+
70
+ ---
71
+
72
+ ## Features
73
+
74
+ - **Anchor-weighted hybrid** โ€” prioritizes entity matching (40%) over pure semantics
75
+ - **Full Hadith metadata** โ€” text, Isnad chain, topic classification, source URL
76
+ - **Arabic-native** โ€” built for Arabic queries with proper diacritic handling
77
+ - **RTL Arabic UI** โ€” responsive glassmorphism design
78
+ - **Fast cold start** โ€” model baked into Docker image at build time
79
+ - **Cached embeddings** โ€” TTL-based in-memory cache for repeated queries
80
+
81
+ ---
82
+
83
+ ## Tech Stack
84
+
85
+ | Layer | Technology |
86
+ |---|---|
87
+ | Backend | FastAPI + Uvicorn |
88
+ | Embeddings | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` |
89
+ | Vector Search | FAISS (CPU) |
90
+ | Keyword Search | BM25 (`rank-bm25`) |
91
+ | Frontend | Vanilla HTML/CSS/JS โ€” RTL Arabic |
92
+ | Deployment | Docker on HuggingFace Spaces |
93
+
94
+ ---
95
+
96
+ ## Project Structure
97
+
98
+ ```
99
+ โ”œโ”€โ”€ app.py # FastAPI entrypoint, /api/search endpoint
100
+ โ”œโ”€โ”€ hadith_mcp.py # Search orchestrator, RAG initialization
101
+ โ”œโ”€โ”€ retrieval.py # Hybrid search: BM25 + semantic + anchor
102
+ โ”œโ”€โ”€ hf_model.py # Thread-safe SentenceTransformer + TTL cache
103
+ โ”œโ”€โ”€ utils.py # Arabic text utilities (tashkeel, normalization)
104
+ โ”œโ”€โ”€ index.html # Frontend UI
105
+ โ”œโ”€โ”€ assets/
106
+ โ”‚ โ”œโ”€โ”€ script.js # Fetch + render result cards
107
+ โ”‚ โ””โ”€โ”€ style.css # Glassmorphism RTL design
108
+ โ”œโ”€โ”€ data/
109
+ โ”‚ โ”œโ”€โ”€ hadith.csv # Hadith corpus (text, isnad, title, topic, url)
110
+ โ”‚ โ”œโ”€โ”€ hadith_embeddings.npy # Pre-computed embeddings
111
+ โ”‚ โ”œโ”€โ”€ bm25.pkl # BM25 index
112
+ โ”‚ โ”œโ”€โ”€ faiss_anchor.index # FAISS anchor index
113
+ โ”‚ โ”œโ”€โ”€ anchor_dict.pkl # anchor โ†’ hadith row indices
114
+ โ”‚ โ””โ”€โ”€ unique_anchor_texts.pkl # Ordered anchor list
115
+ โ””โ”€โ”€ Dockerfile
116
+ ```
117
+
118
+ ---
119
+
120
+ ## API
121
+
122
+ ### `POST /api/search`
123
+
124
+ ```json
125
+ // Request
126
+ { "query": "ุฅู†ู…ุง ุงู„ุฃุนู…ุงู„ ๏ฟฝ๏ฟฝุงู„ู†ูŠุงุช", "top_k": 5 }
127
+
128
+ // Response
129
+ {
130
+ "results": [
131
+ {
132
+ "rank": 1,
133
+ "title": "ุญุฏูŠุซ ุงู„ู†ูŠุฉ",
134
+ "text": "ุนูŽู†ู’ ุนูู…ูŽุฑูŽ ุจู’ู†ู ุงู„ู’ุฎูŽุทูŽู‘ุงุจู ู‚ูŽุงู„ูŽ ุณูŽู…ูุนู’ุชู ุฑูŽุณููˆู„ูŽ ุงู„ู„ูŽู‘ู‡ู...",
135
+ "topic": "ุงู„ู†ูŠุฉ ูˆุงู„ุฅุฎู„ุงุต",
136
+ "source_url": "https://..."
137
+ }
138
+ ]
139
+ }
140
+ ```
141
+
142
+ `top_k` accepts 1โ€“10.
143
+
144
+ ---
145
+
146
+ ## Local Setup
147
+
148
+ ```bash
149
+ pip install -r requirements.txt
150
+ uvicorn app:app --host 0.0.0.0 --port 7860 --reload
151
+ # open http://localhost:7860
152
+ ```
153
+
154
+ ---
155
+
156
+ ## Built by
157
+
158
+ **ูŠุญูŠู‰ ุงู„ู†ูˆุณุงู†ูŠ** โ€” [HuggingFace](https://huggingface.co/NightPrince)
159
+
160
+ ---
161
+
162
+ *Part of a series of Islamic knowledge retrieval engines. See also: [Tafsir Search](https://github.com/NightPrinceY/Tafsir_Search) ยท [Quran Semantic Retrieval](https://github.com/NightPrinceY/Quran-Semantic-Retrieval)*