Hadith_Search / README.md
NightPrince's picture
Add impressive README with architecture, API docs, and cross-project links
e36606a
metadata
title: Hadith Search
emoji: ๐Ÿ“œ
colorFrom: indigo
colorTo: green
sdk: docker
pinned: false
license: mit

๐Ÿ“œ Hadith Search

Semantic search across thousands of Prophetic traditions โ€” find the Hadith closest to your question by meaning, not just keywords.

HuggingFace Space License: MIT Python 3.10 FastAPI


What Is This?

A hybrid AI-powered search engine over a comprehensive corpus of Islamic Hadith (prophetic traditions). It combines neural semantic embeddings with classical BM25 and anchor-based retrieval to surface the most relevant traditions โ€” even when your query uses different wording than the Hadith itself.

Each result includes the full Hadith text, its chain of narration (Isnad), topic classification, and a direct source link.


Demo

๐Ÿ”— Live on HuggingFace Spaces โ†’


How It Works

User Query (Arabic)
      โ”‚
      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Arabic Preprocessing              โ”‚
โ”‚  Remove tashkeel ยท Normalize letters        โ”‚
โ”‚  Unicode variant unification                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  โ”‚
                  โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Hybrid Search (3 signals)           โ”‚
โ”‚                                             โ”‚
โ”‚  โ‘  Anchor     40%  โ€” hadith entity match    โ”‚
โ”‚  โ‘ก Semantic   35%  โ€” neural meaning match   โ”‚
โ”‚  โ‘ข BM25       25%  โ€” keyword precision      โ”‚
โ”‚                                             โ”‚
โ”‚  Model: paraphrase-multilingual-MiniLM-L12  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  โ”‚
                  โ–ผ
         Top-K ranked Hadiths
   (text ยท isnad ยท topic ยท source URL)

The anchor signal is weighted higher here (40%) because Hadith have strong named-entity anchors (narrators, topics, keywords) that are highly discriminative โ€” making entity-aware matching the dominant signal.


Features

  • Anchor-weighted hybrid โ€” prioritizes entity matching (40%) over pure semantics
  • Full Hadith metadata โ€” text, Isnad chain, topic classification, source URL
  • Arabic-native โ€” built for Arabic queries with proper diacritic handling
  • RTL Arabic UI โ€” responsive glassmorphism design
  • Fast cold start โ€” model baked into Docker image at build time
  • Cached embeddings โ€” TTL-based in-memory cache for repeated queries

Tech Stack

Layer Technology
Backend FastAPI + Uvicorn
Embeddings sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Vector Search FAISS (CPU)
Keyword Search BM25 (rank-bm25)
Frontend Vanilla HTML/CSS/JS โ€” RTL Arabic
Deployment Docker on HuggingFace Spaces

Project Structure

โ”œโ”€โ”€ app.py              # FastAPI entrypoint, /api/search endpoint
โ”œโ”€โ”€ hadith_mcp.py       # Search orchestrator, RAG initialization
โ”œโ”€โ”€ retrieval.py        # Hybrid search: BM25 + semantic + anchor
โ”œโ”€โ”€ hf_model.py         # Thread-safe SentenceTransformer + TTL cache
โ”œโ”€โ”€ utils.py            # Arabic text utilities (tashkeel, normalization)
โ”œโ”€โ”€ index.html          # Frontend UI
โ”œโ”€โ”€ assets/
โ”‚   โ”œโ”€โ”€ script.js       # Fetch + render result cards
โ”‚   โ””โ”€โ”€ style.css       # Glassmorphism RTL design
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ hadith.csv               # Hadith corpus (text, isnad, title, topic, url)
โ”‚   โ”œโ”€โ”€ hadith_embeddings.npy    # Pre-computed embeddings
โ”‚   โ”œโ”€โ”€ bm25.pkl                 # BM25 index
โ”‚   โ”œโ”€โ”€ faiss_anchor.index       # FAISS anchor index
โ”‚   โ”œโ”€โ”€ anchor_dict.pkl          # anchor โ†’ hadith row indices
โ”‚   โ””โ”€โ”€ unique_anchor_texts.pkl  # Ordered anchor list
โ””โ”€โ”€ Dockerfile

API

POST /api/search

// Request
{ "query": "ุฅู†ู…ุง ุงู„ุฃุนู…ุงู„ ุจุงู„ู†ูŠุงุช", "top_k": 5 }

// Response
{
  "results": [
    {
      "rank": 1,
      "title": "ุญุฏูŠุซ ุงู„ู†ูŠุฉ",
      "text": "ุนูŽู†ู’ ุนูู…ูŽุฑูŽ ุจู’ู†ู ุงู„ู’ุฎูŽุทูŽู‘ุงุจู ู‚ูŽุงู„ูŽ ุณูŽู…ูุนู’ุชู ุฑูŽุณููˆู„ูŽ ุงู„ู„ูŽู‘ู‡ู...",
      "topic": "ุงู„ู†ูŠุฉ ูˆุงู„ุฅุฎู„ุงุต",
      "source_url": "https://..."
    }
  ]
}

top_k accepts 1โ€“10.


Local Setup

pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860 --reload
# open http://localhost:7860

Built by

ูŠุญูŠู‰ ุงู„ู†ูˆุณุงู†ูŠ โ€” HuggingFace


Part of a series of Islamic knowledge retrieval engines. See also: Tafsir Search ยท Quran Semantic Retrieval