| --- |
| title: QModel |
| emoji: π |
| colorFrom: green |
| colorTo: blue |
| sdk: docker |
| app_port: 8000 |
| license: mit |
| tags: |
| - quran |
| - hadith |
| - islamic |
| - rag |
| - faiss |
| - nlp |
| - arabic |
| language: |
| - ar |
| - en |
| --- |
| |
| # QModel v6 β Islamic RAG System |
| **Specialized Qur'an & Hadith Knowledge System with Dual LLM Support** |
|
|
| > A production-ready Retrieval-Augmented Generation system specialized exclusively in authenticated Islamic knowledge. No hallucinations, no outside knowledgeβonly content from verified sources. |
|
|
|  |
|  |
|  |
|
|
| --- |
|
|
| ## Features |
|
|
| ### π Qur'an Capabilities |
| - **Verse Lookup**: Find verses by topic or keyword |
| - **Word Frequency**: Count occurrences with Surah breakdown |
| - **Bilingual**: Full Arabic + English translation support |
| - **Tafsir Integration**: AI-powered contextual interpretation |
|
|
| ### π Hadith Capabilities |
| - **Authenticity Verification**: Check if Hadith is in authenticated collections |
| - **Grade Display**: Show Sahih/Hasan/Da'if authenticity levels |
| - **Topic Search**: Find relevant Hadiths across 9 major collections |
| - **Collection Navigation**: Filter by Bukhari, Muslim, Abu Dawud, etc. |
|
|
| ### π‘οΈ Safety Features |
| - **Confidence Gating**: Low-confidence queries return "not found" instead of guesses |
| - **Source Attribution**: Every answer cites exact verse/Hadith reference |
| - **Verbatim Quotes**: Text copied directly from data, never paraphrased |
| - **Anti-Hallucination**: Hardened prompts with few-shot "not found" examples |
|
|
| ### π Integration |
| - **OpenAI-Compatible API**: Use with Open-WebUI, Langchain, or any OpenAI client |
| - **OpenAI Schema**: Full support for `/v1/chat/completions` and `/v1/models` |
| - **Streaming Responses**: SSE streaming for long-form answers |
|
|
| ### βοΈ Technical |
| - **Dual LLM Backend**: Ollama (dev) + HuggingFace (prod) |
| - **Hybrid Search**: Dense (FAISS) + Sparse (BM25) scoring |
| - **Async API**: FastAPI with async/await throughout |
| - **Caching**: TTL-based LRU cache for frequent queries |
| - **Scale**: 6,236 Quranic verses + 41,390 Hadiths indexed |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| ### Prerequisites |
| - Python 3.10+ |
| - 16 GB RAM minimum (for embeddings + LLM) |
| - GPU recommended for HuggingFace backend |
| - Ollama installed (for local development) OR internet access (for HuggingFace) |
|
|
| ### Installation |
|
|
| ```bash |
| # Clone and enter project |
| git clone https://github.com/Logicsoft/QModel.git && cd QModel |
| python3 -m venv .venv && source .venv/bin/activate |
| pip install -r requirements.txt |
| |
| # Configure (choose one backend) |
| # Option A β Ollama (local development): |
| export LLM_BACKEND=ollama |
| export OLLAMA_MODEL=llama2 |
| # Make sure Ollama is running: ollama serve |
| |
| # Option B β HuggingFace (production): |
| export LLM_BACKEND=hf |
| export HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct |
| |
| # Run |
| python main.py |
| |
| # Query |
| curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?" |
| ``` |
|
|
| API docs: http://localhost:8000/docs |
|
|
| ### Data & Index |
|
|
| Pre-built data files are included: |
| - `metadata.json` β 47,626 documents (6,236 Quran verses + 41,390 hadiths from 9 canonical collections) |
| - `QModel.index` β FAISS search index |
|
|
| To rebuild after dataset changes: |
| ```bash |
| python build_index.py |
| ``` |
|
|
| --- |
|
|
| ## Example Queries |
|
|
| ```bash |
| # Basic question |
| curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?" |
| |
| # Word frequency |
| curl "http://localhost:8000/ask?q=How%20many%20times%20is%20mercy%20mentioned?" |
| |
| # Authentic Hadiths only |
| curl "http://localhost:8000/ask?q=prayer&source_type=hadith&grade_filter=sahih" |
| |
| # Quran text search |
| curl "http://localhost:8000/quran/search?q=bismillah" |
| |
| # Quran topic search |
| curl "http://localhost:8000/quran/topic?topic=patience&top_k=5" |
| |
| # Quran word frequency |
| curl "http://localhost:8000/quran/word-frequency?word=mercy" |
| |
| # Single chapter |
| curl "http://localhost:8000/quran/chapter/2" |
| |
| # Exact verse |
| curl "http://localhost:8000/quran/verse/2:255" |
| |
| # Hadith text search |
| curl "http://localhost:8000/hadith/search?q=actions+are+judged+by+intentions" |
| |
| # Hadith topic search (Sahih only) |
| curl "http://localhost:8000/hadith/topic?topic=fasting&grade_filter=sahih" |
| |
| # Verify Hadith authenticity |
| curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions" |
| |
| # Browse a collection |
| curl "http://localhost:8000/hadith/collection/bukhari?limit=5" |
| |
| # Streaming (OpenAI-compatible) |
| curl -X POST http://localhost:8000/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{"model":"QModel","messages":[{"role":"user","content":"What does Islam say about charity?"}],"stream":true}' |
| ``` |
|
|
| --- |
|
|
| ## Configuration |
|
|
| All configuration via environment variables (`.env` file or exported directly): |
|
|
| ### Backend Selection |
|
|
| | Backend | Pros | Cons | When to Use | |
| |---------|------|------|------------| |
| | **Ollama** | Fast setup, no GPU, free | Smaller models | Development, testing | |
| | **HuggingFace** | Larger models, better quality | Requires GPU or significant RAM | Production | |
|
|
| ### Ollama Backend (Development) |
|
|
| ```bash |
| LLM_BACKEND=ollama |
| OLLAMA_HOST=http://localhost:11434 |
| OLLAMA_MODEL=llama2 # or: mistral, neural-chat, orca-mini |
| ``` |
|
|
| Requires: `ollama serve` running and model pulled (`ollama pull llama2`). |
|
|
| ### HuggingFace Backend (Production) |
|
|
| ```bash |
| LLM_BACKEND=hf |
| HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct |
| HF_DEVICE=auto # auto | cuda | cpu |
| HF_MAX_NEW_TOKENS=2048 |
| ``` |
|
|
| ### All Environment Variables |
|
|
| | Variable | Default | Description | |
| |----------|---------|-------------| |
| | **Backend** | | | |
| | `LLM_BACKEND` | `hf` | `ollama` or `hf` | |
| | `OLLAMA_HOST` | `http://localhost:11434` | Ollama server URL | |
| | `OLLAMA_MODEL` | `llama2` | Ollama model name | |
| | `HF_MODEL_NAME` | `Qwen/Qwen2-7B-Instruct` | HuggingFace model ID | |
| | `HF_DEVICE` | `auto` | `auto`, `cuda`, or `cpu` | |
| | `HF_MAX_NEW_TOKENS` | `2048` | Max output length | |
| | **Embedding & Data** | | | |
| | `EMBED_MODEL` | `intfloat/multilingual-e5-large` | Embedding model | |
| | `FAISS_INDEX` | `QModel.index` | Index file path | |
| | `METADATA_FILE` | `metadata.json` | Dataset file | |
| | **Retrieval** | | | |
| | `TOP_K_SEARCH` | `20` | Candidate pool (5β100) | |
| | `TOP_K_RETURN` | `5` | Results shown to user (1β20) | |
| | `RERANK_ALPHA` | `0.6` | Dense vs Sparse weight (0.0β1.0) | |
| | **Generation** | | | |
| | `TEMPERATURE` | `0.2` | Creativity (0.0β1.0, use 0.1β0.2 for religious) | |
| | `MAX_TOKENS` | `2048` | Max response length | |
| | **Safety** | | | |
| | `CONFIDENCE_THRESHOLD` | `0.30` | Min score to call LLM (higher = fewer hallucinations) | |
| | `HADITH_BOOST` | `0.08` | Score boost for hadith on hadith queries | |
| | **Other** | | | |
| | `CACHE_SIZE` | `512` | Query response cache entries | |
| | `CACHE_TTL` | `3600` | Cache expiry in seconds | |
| | `ALLOWED_ORIGINS` | `*` | CORS origins | |
| | `MAX_EXAMPLES` | `3` | Few-shot examples in system prompt | |
|
|
| ### Configuration Examples |
|
|
| **Development (Ollama)** |
| ```bash |
| LLM_BACKEND=ollama |
| OLLAMA_HOST=http://localhost:11434 |
| OLLAMA_MODEL=llama2 |
| TEMPERATURE=0.2 |
| CONFIDENCE_THRESHOLD=0.30 |
| ALLOWED_ORIGINS=* |
| ``` |
|
|
| **Production (HuggingFace + GPU)** |
| ```bash |
| LLM_BACKEND=hf |
| HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct |
| HF_DEVICE=cuda |
| TOP_K_SEARCH=30 |
| TEMPERATURE=0.1 |
| CONFIDENCE_THRESHOLD=0.35 |
| ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com |
| ``` |
|
|
| ### Tuning Tips |
|
|
| - **Better results**: Increase `TOP_K_SEARCH`, lower `CONFIDENCE_THRESHOLD`, use `TEMPERATURE=0.1` |
| - **Faster performance**: Lower `TOP_K_SEARCH` and `TOP_K_RETURN`, reduce `MAX_TOKENS`, use Ollama |
| - **More conservative**: Increase `CONFIDENCE_THRESHOLD`, lower `TEMPERATURE` |
|
|
| --- |
|
|
| ## Docker Deployment |
|
|
| ### Docker Compose (Recommended) |
|
|
| ```bash |
| cp .env.example .env # Configure backend (see Configuration section) |
| docker-compose up |
| ``` |
|
|
| ### Docker CLI |
|
|
| ```bash |
| docker build -t qmodel . |
| |
| # With Ollama backend |
| docker run -p 8000:8000 \ |
| --env-file .env \ |
| --add-host host.docker.internal:host-gateway \ |
| qmodel |
| |
| # With HuggingFace backend |
| docker run -p 8000:8000 \ |
| --env-file .env \ |
| --env HF_TOKEN=your_token_here \ |
| qmodel |
| ``` |
|
|
| ### Docker with Ollama |
|
|
| ```bash |
| # .env |
| LLM_BACKEND=ollama |
| OLLAMA_HOST=http://host.docker.internal:11434 |
| OLLAMA_MODEL=llama2 |
| ``` |
|
|
| Requires Ollama running on the host (`ollama serve`). |
|
|
| ### Docker with HuggingFace |
|
|
| ```bash |
| # .env |
| LLM_BACKEND=hf |
| HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct |
| HF_DEVICE=auto |
| |
| # Pass HF token |
| export HF_TOKEN=hf_xxxxxxxxxxxxx |
| docker-compose up |
| ``` |
|
|
| ### Docker Compose with GPU (Linux) |
|
|
| ```yaml |
| services: |
| qmodel: |
| deploy: |
| resources: |
| reservations: |
| devices: |
| - driver: nvidia |
| count: 1 |
| capabilities: [gpu] |
| ``` |
|
|
| ### Production Tips |
|
|
| - Remove dev volume mount (`.:/app`) in `docker-compose.yml` |
| - Set `restart: on-failure:5` |
| - Use specific `ALLOWED_ORIGINS` instead of `*` |
|
|
| --- |
|
|
| ## Open-WebUI Integration |
|
|
| QModel is fully OpenAI-compatible and works out of the box with Open-WebUI. |
|
|
| ### Setup |
|
|
| ```bash |
| # Start QModel |
| python main.py |
| |
| # Start Open-WebUI |
| docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:latest |
| ``` |
|
|
| ### Connect |
|
|
| 1. **Settings** β **Models** β **Manage Models** |
| 2. Click **"Connect to OpenAI-compatible API"** |
| 3. **API Base URL**: `http://localhost:8000/v1` |
| 4. **Model Name**: `QModel` |
| 5. **API Key**: Leave blank |
| 6. **Save & Test** β β
Connected |
|
|
| ### Docker Compose (QModel + Ollama + Open-WebUI) |
|
|
| ```yaml |
| version: '3.8' |
| services: |
| qmodel: |
| build: . |
| ports: |
| - "8000:8000" |
| environment: |
| - LLM_BACKEND=ollama |
| - OLLAMA_HOST=http://ollama:11434 |
| |
| ollama: |
| image: ollama/ollama:latest |
| ports: |
| - "11434:11434" |
| |
| web-ui: |
| image: ghcr.io/open-webui/open-webui:latest |
| ports: |
| - "3000:8080" |
| depends_on: |
| - qmodel |
| ``` |
|
|
| ### Supported Features |
|
|
| | Feature | Status | |
| |---------|--------| |
| | Chat | β
Full support | |
| | Streaming | β
`stream: true` | |
| | Multi-turn context | β
Handled by Open-WebUI | |
| | Temperature | β
Configurable | |
| | Token limits | β
`max_tokens` | |
| | Model listing | β
`/v1/models` | |
| | Source attribution | β
`x_metadata.sources` | |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ### Module Structure |
|
|
| ``` |
| main.py β FastAPI app + router registration |
| app/ |
| config.py β Config class (env vars) |
| llm.py β LLM providers (Ollama, HuggingFace) |
| cache.py β TTL-LRU async cache |
| arabic_nlp.py β Arabic normalization, stemming, language detection |
| search.py β Hybrid FAISS+BM25, text search, query rewriting |
| analysis.py β Intent detection, analytics, counting |
| prompts.py β Prompt engineering (persona, anti-hallucination) |
| models.py β Pydantic schemas |
| state.py β AppState, lifespan, RAG pipeline |
| routers/ |
| quran.py β 6 Quran endpoints |
| hadith.py β 5 Hadith endpoints |
| chat.py β /ask + OpenAI-compatible chat |
| ops.py β health, models, debug scores |
| ``` |
|
|
| ### Data Pipeline |
|
|
| 1. **Ingest**: 47,626 documents (6,236 Quran verses + 41,390 Hadiths from 9 collections) |
| 2. **Embed**: Encode with `multilingual-e5-large` (Arabic + English dual embeddings) |
| 3. **Index**: FAISS `IndexFlatIP` for dense retrieval |
|
|
| ### Retrieval & Ranking |
|
|
| 1. Dense retrieval (FAISS semantic scoring) |
| 2. Sparse retrieval (BM25 term-frequency) |
| 3. Fusion: 60% dense + 40% sparse |
| 4. Intent-aware boost (+0.08 to Hadith when intent=hadith) |
| 5. Type filter (quran_only / hadith_only / authenticated_only) |
| 6. Text search fallback (exact phrase + word-overlap) |
| |
| ### Anti-Hallucination Measures |
| |
| - Few-shot examples including "not found" refusal path |
| - Hardcoded citation format rules |
| - Verbatim copy rules (no text reconstruction) |
| - Confidence threshold gating (default: 0.30) |
| - Post-generation citation verification |
| - Grade inference from collection name |
| |
| ### Performance |
| |
| | Operation | Time | Backend | |
| |-----------|------|---------| |
| | Query (cached) | ~50ms | Both | |
| | Query (Ollama) | 400β800ms | Ollama | |
| | Query (HF GPU) | 500β1500ms | CUDA | |
| | Query (HF CPU) | 2β5s | CPU | |
| |
| --- |
| |
| ## Troubleshooting |
| |
| ### "Cannot connect to Ollama" |
| ```bash |
| ollama serve # Ensure Ollama is running on host |
| # In Docker, use OLLAMA_HOST=http://host.docker.internal:11434 |
| ``` |
| |
| ### "HuggingFace model not found" |
| ```bash |
| export HF_TOKEN=hf_xxxxxxxxxxxxx # Set token for gated models |
| ``` |
| |
| ### "Out of memory" |
| - Use smaller model: `HF_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2` |
| - Use Ollama with `neural-chat` |
| - Reduce `MAX_TOKENS` to 1024 |
| - Increase Docker memory limit in `docker-compose.yml` |
| |
| ### "Assistant returns 'Not found'" |
| This is expected β QModel rejects low-confidence queries. Try: |
| - More specific queries |
| - Lower `CONFIDENCE_THRESHOLD` in `.env` |
| - Check raw scores: `GET /debug/scores?q=your+query` |
| |
| ### "Port already in use" |
| ```bash |
| docker-compose down && docker system prune |
| # Or change port: ports: ["8001:8000"] |
| ``` |
| |
| --- |
| |
| ## Roadmap |
| |
| - [x] Grade-based filtering |
| - [x] Streaming responses (SSE) |
| - [x] Modular architecture (4 routers, 16 endpoints) |
| - [x] Dual LLM backend (Ollama + HuggingFace) |
| - [x] Text search (exact substring + fuzzy matching) |
| - [ ] Chain of narrators (Isnad display) |
| - [ ] Synonym expansion (mercy β rahma, compassion) |
| - [ ] Batch processing (multiple questions per request) |
| - [ ] Islamic calendar integration (Hijri dates) |
| - [ ] Tafsir endpoint with scholar citations |
| |
| --- |
| |
| ## Data Sources |
| |
| - **Qur'an**: [risan/quran-json](https://github.com/risan/quran-json) β 114 Surahs, 6,236 verses |
| - **Hadith**: [AhmedBaset/hadith-json](https://github.com/AhmedBaset/hadith-json) β 9 canonical collections, 41,390 hadiths |
| |
| --- |
| |
| ## Architecture Overview |
| |
| ``` |
| User Query |
| β |
| Query Rewriting & Intent Detection |
| β |
| Hybrid Search (FAISS dense + BM25 sparse) |
| β |
| Filtering & Ranking |
| β |
| Confidence Gate (skip LLM if low-scoring) |
| β |
| LLM Generation (Ollama or HuggingFace) |
| β |
| Formatted Response with Sources |
| ``` |
| |
| See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed system design. |
|
|
| --- |
|
|
| ## Troubleshooting |
|
|
| | Issue | Solution | |
| |-------|----------| |
| | "Service is initialising" | Wait 60-90s for embeddings model to load | |
| | Low retrieval scores | Check `/debug/scores`, try synonyms, lower threshold | |
| | "Model not found" (HF) | Run `huggingface-cli login` | |
| | Out of memory | Use smaller model or CPU backend | |
| | No results | Verify data files exist: `metadata.json` and `QModel.index` | |
|
|
| See [SETUP.md](SETUP.md) and [DOCKER.md](DOCKER.md) for more detailed troubleshooting. |
|
|
| --- |
|
|
| ## What's New in v6 |
|
|
| β¨ **Dual LLM Backend** β Ollama (dev) + HuggingFace (prod) |
| β¨ **Grade Filtering** β Return only Sahih/Hasan authenticated Hadiths |
| β¨ **Source Filtering** β Quran-only or Hadith-only queries |
| β¨ **Hadith Verification** β `/hadith/verify` endpoint |
| β¨ **Enhanced Frequency** β Word counts by Surah |
| β¨ **OpenAI Compatible** β Use with any OpenAI client |
| β¨ **Production Ready** β Structured logging, error handling, async throughout |
|
|
| --- |
|
|
| ## Next Steps |
|
|
| 1. **Get Started**: See [SETUP.md](SETUP.md) |
| 2. **Integrate with Open-WebUI**: See [OPEN_WEBUI.md](OPEN_WEBUI.md) |
| 3. **Deploy with Docker**: See [DOCKER.md](DOCKER.md) |
| 4. **Understand Architecture**: See [ARCHITECTURE.md](ARCHITECTURE.md) |
|
|
| --- |
|
|
| ## License |
|
|
| This project uses open-source data from: |
| - [Qur'an JSON](https://github.com/risan/quran-json) β Open source |
| - [Hadith API](https://github.com/AhmedBaset/hadith-json) β Open source |
|
|
| See individual repositories for license details. |
|
|
| --- |
|
|
| **Made with β€οΈ for Islamic scholarship.** |
|
|
| Version 4.0.0 | March 2025 | Production-Ready |
|
|
|
|