title: QModel
emoji: π
colorFrom: green
colorTo: blue
sdk: docker
app_port: 8000
license: mit
tags:
- quran
- hadith
- islamic
- rag
- faiss
- nlp
- arabic
language:
- ar
- en
QModel v6 β Islamic RAG System
Specialized Qur'an & Hadith Knowledge System with Dual LLM Support
A production-ready Retrieval-Augmented Generation system specialized exclusively in authenticated Islamic knowledge. No hallucinations, no outside knowledgeβonly content from verified sources.
Features
π Qur'an Capabilities
- Verse Lookup: Find verses by topic or keyword
- Word Frequency: Count occurrences with Surah breakdown
- Bilingual: Full Arabic + English translation support
- Tafsir Integration: AI-powered contextual interpretation
π Hadith Capabilities
- Authenticity Verification: Check if Hadith is in authenticated collections
- Grade Display: Show Sahih/Hasan/Da'if authenticity levels
- Topic Search: Find relevant Hadiths across 9 major collections
- Collection Navigation: Filter by Bukhari, Muslim, Abu Dawud, etc.
π‘οΈ Safety Features
- Confidence Gating: Low-confidence queries return "not found" instead of guesses
- Source Attribution: Every answer cites exact verse/Hadith reference
- Verbatim Quotes: Text copied directly from data, never paraphrased
- Anti-Hallucination: Hardened prompts with few-shot "not found" examples
π Integration
- OpenAI-Compatible API: Use with Open-WebUI, Langchain, or any OpenAI client
- OpenAI Schema: Full support for
/v1/chat/completionsand/v1/models - Streaming Responses: SSE streaming for long-form answers
βοΈ Technical
- Dual LLM Backend: Ollama (dev) + HuggingFace (prod)
- Hybrid Search: Dense (FAISS) + Sparse (BM25) scoring
- Async API: FastAPI with async/await throughout
- Caching: TTL-based LRU cache for frequent queries
- Scale: 6,236 Quranic verses + 41,390 Hadiths indexed
Quick Start
Prerequisites
- Python 3.10+
- 16 GB RAM minimum (for embeddings + LLM)
- GPU recommended for HuggingFace backend
- Ollama installed (for local development) OR internet access (for HuggingFace)
Installation
# Clone and enter project
git clone https://github.com/Logicsoft/QModel.git && cd QModel
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Configure (choose one backend)
# Option A β Ollama (local development):
export LLM_BACKEND=ollama
export OLLAMA_MODEL=llama2
# Make sure Ollama is running: ollama serve
# Option B β HuggingFace (production):
export LLM_BACKEND=hf
export HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
# Run
python main.py
# Query
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
API docs: http://localhost:8000/docs
Data & Index
Pre-built data files are included:
metadata.jsonβ 47,626 documents (6,236 Quran verses + 41,390 hadiths from 9 canonical collections)QModel.indexβ FAISS search index
To rebuild after dataset changes:
python build_index.py
Example Queries
# Basic question
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
# Word frequency
curl "http://localhost:8000/ask?q=How%20many%20times%20is%20mercy%20mentioned?"
# Authentic Hadiths only
curl "http://localhost:8000/ask?q=prayer&source_type=hadith&grade_filter=sahih"
# Quran text search
curl "http://localhost:8000/quran/search?q=bismillah"
# Quran topic search
curl "http://localhost:8000/quran/topic?topic=patience&top_k=5"
# Quran word frequency
curl "http://localhost:8000/quran/word-frequency?word=mercy"
# Single chapter
curl "http://localhost:8000/quran/chapter/2"
# Exact verse
curl "http://localhost:8000/quran/verse/2:255"
# Hadith text search
curl "http://localhost:8000/hadith/search?q=actions+are+judged+by+intentions"
# Hadith topic search (Sahih only)
curl "http://localhost:8000/hadith/topic?topic=fasting&grade_filter=sahih"
# Verify Hadith authenticity
curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"
# Browse a collection
curl "http://localhost:8000/hadith/collection/bukhari?limit=5"
# Streaming (OpenAI-compatible)
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"QModel","messages":[{"role":"user","content":"What does Islam say about charity?"}],"stream":true}'
Configuration
All configuration via environment variables (.env file or exported directly):
Backend Selection
| Backend | Pros | Cons | When to Use |
|---|---|---|---|
| Ollama | Fast setup, no GPU, free | Smaller models | Development, testing |
| HuggingFace | Larger models, better quality | Requires GPU or significant RAM | Production |
Ollama Backend (Development)
LLM_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama2 # or: mistral, neural-chat, orca-mini
Requires: ollama serve running and model pulled (ollama pull llama2).
HuggingFace Backend (Production)
LLM_BACKEND=hf
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
HF_DEVICE=auto # auto | cuda | cpu
HF_MAX_NEW_TOKENS=2048
All Environment Variables
| Variable | Default | Description |
|---|---|---|
| Backend | ||
LLM_BACKEND |
hf |
ollama or hf |
OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
llama2 |
Ollama model name |
HF_MODEL_NAME |
Qwen/Qwen2-7B-Instruct |
HuggingFace model ID |
HF_DEVICE |
auto |
auto, cuda, or cpu |
HF_MAX_NEW_TOKENS |
2048 |
Max output length |
| Embedding & Data | ||
EMBED_MODEL |
intfloat/multilingual-e5-large |
Embedding model |
FAISS_INDEX |
QModel.index |
Index file path |
METADATA_FILE |
metadata.json |
Dataset file |
| Retrieval | ||
TOP_K_SEARCH |
20 |
Candidate pool (5β100) |
TOP_K_RETURN |
5 |
Results shown to user (1β20) |
RERANK_ALPHA |
0.6 |
Dense vs Sparse weight (0.0β1.0) |
| Generation | ||
TEMPERATURE |
0.2 |
Creativity (0.0β1.0, use 0.1β0.2 for religious) |
MAX_TOKENS |
2048 |
Max response length |
| Safety | ||
CONFIDENCE_THRESHOLD |
0.30 |
Min score to call LLM (higher = fewer hallucinations) |
HADITH_BOOST |
0.08 |
Score boost for hadith on hadith queries |
| Other | ||
CACHE_SIZE |
512 |
Query response cache entries |
CACHE_TTL |
3600 |
Cache expiry in seconds |
ALLOWED_ORIGINS |
* |
CORS origins |
MAX_EXAMPLES |
3 |
Few-shot examples in system prompt |
Configuration Examples
Development (Ollama)
LLM_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama2
TEMPERATURE=0.2
CONFIDENCE_THRESHOLD=0.30
ALLOWED_ORIGINS=*
Production (HuggingFace + GPU)
LLM_BACKEND=hf
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
HF_DEVICE=cuda
TOP_K_SEARCH=30
TEMPERATURE=0.1
CONFIDENCE_THRESHOLD=0.35
ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com
Tuning Tips
- Better results: Increase
TOP_K_SEARCH, lowerCONFIDENCE_THRESHOLD, useTEMPERATURE=0.1 - Faster performance: Lower
TOP_K_SEARCHandTOP_K_RETURN, reduceMAX_TOKENS, use Ollama - More conservative: Increase
CONFIDENCE_THRESHOLD, lowerTEMPERATURE
Docker Deployment
Docker Compose (Recommended)
cp .env.example .env # Configure backend (see Configuration section)
docker-compose up
Docker CLI
docker build -t qmodel .
# With Ollama backend
docker run -p 8000:8000 \
--env-file .env \
--add-host host.docker.internal:host-gateway \
qmodel
# With HuggingFace backend
docker run -p 8000:8000 \
--env-file .env \
--env HF_TOKEN=your_token_here \
qmodel
Docker with Ollama
# .env
LLM_BACKEND=ollama
OLLAMA_HOST=http://host.docker.internal:11434
OLLAMA_MODEL=llama2
Requires Ollama running on the host (ollama serve).
Docker with HuggingFace
# .env
LLM_BACKEND=hf
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
HF_DEVICE=auto
# Pass HF token
export HF_TOKEN=hf_xxxxxxxxxxxxx
docker-compose up
Docker Compose with GPU (Linux)
services:
qmodel:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Production Tips
- Remove dev volume mount (
.:/app) indocker-compose.yml - Set
restart: on-failure:5 - Use specific
ALLOWED_ORIGINSinstead of*
Open-WebUI Integration
QModel is fully OpenAI-compatible and works out of the box with Open-WebUI.
Setup
# Start QModel
python main.py
# Start Open-WebUI
docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:latest
Connect
- Settings β Models β Manage Models
- Click "Connect to OpenAI-compatible API"
- API Base URL:
http://localhost:8000/v1 - Model Name:
QModel - API Key: Leave blank
- Save & Test β β Connected
Docker Compose (QModel + Ollama + Open-WebUI)
version: '3.8'
services:
qmodel:
build: .
ports:
- "8000:8000"
environment:
- LLM_BACKEND=ollama
- OLLAMA_HOST=http://ollama:11434
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
web-ui:
image: ghcr.io/open-webui/open-webui:latest
ports:
- "3000:8080"
depends_on:
- qmodel
Supported Features
| Feature | Status |
|---|---|
| Chat | β Full support |
| Streaming | β
stream: true |
| Multi-turn context | β Handled by Open-WebUI |
| Temperature | β Configurable |
| Token limits | β
max_tokens |
| Model listing | β
/v1/models |
| Source attribution | β
x_metadata.sources |
Architecture
Module Structure
main.py β FastAPI app + router registration
app/
config.py β Config class (env vars)
llm.py β LLM providers (Ollama, HuggingFace)
cache.py β TTL-LRU async cache
arabic_nlp.py β Arabic normalization, stemming, language detection
search.py β Hybrid FAISS+BM25, text search, query rewriting
analysis.py β Intent detection, analytics, counting
prompts.py β Prompt engineering (persona, anti-hallucination)
models.py β Pydantic schemas
state.py β AppState, lifespan, RAG pipeline
routers/
quran.py β 6 Quran endpoints
hadith.py β 5 Hadith endpoints
chat.py β /ask + OpenAI-compatible chat
ops.py β health, models, debug scores
Data Pipeline
- Ingest: 47,626 documents (6,236 Quran verses + 41,390 Hadiths from 9 collections)
- Embed: Encode with
multilingual-e5-large(Arabic + English dual embeddings) - Index: FAISS
IndexFlatIPfor dense retrieval
Retrieval & Ranking
- Dense retrieval (FAISS semantic scoring)
- Sparse retrieval (BM25 term-frequency)
- Fusion: 60% dense + 40% sparse
- Intent-aware boost (+0.08 to Hadith when intent=hadith)
- Type filter (quran_only / hadith_only / authenticated_only)
- Text search fallback (exact phrase + word-overlap)
Anti-Hallucination Measures
- Few-shot examples including "not found" refusal path
- Hardcoded citation format rules
- Verbatim copy rules (no text reconstruction)
- Confidence threshold gating (default: 0.30)
- Post-generation citation verification
- Grade inference from collection name
Performance
| Operation | Time | Backend |
|---|---|---|
| Query (cached) | ~50ms | Both |
| Query (Ollama) | 400β800ms | Ollama |
| Query (HF GPU) | 500β1500ms | CUDA |
| Query (HF CPU) | 2β5s | CPU |
Troubleshooting
"Cannot connect to Ollama"
ollama serve # Ensure Ollama is running on host
# In Docker, use OLLAMA_HOST=http://host.docker.internal:11434
"HuggingFace model not found"
export HF_TOKEN=hf_xxxxxxxxxxxxx # Set token for gated models
"Out of memory"
- Use smaller model:
HF_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2 - Use Ollama with
neural-chat - Reduce
MAX_TOKENSto 1024 - Increase Docker memory limit in
docker-compose.yml
"Assistant returns 'Not found'"
This is expected β QModel rejects low-confidence queries. Try:
- More specific queries
- Lower
CONFIDENCE_THRESHOLDin.env - Check raw scores:
GET /debug/scores?q=your+query
"Port already in use"
docker-compose down && docker system prune
# Or change port: ports: ["8001:8000"]
Roadmap
- Grade-based filtering
- Streaming responses (SSE)
- Modular architecture (4 routers, 16 endpoints)
- Dual LLM backend (Ollama + HuggingFace)
- Text search (exact substring + fuzzy matching)
- Chain of narrators (Isnad display)
- Synonym expansion (mercy β rahma, compassion)
- Batch processing (multiple questions per request)
- Islamic calendar integration (Hijri dates)
- Tafsir endpoint with scholar citations
Data Sources
- Qur'an: risan/quran-json β 114 Surahs, 6,236 verses
- Hadith: AhmedBaset/hadith-json β 9 canonical collections, 41,390 hadiths
Architecture Overview
User Query
β
Query Rewriting & Intent Detection
β
Hybrid Search (FAISS dense + BM25 sparse)
β
Filtering & Ranking
β
Confidence Gate (skip LLM if low-scoring)
β
LLM Generation (Ollama or HuggingFace)
β
Formatted Response with Sources
See ARCHITECTURE.md for detailed system design.
Troubleshooting
| Issue | Solution |
|---|---|
| "Service is initialising" | Wait 60-90s for embeddings model to load |
| Low retrieval scores | Check /debug/scores, try synonyms, lower threshold |
| "Model not found" (HF) | Run huggingface-cli login |
| Out of memory | Use smaller model or CPU backend |
| No results | Verify data files exist: metadata.json and QModel.index |
See SETUP.md and DOCKER.md for more detailed troubleshooting.
What's New in v6
β¨ Dual LLM Backend β Ollama (dev) + HuggingFace (prod)
β¨ Grade Filtering β Return only Sahih/Hasan authenticated Hadiths
β¨ Source Filtering β Quran-only or Hadith-only queries
β¨ Hadith Verification β /hadith/verify endpoint
β¨ Enhanced Frequency β Word counts by Surah
β¨ OpenAI Compatible β Use with any OpenAI client
β¨ Production Ready β Structured logging, error handling, async throughout
Next Steps
- Get Started: See SETUP.md
- Integrate with Open-WebUI: See OPEN_WEBUI.md
- Deploy with Docker: See DOCKER.md
- Understand Architecture: See ARCHITECTURE.md
License
This project uses open-source data from:
- Qur'an JSON β Open source
- Hadith API β Open source
See individual repositories for license details.
Made with β€οΈ for Islamic scholarship.
Version 4.0.0 | March 2025 | Production-Ready