QModel / README.md
aelgendy's picture
Upload folder using huggingface_hub
605bb90
---
title: QModel
emoji: πŸ•Œ
colorFrom: green
colorTo: blue
sdk: docker
app_port: 8000
license: mit
tags:
- quran
- hadith
- islamic
- rag
- faiss
- nlp
- arabic
language:
- ar
- en
---
# QModel v6 β€” Islamic RAG System
**Specialized Qur'an & Hadith Knowledge System with Dual LLM Support**
> A production-ready Retrieval-Augmented Generation system specialized exclusively in authenticated Islamic knowledge. No hallucinations, no outside knowledgeβ€”only content from verified sources.
![Version](https://img.shields.io/badge/version-6.0.0-blue)
![Backend](https://img.shields.io/badge/backend-ollama%20%7C%20huggingface-green)
![Status](https://img.shields.io/badge/status-production--ready-success)
---
## Features
### πŸ“– Qur'an Capabilities
- **Verse Lookup**: Find verses by topic or keyword
- **Word Frequency**: Count occurrences with Surah breakdown
- **Bilingual**: Full Arabic + English translation support
- **Tafsir Integration**: AI-powered contextual interpretation
### πŸ“š Hadith Capabilities
- **Authenticity Verification**: Check if Hadith is in authenticated collections
- **Grade Display**: Show Sahih/Hasan/Da'if authenticity levels
- **Topic Search**: Find relevant Hadiths across 9 major collections
- **Collection Navigation**: Filter by Bukhari, Muslim, Abu Dawud, etc.
### πŸ›‘οΈ Safety Features
- **Confidence Gating**: Low-confidence queries return "not found" instead of guesses
- **Source Attribution**: Every answer cites exact verse/Hadith reference
- **Verbatim Quotes**: Text copied directly from data, never paraphrased
- **Anti-Hallucination**: Hardened prompts with few-shot "not found" examples
### πŸš€ Integration
- **OpenAI-Compatible API**: Use with Open-WebUI, Langchain, or any OpenAI client
- **OpenAI Schema**: Full support for `/v1/chat/completions` and `/v1/models`
- **Streaming Responses**: SSE streaming for long-form answers
### βš™οΈ Technical
- **Dual LLM Backend**: Ollama (dev) + HuggingFace (prod)
- **Hybrid Search**: Dense (FAISS) + Sparse (BM25) scoring
- **Async API**: FastAPI with async/await throughout
- **Caching**: TTL-based LRU cache for frequent queries
- **Scale**: 6,236 Quranic verses + 41,390 Hadiths indexed
---
## Quick Start
### Prerequisites
- Python 3.10+
- 16 GB RAM minimum (for embeddings + LLM)
- GPU recommended for HuggingFace backend
- Ollama installed (for local development) OR internet access (for HuggingFace)
### Installation
```bash
# Clone and enter project
git clone https://github.com/Logicsoft/QModel.git && cd QModel
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Configure (choose one backend)
# Option A β€” Ollama (local development):
export LLM_BACKEND=ollama
export OLLAMA_MODEL=llama2
# Make sure Ollama is running: ollama serve
# Option B β€” HuggingFace (production):
export LLM_BACKEND=hf
export HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
# Run
python main.py
# Query
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
```
API docs: http://localhost:8000/docs
### Data & Index
Pre-built data files are included:
- `metadata.json` β€” 47,626 documents (6,236 Quran verses + 41,390 hadiths from 9 canonical collections)
- `QModel.index` β€” FAISS search index
To rebuild after dataset changes:
```bash
python build_index.py
```
---
## Example Queries
```bash
# Basic question
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
# Word frequency
curl "http://localhost:8000/ask?q=How%20many%20times%20is%20mercy%20mentioned?"
# Authentic Hadiths only
curl "http://localhost:8000/ask?q=prayer&source_type=hadith&grade_filter=sahih"
# Quran text search
curl "http://localhost:8000/quran/search?q=bismillah"
# Quran topic search
curl "http://localhost:8000/quran/topic?topic=patience&top_k=5"
# Quran word frequency
curl "http://localhost:8000/quran/word-frequency?word=mercy"
# Single chapter
curl "http://localhost:8000/quran/chapter/2"
# Exact verse
curl "http://localhost:8000/quran/verse/2:255"
# Hadith text search
curl "http://localhost:8000/hadith/search?q=actions+are+judged+by+intentions"
# Hadith topic search (Sahih only)
curl "http://localhost:8000/hadith/topic?topic=fasting&grade_filter=sahih"
# Verify Hadith authenticity
curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"
# Browse a collection
curl "http://localhost:8000/hadith/collection/bukhari?limit=5"
# Streaming (OpenAI-compatible)
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"QModel","messages":[{"role":"user","content":"What does Islam say about charity?"}],"stream":true}'
```
---
## Configuration
All configuration via environment variables (`.env` file or exported directly):
### Backend Selection
| Backend | Pros | Cons | When to Use |
|---------|------|------|------------|
| **Ollama** | Fast setup, no GPU, free | Smaller models | Development, testing |
| **HuggingFace** | Larger models, better quality | Requires GPU or significant RAM | Production |
### Ollama Backend (Development)
```bash
LLM_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama2 # or: mistral, neural-chat, orca-mini
```
Requires: `ollama serve` running and model pulled (`ollama pull llama2`).
### HuggingFace Backend (Production)
```bash
LLM_BACKEND=hf
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
HF_DEVICE=auto # auto | cuda | cpu
HF_MAX_NEW_TOKENS=2048
```
### All Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| **Backend** | | |
| `LLM_BACKEND` | `hf` | `ollama` or `hf` |
| `OLLAMA_HOST` | `http://localhost:11434` | Ollama server URL |
| `OLLAMA_MODEL` | `llama2` | Ollama model name |
| `HF_MODEL_NAME` | `Qwen/Qwen2-7B-Instruct` | HuggingFace model ID |
| `HF_DEVICE` | `auto` | `auto`, `cuda`, or `cpu` |
| `HF_MAX_NEW_TOKENS` | `2048` | Max output length |
| **Embedding & Data** | | |
| `EMBED_MODEL` | `intfloat/multilingual-e5-large` | Embedding model |
| `FAISS_INDEX` | `QModel.index` | Index file path |
| `METADATA_FILE` | `metadata.json` | Dataset file |
| **Retrieval** | | |
| `TOP_K_SEARCH` | `20` | Candidate pool (5–100) |
| `TOP_K_RETURN` | `5` | Results shown to user (1–20) |
| `RERANK_ALPHA` | `0.6` | Dense vs Sparse weight (0.0–1.0) |
| **Generation** | | |
| `TEMPERATURE` | `0.2` | Creativity (0.0–1.0, use 0.1–0.2 for religious) |
| `MAX_TOKENS` | `2048` | Max response length |
| **Safety** | | |
| `CONFIDENCE_THRESHOLD` | `0.30` | Min score to call LLM (higher = fewer hallucinations) |
| `HADITH_BOOST` | `0.08` | Score boost for hadith on hadith queries |
| **Other** | | |
| `CACHE_SIZE` | `512` | Query response cache entries |
| `CACHE_TTL` | `3600` | Cache expiry in seconds |
| `ALLOWED_ORIGINS` | `*` | CORS origins |
| `MAX_EXAMPLES` | `3` | Few-shot examples in system prompt |
### Configuration Examples
**Development (Ollama)**
```bash
LLM_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama2
TEMPERATURE=0.2
CONFIDENCE_THRESHOLD=0.30
ALLOWED_ORIGINS=*
```
**Production (HuggingFace + GPU)**
```bash
LLM_BACKEND=hf
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
HF_DEVICE=cuda
TOP_K_SEARCH=30
TEMPERATURE=0.1
CONFIDENCE_THRESHOLD=0.35
ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com
```
### Tuning Tips
- **Better results**: Increase `TOP_K_SEARCH`, lower `CONFIDENCE_THRESHOLD`, use `TEMPERATURE=0.1`
- **Faster performance**: Lower `TOP_K_SEARCH` and `TOP_K_RETURN`, reduce `MAX_TOKENS`, use Ollama
- **More conservative**: Increase `CONFIDENCE_THRESHOLD`, lower `TEMPERATURE`
---
## Docker Deployment
### Docker Compose (Recommended)
```bash
cp .env.example .env # Configure backend (see Configuration section)
docker-compose up
```
### Docker CLI
```bash
docker build -t qmodel .
# With Ollama backend
docker run -p 8000:8000 \
--env-file .env \
--add-host host.docker.internal:host-gateway \
qmodel
# With HuggingFace backend
docker run -p 8000:8000 \
--env-file .env \
--env HF_TOKEN=your_token_here \
qmodel
```
### Docker with Ollama
```bash
# .env
LLM_BACKEND=ollama
OLLAMA_HOST=http://host.docker.internal:11434
OLLAMA_MODEL=llama2
```
Requires Ollama running on the host (`ollama serve`).
### Docker with HuggingFace
```bash
# .env
LLM_BACKEND=hf
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
HF_DEVICE=auto
# Pass HF token
export HF_TOKEN=hf_xxxxxxxxxxxxx
docker-compose up
```
### Docker Compose with GPU (Linux)
```yaml
services:
qmodel:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```
### Production Tips
- Remove dev volume mount (`.:/app`) in `docker-compose.yml`
- Set `restart: on-failure:5`
- Use specific `ALLOWED_ORIGINS` instead of `*`
---
## Open-WebUI Integration
QModel is fully OpenAI-compatible and works out of the box with Open-WebUI.
### Setup
```bash
# Start QModel
python main.py
# Start Open-WebUI
docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:latest
```
### Connect
1. **Settings** β†’ **Models** β†’ **Manage Models**
2. Click **"Connect to OpenAI-compatible API"**
3. **API Base URL**: `http://localhost:8000/v1`
4. **Model Name**: `QModel`
5. **API Key**: Leave blank
6. **Save & Test** β†’ βœ… Connected
### Docker Compose (QModel + Ollama + Open-WebUI)
```yaml
version: '3.8'
services:
qmodel:
build: .
ports:
- "8000:8000"
environment:
- LLM_BACKEND=ollama
- OLLAMA_HOST=http://ollama:11434
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
web-ui:
image: ghcr.io/open-webui/open-webui:latest
ports:
- "3000:8080"
depends_on:
- qmodel
```
### Supported Features
| Feature | Status |
|---------|--------|
| Chat | βœ… Full support |
| Streaming | βœ… `stream: true` |
| Multi-turn context | βœ… Handled by Open-WebUI |
| Temperature | βœ… Configurable |
| Token limits | βœ… `max_tokens` |
| Model listing | βœ… `/v1/models` |
| Source attribution | βœ… `x_metadata.sources` |
---
## Architecture
### Module Structure
```
main.py ← FastAPI app + router registration
app/
config.py ← Config class (env vars)
llm.py ← LLM providers (Ollama, HuggingFace)
cache.py ← TTL-LRU async cache
arabic_nlp.py ← Arabic normalization, stemming, language detection
search.py ← Hybrid FAISS+BM25, text search, query rewriting
analysis.py ← Intent detection, analytics, counting
prompts.py ← Prompt engineering (persona, anti-hallucination)
models.py ← Pydantic schemas
state.py ← AppState, lifespan, RAG pipeline
routers/
quran.py ← 6 Quran endpoints
hadith.py ← 5 Hadith endpoints
chat.py ← /ask + OpenAI-compatible chat
ops.py ← health, models, debug scores
```
### Data Pipeline
1. **Ingest**: 47,626 documents (6,236 Quran verses + 41,390 Hadiths from 9 collections)
2. **Embed**: Encode with `multilingual-e5-large` (Arabic + English dual embeddings)
3. **Index**: FAISS `IndexFlatIP` for dense retrieval
### Retrieval & Ranking
1. Dense retrieval (FAISS semantic scoring)
2. Sparse retrieval (BM25 term-frequency)
3. Fusion: 60% dense + 40% sparse
4. Intent-aware boost (+0.08 to Hadith when intent=hadith)
5. Type filter (quran_only / hadith_only / authenticated_only)
6. Text search fallback (exact phrase + word-overlap)
### Anti-Hallucination Measures
- Few-shot examples including "not found" refusal path
- Hardcoded citation format rules
- Verbatim copy rules (no text reconstruction)
- Confidence threshold gating (default: 0.30)
- Post-generation citation verification
- Grade inference from collection name
### Performance
| Operation | Time | Backend |
|-----------|------|---------|
| Query (cached) | ~50ms | Both |
| Query (Ollama) | 400–800ms | Ollama |
| Query (HF GPU) | 500–1500ms | CUDA |
| Query (HF CPU) | 2–5s | CPU |
---
## Troubleshooting
### "Cannot connect to Ollama"
```bash
ollama serve # Ensure Ollama is running on host
# In Docker, use OLLAMA_HOST=http://host.docker.internal:11434
```
### "HuggingFace model not found"
```bash
export HF_TOKEN=hf_xxxxxxxxxxxxx # Set token for gated models
```
### "Out of memory"
- Use smaller model: `HF_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2`
- Use Ollama with `neural-chat`
- Reduce `MAX_TOKENS` to 1024
- Increase Docker memory limit in `docker-compose.yml`
### "Assistant returns 'Not found'"
This is expected β€” QModel rejects low-confidence queries. Try:
- More specific queries
- Lower `CONFIDENCE_THRESHOLD` in `.env`
- Check raw scores: `GET /debug/scores?q=your+query`
### "Port already in use"
```bash
docker-compose down && docker system prune
# Or change port: ports: ["8001:8000"]
```
---
## Roadmap
- [x] Grade-based filtering
- [x] Streaming responses (SSE)
- [x] Modular architecture (4 routers, 16 endpoints)
- [x] Dual LLM backend (Ollama + HuggingFace)
- [x] Text search (exact substring + fuzzy matching)
- [ ] Chain of narrators (Isnad display)
- [ ] Synonym expansion (mercy β†’ rahma, compassion)
- [ ] Batch processing (multiple questions per request)
- [ ] Islamic calendar integration (Hijri dates)
- [ ] Tafsir endpoint with scholar citations
---
## Data Sources
- **Qur'an**: [risan/quran-json](https://github.com/risan/quran-json) β€” 114 Surahs, 6,236 verses
- **Hadith**: [AhmedBaset/hadith-json](https://github.com/AhmedBaset/hadith-json) β€” 9 canonical collections, 41,390 hadiths
---
## Architecture Overview
```
User Query
↓
Query Rewriting & Intent Detection
↓
Hybrid Search (FAISS dense + BM25 sparse)
↓
Filtering & Ranking
↓
Confidence Gate (skip LLM if low-scoring)
↓
LLM Generation (Ollama or HuggingFace)
↓
Formatted Response with Sources
```
See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed system design.
---
## Troubleshooting
| Issue | Solution |
|-------|----------|
| "Service is initialising" | Wait 60-90s for embeddings model to load |
| Low retrieval scores | Check `/debug/scores`, try synonyms, lower threshold |
| "Model not found" (HF) | Run `huggingface-cli login` |
| Out of memory | Use smaller model or CPU backend |
| No results | Verify data files exist: `metadata.json` and `QModel.index` |
See [SETUP.md](SETUP.md) and [DOCKER.md](DOCKER.md) for more detailed troubleshooting.
---
## What's New in v6
✨ **Dual LLM Backend** β€” Ollama (dev) + HuggingFace (prod)
✨ **Grade Filtering** β€” Return only Sahih/Hasan authenticated Hadiths
✨ **Source Filtering** β€” Quran-only or Hadith-only queries
✨ **Hadith Verification** β€” `/hadith/verify` endpoint
✨ **Enhanced Frequency** β€” Word counts by Surah
✨ **OpenAI Compatible** β€” Use with any OpenAI client
✨ **Production Ready** β€” Structured logging, error handling, async throughout
---
## Next Steps
1. **Get Started**: See [SETUP.md](SETUP.md)
2. **Integrate with Open-WebUI**: See [OPEN_WEBUI.md](OPEN_WEBUI.md)
3. **Deploy with Docker**: See [DOCKER.md](DOCKER.md)
4. **Understand Architecture**: See [ARCHITECTURE.md](ARCHITECTURE.md)
---
## License
This project uses open-source data from:
- [Qur'an JSON](https://github.com/risan/quran-json) β€” Open source
- [Hadith API](https://github.com/AhmedBaset/hadith-json) β€” Open source
See individual repositories for license details.
---
**Made with ❀️ for Islamic scholarship.**
Version 4.0.0 | March 2025 | Production-Ready