Instructions to use aelgendy/QModel with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use aelgendy/QModel with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="aelgendy/QModel", filename="models/Qwen3-32B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use aelgendy/QModel with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf aelgendy/QModel:Q4_K_M # Run inference directly in the terminal: llama-cli -hf aelgendy/QModel:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf aelgendy/QModel:Q4_K_M # Run inference directly in the terminal: llama-cli -hf aelgendy/QModel:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf aelgendy/QModel:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf aelgendy/QModel:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf aelgendy/QModel:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf aelgendy/QModel:Q4_K_M
Use Docker
docker model run hf.co/aelgendy/QModel:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use aelgendy/QModel with Ollama:
ollama run hf.co/aelgendy/QModel:Q4_K_M
- Unsloth Studio new
How to use aelgendy/QModel with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aelgendy/QModel to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aelgendy/QModel to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for aelgendy/QModel to start chatting
- Pi new
How to use aelgendy/QModel with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf aelgendy/QModel:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "aelgendy/QModel:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use aelgendy/QModel with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf aelgendy/QModel:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default aelgendy/QModel:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use aelgendy/QModel with Docker Model Runner:
docker model run hf.co/aelgendy/QModel:Q4_K_M
- Lemonade
How to use aelgendy/QModel with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull aelgendy/QModel:Q4_K_M
Run and chat with the model
lemonade run user.QModel-Q4_K_M
List all available models
lemonade list
Upload folder using huggingface_hub
Browse files- ARCHITECTURE.md +0 -334
- DOCKER.md +0 -443
- OPEN_WEBUI.md +0 -385
- README.md +503 -80
- SETUP.md +0 -590
- app/routers/chat.py +43 -2
- app/routers/ops.py +43 -4
- main.py +4 -2
ARCHITECTURE.md
DELETED
|
@@ -1,334 +0,0 @@
|
|
| 1 |
-
# QModel v6 Architecture β Detailed System Design
|
| 2 |
-
|
| 3 |
-
> For a quick overview, see [README.md](README.md#architecture-overview)
|
| 4 |
-
|
| 5 |
-
## System Vision
|
| 6 |
-
A RAG system specialized **exclusively** in authenticated Qur'an and Hadith. No hallucinations, no outside knowledgeβonly content from verified sources.
|
| 7 |
-
|
| 8 |
-
## Core Capabilities
|
| 9 |
-
|
| 10 |
-
### 1. **Quran Verse Lookup** (by partial text)
|
| 11 |
-
- Text search: find any verse by typing part of its Arabic or English text
|
| 12 |
-
- Exact substring + fuzzy word-overlap matching
|
| 13 |
-
|
| 14 |
-
### 2. **Quran Topic Search**
|
| 15 |
-
- Semantic hybrid search to find verses related to any topic
|
| 16 |
-
- Full Tafsir-aware prompting
|
| 17 |
-
|
| 18 |
-
### 3. **Quran Word Frequency & Analytics**
|
| 19 |
-
- Count how many times a word appears across all 114 Surahs
|
| 20 |
-
- Per-surah breakdown with example verses
|
| 21 |
-
- Chapter-level analytics (verse count, revelation type)
|
| 22 |
-
|
| 23 |
-
### 4. **Hadith Lookup** (by partial text)
|
| 24 |
-
- Text search across 9 Hadith collections
|
| 25 |
-
- Optional collection filter
|
| 26 |
-
|
| 27 |
-
### 5. **Hadith Topic Search**
|
| 28 |
-
- Semantic hybrid search for Hadiths by topic
|
| 29 |
-
- Optional grade filter (sahih, hasan, etc.)
|
| 30 |
-
|
| 31 |
-
### 6. **Hadith Authenticity Verification**
|
| 32 |
-
- Dual-method verification: text search + semantic search
|
| 33 |
-
- Grade inference from collection name when not explicitly provided
|
| 34 |
-
- Sources: Bukhari, Muslim, Abu Dawud, Tirmidhi, Ibn Majah, Nasa'i, Malik, Ahmad, Darimi
|
| 35 |
-
|
| 36 |
-
### 7. **Safety First**
|
| 37 |
-
- **Confidence Gating**: Low-confidence queries return "not found" instead of LLM guess
|
| 38 |
-
- **Source Attribution**: Every answer cites exact verse/Hadith with reference
|
| 39 |
-
- **Grade Filtering**: Optional: only return Sahih-authenticated Hadiths
|
| 40 |
-
- **Verbatim Quotes**: Copy text directly from data, no paraphrasing
|
| 41 |
-
|
| 42 |
-
## Modular Architecture (v6)
|
| 43 |
-
|
| 44 |
-
```
|
| 45 |
-
main.py β Thin launcher (73 lines)
|
| 46 |
-
app/
|
| 47 |
-
config.py β Config class (env vars)
|
| 48 |
-
llm.py β LLM providers (Ollama, HuggingFace)
|
| 49 |
-
cache.py β TTL-LRU async cache
|
| 50 |
-
arabic_nlp.py β Arabic normalisation, stemming, language detection
|
| 51 |
-
search.py β Hybrid FAISS+BM25, text search, query rewriting
|
| 52 |
-
analysis.py β Intent detection, analytics, counting
|
| 53 |
-
prompts.py β Prompt engineering (persona, task instructions)
|
| 54 |
-
models.py β Pydantic schemas
|
| 55 |
-
state.py β AppState, lifespan, RAG pipeline
|
| 56 |
-
routers/
|
| 57 |
-
quran.py β 6 Quran endpoints
|
| 58 |
-
hadith.py β 5 Hadith endpoints
|
| 59 |
-
chat.py β 2 OpenAI-compatible + inference endpoints
|
| 60 |
-
ops.py β 3 operational endpoints (health, models, debug)
|
| 61 |
-
```
|
| 62 |
-
|
| 63 |
-
---
|
| 64 |
-
|
| 65 |
-
## Data Pipeline
|
| 66 |
-
|
| 67 |
-
The system follows a three-phase approach:
|
| 68 |
-
|
| 69 |
-
**Metadata Schema** (47,179 entries: 6,236 Quran + 40,943 Hadith):
|
| 70 |
-
```json
|
| 71 |
-
{
|
| 72 |
-
"id": "surah:verse or hadith_prefix_number",
|
| 73 |
-
"arabic": "...",
|
| 74 |
-
"english": "...",
|
| 75 |
-
"source": "Surah Al-Baqarah 2:43 | Sahih al-Bukhari 1",
|
| 76 |
-
"type": "quran | hadith",
|
| 77 |
-
|
| 78 |
-
// Quran only
|
| 79 |
-
"surah_number": 2,
|
| 80 |
-
"surah_name_en": "Al-Baqarah",
|
| 81 |
-
"surah_name_ar": "Ψ§ΩΨ¨ΩΨ±Ψ©",
|
| 82 |
-
"verse_number": 43,
|
| 83 |
-
|
| 84 |
-
// Hadith only
|
| 85 |
-
"collection": "Sahih al-Bukhari",
|
| 86 |
-
"grade": "Sahih",
|
| 87 |
-
"hadith_number": 1
|
| 88 |
-
}
|
| 89 |
-
```
|
| 90 |
-
|
| 91 |
-
### Phase 2: Indexing
|
| 92 |
-
```
|
| 93 |
-
build_index.py
|
| 94 |
-
βββ Load Quran + Hadith JSON
|
| 95 |
-
βββ Encode all texts with multilingual-e5-large
|
| 96 |
-
β βββ Dual embeddings: Arabic + English per item
|
| 97 |
-
β βββ Normalize before encoding
|
| 98 |
-
βββ Build FAISS IndexFlatIP for dense retrieval
|
| 99 |
-
```
|
| 100 |
-
|
| 101 |
-
### Phase 3: Retrieval & Ranking
|
| 102 |
-
|
| 103 |
-
**Hybrid Search Algorithm** (`app/search.py`):
|
| 104 |
-
1. Dense retrieval: FAISS semantic scoring
|
| 105 |
-
2. Sparse retrieval: BM25 term-frequency ranking
|
| 106 |
-
3. Fusion: 60% dense + 40% sparse
|
| 107 |
-
4. Intent-aware boost: +0.08 to Hadith items when intent=hadith
|
| 108 |
-
5. Type filter: Optional (quran_only / hadith_only / authenticated_only)
|
| 109 |
-
6. Phrase matching: Exact phrase + word-overlap scoring for text search
|
| 110 |
-
|
| 111 |
-
---
|
| 112 |
-
|
| 113 |
-
## Module Reference
|
| 114 |
-
|
| 115 |
-
### `app/config.py` β Configuration
|
| 116 |
-
- `Config` dataclass with all environment variables
|
| 117 |
-
- Singleton `cfg` instance
|
| 118 |
-
- Loads `.env` via dotenv
|
| 119 |
-
|
| 120 |
-
### `app/llm.py` β LLM Providers
|
| 121 |
-
- `LLMProvider` abstract base class
|
| 122 |
-
- `OllamaProvider` β primary (3-model fallback chain)
|
| 123 |
-
- `HuggingFaceProvider` β alternative local inference
|
| 124 |
-
- `create_llm_provider()` factory dispatches on `LLM_BACKEND` env var
|
| 125 |
-
|
| 126 |
-
### `app/cache.py` β TTL-LRU Cache
|
| 127 |
-
- `TTLCache` with size limit (1024) and TTL (300s)
|
| 128 |
-
- Pre-built instances: `search_cache`, `analysis_cache`, `rewrite_cache`
|
| 129 |
-
|
| 130 |
-
### `app/arabic_nlp.py` β Arabic NLP
|
| 131 |
-
- `normalize_arabic()` β tashkeel removal, hamza normalization
|
| 132 |
-
- `light_stem()` β prefix/suffix stripping
|
| 133 |
-
- `tokenize_ar()` β Arabic-aware tokenization
|
| 134 |
-
- `detect_language()` / `language_instruction()` β route persona by language
|
| 135 |
-
|
| 136 |
-
### `app/search.py` β Retrieval Engine
|
| 137 |
-
- `rewrite_query()` β dual-language normalization, LLM-assisted rewriting
|
| 138 |
-
- `hybrid_search()` β FAISS + BM25 fusion with intent-aware boosting
|
| 139 |
-
- `text_search()` β exact substring + word-overlap matching (for verse/hadith lookup by partial text)
|
| 140 |
-
- `build_context()` β format retrieved items for LLM prompt
|
| 141 |
-
|
| 142 |
-
### `app/analysis.py` β Analytics & Intent Detection
|
| 143 |
-
- `detect_analysis_intent()` β identifies count / analytics / chapter queries
|
| 144 |
-
- `count_occurrences()` β word frequency across all Surahs
|
| 145 |
-
- `get_quran_analytics()` β chapter-level stats
|
| 146 |
-
- `get_hadith_analytics()` β collection-level stats
|
| 147 |
-
- `get_chapter_info()` β single Surah metadata
|
| 148 |
-
- `get_verse()` β exact verse by surah:ayah
|
| 149 |
-
- `detect_surah_info()` / `lookup_surah_info()` β Surah name resolution
|
| 150 |
-
|
| 151 |
-
### `app/prompts.py` β Prompt Engineering
|
| 152 |
-
- `PERSONA` β Islamic scholar persona definition
|
| 153 |
-
- `TASK_INSTRUCTIONS` β verbatim-quoting, anti-hallucination rules
|
| 154 |
-
- `FORMAT_RULES` β citation box format
|
| 155 |
-
- `build_messages()` β intent-aware system + user message construction
|
| 156 |
-
- `not_found_answer()` β safe "not in dataset" response
|
| 157 |
-
|
| 158 |
-
### `app/models.py` β Pydantic Schemas
|
| 159 |
-
All request/response models:
|
| 160 |
-
- `ChatMessage`, `ChatCompletionRequest/Response/Choice` β OpenAI-compatible
|
| 161 |
-
- `AskResponse`, `AnalysisResult`, `SourceItem` β RAG pipeline
|
| 162 |
-
- `HadithVerifyResponse` β authenticity verification
|
| 163 |
-
- `VerseItem`, `HadithItem`, `TextSearchResponse` β text search
|
| 164 |
-
- `ChapterResponse`, `QuranAnalyticsResponse`, `HadithAnalyticsResponse` β analytics
|
| 165 |
-
- `WordFrequencyResponse` β word counting
|
| 166 |
-
- `ModelInfo`, `ModelsListResponse` β OpenAI models list
|
| 167 |
-
|
| 168 |
-
### `app/state.py` β Application State & Lifecycle
|
| 169 |
-
- `AppState` β holds FAISS index, metadata, embedder, LLM provider
|
| 170 |
-
- `lifespan()` β async startup (loads index, model, metadata)
|
| 171 |
-
- `check_ready()` β dependency guard for endpoints
|
| 172 |
-
- `run_rag_pipeline()` β full RAG: rewrite β search β context β LLM β response
|
| 173 |
-
- `infer_hadith_grade()` β grade detection from collection name
|
| 174 |
-
|
| 175 |
-
---
|
| 176 |
-
|
| 177 |
-
## API Endpoints (16 total)
|
| 178 |
-
|
| 179 |
-
### Quran Router (`/quran/...`) β 6 endpoints
|
| 180 |
-
|
| 181 |
-
| Endpoint | Method | Description |
|
| 182 |
-
|----------|--------|-------------|
|
| 183 |
-
| `/quran/search?q=...` | GET | Text search: find verses by partial Arabic/English text |
|
| 184 |
-
| `/quran/topic?q=...&top_k=5` | GET | Semantic search: find verses related to a topic |
|
| 185 |
-
| `/quran/word-frequency?word=...` | GET | Count word occurrences across all Surahs |
|
| 186 |
-
| `/quran/analytics` | GET | Overall Quran stats (total verses, Surahs, types) |
|
| 187 |
-
| `/quran/chapter/{number}` | GET | Single Surah metadata (name, verse count, type) |
|
| 188 |
-
| `/quran/verse/{surah}:{ayah}` | GET | Exact verse lookup by reference |
|
| 189 |
-
|
| 190 |
-
### Hadith Router (`/hadith/...`) β 5 endpoints
|
| 191 |
-
|
| 192 |
-
| Endpoint | Method | Description |
|
| 193 |
-
|----------|--------|-------------|
|
| 194 |
-
| `/hadith/search?q=...&collection=...` | GET | Text search across collections |
|
| 195 |
-
| `/hadith/topic?q=...&top_k=5&grade=...` | GET | Semantic search by topic with optional grade filter |
|
| 196 |
-
| `/hadith/verify?q=...` | GET | Authenticity verification (text + semantic search) |
|
| 197 |
-
| `/hadith/collection/{name}?limit=20` | GET | Browse a specific collection |
|
| 198 |
-
| `/hadith/analytics` | GET | Collection-level statistics |
|
| 199 |
-
|
| 200 |
-
### Chat Router β 2 endpoints
|
| 201 |
-
|
| 202 |
-
| Endpoint | Method | Description |
|
| 203 |
-
|----------|--------|-------------|
|
| 204 |
-
| `/v1/chat/completions` | POST | OpenAI-compatible chat (SSE streaming supported) |
|
| 205 |
-
| `/ask?q=...&top_k=5` | GET | Direct RAG query with full source attribution |
|
| 206 |
-
|
| 207 |
-
### Ops Router β 3 endpoints
|
| 208 |
-
|
| 209 |
-
| Endpoint | Method | Description |
|
| 210 |
-
|----------|--------|-------------|
|
| 211 |
-
| `/health` | GET | Readiness check |
|
| 212 |
-
| `/v1/models` | GET | OpenAI-compatible model listing |
|
| 213 |
-
| `/debug/scores?q=...&top_k=10` | GET | Raw retrieval scores (no LLM call) |
|
| 214 |
-
|
| 215 |
-
---
|
| 216 |
-
|
| 217 |
-
## Anti-Hallucination Measures
|
| 218 |
-
|
| 219 |
-
- Few-shot examples including "not found" refusal path
|
| 220 |
-
- Hardcoded format rules (box/citation format required)
|
| 221 |
-
- Verbatim copy rules (no reconstruction from memory)
|
| 222 |
-
- Confidence threshold gating (default: 0.30)
|
| 223 |
-
- Grade inference for Hadith authenticity (collection-based)
|
| 224 |
-
|
| 225 |
-
---
|
| 226 |
-
|
| 227 |
-
## Configuration
|
| 228 |
-
|
| 229 |
-
**`.env` variables**:
|
| 230 |
-
```
|
| 231 |
-
OLLAMA_HOST # Ollama server URL
|
| 232 |
-
LLM_MODEL # Primary model (e.g. minimax-m2.7:cloud)
|
| 233 |
-
LLM_BACKEND # "ollama" (default) or "huggingface"
|
| 234 |
-
EMBED_MODEL # Embedding model (intfloat/multilingual-e5-large)
|
| 235 |
-
FAISS_INDEX # Path to QModel.index
|
| 236 |
-
METADATA_FILE # Path to metadata.json
|
| 237 |
-
CONFIDENCE_THRESHOLD # Min hybrid score for LLM call (default: 0.30)
|
| 238 |
-
HADITH_BOOST # Intent-aware boost for Hadith (default: 0.08)
|
| 239 |
-
TOP_K_SEARCH # Retrieval candidate pool (default: 20)
|
| 240 |
-
TOP_K_RETURN # Results returned to user (default: 5)
|
| 241 |
-
TEMPERATURE # LLM creativity (default: 0.2 for factual)
|
| 242 |
-
```
|
| 243 |
-
|
| 244 |
-
---
|
| 245 |
-
|
| 246 |
-
## Deployment
|
| 247 |
-
|
| 248 |
-
### Local Development
|
| 249 |
-
```bash
|
| 250 |
-
python main.py
|
| 251 |
-
# API at http://localhost:8000
|
| 252 |
-
# Docs at http://localhost:8000/docs
|
| 253 |
-
```
|
| 254 |
-
|
| 255 |
-
### Docker
|
| 256 |
-
```bash
|
| 257 |
-
docker-compose up
|
| 258 |
-
# Ollama on port 11434
|
| 259 |
-
# QModel on port 8000
|
| 260 |
-
```
|
| 261 |
-
|
| 262 |
-
---
|
| 263 |
-
|
| 264 |
-
## Testing Examples
|
| 265 |
-
|
| 266 |
-
### 1. Quran Verse Lookup (Capability 1)
|
| 267 |
-
```bash
|
| 268 |
-
curl "http://localhost:8000/quran/search?q=bismillah"
|
| 269 |
-
```
|
| 270 |
-
|
| 271 |
-
### 2. Quran Topic Search (Capability 2)
|
| 272 |
-
```bash
|
| 273 |
-
curl "http://localhost:8000/quran/topic?q=patience&top_k=5"
|
| 274 |
-
```
|
| 275 |
-
|
| 276 |
-
### 3. Word Frequency (Capability 3)
|
| 277 |
-
```bash
|
| 278 |
-
curl "http://localhost:8000/quran/word-frequency?word=mercy"
|
| 279 |
-
# β Returns: count per surah + total + examples
|
| 280 |
-
```
|
| 281 |
-
|
| 282 |
-
### 4. Quran Analytics (Capability 3)
|
| 283 |
-
```bash
|
| 284 |
-
curl "http://localhost:8000/quran/analytics"
|
| 285 |
-
curl "http://localhost:8000/quran/chapter/2"
|
| 286 |
-
```
|
| 287 |
-
|
| 288 |
-
### 5. Hadith Text Search (Capability 4)
|
| 289 |
-
```bash
|
| 290 |
-
curl "http://localhost:8000/hadith/search?q=actions+are+judged+by+intentions"
|
| 291 |
-
```
|
| 292 |
-
|
| 293 |
-
### 6. Hadith Topic Search (Capability 5)
|
| 294 |
-
```bash
|
| 295 |
-
curl "http://localhost:8000/hadith/topic?q=fasting&grade=sahih"
|
| 296 |
-
```
|
| 297 |
-
|
| 298 |
-
### 7. Hadith Authenticity Verification (Capability 6)
|
| 299 |
-
```bash
|
| 300 |
-
curl "http://localhost:8000/hadith/verify?q=Actions+are+judged+by+intentions"
|
| 301 |
-
# β Returns: found=true, grade="Sahih", source="Sahih al-Bukhari 1"
|
| 302 |
-
```
|
| 303 |
-
|
| 304 |
-
### 8. Confidence Gate in Action (Safety)
|
| 305 |
-
```
|
| 306 |
-
Q: "Who was Muhammad's 7th wife?" (not in dataset)
|
| 307 |
-
β Retrieval score: 0.15 (below 0.30 threshold)
|
| 308 |
-
β Returns: "Not in available dataset"
|
| 309 |
-
β LLM not called (prevents hallucination)
|
| 310 |
-
```
|
| 311 |
-
|
| 312 |
-
### 9. OpenAI-Compatible Chat (Streaming)
|
| 313 |
-
```bash
|
| 314 |
-
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 315 |
-
-H "Content-Type: application/json" \
|
| 316 |
-
-d '{"model":"qmodel","messages":[{"role":"user","content":"What does Islam say about charity?"}],"stream":true}'
|
| 317 |
-
```
|
| 318 |
-
|
| 319 |
-
---
|
| 320 |
-
|
| 321 |
-
## Roadmap: v6+ Enhancements
|
| 322 |
-
|
| 323 |
-
- [x] Grade-based filtering: `?grade=sahih` to return only authenticated Hadiths
|
| 324 |
-
- [x] Streaming responses: SSE for long-form answers
|
| 325 |
-
- [x] Modular architecture: Separate routers, models, and services
|
| 326 |
-
- [x] Dual LLM backend: Ollama + HuggingFace support
|
| 327 |
-
- [x] Text search: Exact substring + fuzzy word-overlap matching
|
| 328 |
-
- [x] Expanded endpoints: 16 endpoints across 4 routers
|
| 329 |
-
- [ ] Chain of narrators: Display Isnad with full narrator details
|
| 330 |
-
- [ ] Synonym expansion: Better topic matching (e.g., "mercy" β "rahma, compassion")
|
| 331 |
-
- [ ] Multi-Surah topics: Topics spanning multiple Surahs
|
| 332 |
-
- [ ] Batch processing: Handle multiple questions in one request
|
| 333 |
-
- [ ] Islamic calendar integration: Hijri date references
|
| 334 |
-
- [ ] Tafsir integration: Dedicated Tafsir endpoint with scholar citations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DOCKER.md
DELETED
|
@@ -1,443 +0,0 @@
|
|
| 1 |
-
# QModel Docker Guide
|
| 2 |
-
|
| 3 |
-
Complete guide for running QModel in Docker with both backend options.
|
| 4 |
-
|
| 5 |
-
## Quick Start
|
| 6 |
-
|
| 7 |
-
### Option 1: Docker Compose (Recommended)
|
| 8 |
-
|
| 9 |
-
```bash
|
| 10 |
-
# 1. Copy example config
|
| 11 |
-
cp .env.example .env
|
| 12 |
-
|
| 13 |
-
# 2. Edit .env and choose your backend (see below)
|
| 14 |
-
nano .env
|
| 15 |
-
|
| 16 |
-
# 3. Run with compose
|
| 17 |
-
docker-compose up
|
| 18 |
-
```
|
| 19 |
-
|
| 20 |
-
API available at: `http://localhost:8000`
|
| 21 |
-
|
| 22 |
-
### Option 2: Docker CLI
|
| 23 |
-
|
| 24 |
-
```bash
|
| 25 |
-
# Build image
|
| 26 |
-
docker build -t qmodel .
|
| 27 |
-
|
| 28 |
-
# Run with Ollama backend
|
| 29 |
-
docker run -p 8000:8000 \
|
| 30 |
-
--env-file .env \
|
| 31 |
-
--add-host host.docker.internal:host-gateway \
|
| 32 |
-
qmodel
|
| 33 |
-
|
| 34 |
-
# Or run with HuggingFace backend
|
| 35 |
-
docker run -p 8000:8000 \
|
| 36 |
-
--env-file .env \
|
| 37 |
-
--env HF_TOKEN=your_token_here \
|
| 38 |
-
qmodel
|
| 39 |
-
```
|
| 40 |
-
|
| 41 |
-
---
|
| 42 |
-
|
| 43 |
-
## Backend Configuration
|
| 44 |
-
|
| 45 |
-
Configure which backend to use via `.env` file:
|
| 46 |
-
|
| 47 |
-
### Backend 1: Ollama (Local)
|
| 48 |
-
|
| 49 |
-
**Best for**: Development, testing, Docker Desktop
|
| 50 |
-
|
| 51 |
-
```bash
|
| 52 |
-
# .env
|
| 53 |
-
LLM_BACKEND=ollama
|
| 54 |
-
OLLAMA_HOST=http://host.docker.internal:11434
|
| 55 |
-
OLLAMA_MODEL=llama2
|
| 56 |
-
```
|
| 57 |
-
|
| 58 |
-
**Prerequisites**:
|
| 59 |
-
- Ollama installed on host machine
|
| 60 |
-
- Running: `ollama serve`
|
| 61 |
-
- Model pulled: `ollama pull llama2`
|
| 62 |
-
|
| 63 |
-
**Why**:
|
| 64 |
-
- β
Fast setup
|
| 65 |
-
- β
No GPU required
|
| 66 |
-
- β
Works on Docker Desktop (Mac/Windows)
|
| 67 |
-
- β Requires host Ollama service
|
| 68 |
-
|
| 69 |
-
### Backend 2: HuggingFace (Remote)
|
| 70 |
-
|
| 71 |
-
**Best for**: Production, GPU servers, containerized environments
|
| 72 |
-
|
| 73 |
-
```bash
|
| 74 |
-
# .env
|
| 75 |
-
LLM_BACKEND=hf
|
| 76 |
-
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
|
| 77 |
-
HF_DEVICE=auto
|
| 78 |
-
```
|
| 79 |
-
|
| 80 |
-
**Prerequisites**:
|
| 81 |
-
- GPU (recommended) OR significant RAM
|
| 82 |
-
- HuggingFace token (for gated models)
|
| 83 |
-
|
| 84 |
-
**Passing HF Token**:
|
| 85 |
-
```bash
|
| 86 |
-
# Via docker-compose
|
| 87 |
-
export HF_TOKEN=your_token_here
|
| 88 |
-
docker-compose up
|
| 89 |
-
|
| 90 |
-
# Via docker run
|
| 91 |
-
docker run -p 8000:8000 \
|
| 92 |
-
--env-file .env \
|
| 93 |
-
--env HF_TOKEN=your_token_here \
|
| 94 |
-
qmodel
|
| 95 |
-
```
|
| 96 |
-
|
| 97 |
-
---
|
| 98 |
-
|
| 99 |
-
## Docker Compose Configuration
|
| 100 |
-
|
| 101 |
-
The `docker-compose.yml` includes:
|
| 102 |
-
|
| 103 |
-
| Setting | Value | Description |
|
| 104 |
-
|---------|-------|-------------|
|
| 105 |
-
| **Image** | Builds from `Dockerfile` | Python 3.11 + dependencies |
|
| 106 |
-
| **Port** | `8000:8000` | API port mapping |
|
| 107 |
-
| **Env File** | `.env` | Configuration source |
|
| 108 |
-
| **HF Token** | From `.env` or `${HF_TOKEN}` | For HuggingFace auth |
|
| 109 |
-
| **Ollama Host** | `host.docker.internal:11434` | Connect to host Ollama |
|
| 110 |
-
| **Volumes** | `.:/app` | Code changes sync (dev mode) |
|
| 111 |
-
| **HF Cache** | `/root/.cache/huggingface` | Persistent model cache |
|
| 112 |
-
| **Networks** | `qmodel-network` | Internal network |
|
| 113 |
-
| **Health Check** | `/health` endpoint | Auto-restart on failure |
|
| 114 |
-
|
| 115 |
-
### For Production
|
| 116 |
-
|
| 117 |
-
Modify `docker-compose.yml`:
|
| 118 |
-
```yaml
|
| 119 |
-
services:
|
| 120 |
-
qmodel:
|
| 121 |
-
# ... (same as above)
|
| 122 |
-
volumes:
|
| 123 |
-
# Remove live code volume
|
| 124 |
-
- huggingface_cache:/root/.cache/huggingface
|
| 125 |
-
restart: on-failure:5
|
| 126 |
-
```
|
| 127 |
-
|
| 128 |
-
---
|
| 129 |
-
|
| 130 |
-
## Examples
|
| 131 |
-
|
| 132 |
-
### Development with Ollama
|
| 133 |
-
|
| 134 |
-
```bash
|
| 135 |
-
# Terminal 1: Start Ollama
|
| 136 |
-
ollama serve
|
| 137 |
-
|
| 138 |
-
# Terminal 2: Run QModel
|
| 139 |
-
cat > .env << EOF
|
| 140 |
-
LLM_BACKEND=ollama
|
| 141 |
-
OLLAMA_HOST=http://host.docker.internal:11434
|
| 142 |
-
OLLAMA_MODEL=llama2
|
| 143 |
-
TEMPERATURE=0.2
|
| 144 |
-
CONFIDENCE_THRESHOLD=0.30
|
| 145 |
-
EOF
|
| 146 |
-
|
| 147 |
-
docker-compose up
|
| 148 |
-
```
|
| 149 |
-
|
| 150 |
-
Access: `http://localhost:8000`
|
| 151 |
-
|
| 152 |
-
### Production with HuggingFace
|
| 153 |
-
|
| 154 |
-
```bash
|
| 155 |
-
# Create .env for production
|
| 156 |
-
cat > .env << EOF
|
| 157 |
-
LLM_BACKEND=hf
|
| 158 |
-
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
|
| 159 |
-
HF_DEVICE=auto
|
| 160 |
-
TEMPERATURE=0.1
|
| 161 |
-
CONFIDENCE_THRESHOLD=0.35
|
| 162 |
-
ALLOWED_ORIGINS=yourdomain.com
|
| 163 |
-
EOF
|
| 164 |
-
|
| 165 |
-
# Export HF token
|
| 166 |
-
export HF_TOKEN=hf_xxxxxxxxxxxxx
|
| 167 |
-
|
| 168 |
-
# Run
|
| 169 |
-
docker-compose up -d
|
| 170 |
-
docker-compose logs -f
|
| 171 |
-
```
|
| 172 |
-
|
| 173 |
-
### Detached Mode
|
| 174 |
-
|
| 175 |
-
```bash
|
| 176 |
-
# Run in background
|
| 177 |
-
docker-compose up -d
|
| 178 |
-
|
| 179 |
-
# View logs
|
| 180 |
-
docker-compose logs -f
|
| 181 |
-
|
| 182 |
-
# Check status
|
| 183 |
-
docker-compose ps
|
| 184 |
-
|
| 185 |
-
# Stop
|
| 186 |
-
docker-compose down
|
| 187 |
-
```
|
| 188 |
-
|
| 189 |
-
---
|
| 190 |
-
|
| 191 |
-
## Troubleshooting
|
| 192 |
-
|
| 193 |
-
### "Cannot connect to Ollama"
|
| 194 |
-
|
| 195 |
-
**Symptom**: `ConnectionRefusedError` when using Ollama backend
|
| 196 |
-
|
| 197 |
-
**Solution**:
|
| 198 |
-
```bash
|
| 199 |
-
# Ensure Ollama is running on host
|
| 200 |
-
ollama serve
|
| 201 |
-
|
| 202 |
-
# Verify in Docker container
|
| 203 |
-
docker run --add-host host.docker.internal:host-gateway qmodel \
|
| 204 |
-
python -c "import requests; print(requests.get('http://host.docker.internal:11434/api/tags').json())"
|
| 205 |
-
```
|
| 206 |
-
|
| 207 |
-
### "HuggingFace model not found"
|
| 208 |
-
|
| 209 |
-
**Symptom**: `OSError: ... not found`
|
| 210 |
-
|
| 211 |
-
**Solution**:
|
| 212 |
-
```bash
|
| 213 |
-
# Check HF token is set
|
| 214 |
-
echo $HF_TOKEN
|
| 215 |
-
|
| 216 |
-
# If not set, export it
|
| 217 |
-
export HF_TOKEN=hf_xxxxxxxxxxxxx
|
| 218 |
-
docker-compose up
|
| 219 |
-
```
|
| 220 |
-
|
| 221 |
-
### "Out of memory"
|
| 222 |
-
|
| 223 |
-
**Symptom**: Container exits with no error message
|
| 224 |
-
|
| 225 |
-
**Solution**:
|
| 226 |
-
- Use smaller model: `HF_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2`
|
| 227 |
-
- Use Ollama with `neural-chat` model
|
| 228 |
-
- Increase Docker memory limits:
|
| 229 |
-
|
| 230 |
-
```bash
|
| 231 |
-
# Edit docker-compose.yml
|
| 232 |
-
services:
|
| 233 |
-
qmodel:
|
| 234 |
-
deploy:
|
| 235 |
-
resources:
|
| 236 |
-
limits:
|
| 237 |
-
memory: 16G
|
| 238 |
-
```
|
| 239 |
-
|
| 240 |
-
### "Port already in use"
|
| 241 |
-
|
| 242 |
-
**Symptom**: `Address already in use`
|
| 243 |
-
|
| 244 |
-
**Solution**:
|
| 245 |
-
```bash
|
| 246 |
-
# Change port in docker-compose.yml
|
| 247 |
-
ports:
|
| 248 |
-
- "8001:8000"
|
| 249 |
-
|
| 250 |
-
# Or kill existing container
|
| 251 |
-
docker-compose down
|
| 252 |
-
docker system prune
|
| 253 |
-
```
|
| 254 |
-
|
| 255 |
-
---
|
| 256 |
-
|
| 257 |
-
## Building Custom Images
|
| 258 |
-
|
| 259 |
-
### Build for Specific Backend
|
| 260 |
-
|
| 261 |
-
No code changes needed - just use `.env` to configure.
|
| 262 |
-
|
| 263 |
-
### Build with Custom Requirements
|
| 264 |
-
|
| 265 |
-
```bash
|
| 266 |
-
# Edit requirements.txt, then rebuild
|
| 267 |
-
docker build -t qmodel:custom .
|
| 268 |
-
```
|
| 269 |
-
|
| 270 |
-
### Push to Registry
|
| 271 |
-
|
| 272 |
-
```bash
|
| 273 |
-
# Tag for registry
|
| 274 |
-
docker tag qmodel myregistry/qmodel:v6.1
|
| 275 |
-
|
| 276 |
-
# Push
|
| 277 |
-
docker push myregistry/qmodel:v6.1
|
| 278 |
-
|
| 279 |
-
# Run from registry
|
| 280 |
-
docker run -p 8000:8000 \
|
| 281 |
-
--env-file .env \
|
| 282 |
-
myregistry/qmodel:v6.1
|
| 283 |
-
```
|
| 284 |
-
|
| 285 |
-
---
|
| 286 |
-
|
| 287 |
-
## Performance Tips
|
| 288 |
-
|
| 289 |
-
### Docker Compose with GPU (Linux)
|
| 290 |
-
|
| 291 |
-
```yaml
|
| 292 |
-
services:
|
| 293 |
-
qmodel:
|
| 294 |
-
deploy:
|
| 295 |
-
resources:
|
| 296 |
-
reservations:
|
| 297 |
-
devices:
|
| 298 |
-
- driver: nvidia
|
| 299 |
-
count: 1
|
| 300 |
-
capabilities: [gpu]
|
| 301 |
-
```
|
| 302 |
-
|
| 303 |
-
Then set in `.env`:
|
| 304 |
-
```bash
|
| 305 |
-
HF_DEVICE=cuda
|
| 306 |
-
```
|
| 307 |
-
|
| 308 |
-
### Reduce Memory Usage
|
| 309 |
-
|
| 310 |
-
```bash
|
| 311 |
-
# In .env
|
| 312 |
-
HF_MODEL_NAME=gpt2 # Tiny model
|
| 313 |
-
OLLAMA_MODEL=orca-mini # Smaller Ollama model
|
| 314 |
-
TOP_K_SEARCH=10 # Fewer candidates
|
| 315 |
-
```
|
| 316 |
-
|
| 317 |
-
### Cache Management
|
| 318 |
-
|
| 319 |
-
```bash
|
| 320 |
-
# Clear HuggingFace cache
|
| 321 |
-
docker-compose down
|
| 322 |
-
docker volume rm qmodel_huggingface_cache
|
| 323 |
-
|
| 324 |
-
# Or cleanup all
|
| 325 |
-
docker system prune -a
|
| 326 |
-
```
|
| 327 |
-
|
| 328 |
-
---
|
| 329 |
-
|
| 330 |
-
## Docker Networking
|
| 331 |
-
|
| 332 |
-
### Access QModel from Host
|
| 333 |
-
|
| 334 |
-
```bash
|
| 335 |
-
# Default (works)
|
| 336 |
-
curl http://localhost:8000/health
|
| 337 |
-
```
|
| 338 |
-
|
| 339 |
-
### Custom Network
|
| 340 |
-
|
| 341 |
-
```bash
|
| 342 |
-
# Create network
|
| 343 |
-
docker network create qmodel-net
|
| 344 |
-
|
| 345 |
-
# Run with network
|
| 346 |
-
docker-compose -f docker-compose.yml up
|
| 347 |
-
```
|
| 348 |
-
|
| 349 |
-
### Multiple Containers
|
| 350 |
-
|
| 351 |
-
```yaml
|
| 352 |
-
# docker-compose.yml
|
| 353 |
-
services:
|
| 354 |
-
qmodel:
|
| 355 |
-
networks:
|
| 356 |
-
- custom-network
|
| 357 |
-
other-service:
|
| 358 |
-
networks:
|
| 359 |
-
- custom-network
|
| 360 |
-
|
| 361 |
-
networks:
|
| 362 |
-
custom-network:
|
| 363 |
-
driver: bridge
|
| 364 |
-
```
|
| 365 |
-
|
| 366 |
-
---
|
| 367 |
-
|
| 368 |
-
## CI/CD Integration
|
| 369 |
-
|
| 370 |
-
### GitHub Actions Example
|
| 371 |
-
|
| 372 |
-
```yaml
|
| 373 |
-
name: Deploy QModel
|
| 374 |
-
|
| 375 |
-
on: [push]
|
| 376 |
-
|
| 377 |
-
jobs:
|
| 378 |
-
deploy:
|
| 379 |
-
runs-on: ubuntu-latest
|
| 380 |
-
steps:
|
| 381 |
-
- uses: actions/checkout@v2
|
| 382 |
-
|
| 383 |
-
- name: Build Docker image
|
| 384 |
-
run: docker build -t qmodel .
|
| 385 |
-
|
| 386 |
-
- name: Run tests
|
| 387 |
-
run: |
|
| 388 |
-
docker run -port 8000:8000 qmodel &
|
| 389 |
-
sleep 30
|
| 390 |
-
curl http://localhost:8000/health
|
| 391 |
-
|
| 392 |
-
- name: Push to registry
|
| 393 |
-
run: |
|
| 394 |
-
echo ${{ secrets.REGISTRY_TOKEN }} | docker login -u ${{ secrets.REGISTRY_USER }}
|
| 395 |
-
docker tag qmodel myregistry/qmodel:${{ github.sha }}
|
| 396 |
-
docker push myregistry/qmodel:${{ github.sha }}
|
| 397 |
-
```
|
| 398 |
-
|
| 399 |
-
---
|
| 400 |
-
|
| 401 |
-
## Security Considerations
|
| 402 |
-
|
| 403 |
-
### Secrets Management
|
| 404 |
-
|
| 405 |
-
```bash
|
| 406 |
-
# Don't commit .env with real tokens
|
| 407 |
-
echo ".env" >> .gitignore
|
| 408 |
-
|
| 409 |
-
# Use Docker secrets (Swarm mode)
|
| 410 |
-
docker secret create hf_token -
|
| 411 |
-
# Then use in compose:
|
| 412 |
-
# HF_TOKEN=${HF_TOKEN_FILE}
|
| 413 |
-
```
|
| 414 |
-
|
| 415 |
-
### CORS Configuration
|
| 416 |
-
|
| 417 |
-
```bash
|
| 418 |
-
# In .env (restrict in production)
|
| 419 |
-
ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com
|
| 420 |
-
```
|
| 421 |
-
|
| 422 |
-
### Network Isolation
|
| 423 |
-
|
| 424 |
-
```yaml
|
| 425 |
-
# docker-compose.yml
|
| 426 |
-
services:
|
| 427 |
-
qmodel:
|
| 428 |
-
networks:
|
| 429 |
-
- internal
|
| 430 |
-
|
| 431 |
-
networks:
|
| 432 |
-
internal:
|
| 433 |
-
internal: true
|
| 434 |
-
```
|
| 435 |
-
|
| 436 |
-
---
|
| 437 |
-
|
| 438 |
-
## Reference
|
| 439 |
-
|
| 440 |
-
- **Dockerfile**: Multi-stage build, health checks, proper layer caching
|
| 441 |
-
- **docker-compose.yml**: Service definition, volumes, networking, health checks
|
| 442 |
-
- **Environment**: Fully configurable via `.env`
|
| 443 |
-
- **Backends**: Ollama (local) or HuggingFace (remote) via `LLM_BACKEND` variable
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
OPEN_WEBUI.md
DELETED
|
@@ -1,385 +0,0 @@
|
|
| 1 |
-
# Using QModel v6 with Open-WebUI
|
| 2 |
-
|
| 3 |
-
QModel v6 is fully compatible with **Open-WebUI** thanks to its OpenAI-compatible API endpoints. This guide shows you how to integrate them.
|
| 4 |
-
|
| 5 |
-
## Prerequisites
|
| 6 |
-
|
| 7 |
-
1. **QModel running** on your local machine or server
|
| 8 |
-
```bash
|
| 9 |
-
python main.py
|
| 10 |
-
# Runs on http://localhost:8000
|
| 11 |
-
```
|
| 12 |
-
|
| 13 |
-
2. **Open-WebUI installed** (Docker recommended)
|
| 14 |
-
```bash
|
| 15 |
-
docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:latest
|
| 16 |
-
# Runs on http://localhost:3000
|
| 17 |
-
```
|
| 18 |
-
|
| 19 |
-
---
|
| 20 |
-
|
| 21 |
-
## Integration Steps
|
| 22 |
-
|
| 23 |
-
### Step 1: Add QModel as a Custom OpenAI-Compatible Model
|
| 24 |
-
|
| 25 |
-
In Open-WebUI:
|
| 26 |
-
|
| 27 |
-
1. **Settings** β **Models** β **Manage Models**
|
| 28 |
-
2. Click **"Connect to OpenAI-compatible API"**
|
| 29 |
-
3. Enter:
|
| 30 |
-
- **API Base URL**: `http://localhost:8000/v1`
|
| 31 |
-
- **Model Name**: `QModel` (or `qmodel`)
|
| 32 |
-
- **API Key**: Leave blank (no auth required)
|
| 33 |
-
|
| 34 |
-
4. Click **"Save & Test"**
|
| 35 |
-
5. You should see: β
**Model connected successfully**
|
| 36 |
-
|
| 37 |
-
### Step 2: Start Using QModel
|
| 38 |
-
|
| 39 |
-
1. Open a **New Chat** in Open-WebUI
|
| 40 |
-
2. Select **QModel** from the model dropdown
|
| 41 |
-
3. Type your Islamic question:
|
| 42 |
-
```
|
| 43 |
-
What does the Quran say about mercy?
|
| 44 |
-
```
|
| 45 |
-
|
| 46 |
-
4. Press Enter and get an Islamic-grounded RAG response with sources!
|
| 47 |
-
|
| 48 |
-
---
|
| 49 |
-
|
| 50 |
-
## API Endpoints (OpenAI-Compatible)
|
| 51 |
-
|
| 52 |
-
### POST `/v1/chat/completions`
|
| 53 |
-
Standard OpenAI chat completions endpoint.
|
| 54 |
-
|
| 55 |
-
**Request:**
|
| 56 |
-
```json
|
| 57 |
-
{
|
| 58 |
-
"model": "QModel",
|
| 59 |
-
"messages": [
|
| 60 |
-
{"role": "user", "content": "What does Islam say about patience?"}
|
| 61 |
-
],
|
| 62 |
-
"temperature": 0.2,
|
| 63 |
-
"max_tokens": 2048,
|
| 64 |
-
"top_k": 5,
|
| 65 |
-
"stream": false
|
| 66 |
-
}
|
| 67 |
-
```
|
| 68 |
-
|
| 69 |
-
**Response:**
|
| 70 |
-
```json
|
| 71 |
-
{
|
| 72 |
-
"id": "qmodel-1234567890",
|
| 73 |
-
"object": "chat.completion",
|
| 74 |
-
"created": 1234567890,
|
| 75 |
-
"model": "QModel",
|
| 76 |
-
"choices": [
|
| 77 |
-
{
|
| 78 |
-
"index": 0,
|
| 79 |
-
"message": {
|
| 80 |
-
"role": "assistant",
|
| 81 |
-
"content": "Islam emphasizes patience as a core virtue..."
|
| 82 |
-
},
|
| 83 |
-
"finish_reason": "stop"
|
| 84 |
-
}
|
| 85 |
-
],
|
| 86 |
-
"x_metadata": {
|
| 87 |
-
"language": "english",
|
| 88 |
-
"intent": "general",
|
| 89 |
-
"top_score": 0.876,
|
| 90 |
-
"latency_ms": 342,
|
| 91 |
-
"sources": [
|
| 92 |
-
{
|
| 93 |
-
"source": "Surah Al-Imran 3:200",
|
| 94 |
-
"type": "quran",
|
| 95 |
-
"grade": null,
|
| 96 |
-
"score": 0.876
|
| 97 |
-
}
|
| 98 |
-
]
|
| 99 |
-
}
|
| 100 |
-
}
|
| 101 |
-
```
|
| 102 |
-
|
| 103 |
-
### GET `/v1/models`
|
| 104 |
-
List available models.
|
| 105 |
-
|
| 106 |
-
**Response:**
|
| 107 |
-
```json
|
| 108 |
-
{
|
| 109 |
-
"object": "list",
|
| 110 |
-
"data": [
|
| 111 |
-
{
|
| 112 |
-
"id": "QModel",
|
| 113 |
-
"object": "model",
|
| 114 |
-
"created": 1234567890,
|
| 115 |
-
"owned_by": "elgendy"
|
| 116 |
-
}
|
| 117 |
-
]
|
| 118 |
-
}
|
| 119 |
-
```
|
| 120 |
-
|
| 121 |
-
---
|
| 122 |
-
|
| 123 |
-
## Advanced Query Parameters (Open-WebUI Compatible)
|
| 124 |
-
|
| 125 |
-
When using Open-WebUI, you can include special parameters:
|
| 126 |
-
|
| 127 |
-
### Islamic-Specific Parameters
|
| 128 |
-
|
| 129 |
-
**URL Query String:**
|
| 130 |
-
```
|
| 131 |
-
/v1/chat/completions?source_type=hadith&grade_filter=sahih&top_k=5
|
| 132 |
-
```
|
| 133 |
-
|
| 134 |
-
**Supported Parameters:**
|
| 135 |
-
- `source_type`: `quran` | `hadith` | (both, default)
|
| 136 |
-
- `grade_filter`: `sahih` | `hasan` | (all, default)
|
| 137 |
-
- `top_k`: 1-20 (number of sources to retrieve)
|
| 138 |
-
|
| 139 |
-
### Example Requests via curl
|
| 140 |
-
|
| 141 |
-
```bash
|
| 142 |
-
# 1. Basic query (both Quran + Hadith)
|
| 143 |
-
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 144 |
-
-H "Content-Type: application/json" \
|
| 145 |
-
-d '{
|
| 146 |
-
"model": "QModel",
|
| 147 |
-
"messages": [{"role": "user", "content": "What does Islam say about mercy?"}]
|
| 148 |
-
}'
|
| 149 |
-
|
| 150 |
-
# 2. Quran-only query
|
| 151 |
-
curl -X POST http://localhost:8000/v1/chat/completions?source_type=quran \
|
| 152 |
-
-H "Content-Type: application/json" \
|
| 153 |
-
-d '{
|
| 154 |
-
"model": "QModel",
|
| 155 |
-
"messages": [{"role": "user", "content": "What does the Quran say about patience?"}]
|
| 156 |
-
}'
|
| 157 |
-
|
| 158 |
-
# 3. Authenticated Hadiths only (Sahih grade)
|
| 159 |
-
curl -X POST http://localhost:8000/v1/chat/completions?source_type=hadith&grade_filter=sahih \
|
| 160 |
-
-H "Content-Type: application/json" \
|
| 161 |
-
-d '{
|
| 162 |
-
"model": "QModel",
|
| 163 |
-
"messages": [{"role": "user", "content": "Hadiths about prayer"}]
|
| 164 |
-
}'
|
| 165 |
-
|
| 166 |
-
# 4. Streaming response
|
| 167 |
-
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 168 |
-
-H "Content-Type: application/json" \
|
| 169 |
-
-d '{
|
| 170 |
-
"model": "QModel",
|
| 171 |
-
"messages": [{"role": "user", "content": "Tell me about Zakat"}],
|
| 172 |
-
"stream": true
|
| 173 |
-
}'
|
| 174 |
-
```
|
| 175 |
-
|
| 176 |
-
---
|
| 177 |
-
|
| 178 |
-
## Open-WebUI Features Supported
|
| 179 |
-
|
| 180 |
-
| Feature | Status | Notes |
|
| 181 |
-
|---------|--------|-------|
|
| 182 |
-
| **Chat** | β
Full support | Normal Q&A |
|
| 183 |
-
| **Streaming** | β
Supported | Set `stream: true` in request |
|
| 184 |
-
| **Context** | β
Multi-turn | Open-WebUI handles conversation history |
|
| 185 |
-
| **Temperature** | β
Configurable | Via Open-WebUI settings |
|
| 186 |
-
| **Token Limits** | β
Supported | Via `max_tokens` parameter |
|
| 187 |
-
| **Model List** | β
Available | Via `/v1/models` endpoint |
|
| 188 |
-
| **Source Attribution** | β
In metadata | Via `x_metadata.sources` |
|
| 189 |
-
|
| 190 |
-
---
|
| 191 |
-
|
| 192 |
-
## Custom System Prompts in Open-WebUI
|
| 193 |
-
|
| 194 |
-
To customize QModel for specific Islamic tasks, create a custom chatbot in Open-WebUI:
|
| 195 |
-
|
| 196 |
-
1. **Home** β **+ New Chatbot**
|
| 197 |
-
2. Configure:
|
| 198 |
-
- **Name**: "Islamic Scholar" (or your choice)
|
| 199 |
-
- **Model**: QModel
|
| 200 |
-
- **System Prompt**:
|
| 201 |
-
```
|
| 202 |
-
You are an expert Islamic scholar specializing in Qur'an and Hadith.
|
| 203 |
-
Always cite sources exactly as provided.
|
| 204 |
-
Only answer from the provided Islamic contextβnever use outside knowledge.
|
| 205 |
-
If information is not in the dataset, say so clearly.
|
| 206 |
-
```
|
| 207 |
-
- **Top K Sources**: 5
|
| 208 |
-
- **Temperature**: 0.1 (for consistency)
|
| 209 |
-
|
| 210 |
-
3. **Save** and start chatting!
|
| 211 |
-
|
| 212 |
-
---
|
| 213 |
-
|
| 214 |
-
## Troubleshooting
|
| 215 |
-
|
| 216 |
-
### Issue: "Failed to connect to QModel"
|
| 217 |
-
|
| 218 |
-
**Solutions:**
|
| 219 |
-
1. Check QModel is running: `curl http://localhost:8000/health`
|
| 220 |
-
2. Verify API Base URL is correct: `http://localhost:8000/v1`
|
| 221 |
-
3. Check firewall: Port 8000 must be accessible
|
| 222 |
-
4. Check logs: `python main.py` to see startup messages
|
| 223 |
-
|
| 224 |
-
### Issue: "No sources in response"
|
| 225 |
-
|
| 226 |
-
**Solutions:**
|
| 227 |
-
1. Check `/debug/scores` endpoint directly:
|
| 228 |
-
```bash
|
| 229 |
-
curl "http://localhost:8000/debug/scores?q=patience&top_k=10"
|
| 230 |
-
```
|
| 231 |
-
2. Adjust `CONFIDENCE_THRESHOLD` in `.env` if retrievals are low-quality
|
| 232 |
-
3. Try synonyms: "mercy" instead of "compassion"
|
| 233 |
-
|
| 234 |
-
### Issue: "Assistant returns 'Not found'"
|
| 235 |
-
|
| 236 |
-
**This is expected behavior!** QModel has safety checks:
|
| 237 |
-
1. If retrieval score is too low (< 0.30), it returns "not found"
|
| 238 |
-
2. This prevents hallucinations
|
| 239 |
-
3. Try more specific queries or adjust `CONFIDENCE_THRESHOLD`
|
| 240 |
-
|
| 241 |
-
---
|
| 242 |
-
|
| 243 |
-
## Configuration for Open-WebUI
|
| 244 |
-
|
| 245 |
-
### Recommended Settings
|
| 246 |
-
|
| 247 |
-
For best results with Open-WebUI:
|
| 248 |
-
|
| 249 |
-
```env
|
| 250 |
-
# More conservative (fewer hallucinations)
|
| 251 |
-
CONFIDENCE_THRESHOLD=0.40
|
| 252 |
-
TEMPERATURE=0.1
|
| 253 |
-
HADITH_BOOST=0.08
|
| 254 |
-
|
| 255 |
-
# More liberal (more answers, higher hallucination risk)
|
| 256 |
-
CONFIDENCE_THRESHOLD=0.20
|
| 257 |
-
TEMPERATURE=0.3
|
| 258 |
-
HADITH_BOOST=0.05
|
| 259 |
-
```
|
| 260 |
-
|
| 261 |
-
### Docker Compose Integration
|
| 262 |
-
|
| 263 |
-
To run both QModel and Open-WebUI together:
|
| 264 |
-
|
| 265 |
-
```yaml
|
| 266 |
-
version: '3.8'
|
| 267 |
-
services:
|
| 268 |
-
qmodel:
|
| 269 |
-
build: .
|
| 270 |
-
ports:
|
| 271 |
-
- "8000:8000"
|
| 272 |
-
environment:
|
| 273 |
-
- LLM_BACKEND=ollama
|
| 274 |
-
- OLLAMA_HOST=http://ollama:11434
|
| 275 |
-
depends_on:
|
| 276 |
-
- ollama
|
| 277 |
-
|
| 278 |
-
ollama:
|
| 279 |
-
image: ollama/ollama:latest
|
| 280 |
-
ports:
|
| 281 |
-
- "11434:11434"
|
| 282 |
-
|
| 283 |
-
web-ui:
|
| 284 |
-
image: ghcr.io/open-webui/open-webui:latest
|
| 285 |
-
ports:
|
| 286 |
-
- "3000:8080"
|
| 287 |
-
depends_on:
|
| 288 |
-
- qmodel
|
| 289 |
-
```
|
| 290 |
-
|
| 291 |
-
Run: `docker-compose up`
|
| 292 |
-
|
| 293 |
-
---
|
| 294 |
-
|
| 295 |
-
## Using QModel in Open-WebUI Workflows
|
| 296 |
-
|
| 297 |
-
### Example 1: Islamic Q&A Chatbot
|
| 298 |
-
|
| 299 |
-
1. Create chatbot with system prompt about Islamic knowledge
|
| 300 |
-
2. Select QModel as backend
|
| 301 |
-
3. Set temperature to 0.1 for consistency
|
| 302 |
-
4. Enable web search toggle (optional, for cross-verification)
|
| 303 |
-
|
| 304 |
-
### Example 2: Hadith Research Tool
|
| 305 |
-
|
| 306 |
-
1. Create chatbot: "Hadith Researcher"
|
| 307 |
-
2. System prompt:
|
| 308 |
-
```
|
| 309 |
-
You are a Hadith researcher. For each query:
|
| 310 |
-
1. Search authenticated Hadiths only (Sahih grade)
|
| 311 |
-
2. Display the full text with authenticity grade
|
| 312 |
-
3. Explain the Hadith's significance
|
| 313 |
-
4. Always cite the collection and number
|
| 314 |
-
```
|
| 315 |
-
3. Enable grade filtering: `grade_filter=sahih`
|
| 316 |
-
|
| 317 |
-
### Example 3: Qur'anic Study Assistant
|
| 318 |
-
|
| 319 |
-
1. Create chatbot: "Qur'an Tafsir"
|
| 320 |
-
2. Set `source_type=quran` in parameters
|
| 321 |
-
3. System prompt focusing on Qur'anic interpretation
|
| 322 |
-
4. Enable multi-turn for deeper exploration
|
| 323 |
-
|
| 324 |
-
---
|
| 325 |
-
|
| 326 |
-
## API Testing
|
| 327 |
-
|
| 328 |
-
### Test with Open-WebUI's Developer Tools
|
| 329 |
-
|
| 330 |
-
1. Open Open-WebUI console (F12)
|
| 331 |
-
2. Go to **Network** tab
|
| 332 |
-
3. Send a message to QModel
|
| 333 |
-
4. Inspect the request/response to `/v1/chat/completions`
|
| 334 |
-
|
| 335 |
-
### Test with cURL
|
| 336 |
-
|
| 337 |
-
```bash
|
| 338 |
-
# 1. Health check
|
| 339 |
-
curl http://localhost:8000/health | jq
|
| 340 |
-
|
| 341 |
-
# 2. List models
|
| 342 |
-
curl http://localhost:8000/v1/models | jq
|
| 343 |
-
|
| 344 |
-
# 3. Simple chat
|
| 345 |
-
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 346 |
-
-H "Content-Type: application/json" \
|
| 347 |
-
-d '{"model":"QModel","messages":[{"role":"user","content":"Assalam alaikum"}]}' | jq
|
| 348 |
-
```
|
| 349 |
-
|
| 350 |
-
---
|
| 351 |
-
|
| 352 |
-
## Performance Tips
|
| 353 |
-
|
| 354 |
-
### For Optimal Open-WebUI Experience
|
| 355 |
-
|
| 356 |
-
1. **Use Ollama locally** for responsive chat (400-800ms per query)
|
| 357 |
-
2. **Set `max_tokens=1024`** to avoid long waits
|
| 358 |
-
3. **Use temperature=0.1** for reliable, consistent answers
|
| 359 |
-
4. **Increase `CACHE_TTL`** for frequently asked questions
|
| 360 |
-
5. **Reduce `TOP_K_SEARCH`** if queries are slow (default 20)
|
| 361 |
-
|
| 362 |
-
---
|
| 363 |
-
|
| 364 |
-
## Security Notes
|
| 365 |
-
|
| 366 |
-
### For Production Deployments
|
| 367 |
-
|
| 368 |
-
1. **Restrict CORS**: Set `ALLOWED_ORIGINS=your-domain.com` in `.env`
|
| 369 |
-
2. **Use HTTPS**: Proxy through nginx with TLS
|
| 370 |
-
3. **Rate limit**: Add rate limiting middleware (not in v6, but recommended)
|
| 371 |
-
4. **Authentication**: Consider adding API key validation layer
|
| 372 |
-
5. **Network**: Don't expose QModel directly to the internet without auth
|
| 373 |
-
|
| 374 |
-
---
|
| 375 |
-
|
| 376 |
-
## Support
|
| 377 |
-
|
| 378 |
-
- π Full setup guide: See `SETUP.md`
|
| 379 |
-
- π Debugging: Use `/debug/scores` to inspect retrievals
|
| 380 |
-
- π¬ Questions about Open-WebUI: See https://docs.openwebui.com
|
| 381 |
-
- π Islamic knowledge: See `ARCHITECTURE.md` for system details
|
| 382 |
-
|
| 383 |
-
---
|
| 384 |
-
|
| 385 |
-
**Happy chatting with QModel + Open-WebUI! π**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -64,33 +64,218 @@ language:
|
|
| 64 |
|
| 65 |
---
|
| 66 |
|
| 67 |
-
## Quick Start
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
```bash
|
| 70 |
-
#
|
| 71 |
-
git clone https://github.com/
|
| 72 |
python3 -m venv .venv && source .venv/bin/activate
|
| 73 |
pip install -r requirements.txt
|
| 74 |
|
| 75 |
-
#
|
| 76 |
-
#
|
| 77 |
export LLM_BACKEND=ollama
|
| 78 |
export OLLAMA_MODEL=llama2
|
| 79 |
# Make sure Ollama is running: ollama serve
|
| 80 |
|
| 81 |
-
#
|
| 82 |
export LLM_BACKEND=hf
|
| 83 |
export HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
|
| 84 |
|
| 85 |
-
#
|
| 86 |
python main.py
|
| 87 |
|
| 88 |
-
#
|
| 89 |
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
|
| 90 |
```
|
| 91 |
|
| 92 |
API docs: http://localhost:8000/docs
|
| 93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
---
|
| 95 |
|
| 96 |
## Example Queries
|
|
@@ -105,134 +290,372 @@ curl "http://localhost:8000/ask?q=How%20many%20times%20is%20mercy%20mentioned?"
|
|
| 105 |
# Authentic Hadiths only
|
| 106 |
curl "http://localhost:8000/ask?q=prayer&source_type=hadith&grade_filter=sahih"
|
| 107 |
|
| 108 |
-
#
|
| 109 |
-
curl "http://localhost:8000/
|
| 110 |
-
```
|
| 111 |
|
| 112 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
|
|
|
|
|
|
| 122 |
|
| 123 |
---
|
| 124 |
|
| 125 |
-
##
|
|
|
|
|
|
|
| 126 |
|
| 127 |
### Backend Selection
|
| 128 |
-
- **Ollama** β Fast setup, no GPU, great for development, `LLM_BACKEND=ollama`
|
| 129 |
-
- **HuggingFace** β Production-grade, better quality, GPU recommended, `LLM_BACKEND=hf`
|
| 130 |
|
| 131 |
-
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
-
###
|
| 134 |
-
- **47,626 documents**: 6,236 Quranic verses + 41,390 hadiths from 9 canonical collections
|
| 135 |
-
- **Pre-built**: `metadata.json` and `QModel.index` included, ready to use
|
| 136 |
-
- **Dual-language**: Arabic and English support
|
| 137 |
|
| 138 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
|
| 140 |
-
|
| 141 |
|
| 142 |
-
|
| 143 |
|
| 144 |
```bash
|
| 145 |
-
|
| 146 |
-
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
-
#
|
| 149 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 154 |
```
|
| 155 |
|
| 156 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
---
|
| 159 |
|
| 160 |
-
##
|
| 161 |
|
| 162 |
-
###
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
```
|
| 164 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
```
|
| 166 |
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
|
| 173 |
-
###
|
| 174 |
-
- `GET /debug/scores?q=<question>&top_k=10` β Inspect raw retrieval scores
|
| 175 |
-
- `GET /hadith/verify?q=<hadith_text>` β Check hadith authenticity
|
| 176 |
-
- `POST /v1/chat/completions` β OpenAI-compatible endpoint
|
| 177 |
-
- `GET /health` β Health check
|
| 178 |
|
| 179 |
-
|
|
|
|
|
|
|
| 180 |
|
| 181 |
---
|
| 182 |
|
| 183 |
-
##
|
| 184 |
|
| 185 |
-
|
| 186 |
|
| 187 |
-
|
| 188 |
-
# Backend (required)
|
| 189 |
-
LLM_BACKEND=ollama # or: hf
|
| 190 |
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
|
| 195 |
-
#
|
| 196 |
-
|
| 197 |
-
|
| 198 |
|
| 199 |
-
#
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
```
|
| 204 |
|
| 205 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 206 |
|
| 207 |
---
|
| 208 |
|
| 209 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 210 |
|
| 211 |
| Operation | Time | Backend |
|
| 212 |
|-----------|------|---------|
|
| 213 |
| Query (cached) | ~50ms | Both |
|
| 214 |
-
| Query (Ollama) | 400
|
| 215 |
-
| Query (HF GPU) | 500
|
| 216 |
-
| Query (HF CPU) | 2
|
| 217 |
|
| 218 |
---
|
| 219 |
|
| 220 |
-
##
|
| 221 |
|
| 222 |
-
###
|
| 223 |
```bash
|
| 224 |
-
|
|
|
|
| 225 |
```
|
| 226 |
|
| 227 |
-
###
|
| 228 |
```bash
|
| 229 |
-
|
| 230 |
```
|
| 231 |
|
| 232 |
-
###
|
| 233 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 234 |
|
| 235 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 236 |
|
| 237 |
---
|
| 238 |
|
|
|
|
| 64 |
|
| 65 |
---
|
| 66 |
|
| 67 |
+
## Quick Start
|
| 68 |
+
|
| 69 |
+
### Prerequisites
|
| 70 |
+
- Python 3.10+
|
| 71 |
+
- 16 GB RAM minimum (for embeddings + LLM)
|
| 72 |
+
- GPU recommended for HuggingFace backend
|
| 73 |
+
- Ollama installed (for local development) OR internet access (for HuggingFace)
|
| 74 |
+
|
| 75 |
+
### Installation
|
| 76 |
|
| 77 |
```bash
|
| 78 |
+
# Clone and enter project
|
| 79 |
+
git clone https://github.com/Logicsoft/QModel.git && cd QModel
|
| 80 |
python3 -m venv .venv && source .venv/bin/activate
|
| 81 |
pip install -r requirements.txt
|
| 82 |
|
| 83 |
+
# Configure (choose one backend)
|
| 84 |
+
# Option A β Ollama (local development):
|
| 85 |
export LLM_BACKEND=ollama
|
| 86 |
export OLLAMA_MODEL=llama2
|
| 87 |
# Make sure Ollama is running: ollama serve
|
| 88 |
|
| 89 |
+
# Option B β HuggingFace (production):
|
| 90 |
export LLM_BACKEND=hf
|
| 91 |
export HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
|
| 92 |
|
| 93 |
+
# Run
|
| 94 |
python main.py
|
| 95 |
|
| 96 |
+
# Query
|
| 97 |
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
|
| 98 |
```
|
| 99 |
|
| 100 |
API docs: http://localhost:8000/docs
|
| 101 |
|
| 102 |
+
### Data & Index
|
| 103 |
+
|
| 104 |
+
Pre-built data files are included:
|
| 105 |
+
- `metadata.json` β 47,626 documents (6,236 Quran verses + 41,390 hadiths from 9 canonical collections)
|
| 106 |
+
- `QModel.index` β FAISS search index
|
| 107 |
+
|
| 108 |
+
To rebuild after dataset changes:
|
| 109 |
+
```bash
|
| 110 |
+
python build_index.py
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
---
|
| 114 |
+
|
| 115 |
+
## API Reference (18 endpoints)
|
| 116 |
+
|
| 117 |
+
### Inference
|
| 118 |
+
|
| 119 |
+
| Endpoint | Method | Description |
|
| 120 |
+
|----------|--------|-------------|
|
| 121 |
+
| `/ask?q=...&top_k=5&source_type=&grade_filter=` | GET | Direct RAG query with full source attribution |
|
| 122 |
+
| `/v1/chat/completions` | POST | OpenAI-compatible chat (SSE streaming supported) |
|
| 123 |
+
|
| 124 |
+
### Quran (`/quran/...`)
|
| 125 |
+
|
| 126 |
+
| Endpoint | Method | Description |
|
| 127 |
+
|----------|--------|-------------|
|
| 128 |
+
| `/quran/search?q=...&limit=10` | GET | Text search: find verses by partial Arabic/English text |
|
| 129 |
+
| `/quran/topic?topic=...&top_k=10` | GET | Semantic search: find verses related to a topic |
|
| 130 |
+
| `/quran/word-frequency?word=...` | GET | Count word occurrences across all Surahs |
|
| 131 |
+
| `/quran/analytics` | GET | Overall Quran stats (total verses, Surahs, revelation types) |
|
| 132 |
+
| `/quran/chapter/{number}` | GET | All verses and metadata for a specific Surah |
|
| 133 |
+
| `/quran/verse/{surah}:{ayah}` | GET | Exact verse lookup by reference (e.g. `/quran/verse/2:255`) |
|
| 134 |
+
|
| 135 |
+
### Hadith (`/hadith/...`)
|
| 136 |
+
|
| 137 |
+
| Endpoint | Method | Description |
|
| 138 |
+
|----------|--------|-------------|
|
| 139 |
+
| `/hadith/search?q=...&collection=&limit=10` | GET | Text search across collections |
|
| 140 |
+
| `/hadith/topic?topic=...&top_k=10&grade_filter=` | GET | Semantic search by topic with optional grade filter |
|
| 141 |
+
| `/hadith/verify?q=...&collection=` | GET | Authenticity verification (text + semantic search) |
|
| 142 |
+
| `/hadith/collection/{name}?limit=20&offset=0` | GET | Browse a specific collection |
|
| 143 |
+
| `/hadith/analytics` | GET | Collection-level statistics |
|
| 144 |
+
|
| 145 |
+
### Operations
|
| 146 |
+
|
| 147 |
+
| Endpoint | Method | Description |
|
| 148 |
+
|----------|--------|-------------|
|
| 149 |
+
| `/health` | GET | Readiness check |
|
| 150 |
+
| `/v1/models` | GET | OpenAI-compatible model listing |
|
| 151 |
+
| `/debug/scores?q=...&top_k=10&source_type=` | GET | Raw retrieval scores (no LLM call) |
|
| 152 |
+
|
| 153 |
+
---
|
| 154 |
+
|
| 155 |
+
### GET `/ask` β Main Query
|
| 156 |
+
|
| 157 |
+
```bash
|
| 158 |
+
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?&top_k=5"
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
+
**Parameters:**
|
| 162 |
+
| Parameter | Default | Description |
|
| 163 |
+
|-----------|---------|-------------|
|
| 164 |
+
| `q` | *(required)* | Your Islamic question |
|
| 165 |
+
| `top_k` | `5` | Number of sources to retrieve (1β20) |
|
| 166 |
+
| `source_type` | both | `quran` or `hadith` |
|
| 167 |
+
| `grade_filter` | all | `sahih` or `hasan` |
|
| 168 |
+
|
| 169 |
+
**Response:**
|
| 170 |
+
```json
|
| 171 |
+
{
|
| 172 |
+
"question": "What does Islam say about mercy?",
|
| 173 |
+
"answer": "Islam emphasizes mercy as a core value...",
|
| 174 |
+
"language": "english",
|
| 175 |
+
"intent": "general",
|
| 176 |
+
"analysis": null,
|
| 177 |
+
"sources": [
|
| 178 |
+
{
|
| 179 |
+
"source": "Surah Al-Baqarah 2:178",
|
| 180 |
+
"type": "quran",
|
| 181 |
+
"grade": null,
|
| 182 |
+
"arabic": "...",
|
| 183 |
+
"english": "...",
|
| 184 |
+
"_score": 0.876
|
| 185 |
+
}
|
| 186 |
+
],
|
| 187 |
+
"top_score": 0.876,
|
| 188 |
+
"latency_ms": 342
|
| 189 |
+
}
|
| 190 |
+
```
|
| 191 |
+
|
| 192 |
+
### POST `/v1/chat/completions` β OpenAI-Compatible
|
| 193 |
+
|
| 194 |
+
```bash
|
| 195 |
+
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 196 |
+
-H "Content-Type: application/json" \
|
| 197 |
+
-d '{
|
| 198 |
+
"model": "QModel",
|
| 199 |
+
"messages": [{"role": "user", "content": "What does Islam say about patience?"}],
|
| 200 |
+
"temperature": 0.2,
|
| 201 |
+
"max_tokens": 2048,
|
| 202 |
+
"top_k": 5,
|
| 203 |
+
"stream": false
|
| 204 |
+
}'
|
| 205 |
+
```
|
| 206 |
+
|
| 207 |
+
**Response:**
|
| 208 |
+
```json
|
| 209 |
+
{
|
| 210 |
+
"id": "qmodel-1234567890",
|
| 211 |
+
"object": "chat.completion",
|
| 212 |
+
"created": 1234567890,
|
| 213 |
+
"model": "QModel",
|
| 214 |
+
"choices": [
|
| 215 |
+
{
|
| 216 |
+
"index": 0,
|
| 217 |
+
"message": { "role": "assistant", "content": "Islam emphasizes patience..." },
|
| 218 |
+
"finish_reason": "stop"
|
| 219 |
+
}
|
| 220 |
+
],
|
| 221 |
+
"x_metadata": {
|
| 222 |
+
"language": "english",
|
| 223 |
+
"intent": "general",
|
| 224 |
+
"top_score": 0.876,
|
| 225 |
+
"latency_ms": 342,
|
| 226 |
+
"sources": [{ "source": "Surah Al-Imran 3:200", "type": "quran", "score": 0.876 }]
|
| 227 |
+
}
|
| 228 |
+
}
|
| 229 |
+
```
|
| 230 |
+
|
| 231 |
+
### GET `/hadith/verify` β Authenticity Check
|
| 232 |
+
|
| 233 |
+
```bash
|
| 234 |
+
curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"
|
| 235 |
+
```
|
| 236 |
+
|
| 237 |
+
**Response:**
|
| 238 |
+
```json
|
| 239 |
+
{
|
| 240 |
+
"query": "Actions are judged by intentions",
|
| 241 |
+
"found": true,
|
| 242 |
+
"collection": "Sahih al-Bukhari",
|
| 243 |
+
"grade": "Sahih",
|
| 244 |
+
"reference": "Sahih al-Bukhari 1",
|
| 245 |
+
"arabic": "Ψ₯ΩΩ
Ψ§ Ψ§ΩΨ£ΨΉΩ
Ψ§Ω Ψ¨Ψ§ΩΩΩΨ§Ψͺ",
|
| 246 |
+
"english": "Verily, actions are judged by intentions...",
|
| 247 |
+
"latency_ms": 156
|
| 248 |
+
}
|
| 249 |
+
```
|
| 250 |
+
|
| 251 |
+
### GET `/debug/scores` β Retrieval Inspection
|
| 252 |
+
|
| 253 |
+
```bash
|
| 254 |
+
curl "http://localhost:8000/debug/scores?q=patience&top_k=10"
|
| 255 |
+
```
|
| 256 |
+
|
| 257 |
+
Use this to calibrate `CONFIDENCE_THRESHOLD`. If queries you expect to work have `_score < threshold`, lower the threshold.
|
| 258 |
+
|
| 259 |
+
**Response:**
|
| 260 |
+
```json
|
| 261 |
+
{
|
| 262 |
+
"query": "patience",
|
| 263 |
+
"intent": "general",
|
| 264 |
+
"threshold": 0.3,
|
| 265 |
+
"count": 10,
|
| 266 |
+
"results": [
|
| 267 |
+
{
|
| 268 |
+
"rank": 1,
|
| 269 |
+
"source": "Surah Al-Baqarah 2:45",
|
| 270 |
+
"type": "quran",
|
| 271 |
+
"_dense": 0.8234,
|
| 272 |
+
"_sparse": 0.5421,
|
| 273 |
+
"_score": 0.7234
|
| 274 |
+
}
|
| 275 |
+
]
|
| 276 |
+
}
|
| 277 |
+
```
|
| 278 |
+
|
| 279 |
---
|
| 280 |
|
| 281 |
## Example Queries
|
|
|
|
| 290 |
# Authentic Hadiths only
|
| 291 |
curl "http://localhost:8000/ask?q=prayer&source_type=hadith&grade_filter=sahih"
|
| 292 |
|
| 293 |
+
# Quran text search
|
| 294 |
+
curl "http://localhost:8000/quran/search?q=bismillah"
|
|
|
|
| 295 |
|
| 296 |
+
# Quran topic search
|
| 297 |
+
curl "http://localhost:8000/quran/topic?topic=patience&top_k=5"
|
| 298 |
+
|
| 299 |
+
# Quran word frequency
|
| 300 |
+
curl "http://localhost:8000/quran/word-frequency?word=mercy"
|
| 301 |
+
|
| 302 |
+
# Single chapter
|
| 303 |
+
curl "http://localhost:8000/quran/chapter/2"
|
| 304 |
|
| 305 |
+
# Exact verse
|
| 306 |
+
curl "http://localhost:8000/quran/verse/2:255"
|
| 307 |
+
|
| 308 |
+
# Hadith text search
|
| 309 |
+
curl "http://localhost:8000/hadith/search?q=actions+are+judged+by+intentions"
|
| 310 |
+
|
| 311 |
+
# Hadith topic search (Sahih only)
|
| 312 |
+
curl "http://localhost:8000/hadith/topic?topic=fasting&grade_filter=sahih"
|
| 313 |
+
|
| 314 |
+
# Verify Hadith authenticity
|
| 315 |
+
curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"
|
| 316 |
|
| 317 |
+
# Browse a collection
|
| 318 |
+
curl "http://localhost:8000/hadith/collection/bukhari?limit=5"
|
| 319 |
+
|
| 320 |
+
# Streaming (OpenAI-compatible)
|
| 321 |
+
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 322 |
+
-H "Content-Type: application/json" \
|
| 323 |
+
-d '{"model":"QModel","messages":[{"role":"user","content":"What does Islam say about charity?"}],"stream":true}'
|
| 324 |
+
```
|
| 325 |
|
| 326 |
---
|
| 327 |
|
| 328 |
+
## Configuration
|
| 329 |
+
|
| 330 |
+
All configuration via environment variables (`.env` file or exported directly):
|
| 331 |
|
| 332 |
### Backend Selection
|
|
|
|
|
|
|
| 333 |
|
| 334 |
+
| Backend | Pros | Cons | When to Use |
|
| 335 |
+
|---------|------|------|------------|
|
| 336 |
+
| **Ollama** | Fast setup, no GPU, free | Smaller models | Development, testing |
|
| 337 |
+
| **HuggingFace** | Larger models, better quality | Requires GPU or significant RAM | Production |
|
| 338 |
|
| 339 |
+
### Ollama Backend (Development)
|
|
|
|
|
|
|
|
|
|
| 340 |
|
| 341 |
+
```bash
|
| 342 |
+
LLM_BACKEND=ollama
|
| 343 |
+
OLLAMA_HOST=http://localhost:11434
|
| 344 |
+
OLLAMA_MODEL=llama2 # or: mistral, neural-chat, orca-mini
|
| 345 |
+
```
|
| 346 |
|
| 347 |
+
Requires: `ollama serve` running and model pulled (`ollama pull llama2`).
|
| 348 |
|
| 349 |
+
### HuggingFace Backend (Production)
|
| 350 |
|
| 351 |
```bash
|
| 352 |
+
LLM_BACKEND=hf
|
| 353 |
+
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
|
| 354 |
+
HF_DEVICE=auto # auto | cuda | cpu
|
| 355 |
+
HF_MAX_NEW_TOKENS=2048
|
| 356 |
+
```
|
| 357 |
|
| 358 |
+
### All Environment Variables
|
| 359 |
+
|
| 360 |
+
| Variable | Default | Description |
|
| 361 |
+
|----------|---------|-------------|
|
| 362 |
+
| **Backend** | | |
|
| 363 |
+
| `LLM_BACKEND` | `hf` | `ollama` or `hf` |
|
| 364 |
+
| `OLLAMA_HOST` | `http://localhost:11434` | Ollama server URL |
|
| 365 |
+
| `OLLAMA_MODEL` | `llama2` | Ollama model name |
|
| 366 |
+
| `HF_MODEL_NAME` | `Qwen/Qwen2-7B-Instruct` | HuggingFace model ID |
|
| 367 |
+
| `HF_DEVICE` | `auto` | `auto`, `cuda`, or `cpu` |
|
| 368 |
+
| `HF_MAX_NEW_TOKENS` | `2048` | Max output length |
|
| 369 |
+
| **Embedding & Data** | | |
|
| 370 |
+
| `EMBED_MODEL` | `intfloat/multilingual-e5-large` | Embedding model |
|
| 371 |
+
| `FAISS_INDEX` | `QModel.index` | Index file path |
|
| 372 |
+
| `METADATA_FILE` | `metadata.json` | Dataset file |
|
| 373 |
+
| **Retrieval** | | |
|
| 374 |
+
| `TOP_K_SEARCH` | `20` | Candidate pool (5β100) |
|
| 375 |
+
| `TOP_K_RETURN` | `5` | Results shown to user (1β20) |
|
| 376 |
+
| `RERANK_ALPHA` | `0.6` | Dense vs Sparse weight (0.0β1.0) |
|
| 377 |
+
| **Generation** | | |
|
| 378 |
+
| `TEMPERATURE` | `0.2` | Creativity (0.0β1.0, use 0.1β0.2 for religious) |
|
| 379 |
+
| `MAX_TOKENS` | `2048` | Max response length |
|
| 380 |
+
| **Safety** | | |
|
| 381 |
+
| `CONFIDENCE_THRESHOLD` | `0.30` | Min score to call LLM (higher = fewer hallucinations) |
|
| 382 |
+
| `HADITH_BOOST` | `0.08` | Score boost for hadith on hadith queries |
|
| 383 |
+
| **Other** | | |
|
| 384 |
+
| `CACHE_SIZE` | `512` | Query response cache entries |
|
| 385 |
+
| `CACHE_TTL` | `3600` | Cache expiry in seconds |
|
| 386 |
+
| `ALLOWED_ORIGINS` | `*` | CORS origins |
|
| 387 |
+
| `MAX_EXAMPLES` | `3` | Few-shot examples in system prompt |
|
| 388 |
+
|
| 389 |
+
### Configuration Examples
|
| 390 |
+
|
| 391 |
+
**Development (Ollama)**
|
| 392 |
+
```bash
|
| 393 |
+
LLM_BACKEND=ollama
|
| 394 |
+
OLLAMA_HOST=http://localhost:11434
|
| 395 |
+
OLLAMA_MODEL=llama2
|
| 396 |
+
TEMPERATURE=0.2
|
| 397 |
+
CONFIDENCE_THRESHOLD=0.30
|
| 398 |
+
ALLOWED_ORIGINS=*
|
| 399 |
+
```
|
| 400 |
|
| 401 |
+
**Production (HuggingFace + GPU)**
|
| 402 |
+
```bash
|
| 403 |
+
LLM_BACKEND=hf
|
| 404 |
+
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
|
| 405 |
+
HF_DEVICE=cuda
|
| 406 |
+
TOP_K_SEARCH=30
|
| 407 |
+
TEMPERATURE=0.1
|
| 408 |
+
CONFIDENCE_THRESHOLD=0.35
|
| 409 |
+
ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com
|
| 410 |
```
|
| 411 |
|
| 412 |
+
### Tuning Tips
|
| 413 |
+
|
| 414 |
+
- **Better results**: Increase `TOP_K_SEARCH`, lower `CONFIDENCE_THRESHOLD`, use `TEMPERATURE=0.1`
|
| 415 |
+
- **Faster performance**: Lower `TOP_K_SEARCH` and `TOP_K_RETURN`, reduce `MAX_TOKENS`, use Ollama
|
| 416 |
+
- **More conservative**: Increase `CONFIDENCE_THRESHOLD`, lower `TEMPERATURE`
|
| 417 |
|
| 418 |
---
|
| 419 |
|
| 420 |
+
## Docker Deployment
|
| 421 |
|
| 422 |
+
### Docker Compose (Recommended)
|
| 423 |
+
|
| 424 |
+
```bash
|
| 425 |
+
cp .env.example .env # Configure backend (see Configuration section)
|
| 426 |
+
docker-compose up
|
| 427 |
```
|
| 428 |
+
|
| 429 |
+
### Docker CLI
|
| 430 |
+
|
| 431 |
+
```bash
|
| 432 |
+
docker build -t qmodel .
|
| 433 |
+
|
| 434 |
+
# With Ollama backend
|
| 435 |
+
docker run -p 8000:8000 \
|
| 436 |
+
--env-file .env \
|
| 437 |
+
--add-host host.docker.internal:host-gateway \
|
| 438 |
+
qmodel
|
| 439 |
+
|
| 440 |
+
# With HuggingFace backend
|
| 441 |
+
docker run -p 8000:8000 \
|
| 442 |
+
--env-file .env \
|
| 443 |
+
--env HF_TOKEN=your_token_here \
|
| 444 |
+
qmodel
|
| 445 |
+
```
|
| 446 |
+
|
| 447 |
+
### Docker with Ollama
|
| 448 |
+
|
| 449 |
+
```bash
|
| 450 |
+
# .env
|
| 451 |
+
LLM_BACKEND=ollama
|
| 452 |
+
OLLAMA_HOST=http://host.docker.internal:11434
|
| 453 |
+
OLLAMA_MODEL=llama2
|
| 454 |
```
|
| 455 |
|
| 456 |
+
Requires Ollama running on the host (`ollama serve`).
|
| 457 |
+
|
| 458 |
+
### Docker with HuggingFace
|
| 459 |
+
|
| 460 |
+
```bash
|
| 461 |
+
# .env
|
| 462 |
+
LLM_BACKEND=hf
|
| 463 |
+
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
|
| 464 |
+
HF_DEVICE=auto
|
| 465 |
+
|
| 466 |
+
# Pass HF token
|
| 467 |
+
export HF_TOKEN=hf_xxxxxxxxxxxxx
|
| 468 |
+
docker-compose up
|
| 469 |
+
```
|
| 470 |
+
|
| 471 |
+
### Docker Compose with GPU (Linux)
|
| 472 |
+
|
| 473 |
+
```yaml
|
| 474 |
+
services:
|
| 475 |
+
qmodel:
|
| 476 |
+
deploy:
|
| 477 |
+
resources:
|
| 478 |
+
reservations:
|
| 479 |
+
devices:
|
| 480 |
+
- driver: nvidia
|
| 481 |
+
count: 1
|
| 482 |
+
capabilities: [gpu]
|
| 483 |
+
```
|
| 484 |
|
| 485 |
+
### Production Tips
|
|
|
|
|
|
|
|
|
|
|
|
|
| 486 |
|
| 487 |
+
- Remove dev volume mount (`.:/app`) in `docker-compose.yml`
|
| 488 |
+
- Set `restart: on-failure:5`
|
| 489 |
+
- Use specific `ALLOWED_ORIGINS` instead of `*`
|
| 490 |
|
| 491 |
---
|
| 492 |
|
| 493 |
+
## Open-WebUI Integration
|
| 494 |
|
| 495 |
+
QModel is fully OpenAI-compatible and works out of the box with Open-WebUI.
|
| 496 |
|
| 497 |
+
### Setup
|
|
|
|
|
|
|
| 498 |
|
| 499 |
+
```bash
|
| 500 |
+
# Start QModel
|
| 501 |
+
python main.py
|
| 502 |
|
| 503 |
+
# Start Open-WebUI
|
| 504 |
+
docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:latest
|
| 505 |
+
```
|
| 506 |
|
| 507 |
+
### Connect
|
| 508 |
+
|
| 509 |
+
1. **Settings** β **Models** β **Manage Models**
|
| 510 |
+
2. Click **"Connect to OpenAI-compatible API"**
|
| 511 |
+
3. **API Base URL**: `http://localhost:8000/v1`
|
| 512 |
+
4. **Model Name**: `QModel`
|
| 513 |
+
5. **API Key**: Leave blank
|
| 514 |
+
6. **Save & Test** β β
Connected
|
| 515 |
+
|
| 516 |
+
### Docker Compose (QModel + Ollama + Open-WebUI)
|
| 517 |
+
|
| 518 |
+
```yaml
|
| 519 |
+
version: '3.8'
|
| 520 |
+
services:
|
| 521 |
+
qmodel:
|
| 522 |
+
build: .
|
| 523 |
+
ports:
|
| 524 |
+
- "8000:8000"
|
| 525 |
+
environment:
|
| 526 |
+
- LLM_BACKEND=ollama
|
| 527 |
+
- OLLAMA_HOST=http://ollama:11434
|
| 528 |
+
|
| 529 |
+
ollama:
|
| 530 |
+
image: ollama/ollama:latest
|
| 531 |
+
ports:
|
| 532 |
+
- "11434:11434"
|
| 533 |
+
|
| 534 |
+
web-ui:
|
| 535 |
+
image: ghcr.io/open-webui/open-webui:latest
|
| 536 |
+
ports:
|
| 537 |
+
- "3000:8080"
|
| 538 |
+
depends_on:
|
| 539 |
+
- qmodel
|
| 540 |
```
|
| 541 |
|
| 542 |
+
### Supported Features
|
| 543 |
+
|
| 544 |
+
| Feature | Status |
|
| 545 |
+
|---------|--------|
|
| 546 |
+
| Chat | β
Full support |
|
| 547 |
+
| Streaming | β
`stream: true` |
|
| 548 |
+
| Multi-turn context | β
Handled by Open-WebUI |
|
| 549 |
+
| Temperature | β
Configurable |
|
| 550 |
+
| Token limits | β
`max_tokens` |
|
| 551 |
+
| Model listing | β
`/v1/models` |
|
| 552 |
+
| Source attribution | β
`x_metadata.sources` |
|
| 553 |
|
| 554 |
---
|
| 555 |
|
| 556 |
+
## Architecture
|
| 557 |
+
|
| 558 |
+
### Module Structure
|
| 559 |
+
|
| 560 |
+
```
|
| 561 |
+
main.py β FastAPI app + router registration
|
| 562 |
+
app/
|
| 563 |
+
config.py β Config class (env vars)
|
| 564 |
+
llm.py β LLM providers (Ollama, HuggingFace)
|
| 565 |
+
cache.py β TTL-LRU async cache
|
| 566 |
+
arabic_nlp.py β Arabic normalization, stemming, language detection
|
| 567 |
+
search.py β Hybrid FAISS+BM25, text search, query rewriting
|
| 568 |
+
analysis.py β Intent detection, analytics, counting
|
| 569 |
+
prompts.py β Prompt engineering (persona, anti-hallucination)
|
| 570 |
+
models.py β Pydantic schemas
|
| 571 |
+
state.py β AppState, lifespan, RAG pipeline
|
| 572 |
+
routers/
|
| 573 |
+
quran.py β 6 Quran endpoints
|
| 574 |
+
hadith.py β 5 Hadith endpoints
|
| 575 |
+
chat.py β /ask + OpenAI-compatible chat
|
| 576 |
+
ops.py β health, models, debug scores
|
| 577 |
+
```
|
| 578 |
+
|
| 579 |
+
### Data Pipeline
|
| 580 |
+
|
| 581 |
+
1. **Ingest**: 47,626 documents (6,236 Quran verses + 41,390 Hadiths from 9 collections)
|
| 582 |
+
2. **Embed**: Encode with `multilingual-e5-large` (Arabic + English dual embeddings)
|
| 583 |
+
3. **Index**: FAISS `IndexFlatIP` for dense retrieval
|
| 584 |
+
|
| 585 |
+
### Retrieval & Ranking
|
| 586 |
+
|
| 587 |
+
1. Dense retrieval (FAISS semantic scoring)
|
| 588 |
+
2. Sparse retrieval (BM25 term-frequency)
|
| 589 |
+
3. Fusion: 60% dense + 40% sparse
|
| 590 |
+
4. Intent-aware boost (+0.08 to Hadith when intent=hadith)
|
| 591 |
+
5. Type filter (quran_only / hadith_only / authenticated_only)
|
| 592 |
+
6. Text search fallback (exact phrase + word-overlap)
|
| 593 |
+
|
| 594 |
+
### Anti-Hallucination Measures
|
| 595 |
+
|
| 596 |
+
- Few-shot examples including "not found" refusal path
|
| 597 |
+
- Hardcoded citation format rules
|
| 598 |
+
- Verbatim copy rules (no text reconstruction)
|
| 599 |
+
- Confidence threshold gating (default: 0.30)
|
| 600 |
+
- Post-generation citation verification
|
| 601 |
+
- Grade inference from collection name
|
| 602 |
+
|
| 603 |
+
### Performance
|
| 604 |
|
| 605 |
| Operation | Time | Backend |
|
| 606 |
|-----------|------|---------|
|
| 607 |
| Query (cached) | ~50ms | Both |
|
| 608 |
+
| Query (Ollama) | 400β800ms | Ollama |
|
| 609 |
+
| Query (HF GPU) | 500β1500ms | CUDA |
|
| 610 |
+
| Query (HF CPU) | 2β5s | CPU |
|
| 611 |
|
| 612 |
---
|
| 613 |
|
| 614 |
+
## Troubleshooting
|
| 615 |
|
| 616 |
+
### "Cannot connect to Ollama"
|
| 617 |
```bash
|
| 618 |
+
ollama serve # Ensure Ollama is running on host
|
| 619 |
+
# In Docker, use OLLAMA_HOST=http://host.docker.internal:11434
|
| 620 |
```
|
| 621 |
|
| 622 |
+
### "HuggingFace model not found"
|
| 623 |
```bash
|
| 624 |
+
export HF_TOKEN=hf_xxxxxxxxxxxxx # Set token for gated models
|
| 625 |
```
|
| 626 |
|
| 627 |
+
### "Out of memory"
|
| 628 |
+
- Use smaller model: `HF_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2`
|
| 629 |
+
- Use Ollama with `neural-chat`
|
| 630 |
+
- Reduce `MAX_TOKENS` to 1024
|
| 631 |
+
- Increase Docker memory limit in `docker-compose.yml`
|
| 632 |
+
|
| 633 |
+
### "Assistant returns 'Not found'"
|
| 634 |
+
This is expected β QModel rejects low-confidence queries. Try:
|
| 635 |
+
- More specific queries
|
| 636 |
+
- Lower `CONFIDENCE_THRESHOLD` in `.env`
|
| 637 |
+
- Check raw scores: `GET /debug/scores?q=your+query`
|
| 638 |
+
|
| 639 |
+
### "Port already in use"
|
| 640 |
+
```bash
|
| 641 |
+
docker-compose down && docker system prune
|
| 642 |
+
# Or change port: ports: ["8001:8000"]
|
| 643 |
+
```
|
| 644 |
+
|
| 645 |
+
---
|
| 646 |
|
| 647 |
+
## Roadmap
|
| 648 |
+
|
| 649 |
+
- [x] Grade-based filtering
|
| 650 |
+
- [x] Streaming responses (SSE)
|
| 651 |
+
- [x] Modular architecture (4 routers, 18 endpoints)
|
| 652 |
+
- [x] Dual LLM backend (Ollama + HuggingFace)
|
| 653 |
+
- [x] Text search (exact substring + fuzzy matching)
|
| 654 |
+
- [ ] Chain of narrators (Isnad display)
|
| 655 |
+
- [ ] Synonym expansion (mercy β rahma, compassion)
|
| 656 |
+
- [ ] Batch processing (multiple questions per request)
|
| 657 |
+
- [ ] Islamic calendar integration (Hijri dates)
|
| 658 |
+
- [ ] Tafsir endpoint with scholar citations
|
| 659 |
|
| 660 |
---
|
| 661 |
|
SETUP.md
DELETED
|
@@ -1,590 +0,0 @@
|
|
| 1 |
-
# QModel v6 Setup & Deployment Guide
|
| 2 |
-
|
| 3 |
-
## Quick Start
|
| 4 |
-
|
| 5 |
-
### 1. Prerequisites
|
| 6 |
-
- Python 3.10+
|
| 7 |
-
- 16 GB RAM minimum (for embeddings + LLM)
|
| 8 |
-
- GPU recommended for HuggingFace backend
|
| 9 |
-
- Ollama installed (for local development) OR internet access (for HuggingFace)
|
| 10 |
-
|
| 11 |
-
### 2. Installation
|
| 12 |
-
|
| 13 |
-
```bash
|
| 14 |
-
# Clone and enter project
|
| 15 |
-
cd /Users/elgendy/Projects/QModel
|
| 16 |
-
|
| 17 |
-
# Create virtual environment
|
| 18 |
-
python3 -m venv .venv
|
| 19 |
-
source .venv/bin/activate
|
| 20 |
-
|
| 21 |
-
# Install dependencies
|
| 22 |
-
pip install -r requirements.txt
|
| 23 |
-
```
|
| 24 |
-
|
| 25 |
-
### 3. Data & Index
|
| 26 |
-
|
| 27 |
-
The project includes pre-built data files:
|
| 28 |
-
- `metadata.json` β 47,626 documents (6,236 Quran verses + 41,390 hadiths from 9 canonical collections)
|
| 29 |
-
- `QModel.index` β FAISS search index (pre-generated)
|
| 30 |
-
|
| 31 |
-
If you need to rebuild the index after dataset changes:
|
| 32 |
-
```bash
|
| 33 |
-
python build_index.py
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
---
|
| 37 |
-
|
| 38 |
-
## Backend Configuration
|
| 39 |
-
|
| 40 |
-
QModel supports two LLM backends. Choose based on your environment:
|
| 41 |
-
|
| 42 |
-
| Backend | Pros | Cons | When to Use |
|
| 43 |
-
|---------|------|------|------------|
|
| 44 |
-
| **Ollama** (local) | Fast setup, no GPU needed, no model downloads, free | Smaller models, limited customization | Development, testing, resource-constrained |
|
| 45 |
-
| **HuggingFace** (remote) | Larger models, better quality, full control | Requires GPU or significant RAM, slower downloads | Production, high-quality responses |
|
| 46 |
-
|
| 47 |
-
### LLM Backend Selection
|
| 48 |
-
|
| 49 |
-
**Option 1: Local Ollama (Development)**
|
| 50 |
-
|
| 51 |
-
For development, testing, and when you already have Ollama running locally:
|
| 52 |
-
|
| 53 |
-
```bash
|
| 54 |
-
LLM_BACKEND=ollama
|
| 55 |
-
OLLAMA_HOST=http://localhost:11434
|
| 56 |
-
OLLAMA_MODEL=llama2 # or: mistral, neural-chat, orca-mini
|
| 57 |
-
```
|
| 58 |
-
|
| 59 |
-
**Available Ollama Models:**
|
| 60 |
-
- `llama2` β Fast, good quality (default, recommended)
|
| 61 |
-
- `mistral` β Better Arabic support
|
| 62 |
-
- `neural-chat` β Good balance
|
| 63 |
-
- `openchat` β Good instruction following
|
| 64 |
-
- `orca-mini` β Lightweight
|
| 65 |
-
|
| 66 |
-
**Option 2: Remote HuggingFace (Production)**
|
| 67 |
-
|
| 68 |
-
For production deployments with better quality and control:
|
| 69 |
-
|
| 70 |
-
```bash
|
| 71 |
-
LLM_BACKEND=hf
|
| 72 |
-
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct # Excellent Arabic support
|
| 73 |
-
HF_DEVICE=auto # auto | cuda | cpu
|
| 74 |
-
HF_MAX_NEW_TOKENS=2048
|
| 75 |
-
```
|
| 76 |
-
|
| 77 |
-
**Recommended HuggingFace Models:**
|
| 78 |
-
- `Qwen/Qwen2-7B-Instruct` β Excellent Arabic, strong reasoning (default)
|
| 79 |
-
- `mistralai/Mistral-7B-Instruct-v0.2` β Very capable, fast
|
| 80 |
-
- `meta-llama/Llama-2-13b-chat-hf` β Larger, needs HF token
|
| 81 |
-
|
| 82 |
-
**Device Options:**
|
| 83 |
-
- `auto` β Auto-detect (GPU if available, else CPU)
|
| 84 |
-
- `cuda` β Force GPU (requires NVIDIA GPU)
|
| 85 |
-
- `cpu` β Force CPU (slower, but works everywhere)
|
| 86 |
-
|
| 87 |
-
### Complete Environment Variables Reference
|
| 88 |
-
|
| 89 |
-
#### Backend Selection
|
| 90 |
-
| Variable | Default | Options | Example |
|
| 91 |
-
|----------|---------|---------|---------|
|
| 92 |
-
| `LLM_BACKEND` | `hf` | `ollama`, `hf` | `ollama` |
|
| 93 |
-
|
| 94 |
-
#### Ollama Backend
|
| 95 |
-
| Variable | Default | Description | Example |
|
| 96 |
-
|----------|---------|-------------|---------|
|
| 97 |
-
| `OLLAMA_HOST` | `http://localhost:11434` | Ollama server URL | `http://localhost:11434` |
|
| 98 |
-
| `OLLAMA_MODEL` | `llama2` | Model name | `mistral` |
|
| 99 |
-
|
| 100 |
-
#### HuggingFace Backend
|
| 101 |
-
| Variable | Default | Description | Example |
|
| 102 |
-
|----------|---------|-------------|---------|
|
| 103 |
-
| `HF_MODEL_NAME` | `Qwen/Qwen2-7B-Instruct` | Model ID | `Qwen/Qwen2-7B-Instruct` |
|
| 104 |
-
| `HF_DEVICE` | `auto` | Device to use | `cuda` |
|
| 105 |
-
| `HF_MAX_NEW_TOKENS` | `2048` | Max output length | `2048` |
|
| 106 |
-
|
| 107 |
-
#### Embedding & Data
|
| 108 |
-
| Variable | Default | Description |
|
| 109 |
-
|----------|---------|-------------|
|
| 110 |
-
| `EMBED_MODEL` | `intfloat/multilingual-e5-large` | Embedding model (keep default) |
|
| 111 |
-
| `FAISS_INDEX` | `QModel.index` | Index file path |
|
| 112 |
-
| `METADATA_FILE` | `metadata.json` | Dataset file |
|
| 113 |
-
|
| 114 |
-
#### Retrieval & Ranking
|
| 115 |
-
| Variable | Default | Range | Purpose |
|
| 116 |
-
|----------|---------|-------|---------|
|
| 117 |
-
| `TOP_K_SEARCH` | `20` | 5-100 | Candidate pool (β¬οΈ = slower but more coverage) |
|
| 118 |
-
| `TOP_K_RETURN` | `5` | 1-20 | Results shown to user |
|
| 119 |
-
| `RERANK_ALPHA` | `0.6` | 0.0-1.0 | Dense (0.6) vs Sparse (0.4) weighting |
|
| 120 |
-
|
| 121 |
-
#### Generation
|
| 122 |
-
| Variable | Default | Range | Purpose |
|
| 123 |
-
|----------|---------|-------|---------|
|
| 124 |
-
| `TEMPERATURE` | `0.2` | 0.0-1.0 | 0.0=deterministic, 1.0=creative (use 0.1-0.2 for religious) |
|
| 125 |
-
| `MAX_TOKENS` | `2048` | 512-4096 | Max response length |
|
| 126 |
-
|
| 127 |
-
#### Safety & Quality
|
| 128 |
-
| Variable | Default | Range | Purpose |
|
| 129 |
-
|----------|---------|-------|---------|
|
| 130 |
-
| `CONFIDENCE_THRESHOLD` | `0.30` | 0.0-1.0 | Min score to call LLM (β¬οΈ = fewer hallucinations) |
|
| 131 |
-
| `HADITH_BOOST` | `0.08` | 0.0-1.0 | Score boost for hadith on hadith queries |
|
| 132 |
-
|
| 133 |
-
#### Other Settings
|
| 134 |
-
| Variable | Default | Description |
|
| 135 |
-
|----------|---------|-------------|
|
| 136 |
-
| `CACHE_SIZE` | `512` | Query response cache entries |
|
| 137 |
-
| `CACHE_TTL` | `3600` | Cache expiry in seconds |
|
| 138 |
-
| `ALLOWED_ORIGINS` | `*` | CORS origins (use specific domains in production) |
|
| 139 |
-
| `MAX_EXAMPLES` | `3` | Few-shot examples in system prompt |
|
| 140 |
-
|
| 141 |
-
### Configuration Examples
|
| 142 |
-
|
| 143 |
-
**Development (Ollama) - Recommended for getting started**
|
| 144 |
-
```bash
|
| 145 |
-
LLM_BACKEND=ollama
|
| 146 |
-
OLLAMA_HOST=http://localhost:11434
|
| 147 |
-
OLLAMA_MODEL=llama2
|
| 148 |
-
|
| 149 |
-
EMBED_MODEL=intfloat/multilingual-e5-large
|
| 150 |
-
FAISS_INDEX=QModel.index
|
| 151 |
-
METADATA_FILE=metadata.json
|
| 152 |
-
|
| 153 |
-
TOP_K_SEARCH=20
|
| 154 |
-
TOP_K_RETURN=5
|
| 155 |
-
TEMPERATURE=0.2
|
| 156 |
-
CONFIDENCE_THRESHOLD=0.30
|
| 157 |
-
ALLOWED_ORIGINS=*
|
| 158 |
-
```
|
| 159 |
-
|
| 160 |
-
**Production (HuggingFace + GPU) - Best quality, uses GPU**
|
| 161 |
-
```bash
|
| 162 |
-
LLM_BACKEND=hf
|
| 163 |
-
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
|
| 164 |
-
HF_DEVICE=cuda
|
| 165 |
-
|
| 166 |
-
EMBED_MODEL=intfloat/multilingual-e5-large
|
| 167 |
-
FAISS_INDEX=QModel.index
|
| 168 |
-
METADATA_FILE=metadata.json
|
| 169 |
-
|
| 170 |
-
TOP_K_SEARCH=30 # More candidates for better quality
|
| 171 |
-
TOP_K_RETURN=5
|
| 172 |
-
TEMPERATURE=0.1 # More deterministic
|
| 173 |
-
CONFIDENCE_THRESHOLD=0.35
|
| 174 |
-
ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com
|
| 175 |
-
```
|
| 176 |
-
|
| 177 |
-
**Production (HuggingFace + CPU) - CPU-only, slower but no GPU required**
|
| 178 |
-
```bash
|
| 179 |
-
LLM_BACKEND=hf
|
| 180 |
-
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
|
| 181 |
-
HF_DEVICE=cpu
|
| 182 |
-
|
| 183 |
-
TEMPERATURE=0.1
|
| 184 |
-
MAX_TOKENS=1024 # Reduce for faster responses
|
| 185 |
-
CONFIDENCE_THRESHOLD=0.35
|
| 186 |
-
```
|
| 187 |
-
|
| 188 |
-
### Tuning Tips
|
| 189 |
-
|
| 190 |
-
**For Better Results:**
|
| 191 |
-
- Increase `TOP_K_SEARCH` (costs slightly more compute)
|
| 192 |
-
- Lower `CONFIDENCE_THRESHOLD` (may get some hallucinations)
|
| 193 |
-
- Use larger model with more parameters
|
| 194 |
-
- Set `TEMPERATURE=0.1` for most consistent answers
|
| 195 |
-
|
| 196 |
-
**For Faster Performance:**
|
| 197 |
-
- Lower `TOP_K_SEARCH` and `TOP_K_RETURN`
|
| 198 |
-
- Use Ollama backend (faster inference)
|
| 199 |
-
- Reduce `MAX_TOKENS`
|
| 200 |
-
- Set `HF_DEVICE=cpu` if using HF (faster than auto-selecting)
|
| 201 |
-
|
| 202 |
-
**For More Accurate/Conservative Answers:**
|
| 203 |
-
- Increase `CONFIDENCE_THRESHOLD` (skip borderline queries)
|
| 204 |
-
- Lower `TEMPERATURE` (more deterministic)
|
| 205 |
-
- Use larger model (7B+ parameters)
|
| 206 |
-
|
| 207 |
-
**For CPU-Only (No GPU Available):**
|
| 208 |
-
- Use Ollama backend with `neural-chat` model
|
| 209 |
-
- Set `HF_DEVICE=cpu` if using HF
|
| 210 |
-
- Reduce `MAX_TOKENS` to 1024
|
| 211 |
-
|
| 212 |
-
---
|
| 213 |
-
|
| 214 |
-
## Running QModel
|
| 215 |
-
|
| 216 |
-
### Step-by-Step: Starting the API
|
| 217 |
-
|
| 218 |
-
1. **Create `.env` file**:
|
| 219 |
-
```bash
|
| 220 |
-
cp .env.example .env
|
| 221 |
-
# Edit .env and choose your backend (see Configuration section above)
|
| 222 |
-
```
|
| 223 |
-
|
| 224 |
-
2. **Start the backend service**:
|
| 225 |
-
|
| 226 |
-
**If using Ollama:**
|
| 227 |
-
```bash
|
| 228 |
-
# Terminal 1: Start Ollama daemon
|
| 229 |
-
ollama serve
|
| 230 |
-
|
| 231 |
-
# Terminal 2: Pull a model (first time only)
|
| 232 |
-
ollama pull llama2 # or: mistral, neural-chat
|
| 233 |
-
```
|
| 234 |
-
|
| 235 |
-
**If using HuggingFace:**
|
| 236 |
-
- No separate service needed, models download automatically
|
| 237 |
-
|
| 238 |
-
3. **Start QModel API**:
|
| 239 |
-
```bash
|
| 240 |
-
python main.py
|
| 241 |
-
```
|
| 242 |
-
|
| 243 |
-
API available at `http://localhost:8000`
|
| 244 |
-
|
| 245 |
-
View interactive docs: `http://localhost:8000/docs`
|
| 246 |
-
|
| 247 |
-
### Docker Option
|
| 248 |
-
|
| 249 |
-
```bash
|
| 250 |
-
# Configure your backend in .env (see Configuration section)
|
| 251 |
-
cp .env.example .env
|
| 252 |
-
nano .env # Choose LLM_BACKEND=ollama or hf
|
| 253 |
-
|
| 254 |
-
# Run with Docker Compose
|
| 255 |
-
docker-compose up
|
| 256 |
-
```
|
| 257 |
-
|
| 258 |
-
For full Docker documentation (including production deployment, troubleshooting, and multi-container setup), see **[DOCKER.md](DOCKER.md)**.
|
| 259 |
-
|
| 260 |
-
---
|
| 261 |
-
|
| 262 |
-
## API Endpoints
|
| 263 |
-
|
| 264 |
-
### Main Query Endpoint
|
| 265 |
-
|
| 266 |
-
```bash
|
| 267 |
-
GET /ask?q=<question>&top_k=5&source_type=<filter>&grade_filter=<filter>
|
| 268 |
-
```
|
| 269 |
-
|
| 270 |
-
**Parameters:**
|
| 271 |
-
- `q` (required): Your Islamic question
|
| 272 |
-
- `top_k`: Number of sources to retrieve (1-20, default: 5)
|
| 273 |
-
- `source_type`: Filter by source type
|
| 274 |
-
- `quran` β Quranic verses only
|
| 275 |
-
- `hadith` β Hadiths only
|
| 276 |
-
- `null` (default) β Both
|
| 277 |
-
- `grade_filter`: Filter Hadith by authenticity grade
|
| 278 |
-
- `sahih` β Only Sahih-graded Hadiths
|
| 279 |
-
- `hasan` β Sahih + Hasan
|
| 280 |
-
- `null` (default) β All grades
|
| 281 |
-
|
| 282 |
-
**Example Requests:**
|
| 283 |
-
|
| 284 |
-
```bash
|
| 285 |
-
# General question
|
| 286 |
-
curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
|
| 287 |
-
|
| 288 |
-
# Quran-only with word frequency
|
| 289 |
-
curl "http://localhost:8000/ask?q=How%20many%20times%20is%20mercy%20mentioned?&source_type=quran"
|
| 290 |
-
|
| 291 |
-
# Authentic Hadiths only
|
| 292 |
-
curl "http://localhost:8000/ask?q=Hadiths%20about%20prayer&source_type=hadith&grade_filter=sahih"
|
| 293 |
-
```
|
| 294 |
-
|
| 295 |
-
**Response:**
|
| 296 |
-
```json
|
| 297 |
-
{
|
| 298 |
-
"question": "What does Islam say about mercy?",
|
| 299 |
-
"answer": "Islam emphasizes mercy as a core value...",
|
| 300 |
-
"language": "english",
|
| 301 |
-
"intent": "general",
|
| 302 |
-
"analysis": null,
|
| 303 |
-
"sources": [
|
| 304 |
-
{
|
| 305 |
-
"source": "Surah Al-Baqarah 2:178",
|
| 306 |
-
"type": "quran",
|
| 307 |
-
"grade": null,
|
| 308 |
-
"arabic": "...",
|
| 309 |
-
"english": "...",
|
| 310 |
-
"_score": 0.876
|
| 311 |
-
}
|
| 312 |
-
],
|
| 313 |
-
"top_score": 0.876,
|
| 314 |
-
"latency_ms": 342
|
| 315 |
-
}
|
| 316 |
-
```
|
| 317 |
-
|
| 318 |
-
---
|
| 319 |
-
|
| 320 |
-
### Hadith Verification Endpoint
|
| 321 |
-
|
| 322 |
-
```bash
|
| 323 |
-
GET /hadith/verify?q=<hadith_text>&collection=<filter>
|
| 324 |
-
```
|
| 325 |
-
|
| 326 |
-
**Purpose:** Quick authenticity check for a Hadith
|
| 327 |
-
|
| 328 |
-
**Example:**
|
| 329 |
-
```bash
|
| 330 |
-
curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"
|
| 331 |
-
```
|
| 332 |
-
|
| 333 |
-
**Response:**
|
| 334 |
-
```json
|
| 335 |
-
{
|
| 336 |
-
"query": "Actions are judged by intentions",
|
| 337 |
-
"found": true,
|
| 338 |
-
"collection": "Sahih al-Bukhari",
|
| 339 |
-
"grade": "Sahih",
|
| 340 |
-
"reference": "Sahih al-Bukhari 1",
|
| 341 |
-
"arabic": "Ψ₯ΩΩ
Ψ§ Ψ§ΩΨ£ΨΉΩ
Ψ§Ω Ψ¨Ψ§ΩΩΩΨ§Ψͺ",
|
| 342 |
-
"english": "Verily, actions are judged by intentions...",
|
| 343 |
-
"latency_ms": 156
|
| 344 |
-
}
|
| 345 |
-
```
|
| 346 |
-
|
| 347 |
-
---
|
| 348 |
-
|
| 349 |
-
### Debug Endpoint
|
| 350 |
-
|
| 351 |
-
```bash
|
| 352 |
-
GET /debug/scores?q=<question>&top_k=10
|
| 353 |
-
```
|
| 354 |
-
|
| 355 |
-
**Purpose:** Inspect raw retrieval scores without LLM call. Use to calibrate `CONFIDENCE_THRESHOLD`.
|
| 356 |
-
|
| 357 |
-
**Example:**
|
| 358 |
-
```bash
|
| 359 |
-
curl "http://localhost:8000/debug/scores?q=patience&top_k=10"
|
| 360 |
-
```
|
| 361 |
-
|
| 362 |
-
**Response:**
|
| 363 |
-
```json
|
| 364 |
-
{
|
| 365 |
-
"intent": "general",
|
| 366 |
-
"threshold": 0.3,
|
| 367 |
-
"results": [
|
| 368 |
-
{
|
| 369 |
-
"rank": 1,
|
| 370 |
-
"source": "Surah Al-Baqarah 2:45",
|
| 371 |
-
"type": "quran",
|
| 372 |
-
"grade": null,
|
| 373 |
-
"_dense": 0.8234,
|
| 374 |
-
"_sparse": 0.5421,
|
| 375 |
-
"_score": 0.7234
|
| 376 |
-
}
|
| 377 |
-
]
|
| 378 |
-
}
|
| 379 |
-
```
|
| 380 |
-
|
| 381 |
-
Use this to fine-tune `CONFIDENCE_THRESHOLD`. If queries you expect to work have `_score < threshold`, lower the threshold.
|
| 382 |
-
|
| 383 |
-
---
|
| 384 |
-
|
| 385 |
-
### Health & Metadata
|
| 386 |
-
|
| 387 |
-
```bash
|
| 388 |
-
# Health check
|
| 389 |
-
curl http://localhost:8000/health
|
| 390 |
-
|
| 391 |
-
# List available models
|
| 392 |
-
curl http://localhost:8000/v1/models
|
| 393 |
-
|
| 394 |
-
# Interactive API docs
|
| 395 |
-
http://localhost:8000/docs
|
| 396 |
-
```
|
| 397 |
-
|
| 398 |
-
---
|
| 399 |
-
|
| 400 |
-
## Query Examples
|
| 401 |
-
|
| 402 |
-
### 1. Word Frequency Analysis
|
| 403 |
-
|
| 404 |
-
**Question:** "How many times is the word 'mercy' mentioned in the Quran?"
|
| 405 |
-
|
| 406 |
-
**System detects:** `intent=count`
|
| 407 |
-
|
| 408 |
-
**Response includes:**
|
| 409 |
-
```json
|
| 410 |
-
{
|
| 411 |
-
"analysis": {
|
| 412 |
-
"keyword": "mercy",
|
| 413 |
-
"total_count": 87,
|
| 414 |
-
"by_surah": {
|
| 415 |
-
"2": {"name": "Al-Baqarah", "count": 12},
|
| 416 |
-
"7": {"name": "Al-A'raf", "count": 8},
|
| 417 |
-
...
|
| 418 |
-
}
|
| 419 |
-
}
|
| 420 |
-
}
|
| 421 |
-
```
|
| 422 |
-
|
| 423 |
-
---
|
| 424 |
-
|
| 425 |
-
### 2. Topic-Based Aya Retrieval
|
| 426 |
-
|
| 427 |
-
**Question:** "What does the Quran say about patience?"
|
| 428 |
-
|
| 429 |
-
**System detects:** `intent=tafsir`
|
| 430 |
-
|
| 431 |
-
**Response:**
|
| 432 |
-
- Retrieves top 5 verses about patience
|
| 433 |
-
- LLM explains each with Tafsir
|
| 434 |
-
- Shows interconnections between verses
|
| 435 |
-
|
| 436 |
-
---
|
| 437 |
-
|
| 438 |
-
### 3. Hadith Authentication
|
| 439 |
-
|
| 440 |
-
**Question:** "Is the Hadith 'Actions are judged by intentions' authentic?"
|
| 441 |
-
|
| 442 |
-
**System detects:** `intent=auth`
|
| 443 |
-
|
| 444 |
-
**LLM response:**
|
| 445 |
-
- "Yes, this is found in Sahih al-Bukhari 1"
|
| 446 |
-
- "Grade: Sahih (authentic)"
|
| 447 |
-
- "Explanation: This Hadith establishes the principle of intention..."
|
| 448 |
-
|
| 449 |
-
---
|
| 450 |
-
|
| 451 |
-
### 4. Bilingual Support
|
| 452 |
-
|
| 453 |
-
**Arabic Question:** "Ω
Ψ§ Ψ£ΩΩ
ΩΨ© Ψ§ΩΨ΅Ψ¨Ψ± ΩΩ Ψ§ΩΨ₯Ψ³ΩΨ§Ω
Ψ"
|
| 454 |
-
|
| 455 |
-
**System detects:** Language = arabic
|
| 456 |
-
|
| 457 |
-
**Response:** Full Arabic response with proper vocalization
|
| 458 |
-
|
| 459 |
-
---
|
| 460 |
-
|
| 461 |
-
## Tuning & Optimization
|
| 462 |
-
|
| 463 |
-
### Confidence Threshold
|
| 464 |
-
|
| 465 |
-
The `CONFIDENCE_THRESHOLD` (default 0.30) controls when to call the LLM:
|
| 466 |
-
|
| 467 |
-
- **Too high (e.g., 0.70)**: Many queries rejected as "not found" (safer but less helpful)
|
| 468 |
-
- **Too low (e.g., 0.10)**: LLM called on weak matches (more hallucinations)
|
| 469 |
-
- **Sweet spot (0.30-0.50)**: Most queries get through, but low-quality matches rejected
|
| 470 |
-
|
| 471 |
-
**To calibrate:**
|
| 472 |
-
1. Run `/debug/scores` on representative queries
|
| 473 |
-
2. Check what `_score` values are returned
|
| 474 |
-
3. Adjust `CONFIDENCE_THRESHOLD` in `.env`
|
| 475 |
-
4. Restart service
|
| 476 |
-
|
| 477 |
-
---
|
| 478 |
-
|
| 479 |
-
### Temperature
|
| 480 |
-
|
| 481 |
-
- **0.0**: Deterministic (best for factual Islamic answers)
|
| 482 |
-
- **0.2**: Slightly creative (default)
|
| 483 |
-
- **0.5+**: More creative (not recommended for religious content)
|
| 484 |
-
|
| 485 |
-
---
|
| 486 |
-
|
| 487 |
-
### Model Selection
|
| 488 |
-
|
| 489 |
-
#### For Development (Ollama)
|
| 490 |
-
- **llama2** β Fastest, good quality, easy setup
|
| 491 |
-
- **mistral** β Better Arabic, slightly slower
|
| 492 |
-
- **neural-chat** β Good balance
|
| 493 |
-
|
| 494 |
-
```bash
|
| 495 |
-
ollama pull llama2
|
| 496 |
-
OLLAMA_MODEL=llama2 python main.py
|
| 497 |
-
```
|
| 498 |
-
|
| 499 |
-
#### For Production (HuggingFace)
|
| 500 |
-
- **Qwen/Qwen2-7B-Instruct** β Strong Arabic, 7B params
|
| 501 |
-
- **mistralai/Mistral-7B-Instruct-v0.2** β Very capable
|
| 502 |
-
- **meta-llama/Llama-2-13b-chat-hf** β Larger, better quality (requires HF token)
|
| 503 |
-
|
| 504 |
-
```bash
|
| 505 |
-
HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct python main.py
|
| 506 |
-
```
|
| 507 |
-
|
| 508 |
-
---
|
| 509 |
-
|
| 510 |
-
## Troubleshooting
|
| 511 |
-
|
| 512 |
-
### Issue: "Service is still initialising"
|
| 513 |
-
|
| 514 |
-
**Solution:** Wait 60-90 seconds for embedding model to load. Check logs:
|
| 515 |
-
```bash
|
| 516 |
-
tail -f <logfile>
|
| 517 |
-
```
|
| 518 |
-
|
| 519 |
-
### Issue: Low retrieval scores
|
| 520 |
-
|
| 521 |
-
**Cause:** Queries don't match dataset language better
|
| 522 |
-
|
| 523 |
-
**Solution:**
|
| 524 |
-
1. Check `/debug/scores` output
|
| 525 |
-
2. Ensure query is in Arabic or clear English
|
| 526 |
-
3. Try synonyms (e.g., "mercy" vs "compassion")
|
| 527 |
-
4. Lower `CONFIDENCE_THRESHOLD` in `.env`
|
| 528 |
-
|
| 529 |
-
### Issue: LLM model not found (HF backend)
|
| 530 |
-
|
| 531 |
-
**Solution:**
|
| 532 |
-
```bash
|
| 533 |
-
huggingface-cli login
|
| 534 |
-
export HF_TOKEN=<your_token>
|
| 535 |
-
```
|
| 536 |
-
|
| 537 |
-
### Issue: Out of memory
|
| 538 |
-
|
| 539 |
-
**Solution:**
|
| 540 |
-
- Use `OLLAMA_MODEL=neural-chat` (smaller)
|
| 541 |
-
- Set `HF_DEVICE=cpu` (slower but uses RAM instead of VRAM)
|
| 542 |
-
- Reduce `TOP_K_SEARCH` in `.env`
|
| 543 |
-
|
| 544 |
-
---
|
| 545 |
-
|
| 546 |
-
## Production Checklist
|
| 547 |
-
|
| 548 |
-
- [ ] Test with at least 10 representative queries
|
| 549 |
-
- [ ] Verify `/debug/scores` on low-confidence queries
|
| 550 |
-
- [ ] Adjust `CONFIDENCE_THRESHOLD` to acceptable false-positive rate
|
| 551 |
-
- [ ] Set `ALLOWED_ORIGINS` to your domain only (security)
|
| 552 |
-
- [ ] Use production-grade LLM model (Qwen 7B+ or Mistral)
|
| 553 |
-
- [ ] Set `TEMPERATURE=0.1` for maximum consistency
|
| 554 |
-
- [ ] Monitor first 100 queries for quality
|
| 555 |
-
- [ ] Enable access logging and error tracking
|
| 556 |
-
|
| 557 |
-
---
|
| 558 |
-
|
| 559 |
-
## Architecture Files
|
| 560 |
-
|
| 561 |
-
- **main.py** β Core API + RAG pipeline (LLM backend abstraction, retrieval, generation)
|
| 562 |
-
- **build_index.py** β FAISS index generation from metadata
|
| 563 |
-
- **enrich_dataset.py** β Dataset enrichment script (fetch hadith collections, deduplicate)
|
| 564 |
-
- **metadata.json** β Combined dataset: 6,236 Quran verses + 41,390 hadiths
|
| 565 |
-
- **QModel.index** β FAISS vector index (pre-built, ready to use)
|
| 566 |
-
- **ARCHITECTURE.md** β Detailed system design
|
| 567 |
-
- **requirements.txt** β Python dependencies
|
| 568 |
-
|
| 569 |
-
---
|
| 570 |
-
|
| 571 |
-
## Next Steps
|
| 572 |
-
|
| 573 |
-
After setup, consider:
|
| 574 |
-
1. Grade filtering: Try `?grade_filter=sahih` for authenticated-only results
|
| 575 |
-
2. Source filtering: Use `?source_type=quran` vs `?source_type=hadith`
|
| 576 |
-
3. Batch processing: Add endpoint for multiple questions
|
| 577 |
-
4. Webhook integration: Stream answers as they generate
|
| 578 |
-
5. Caching improvements: Persistent Redis cache for production
|
| 579 |
-
|
| 580 |
-
---
|
| 581 |
-
|
| 582 |
-
## Support
|
| 583 |
-
|
| 584 |
-
For issues:
|
| 585 |
-
1. Check logs: `python main.py` (stdout)
|
| 586 |
-
2. Test endpoints: http://localhost:8000/docs
|
| 587 |
-
3. Review `/debug/scores` for retrieval quality
|
| 588 |
-
4. Check `.env` configuration
|
| 589 |
-
|
| 590 |
-
Happy querying! π
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app/routers/chat.py
CHANGED
|
@@ -1,16 +1,18 @@
|
|
| 1 |
-
"""Chat / inference endpoints β OpenAI-compatible."""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
import json
|
| 6 |
import logging
|
| 7 |
import time
|
|
|
|
| 8 |
|
| 9 |
-
from fastapi import APIRouter, HTTPException
|
| 10 |
from fastapi.responses import StreamingResponse
|
| 11 |
|
| 12 |
from app.config import cfg
|
| 13 |
from app.models import (
|
|
|
|
| 14 |
ChatCompletionChoice,
|
| 15 |
ChatCompletionMessage,
|
| 16 |
ChatCompletionRequest,
|
|
@@ -23,6 +25,45 @@ logger = logging.getLogger("qmodel.chat")
|
|
| 23 |
router = APIRouter(tags=["inference"])
|
| 24 |
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 27 |
# POST /v1/chat/completions β OpenAI-compatible
|
| 28 |
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 1 |
+
"""Chat / inference endpoints β OpenAI-compatible + convenience /ask."""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
import json
|
| 6 |
import logging
|
| 7 |
import time
|
| 8 |
+
from typing import Literal, Optional
|
| 9 |
|
| 10 |
+
from fastapi import APIRouter, HTTPException, Query
|
| 11 |
from fastapi.responses import StreamingResponse
|
| 12 |
|
| 13 |
from app.config import cfg
|
| 14 |
from app.models import (
|
| 15 |
+
AskResponse,
|
| 16 |
ChatCompletionChoice,
|
| 17 |
ChatCompletionMessage,
|
| 18 |
ChatCompletionRequest,
|
|
|
|
| 25 |
router = APIRouter(tags=["inference"])
|
| 26 |
|
| 27 |
|
| 28 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 29 |
+
# GET /ask β convenience RAG query endpoint
|
| 30 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 31 |
+
@router.get("/ask", response_model=AskResponse)
|
| 32 |
+
async def ask(
|
| 33 |
+
q: str = Query(..., min_length=1, max_length=500, description="Your Islamic question"),
|
| 34 |
+
top_k: int = Query(5, ge=1, le=20, description="Number of sources to retrieve"),
|
| 35 |
+
source_type: Optional[Literal["quran", "hadith"]] = Query(None, description="Filter: quran | hadith"),
|
| 36 |
+
grade_filter: Optional[str] = Query(None, description="Hadith grade filter: sahih | hasan"),
|
| 37 |
+
):
|
| 38 |
+
"""Direct RAG query with full source attribution.
|
| 39 |
+
|
| 40 |
+
Returns an AI-generated answer grounded in Quran and Hadith sources,
|
| 41 |
+
with language detection, intent classification, and scored references.
|
| 42 |
+
"""
|
| 43 |
+
check_ready()
|
| 44 |
+
result = await run_rag_pipeline(q, top_k=top_k, source_type=source_type, grade_filter=grade_filter)
|
| 45 |
+
return AskResponse(
|
| 46 |
+
question=q,
|
| 47 |
+
answer=result["answer"],
|
| 48 |
+
language=result["language"],
|
| 49 |
+
intent=result["intent"],
|
| 50 |
+
analysis=result.get("analysis"),
|
| 51 |
+
sources=[
|
| 52 |
+
{
|
| 53 |
+
"source": s.get("source") or s.get("reference", ""),
|
| 54 |
+
"type": s.get("type", ""),
|
| 55 |
+
"grade": s.get("grade"),
|
| 56 |
+
"arabic": s.get("arabic", ""),
|
| 57 |
+
"english": s.get("english", ""),
|
| 58 |
+
"_score": round(s.get("_score", 0), 4),
|
| 59 |
+
}
|
| 60 |
+
for s in result.get("sources", [])
|
| 61 |
+
],
|
| 62 |
+
top_score=round(result["top_score"], 4),
|
| 63 |
+
latency_ms=result["latency_ms"],
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 68 |
# POST /v1/chat/completions β OpenAI-compatible
|
| 69 |
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
app/routers/ops.py
CHANGED
|
@@ -1,14 +1,16 @@
|
|
| 1 |
-
"""Operational endpoints β health, models."""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
import time
|
|
|
|
| 6 |
|
| 7 |
-
from fastapi import APIRouter
|
| 8 |
|
| 9 |
from app.config import cfg
|
| 10 |
from app.models import ModelInfo, ModelsListResponse
|
| 11 |
-
from app.
|
|
|
|
| 12 |
|
| 13 |
router = APIRouter(tags=["ops"])
|
| 14 |
|
|
@@ -18,7 +20,7 @@ def health():
|
|
| 18 |
"""Health check endpoint."""
|
| 19 |
return {
|
| 20 |
"status": "ok" if state.ready else "initialising",
|
| 21 |
-
"version": "
|
| 22 |
"llm_backend": cfg.LLM_BACKEND,
|
| 23 |
"dataset_size": len(state.dataset) if state.dataset else 0,
|
| 24 |
"faiss_total": state.faiss_index.ntotal if state.faiss_index else 0,
|
|
@@ -35,3 +37,40 @@ def list_models():
|
|
| 35 |
ModelInfo(id="qmodel", created=int(time.time()), owned_by="elgendy"),
|
| 36 |
]
|
| 37 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Operational endpoints β health, models, debug."""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
import time
|
| 6 |
+
from typing import Literal, Optional
|
| 7 |
|
| 8 |
+
from fastapi import APIRouter, Query
|
| 9 |
|
| 10 |
from app.config import cfg
|
| 11 |
from app.models import ModelInfo, ModelsListResponse
|
| 12 |
+
from app.search import hybrid_search, rewrite_query
|
| 13 |
+
from app.state import check_ready, state
|
| 14 |
|
| 15 |
router = APIRouter(tags=["ops"])
|
| 16 |
|
|
|
|
| 20 |
"""Health check endpoint."""
|
| 21 |
return {
|
| 22 |
"status": "ok" if state.ready else "initialising",
|
| 23 |
+
"version": "6.0.0",
|
| 24 |
"llm_backend": cfg.LLM_BACKEND,
|
| 25 |
"dataset_size": len(state.dataset) if state.dataset else 0,
|
| 26 |
"faiss_total": state.faiss_index.ntotal if state.faiss_index else 0,
|
|
|
|
| 37 |
ModelInfo(id="qmodel", created=int(time.time()), owned_by="elgendy"),
|
| 38 |
]
|
| 39 |
)
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
@router.get("/debug/scores", tags=["debug"])
|
| 43 |
+
async def debug_scores(
|
| 44 |
+
q: str = Query(..., min_length=1, max_length=500, description="Query to inspect"),
|
| 45 |
+
top_k: int = Query(10, ge=1, le=50, description="Number of results"),
|
| 46 |
+
source_type: Optional[Literal["quran", "hadith"]] = Query(None, description="Filter: quran | hadith"),
|
| 47 |
+
):
|
| 48 |
+
"""Inspect raw retrieval scores without calling the LLM.
|
| 49 |
+
|
| 50 |
+
Use this to calibrate CONFIDENCE_THRESHOLD and debug search quality.
|
| 51 |
+
"""
|
| 52 |
+
check_ready()
|
| 53 |
+
rewrite = await rewrite_query(q, state.llm)
|
| 54 |
+
results = await hybrid_search(
|
| 55 |
+
q, rewrite,
|
| 56 |
+
state.embed_model, state.faiss_index, state.dataset,
|
| 57 |
+
top_n=top_k, source_type=source_type,
|
| 58 |
+
)
|
| 59 |
+
return {
|
| 60 |
+
"query": q,
|
| 61 |
+
"intent": rewrite.get("intent", "general"),
|
| 62 |
+
"threshold": cfg.CONFIDENCE_THRESHOLD,
|
| 63 |
+
"count": len(results),
|
| 64 |
+
"results": [
|
| 65 |
+
{
|
| 66 |
+
"rank": i + 1,
|
| 67 |
+
"source": r.get("source") or r.get("reference", ""),
|
| 68 |
+
"type": r.get("type", ""),
|
| 69 |
+
"grade": r.get("grade"),
|
| 70 |
+
"_dense": round(r.get("_dense", 0), 4),
|
| 71 |
+
"_sparse": round(r.get("_sparse", 0), 4),
|
| 72 |
+
"_score": round(r.get("_score", 0), 4),
|
| 73 |
+
}
|
| 74 |
+
for i, r in enumerate(results)
|
| 75 |
+
],
|
| 76 |
+
}
|
main.py
CHANGED
|
@@ -33,7 +33,7 @@ logging.basicConfig(
|
|
| 33 |
|
| 34 |
from app.config import cfg
|
| 35 |
from app.state import lifespan
|
| 36 |
-
from app.routers import chat, ops
|
| 37 |
|
| 38 |
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 39 |
# FASTAPI APP
|
|
@@ -47,7 +47,7 @@ app = FastAPI(
|
|
| 47 |
"- Streaming support\n"
|
| 48 |
"- Islamic knowledge RAG pipeline"
|
| 49 |
),
|
| 50 |
-
version="
|
| 51 |
lifespan=lifespan,
|
| 52 |
)
|
| 53 |
|
|
@@ -62,6 +62,8 @@ app.add_middleware(
|
|
| 62 |
# Register routers
|
| 63 |
app.include_router(ops.router)
|
| 64 |
app.include_router(chat.router)
|
|
|
|
|
|
|
| 65 |
|
| 66 |
|
| 67 |
if __name__ == "__main__":
|
|
|
|
| 33 |
|
| 34 |
from app.config import cfg
|
| 35 |
from app.state import lifespan
|
| 36 |
+
from app.routers import chat, hadith, ops, quran
|
| 37 |
|
| 38 |
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 39 |
# FASTAPI APP
|
|
|
|
| 47 |
"- Streaming support\n"
|
| 48 |
"- Islamic knowledge RAG pipeline"
|
| 49 |
),
|
| 50 |
+
version="6.0.0",
|
| 51 |
lifespan=lifespan,
|
| 52 |
)
|
| 53 |
|
|
|
|
| 62 |
# Register routers
|
| 63 |
app.include_router(ops.router)
|
| 64 |
app.include_router(chat.router)
|
| 65 |
+
app.include_router(quran.router)
|
| 66 |
+
app.include_router(hadith.router)
|
| 67 |
|
| 68 |
|
| 69 |
if __name__ == "__main__":
|