aelgendy commited on
Commit
0eb0a7a
Β·
1 Parent(s): 0d33f13

Upload folder using huggingface_hub

Browse files
Files changed (8) hide show
  1. ARCHITECTURE.md +0 -334
  2. DOCKER.md +0 -443
  3. OPEN_WEBUI.md +0 -385
  4. README.md +503 -80
  5. SETUP.md +0 -590
  6. app/routers/chat.py +43 -2
  7. app/routers/ops.py +43 -4
  8. main.py +4 -2
ARCHITECTURE.md DELETED
@@ -1,334 +0,0 @@
1
- # QModel v6 Architecture β€” Detailed System Design
2
-
3
- > For a quick overview, see [README.md](README.md#architecture-overview)
4
-
5
- ## System Vision
6
- A RAG system specialized **exclusively** in authenticated Qur'an and Hadith. No hallucinations, no outside knowledgeβ€”only content from verified sources.
7
-
8
- ## Core Capabilities
9
-
10
- ### 1. **Quran Verse Lookup** (by partial text)
11
- - Text search: find any verse by typing part of its Arabic or English text
12
- - Exact substring + fuzzy word-overlap matching
13
-
14
- ### 2. **Quran Topic Search**
15
- - Semantic hybrid search to find verses related to any topic
16
- - Full Tafsir-aware prompting
17
-
18
- ### 3. **Quran Word Frequency & Analytics**
19
- - Count how many times a word appears across all 114 Surahs
20
- - Per-surah breakdown with example verses
21
- - Chapter-level analytics (verse count, revelation type)
22
-
23
- ### 4. **Hadith Lookup** (by partial text)
24
- - Text search across 9 Hadith collections
25
- - Optional collection filter
26
-
27
- ### 5. **Hadith Topic Search**
28
- - Semantic hybrid search for Hadiths by topic
29
- - Optional grade filter (sahih, hasan, etc.)
30
-
31
- ### 6. **Hadith Authenticity Verification**
32
- - Dual-method verification: text search + semantic search
33
- - Grade inference from collection name when not explicitly provided
34
- - Sources: Bukhari, Muslim, Abu Dawud, Tirmidhi, Ibn Majah, Nasa'i, Malik, Ahmad, Darimi
35
-
36
- ### 7. **Safety First**
37
- - **Confidence Gating**: Low-confidence queries return "not found" instead of LLM guess
38
- - **Source Attribution**: Every answer cites exact verse/Hadith with reference
39
- - **Grade Filtering**: Optional: only return Sahih-authenticated Hadiths
40
- - **Verbatim Quotes**: Copy text directly from data, no paraphrasing
41
-
42
- ## Modular Architecture (v6)
43
-
44
- ```
45
- main.py ← Thin launcher (73 lines)
46
- app/
47
- config.py ← Config class (env vars)
48
- llm.py ← LLM providers (Ollama, HuggingFace)
49
- cache.py ← TTL-LRU async cache
50
- arabic_nlp.py ← Arabic normalisation, stemming, language detection
51
- search.py ← Hybrid FAISS+BM25, text search, query rewriting
52
- analysis.py ← Intent detection, analytics, counting
53
- prompts.py ← Prompt engineering (persona, task instructions)
54
- models.py ← Pydantic schemas
55
- state.py ← AppState, lifespan, RAG pipeline
56
- routers/
57
- quran.py ← 6 Quran endpoints
58
- hadith.py ← 5 Hadith endpoints
59
- chat.py ← 2 OpenAI-compatible + inference endpoints
60
- ops.py ← 3 operational endpoints (health, models, debug)
61
- ```
62
-
63
- ---
64
-
65
- ## Data Pipeline
66
-
67
- The system follows a three-phase approach:
68
-
69
- **Metadata Schema** (47,179 entries: 6,236 Quran + 40,943 Hadith):
70
- ```json
71
- {
72
- "id": "surah:verse or hadith_prefix_number",
73
- "arabic": "...",
74
- "english": "...",
75
- "source": "Surah Al-Baqarah 2:43 | Sahih al-Bukhari 1",
76
- "type": "quran | hadith",
77
-
78
- // Quran only
79
- "surah_number": 2,
80
- "surah_name_en": "Al-Baqarah",
81
- "surah_name_ar": "Ψ§Ω„Ψ¨Ω‚Ψ±Ψ©",
82
- "verse_number": 43,
83
-
84
- // Hadith only
85
- "collection": "Sahih al-Bukhari",
86
- "grade": "Sahih",
87
- "hadith_number": 1
88
- }
89
- ```
90
-
91
- ### Phase 2: Indexing
92
- ```
93
- build_index.py
94
- β”œβ”€β”€ Load Quran + Hadith JSON
95
- β”œβ”€β”€ Encode all texts with multilingual-e5-large
96
- β”‚ β”œβ”€β”€ Dual embeddings: Arabic + English per item
97
- β”‚ └── Normalize before encoding
98
- └── Build FAISS IndexFlatIP for dense retrieval
99
- ```
100
-
101
- ### Phase 3: Retrieval & Ranking
102
-
103
- **Hybrid Search Algorithm** (`app/search.py`):
104
- 1. Dense retrieval: FAISS semantic scoring
105
- 2. Sparse retrieval: BM25 term-frequency ranking
106
- 3. Fusion: 60% dense + 40% sparse
107
- 4. Intent-aware boost: +0.08 to Hadith items when intent=hadith
108
- 5. Type filter: Optional (quran_only / hadith_only / authenticated_only)
109
- 6. Phrase matching: Exact phrase + word-overlap scoring for text search
110
-
111
- ---
112
-
113
- ## Module Reference
114
-
115
- ### `app/config.py` β€” Configuration
116
- - `Config` dataclass with all environment variables
117
- - Singleton `cfg` instance
118
- - Loads `.env` via dotenv
119
-
120
- ### `app/llm.py` β€” LLM Providers
121
- - `LLMProvider` abstract base class
122
- - `OllamaProvider` β€” primary (3-model fallback chain)
123
- - `HuggingFaceProvider` β€” alternative local inference
124
- - `create_llm_provider()` factory dispatches on `LLM_BACKEND` env var
125
-
126
- ### `app/cache.py` β€” TTL-LRU Cache
127
- - `TTLCache` with size limit (1024) and TTL (300s)
128
- - Pre-built instances: `search_cache`, `analysis_cache`, `rewrite_cache`
129
-
130
- ### `app/arabic_nlp.py` β€” Arabic NLP
131
- - `normalize_arabic()` β€” tashkeel removal, hamza normalization
132
- - `light_stem()` β€” prefix/suffix stripping
133
- - `tokenize_ar()` β€” Arabic-aware tokenization
134
- - `detect_language()` / `language_instruction()` β€” route persona by language
135
-
136
- ### `app/search.py` β€” Retrieval Engine
137
- - `rewrite_query()` β€” dual-language normalization, LLM-assisted rewriting
138
- - `hybrid_search()` β€” FAISS + BM25 fusion with intent-aware boosting
139
- - `text_search()` β€” exact substring + word-overlap matching (for verse/hadith lookup by partial text)
140
- - `build_context()` β€” format retrieved items for LLM prompt
141
-
142
- ### `app/analysis.py` β€” Analytics & Intent Detection
143
- - `detect_analysis_intent()` β€” identifies count / analytics / chapter queries
144
- - `count_occurrences()` β€” word frequency across all Surahs
145
- - `get_quran_analytics()` β€” chapter-level stats
146
- - `get_hadith_analytics()` β€” collection-level stats
147
- - `get_chapter_info()` β€” single Surah metadata
148
- - `get_verse()` β€” exact verse by surah:ayah
149
- - `detect_surah_info()` / `lookup_surah_info()` β€” Surah name resolution
150
-
151
- ### `app/prompts.py` β€” Prompt Engineering
152
- - `PERSONA` β€” Islamic scholar persona definition
153
- - `TASK_INSTRUCTIONS` β€” verbatim-quoting, anti-hallucination rules
154
- - `FORMAT_RULES` β€” citation box format
155
- - `build_messages()` β€” intent-aware system + user message construction
156
- - `not_found_answer()` β€” safe "not in dataset" response
157
-
158
- ### `app/models.py` β€” Pydantic Schemas
159
- All request/response models:
160
- - `ChatMessage`, `ChatCompletionRequest/Response/Choice` β€” OpenAI-compatible
161
- - `AskResponse`, `AnalysisResult`, `SourceItem` β€” RAG pipeline
162
- - `HadithVerifyResponse` β€” authenticity verification
163
- - `VerseItem`, `HadithItem`, `TextSearchResponse` β€” text search
164
- - `ChapterResponse`, `QuranAnalyticsResponse`, `HadithAnalyticsResponse` β€” analytics
165
- - `WordFrequencyResponse` β€” word counting
166
- - `ModelInfo`, `ModelsListResponse` β€” OpenAI models list
167
-
168
- ### `app/state.py` β€” Application State & Lifecycle
169
- - `AppState` β€” holds FAISS index, metadata, embedder, LLM provider
170
- - `lifespan()` β€” async startup (loads index, model, metadata)
171
- - `check_ready()` β€” dependency guard for endpoints
172
- - `run_rag_pipeline()` β€” full RAG: rewrite β†’ search β†’ context β†’ LLM β†’ response
173
- - `infer_hadith_grade()` β€” grade detection from collection name
174
-
175
- ---
176
-
177
- ## API Endpoints (16 total)
178
-
179
- ### Quran Router (`/quran/...`) β€” 6 endpoints
180
-
181
- | Endpoint | Method | Description |
182
- |----------|--------|-------------|
183
- | `/quran/search?q=...` | GET | Text search: find verses by partial Arabic/English text |
184
- | `/quran/topic?q=...&top_k=5` | GET | Semantic search: find verses related to a topic |
185
- | `/quran/word-frequency?word=...` | GET | Count word occurrences across all Surahs |
186
- | `/quran/analytics` | GET | Overall Quran stats (total verses, Surahs, types) |
187
- | `/quran/chapter/{number}` | GET | Single Surah metadata (name, verse count, type) |
188
- | `/quran/verse/{surah}:{ayah}` | GET | Exact verse lookup by reference |
189
-
190
- ### Hadith Router (`/hadith/...`) β€” 5 endpoints
191
-
192
- | Endpoint | Method | Description |
193
- |----------|--------|-------------|
194
- | `/hadith/search?q=...&collection=...` | GET | Text search across collections |
195
- | `/hadith/topic?q=...&top_k=5&grade=...` | GET | Semantic search by topic with optional grade filter |
196
- | `/hadith/verify?q=...` | GET | Authenticity verification (text + semantic search) |
197
- | `/hadith/collection/{name}?limit=20` | GET | Browse a specific collection |
198
- | `/hadith/analytics` | GET | Collection-level statistics |
199
-
200
- ### Chat Router β€” 2 endpoints
201
-
202
- | Endpoint | Method | Description |
203
- |----------|--------|-------------|
204
- | `/v1/chat/completions` | POST | OpenAI-compatible chat (SSE streaming supported) |
205
- | `/ask?q=...&top_k=5` | GET | Direct RAG query with full source attribution |
206
-
207
- ### Ops Router β€” 3 endpoints
208
-
209
- | Endpoint | Method | Description |
210
- |----------|--------|-------------|
211
- | `/health` | GET | Readiness check |
212
- | `/v1/models` | GET | OpenAI-compatible model listing |
213
- | `/debug/scores?q=...&top_k=10` | GET | Raw retrieval scores (no LLM call) |
214
-
215
- ---
216
-
217
- ## Anti-Hallucination Measures
218
-
219
- - Few-shot examples including "not found" refusal path
220
- - Hardcoded format rules (box/citation format required)
221
- - Verbatim copy rules (no reconstruction from memory)
222
- - Confidence threshold gating (default: 0.30)
223
- - Grade inference for Hadith authenticity (collection-based)
224
-
225
- ---
226
-
227
- ## Configuration
228
-
229
- **`.env` variables**:
230
- ```
231
- OLLAMA_HOST # Ollama server URL
232
- LLM_MODEL # Primary model (e.g. minimax-m2.7:cloud)
233
- LLM_BACKEND # "ollama" (default) or "huggingface"
234
- EMBED_MODEL # Embedding model (intfloat/multilingual-e5-large)
235
- FAISS_INDEX # Path to QModel.index
236
- METADATA_FILE # Path to metadata.json
237
- CONFIDENCE_THRESHOLD # Min hybrid score for LLM call (default: 0.30)
238
- HADITH_BOOST # Intent-aware boost for Hadith (default: 0.08)
239
- TOP_K_SEARCH # Retrieval candidate pool (default: 20)
240
- TOP_K_RETURN # Results returned to user (default: 5)
241
- TEMPERATURE # LLM creativity (default: 0.2 for factual)
242
- ```
243
-
244
- ---
245
-
246
- ## Deployment
247
-
248
- ### Local Development
249
- ```bash
250
- python main.py
251
- # API at http://localhost:8000
252
- # Docs at http://localhost:8000/docs
253
- ```
254
-
255
- ### Docker
256
- ```bash
257
- docker-compose up
258
- # Ollama on port 11434
259
- # QModel on port 8000
260
- ```
261
-
262
- ---
263
-
264
- ## Testing Examples
265
-
266
- ### 1. Quran Verse Lookup (Capability 1)
267
- ```bash
268
- curl "http://localhost:8000/quran/search?q=bismillah"
269
- ```
270
-
271
- ### 2. Quran Topic Search (Capability 2)
272
- ```bash
273
- curl "http://localhost:8000/quran/topic?q=patience&top_k=5"
274
- ```
275
-
276
- ### 3. Word Frequency (Capability 3)
277
- ```bash
278
- curl "http://localhost:8000/quran/word-frequency?word=mercy"
279
- # β†’ Returns: count per surah + total + examples
280
- ```
281
-
282
- ### 4. Quran Analytics (Capability 3)
283
- ```bash
284
- curl "http://localhost:8000/quran/analytics"
285
- curl "http://localhost:8000/quran/chapter/2"
286
- ```
287
-
288
- ### 5. Hadith Text Search (Capability 4)
289
- ```bash
290
- curl "http://localhost:8000/hadith/search?q=actions+are+judged+by+intentions"
291
- ```
292
-
293
- ### 6. Hadith Topic Search (Capability 5)
294
- ```bash
295
- curl "http://localhost:8000/hadith/topic?q=fasting&grade=sahih"
296
- ```
297
-
298
- ### 7. Hadith Authenticity Verification (Capability 6)
299
- ```bash
300
- curl "http://localhost:8000/hadith/verify?q=Actions+are+judged+by+intentions"
301
- # β†’ Returns: found=true, grade="Sahih", source="Sahih al-Bukhari 1"
302
- ```
303
-
304
- ### 8. Confidence Gate in Action (Safety)
305
- ```
306
- Q: "Who was Muhammad's 7th wife?" (not in dataset)
307
- β†’ Retrieval score: 0.15 (below 0.30 threshold)
308
- β†’ Returns: "Not in available dataset"
309
- β†’ LLM not called (prevents hallucination)
310
- ```
311
-
312
- ### 9. OpenAI-Compatible Chat (Streaming)
313
- ```bash
314
- curl -X POST http://localhost:8000/v1/chat/completions \
315
- -H "Content-Type: application/json" \
316
- -d '{"model":"qmodel","messages":[{"role":"user","content":"What does Islam say about charity?"}],"stream":true}'
317
- ```
318
-
319
- ---
320
-
321
- ## Roadmap: v6+ Enhancements
322
-
323
- - [x] Grade-based filtering: `?grade=sahih` to return only authenticated Hadiths
324
- - [x] Streaming responses: SSE for long-form answers
325
- - [x] Modular architecture: Separate routers, models, and services
326
- - [x] Dual LLM backend: Ollama + HuggingFace support
327
- - [x] Text search: Exact substring + fuzzy word-overlap matching
328
- - [x] Expanded endpoints: 16 endpoints across 4 routers
329
- - [ ] Chain of narrators: Display Isnad with full narrator details
330
- - [ ] Synonym expansion: Better topic matching (e.g., "mercy" β†’ "rahma, compassion")
331
- - [ ] Multi-Surah topics: Topics spanning multiple Surahs
332
- - [ ] Batch processing: Handle multiple questions in one request
333
- - [ ] Islamic calendar integration: Hijri date references
334
- - [ ] Tafsir integration: Dedicated Tafsir endpoint with scholar citations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DOCKER.md DELETED
@@ -1,443 +0,0 @@
1
- # QModel Docker Guide
2
-
3
- Complete guide for running QModel in Docker with both backend options.
4
-
5
- ## Quick Start
6
-
7
- ### Option 1: Docker Compose (Recommended)
8
-
9
- ```bash
10
- # 1. Copy example config
11
- cp .env.example .env
12
-
13
- # 2. Edit .env and choose your backend (see below)
14
- nano .env
15
-
16
- # 3. Run with compose
17
- docker-compose up
18
- ```
19
-
20
- API available at: `http://localhost:8000`
21
-
22
- ### Option 2: Docker CLI
23
-
24
- ```bash
25
- # Build image
26
- docker build -t qmodel .
27
-
28
- # Run with Ollama backend
29
- docker run -p 8000:8000 \
30
- --env-file .env \
31
- --add-host host.docker.internal:host-gateway \
32
- qmodel
33
-
34
- # Or run with HuggingFace backend
35
- docker run -p 8000:8000 \
36
- --env-file .env \
37
- --env HF_TOKEN=your_token_here \
38
- qmodel
39
- ```
40
-
41
- ---
42
-
43
- ## Backend Configuration
44
-
45
- Configure which backend to use via `.env` file:
46
-
47
- ### Backend 1: Ollama (Local)
48
-
49
- **Best for**: Development, testing, Docker Desktop
50
-
51
- ```bash
52
- # .env
53
- LLM_BACKEND=ollama
54
- OLLAMA_HOST=http://host.docker.internal:11434
55
- OLLAMA_MODEL=llama2
56
- ```
57
-
58
- **Prerequisites**:
59
- - Ollama installed on host machine
60
- - Running: `ollama serve`
61
- - Model pulled: `ollama pull llama2`
62
-
63
- **Why**:
64
- - βœ… Fast setup
65
- - βœ… No GPU required
66
- - βœ… Works on Docker Desktop (Mac/Windows)
67
- - ❌ Requires host Ollama service
68
-
69
- ### Backend 2: HuggingFace (Remote)
70
-
71
- **Best for**: Production, GPU servers, containerized environments
72
-
73
- ```bash
74
- # .env
75
- LLM_BACKEND=hf
76
- HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
77
- HF_DEVICE=auto
78
- ```
79
-
80
- **Prerequisites**:
81
- - GPU (recommended) OR significant RAM
82
- - HuggingFace token (for gated models)
83
-
84
- **Passing HF Token**:
85
- ```bash
86
- # Via docker-compose
87
- export HF_TOKEN=your_token_here
88
- docker-compose up
89
-
90
- # Via docker run
91
- docker run -p 8000:8000 \
92
- --env-file .env \
93
- --env HF_TOKEN=your_token_here \
94
- qmodel
95
- ```
96
-
97
- ---
98
-
99
- ## Docker Compose Configuration
100
-
101
- The `docker-compose.yml` includes:
102
-
103
- | Setting | Value | Description |
104
- |---------|-------|-------------|
105
- | **Image** | Builds from `Dockerfile` | Python 3.11 + dependencies |
106
- | **Port** | `8000:8000` | API port mapping |
107
- | **Env File** | `.env` | Configuration source |
108
- | **HF Token** | From `.env` or `${HF_TOKEN}` | For HuggingFace auth |
109
- | **Ollama Host** | `host.docker.internal:11434` | Connect to host Ollama |
110
- | **Volumes** | `.:/app` | Code changes sync (dev mode) |
111
- | **HF Cache** | `/root/.cache/huggingface` | Persistent model cache |
112
- | **Networks** | `qmodel-network` | Internal network |
113
- | **Health Check** | `/health` endpoint | Auto-restart on failure |
114
-
115
- ### For Production
116
-
117
- Modify `docker-compose.yml`:
118
- ```yaml
119
- services:
120
- qmodel:
121
- # ... (same as above)
122
- volumes:
123
- # Remove live code volume
124
- - huggingface_cache:/root/.cache/huggingface
125
- restart: on-failure:5
126
- ```
127
-
128
- ---
129
-
130
- ## Examples
131
-
132
- ### Development with Ollama
133
-
134
- ```bash
135
- # Terminal 1: Start Ollama
136
- ollama serve
137
-
138
- # Terminal 2: Run QModel
139
- cat > .env << EOF
140
- LLM_BACKEND=ollama
141
- OLLAMA_HOST=http://host.docker.internal:11434
142
- OLLAMA_MODEL=llama2
143
- TEMPERATURE=0.2
144
- CONFIDENCE_THRESHOLD=0.30
145
- EOF
146
-
147
- docker-compose up
148
- ```
149
-
150
- Access: `http://localhost:8000`
151
-
152
- ### Production with HuggingFace
153
-
154
- ```bash
155
- # Create .env for production
156
- cat > .env << EOF
157
- LLM_BACKEND=hf
158
- HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
159
- HF_DEVICE=auto
160
- TEMPERATURE=0.1
161
- CONFIDENCE_THRESHOLD=0.35
162
- ALLOWED_ORIGINS=yourdomain.com
163
- EOF
164
-
165
- # Export HF token
166
- export HF_TOKEN=hf_xxxxxxxxxxxxx
167
-
168
- # Run
169
- docker-compose up -d
170
- docker-compose logs -f
171
- ```
172
-
173
- ### Detached Mode
174
-
175
- ```bash
176
- # Run in background
177
- docker-compose up -d
178
-
179
- # View logs
180
- docker-compose logs -f
181
-
182
- # Check status
183
- docker-compose ps
184
-
185
- # Stop
186
- docker-compose down
187
- ```
188
-
189
- ---
190
-
191
- ## Troubleshooting
192
-
193
- ### "Cannot connect to Ollama"
194
-
195
- **Symptom**: `ConnectionRefusedError` when using Ollama backend
196
-
197
- **Solution**:
198
- ```bash
199
- # Ensure Ollama is running on host
200
- ollama serve
201
-
202
- # Verify in Docker container
203
- docker run --add-host host.docker.internal:host-gateway qmodel \
204
- python -c "import requests; print(requests.get('http://host.docker.internal:11434/api/tags').json())"
205
- ```
206
-
207
- ### "HuggingFace model not found"
208
-
209
- **Symptom**: `OSError: ... not found`
210
-
211
- **Solution**:
212
- ```bash
213
- # Check HF token is set
214
- echo $HF_TOKEN
215
-
216
- # If not set, export it
217
- export HF_TOKEN=hf_xxxxxxxxxxxxx
218
- docker-compose up
219
- ```
220
-
221
- ### "Out of memory"
222
-
223
- **Symptom**: Container exits with no error message
224
-
225
- **Solution**:
226
- - Use smaller model: `HF_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2`
227
- - Use Ollama with `neural-chat` model
228
- - Increase Docker memory limits:
229
-
230
- ```bash
231
- # Edit docker-compose.yml
232
- services:
233
- qmodel:
234
- deploy:
235
- resources:
236
- limits:
237
- memory: 16G
238
- ```
239
-
240
- ### "Port already in use"
241
-
242
- **Symptom**: `Address already in use`
243
-
244
- **Solution**:
245
- ```bash
246
- # Change port in docker-compose.yml
247
- ports:
248
- - "8001:8000"
249
-
250
- # Or kill existing container
251
- docker-compose down
252
- docker system prune
253
- ```
254
-
255
- ---
256
-
257
- ## Building Custom Images
258
-
259
- ### Build for Specific Backend
260
-
261
- No code changes needed - just use `.env` to configure.
262
-
263
- ### Build with Custom Requirements
264
-
265
- ```bash
266
- # Edit requirements.txt, then rebuild
267
- docker build -t qmodel:custom .
268
- ```
269
-
270
- ### Push to Registry
271
-
272
- ```bash
273
- # Tag for registry
274
- docker tag qmodel myregistry/qmodel:v6.1
275
-
276
- # Push
277
- docker push myregistry/qmodel:v6.1
278
-
279
- # Run from registry
280
- docker run -p 8000:8000 \
281
- --env-file .env \
282
- myregistry/qmodel:v6.1
283
- ```
284
-
285
- ---
286
-
287
- ## Performance Tips
288
-
289
- ### Docker Compose with GPU (Linux)
290
-
291
- ```yaml
292
- services:
293
- qmodel:
294
- deploy:
295
- resources:
296
- reservations:
297
- devices:
298
- - driver: nvidia
299
- count: 1
300
- capabilities: [gpu]
301
- ```
302
-
303
- Then set in `.env`:
304
- ```bash
305
- HF_DEVICE=cuda
306
- ```
307
-
308
- ### Reduce Memory Usage
309
-
310
- ```bash
311
- # In .env
312
- HF_MODEL_NAME=gpt2 # Tiny model
313
- OLLAMA_MODEL=orca-mini # Smaller Ollama model
314
- TOP_K_SEARCH=10 # Fewer candidates
315
- ```
316
-
317
- ### Cache Management
318
-
319
- ```bash
320
- # Clear HuggingFace cache
321
- docker-compose down
322
- docker volume rm qmodel_huggingface_cache
323
-
324
- # Or cleanup all
325
- docker system prune -a
326
- ```
327
-
328
- ---
329
-
330
- ## Docker Networking
331
-
332
- ### Access QModel from Host
333
-
334
- ```bash
335
- # Default (works)
336
- curl http://localhost:8000/health
337
- ```
338
-
339
- ### Custom Network
340
-
341
- ```bash
342
- # Create network
343
- docker network create qmodel-net
344
-
345
- # Run with network
346
- docker-compose -f docker-compose.yml up
347
- ```
348
-
349
- ### Multiple Containers
350
-
351
- ```yaml
352
- # docker-compose.yml
353
- services:
354
- qmodel:
355
- networks:
356
- - custom-network
357
- other-service:
358
- networks:
359
- - custom-network
360
-
361
- networks:
362
- custom-network:
363
- driver: bridge
364
- ```
365
-
366
- ---
367
-
368
- ## CI/CD Integration
369
-
370
- ### GitHub Actions Example
371
-
372
- ```yaml
373
- name: Deploy QModel
374
-
375
- on: [push]
376
-
377
- jobs:
378
- deploy:
379
- runs-on: ubuntu-latest
380
- steps:
381
- - uses: actions/checkout@v2
382
-
383
- - name: Build Docker image
384
- run: docker build -t qmodel .
385
-
386
- - name: Run tests
387
- run: |
388
- docker run -port 8000:8000 qmodel &
389
- sleep 30
390
- curl http://localhost:8000/health
391
-
392
- - name: Push to registry
393
- run: |
394
- echo ${{ secrets.REGISTRY_TOKEN }} | docker login -u ${{ secrets.REGISTRY_USER }}
395
- docker tag qmodel myregistry/qmodel:${{ github.sha }}
396
- docker push myregistry/qmodel:${{ github.sha }}
397
- ```
398
-
399
- ---
400
-
401
- ## Security Considerations
402
-
403
- ### Secrets Management
404
-
405
- ```bash
406
- # Don't commit .env with real tokens
407
- echo ".env" >> .gitignore
408
-
409
- # Use Docker secrets (Swarm mode)
410
- docker secret create hf_token -
411
- # Then use in compose:
412
- # HF_TOKEN=${HF_TOKEN_FILE}
413
- ```
414
-
415
- ### CORS Configuration
416
-
417
- ```bash
418
- # In .env (restrict in production)
419
- ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com
420
- ```
421
-
422
- ### Network Isolation
423
-
424
- ```yaml
425
- # docker-compose.yml
426
- services:
427
- qmodel:
428
- networks:
429
- - internal
430
-
431
- networks:
432
- internal:
433
- internal: true
434
- ```
435
-
436
- ---
437
-
438
- ## Reference
439
-
440
- - **Dockerfile**: Multi-stage build, health checks, proper layer caching
441
- - **docker-compose.yml**: Service definition, volumes, networking, health checks
442
- - **Environment**: Fully configurable via `.env`
443
- - **Backends**: Ollama (local) or HuggingFace (remote) via `LLM_BACKEND` variable
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
OPEN_WEBUI.md DELETED
@@ -1,385 +0,0 @@
1
- # Using QModel v6 with Open-WebUI
2
-
3
- QModel v6 is fully compatible with **Open-WebUI** thanks to its OpenAI-compatible API endpoints. This guide shows you how to integrate them.
4
-
5
- ## Prerequisites
6
-
7
- 1. **QModel running** on your local machine or server
8
- ```bash
9
- python main.py
10
- # Runs on http://localhost:8000
11
- ```
12
-
13
- 2. **Open-WebUI installed** (Docker recommended)
14
- ```bash
15
- docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:latest
16
- # Runs on http://localhost:3000
17
- ```
18
-
19
- ---
20
-
21
- ## Integration Steps
22
-
23
- ### Step 1: Add QModel as a Custom OpenAI-Compatible Model
24
-
25
- In Open-WebUI:
26
-
27
- 1. **Settings** β†’ **Models** β†’ **Manage Models**
28
- 2. Click **"Connect to OpenAI-compatible API"**
29
- 3. Enter:
30
- - **API Base URL**: `http://localhost:8000/v1`
31
- - **Model Name**: `QModel` (or `qmodel`)
32
- - **API Key**: Leave blank (no auth required)
33
-
34
- 4. Click **"Save & Test"**
35
- 5. You should see: βœ… **Model connected successfully**
36
-
37
- ### Step 2: Start Using QModel
38
-
39
- 1. Open a **New Chat** in Open-WebUI
40
- 2. Select **QModel** from the model dropdown
41
- 3. Type your Islamic question:
42
- ```
43
- What does the Quran say about mercy?
44
- ```
45
-
46
- 4. Press Enter and get an Islamic-grounded RAG response with sources!
47
-
48
- ---
49
-
50
- ## API Endpoints (OpenAI-Compatible)
51
-
52
- ### POST `/v1/chat/completions`
53
- Standard OpenAI chat completions endpoint.
54
-
55
- **Request:**
56
- ```json
57
- {
58
- "model": "QModel",
59
- "messages": [
60
- {"role": "user", "content": "What does Islam say about patience?"}
61
- ],
62
- "temperature": 0.2,
63
- "max_tokens": 2048,
64
- "top_k": 5,
65
- "stream": false
66
- }
67
- ```
68
-
69
- **Response:**
70
- ```json
71
- {
72
- "id": "qmodel-1234567890",
73
- "object": "chat.completion",
74
- "created": 1234567890,
75
- "model": "QModel",
76
- "choices": [
77
- {
78
- "index": 0,
79
- "message": {
80
- "role": "assistant",
81
- "content": "Islam emphasizes patience as a core virtue..."
82
- },
83
- "finish_reason": "stop"
84
- }
85
- ],
86
- "x_metadata": {
87
- "language": "english",
88
- "intent": "general",
89
- "top_score": 0.876,
90
- "latency_ms": 342,
91
- "sources": [
92
- {
93
- "source": "Surah Al-Imran 3:200",
94
- "type": "quran",
95
- "grade": null,
96
- "score": 0.876
97
- }
98
- ]
99
- }
100
- }
101
- ```
102
-
103
- ### GET `/v1/models`
104
- List available models.
105
-
106
- **Response:**
107
- ```json
108
- {
109
- "object": "list",
110
- "data": [
111
- {
112
- "id": "QModel",
113
- "object": "model",
114
- "created": 1234567890,
115
- "owned_by": "elgendy"
116
- }
117
- ]
118
- }
119
- ```
120
-
121
- ---
122
-
123
- ## Advanced Query Parameters (Open-WebUI Compatible)
124
-
125
- When using Open-WebUI, you can include special parameters:
126
-
127
- ### Islamic-Specific Parameters
128
-
129
- **URL Query String:**
130
- ```
131
- /v1/chat/completions?source_type=hadith&grade_filter=sahih&top_k=5
132
- ```
133
-
134
- **Supported Parameters:**
135
- - `source_type`: `quran` | `hadith` | (both, default)
136
- - `grade_filter`: `sahih` | `hasan` | (all, default)
137
- - `top_k`: 1-20 (number of sources to retrieve)
138
-
139
- ### Example Requests via curl
140
-
141
- ```bash
142
- # 1. Basic query (both Quran + Hadith)
143
- curl -X POST http://localhost:8000/v1/chat/completions \
144
- -H "Content-Type: application/json" \
145
- -d '{
146
- "model": "QModel",
147
- "messages": [{"role": "user", "content": "What does Islam say about mercy?"}]
148
- }'
149
-
150
- # 2. Quran-only query
151
- curl -X POST http://localhost:8000/v1/chat/completions?source_type=quran \
152
- -H "Content-Type: application/json" \
153
- -d '{
154
- "model": "QModel",
155
- "messages": [{"role": "user", "content": "What does the Quran say about patience?"}]
156
- }'
157
-
158
- # 3. Authenticated Hadiths only (Sahih grade)
159
- curl -X POST http://localhost:8000/v1/chat/completions?source_type=hadith&grade_filter=sahih \
160
- -H "Content-Type: application/json" \
161
- -d '{
162
- "model": "QModel",
163
- "messages": [{"role": "user", "content": "Hadiths about prayer"}]
164
- }'
165
-
166
- # 4. Streaming response
167
- curl -X POST http://localhost:8000/v1/chat/completions \
168
- -H "Content-Type: application/json" \
169
- -d '{
170
- "model": "QModel",
171
- "messages": [{"role": "user", "content": "Tell me about Zakat"}],
172
- "stream": true
173
- }'
174
- ```
175
-
176
- ---
177
-
178
- ## Open-WebUI Features Supported
179
-
180
- | Feature | Status | Notes |
181
- |---------|--------|-------|
182
- | **Chat** | βœ… Full support | Normal Q&A |
183
- | **Streaming** | βœ… Supported | Set `stream: true` in request |
184
- | **Context** | βœ… Multi-turn | Open-WebUI handles conversation history |
185
- | **Temperature** | βœ… Configurable | Via Open-WebUI settings |
186
- | **Token Limits** | βœ… Supported | Via `max_tokens` parameter |
187
- | **Model List** | βœ… Available | Via `/v1/models` endpoint |
188
- | **Source Attribution** | βœ… In metadata | Via `x_metadata.sources` |
189
-
190
- ---
191
-
192
- ## Custom System Prompts in Open-WebUI
193
-
194
- To customize QModel for specific Islamic tasks, create a custom chatbot in Open-WebUI:
195
-
196
- 1. **Home** β†’ **+ New Chatbot**
197
- 2. Configure:
198
- - **Name**: "Islamic Scholar" (or your choice)
199
- - **Model**: QModel
200
- - **System Prompt**:
201
- ```
202
- You are an expert Islamic scholar specializing in Qur'an and Hadith.
203
- Always cite sources exactly as provided.
204
- Only answer from the provided Islamic contextβ€”never use outside knowledge.
205
- If information is not in the dataset, say so clearly.
206
- ```
207
- - **Top K Sources**: 5
208
- - **Temperature**: 0.1 (for consistency)
209
-
210
- 3. **Save** and start chatting!
211
-
212
- ---
213
-
214
- ## Troubleshooting
215
-
216
- ### Issue: "Failed to connect to QModel"
217
-
218
- **Solutions:**
219
- 1. Check QModel is running: `curl http://localhost:8000/health`
220
- 2. Verify API Base URL is correct: `http://localhost:8000/v1`
221
- 3. Check firewall: Port 8000 must be accessible
222
- 4. Check logs: `python main.py` to see startup messages
223
-
224
- ### Issue: "No sources in response"
225
-
226
- **Solutions:**
227
- 1. Check `/debug/scores` endpoint directly:
228
- ```bash
229
- curl "http://localhost:8000/debug/scores?q=patience&top_k=10"
230
- ```
231
- 2. Adjust `CONFIDENCE_THRESHOLD` in `.env` if retrievals are low-quality
232
- 3. Try synonyms: "mercy" instead of "compassion"
233
-
234
- ### Issue: "Assistant returns 'Not found'"
235
-
236
- **This is expected behavior!** QModel has safety checks:
237
- 1. If retrieval score is too low (< 0.30), it returns "not found"
238
- 2. This prevents hallucinations
239
- 3. Try more specific queries or adjust `CONFIDENCE_THRESHOLD`
240
-
241
- ---
242
-
243
- ## Configuration for Open-WebUI
244
-
245
- ### Recommended Settings
246
-
247
- For best results with Open-WebUI:
248
-
249
- ```env
250
- # More conservative (fewer hallucinations)
251
- CONFIDENCE_THRESHOLD=0.40
252
- TEMPERATURE=0.1
253
- HADITH_BOOST=0.08
254
-
255
- # More liberal (more answers, higher hallucination risk)
256
- CONFIDENCE_THRESHOLD=0.20
257
- TEMPERATURE=0.3
258
- HADITH_BOOST=0.05
259
- ```
260
-
261
- ### Docker Compose Integration
262
-
263
- To run both QModel and Open-WebUI together:
264
-
265
- ```yaml
266
- version: '3.8'
267
- services:
268
- qmodel:
269
- build: .
270
- ports:
271
- - "8000:8000"
272
- environment:
273
- - LLM_BACKEND=ollama
274
- - OLLAMA_HOST=http://ollama:11434
275
- depends_on:
276
- - ollama
277
-
278
- ollama:
279
- image: ollama/ollama:latest
280
- ports:
281
- - "11434:11434"
282
-
283
- web-ui:
284
- image: ghcr.io/open-webui/open-webui:latest
285
- ports:
286
- - "3000:8080"
287
- depends_on:
288
- - qmodel
289
- ```
290
-
291
- Run: `docker-compose up`
292
-
293
- ---
294
-
295
- ## Using QModel in Open-WebUI Workflows
296
-
297
- ### Example 1: Islamic Q&A Chatbot
298
-
299
- 1. Create chatbot with system prompt about Islamic knowledge
300
- 2. Select QModel as backend
301
- 3. Set temperature to 0.1 for consistency
302
- 4. Enable web search toggle (optional, for cross-verification)
303
-
304
- ### Example 2: Hadith Research Tool
305
-
306
- 1. Create chatbot: "Hadith Researcher"
307
- 2. System prompt:
308
- ```
309
- You are a Hadith researcher. For each query:
310
- 1. Search authenticated Hadiths only (Sahih grade)
311
- 2. Display the full text with authenticity grade
312
- 3. Explain the Hadith's significance
313
- 4. Always cite the collection and number
314
- ```
315
- 3. Enable grade filtering: `grade_filter=sahih`
316
-
317
- ### Example 3: Qur'anic Study Assistant
318
-
319
- 1. Create chatbot: "Qur'an Tafsir"
320
- 2. Set `source_type=quran` in parameters
321
- 3. System prompt focusing on Qur'anic interpretation
322
- 4. Enable multi-turn for deeper exploration
323
-
324
- ---
325
-
326
- ## API Testing
327
-
328
- ### Test with Open-WebUI's Developer Tools
329
-
330
- 1. Open Open-WebUI console (F12)
331
- 2. Go to **Network** tab
332
- 3. Send a message to QModel
333
- 4. Inspect the request/response to `/v1/chat/completions`
334
-
335
- ### Test with cURL
336
-
337
- ```bash
338
- # 1. Health check
339
- curl http://localhost:8000/health | jq
340
-
341
- # 2. List models
342
- curl http://localhost:8000/v1/models | jq
343
-
344
- # 3. Simple chat
345
- curl -X POST http://localhost:8000/v1/chat/completions \
346
- -H "Content-Type: application/json" \
347
- -d '{"model":"QModel","messages":[{"role":"user","content":"Assalam alaikum"}]}' | jq
348
- ```
349
-
350
- ---
351
-
352
- ## Performance Tips
353
-
354
- ### For Optimal Open-WebUI Experience
355
-
356
- 1. **Use Ollama locally** for responsive chat (400-800ms per query)
357
- 2. **Set `max_tokens=1024`** to avoid long waits
358
- 3. **Use temperature=0.1** for reliable, consistent answers
359
- 4. **Increase `CACHE_TTL`** for frequently asked questions
360
- 5. **Reduce `TOP_K_SEARCH`** if queries are slow (default 20)
361
-
362
- ---
363
-
364
- ## Security Notes
365
-
366
- ### For Production Deployments
367
-
368
- 1. **Restrict CORS**: Set `ALLOWED_ORIGINS=your-domain.com` in `.env`
369
- 2. **Use HTTPS**: Proxy through nginx with TLS
370
- 3. **Rate limit**: Add rate limiting middleware (not in v6, but recommended)
371
- 4. **Authentication**: Consider adding API key validation layer
372
- 5. **Network**: Don't expose QModel directly to the internet without auth
373
-
374
- ---
375
-
376
- ## Support
377
-
378
- - πŸ“– Full setup guide: See `SETUP.md`
379
- - πŸ” Debugging: Use `/debug/scores` to inspect retrievals
380
- - πŸ’¬ Questions about Open-WebUI: See https://docs.openwebui.com
381
- - πŸ•Œ Islamic knowledge: See `ARCHITECTURE.md` for system details
382
-
383
- ---
384
-
385
- **Happy chatting with QModel + Open-WebUI! πŸ•Œ**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -64,33 +64,218 @@ language:
64
 
65
  ---
66
 
67
- ## Quick Start (5 minutes)
 
 
 
 
 
 
 
 
68
 
69
  ```bash
70
- # 1. Install
71
- git clone https://github.com/elgendy/QModel.git && cd QModel
72
  python3 -m venv .venv && source .venv/bin/activate
73
  pip install -r requirements.txt
74
 
75
- # 2. Configure (choose one)
76
- # For local development - Ollama:
77
  export LLM_BACKEND=ollama
78
  export OLLAMA_MODEL=llama2
79
  # Make sure Ollama is running: ollama serve
80
 
81
- # OR for production - HuggingFace:
82
  export LLM_BACKEND=hf
83
  export HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
84
 
85
- # 3. Run
86
  python main.py
87
 
88
- # 4. Query
89
  curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
90
  ```
91
 
92
  API docs: http://localhost:8000/docs
93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  ---
95
 
96
  ## Example Queries
@@ -105,134 +290,372 @@ curl "http://localhost:8000/ask?q=How%20many%20times%20is%20mercy%20mentioned?"
105
  # Authentic Hadiths only
106
  curl "http://localhost:8000/ask?q=prayer&source_type=hadith&grade_filter=sahih"
107
 
108
- # Verify Hadith
109
- curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"
110
- ```
111
 
112
- ---
 
 
 
 
 
 
 
113
 
114
- ## Documentation
 
 
 
 
 
 
 
 
 
 
115
 
116
- | Document | Purpose |
117
- |----------|---------|
118
- | **[SETUP.md](SETUP.md)** | Installation, configuration (both backends), API endpoints, examples |
119
- | **[DOCKER.md](DOCKER.md)** | Docker deployment, production setup, troubleshooting |
120
- | **[ARCHITECTURE.md](ARCHITECTURE.md)** | System design, data pipeline, core components |
121
- | **[OPEN_WEBUI.md](OPEN_WEBUI.md)** | Integration with Open-WebUI chat interface |
 
 
122
 
123
  ---
124
 
125
- ## Key Decisions
 
 
126
 
127
  ### Backend Selection
128
- - **Ollama** β€” Fast setup, no GPU, great for development, `LLM_BACKEND=ollama`
129
- - **HuggingFace** β€” Production-grade, better quality, GPU recommended, `LLM_BACKEND=hf`
130
 
131
- Both are equally supported via the same `.env` configuration. Just set `LLM_BACKEND` and restart.
 
 
 
132
 
133
- ### Data
134
- - **47,626 documents**: 6,236 Quranic verses + 41,390 hadiths from 9 canonical collections
135
- - **Pre-built**: `metadata.json` and `QModel.index` included, ready to use
136
- - **Dual-language**: Arabic and English support
137
 
138
- ---
 
 
 
 
139
 
140
- ## Open-WebUI Integration
141
 
142
- QModel integrates seamlessly with Open-WebUI for a chat interface:
143
 
144
  ```bash
145
- # Start QModel
146
- python main.py
 
 
 
147
 
148
- # Start Open-WebUI (Docker)
149
- docker run -p 3000:8080 ghcr.io/open-webui/open-webui:latest
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
150
 
151
- # In Open-WebUI: Settings β†’ Models β†’ Add OpenAI-compatible
152
- # API Base: http://localhost:8000/v1
153
- # Model: QModel
 
 
 
 
 
 
154
  ```
155
 
156
- See [OPEN_WEBUI.md](OPEN_WEBUI.md) for detailed integration guide.
 
 
 
 
157
 
158
  ---
159
 
160
- ## API Reference (Quick)
161
 
162
- ### Main Query
 
 
 
 
163
  ```
164
- GET /ask?q=<question>&top_k=5&source_type=<quran|hadith>&grade_filter=<sahih|hasan>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  ```
166
 
167
- **Response includes:**
168
- - AI-generated answer
169
- - Listed sources with scores
170
- - Language detection (Arabic/English)
171
- - Query intent classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
- ### Other Endpoints
174
- - `GET /debug/scores?q=<question>&top_k=10` β€” Inspect raw retrieval scores
175
- - `GET /hadith/verify?q=<hadith_text>` β€” Check hadith authenticity
176
- - `POST /v1/chat/completions` β€” OpenAI-compatible endpoint
177
- - `GET /health` β€” Health check
178
 
179
- See [SETUP.md](SETUP.md) for full endpoint documentation.
 
 
180
 
181
  ---
182
 
183
- ## Configuration
184
 
185
- All configuration via environment variables (no code changes needed):
186
 
187
- ```bash
188
- # Backend (required)
189
- LLM_BACKEND=ollama # or: hf
190
 
191
- # Ollama settings
192
- OLLAMA_HOST=http://localhost:11434
193
- OLLAMA_MODEL=llama2 # or: mistral, neural-chat
194
 
195
- # HuggingFace settings
196
- HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
197
- HF_DEVICE=auto # auto, cuda, or cpu
198
 
199
- # Quality tuning
200
- TEMPERATURE=0.2 # 0=deterministic, 1=creative
201
- CONFIDENCE_THRESHOLD=0.30 # Min score for LLM call
202
- TOP_K_RETURN=5 # Results per query
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
203
  ```
204
 
205
- See [SETUP.md](SETUP.md) for comprehensive configuration reference.
 
 
 
 
 
 
 
 
 
 
206
 
207
  ---
208
 
209
- ## Performance
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210
 
211
  | Operation | Time | Backend |
212
  |-----------|------|---------|
213
  | Query (cached) | ~50ms | Both |
214
- | Query (Ollama) | 400-800ms | Ollama |
215
- | Query (HF GPU) | 500-1500ms | CUDA |
216
- | Query (HF CPU) | 2-5s | CPU |
217
 
218
  ---
219
 
220
- ## Deployment
221
 
222
- ### Local Development
223
  ```bash
224
- python main.py
 
225
  ```
226
 
227
- ### Docker (with Ollama backend)
228
  ```bash
229
- docker-compose up
230
  ```
231
 
232
- ### Docker (with HuggingFace backend)
233
- Set `LLM_BACKEND=hf` in `.env` then `docker-compose up`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
 
235
- See [DOCKER.md](DOCKER.md) for production deployment, troubleshooting, and advanced configuration.
 
 
 
 
 
 
 
 
 
 
 
236
 
237
  ---
238
 
 
64
 
65
  ---
66
 
67
+ ## Quick Start
68
+
69
+ ### Prerequisites
70
+ - Python 3.10+
71
+ - 16 GB RAM minimum (for embeddings + LLM)
72
+ - GPU recommended for HuggingFace backend
73
+ - Ollama installed (for local development) OR internet access (for HuggingFace)
74
+
75
+ ### Installation
76
 
77
  ```bash
78
+ # Clone and enter project
79
+ git clone https://github.com/Logicsoft/QModel.git && cd QModel
80
  python3 -m venv .venv && source .venv/bin/activate
81
  pip install -r requirements.txt
82
 
83
+ # Configure (choose one backend)
84
+ # Option A β€” Ollama (local development):
85
  export LLM_BACKEND=ollama
86
  export OLLAMA_MODEL=llama2
87
  # Make sure Ollama is running: ollama serve
88
 
89
+ # Option B β€” HuggingFace (production):
90
  export LLM_BACKEND=hf
91
  export HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
92
 
93
+ # Run
94
  python main.py
95
 
96
+ # Query
97
  curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
98
  ```
99
 
100
  API docs: http://localhost:8000/docs
101
 
102
+ ### Data & Index
103
+
104
+ Pre-built data files are included:
105
+ - `metadata.json` β€” 47,626 documents (6,236 Quran verses + 41,390 hadiths from 9 canonical collections)
106
+ - `QModel.index` β€” FAISS search index
107
+
108
+ To rebuild after dataset changes:
109
+ ```bash
110
+ python build_index.py
111
+ ```
112
+
113
+ ---
114
+
115
+ ## API Reference (18 endpoints)
116
+
117
+ ### Inference
118
+
119
+ | Endpoint | Method | Description |
120
+ |----------|--------|-------------|
121
+ | `/ask?q=...&top_k=5&source_type=&grade_filter=` | GET | Direct RAG query with full source attribution |
122
+ | `/v1/chat/completions` | POST | OpenAI-compatible chat (SSE streaming supported) |
123
+
124
+ ### Quran (`/quran/...`)
125
+
126
+ | Endpoint | Method | Description |
127
+ |----------|--------|-------------|
128
+ | `/quran/search?q=...&limit=10` | GET | Text search: find verses by partial Arabic/English text |
129
+ | `/quran/topic?topic=...&top_k=10` | GET | Semantic search: find verses related to a topic |
130
+ | `/quran/word-frequency?word=...` | GET | Count word occurrences across all Surahs |
131
+ | `/quran/analytics` | GET | Overall Quran stats (total verses, Surahs, revelation types) |
132
+ | `/quran/chapter/{number}` | GET | All verses and metadata for a specific Surah |
133
+ | `/quran/verse/{surah}:{ayah}` | GET | Exact verse lookup by reference (e.g. `/quran/verse/2:255`) |
134
+
135
+ ### Hadith (`/hadith/...`)
136
+
137
+ | Endpoint | Method | Description |
138
+ |----------|--------|-------------|
139
+ | `/hadith/search?q=...&collection=&limit=10` | GET | Text search across collections |
140
+ | `/hadith/topic?topic=...&top_k=10&grade_filter=` | GET | Semantic search by topic with optional grade filter |
141
+ | `/hadith/verify?q=...&collection=` | GET | Authenticity verification (text + semantic search) |
142
+ | `/hadith/collection/{name}?limit=20&offset=0` | GET | Browse a specific collection |
143
+ | `/hadith/analytics` | GET | Collection-level statistics |
144
+
145
+ ### Operations
146
+
147
+ | Endpoint | Method | Description |
148
+ |----------|--------|-------------|
149
+ | `/health` | GET | Readiness check |
150
+ | `/v1/models` | GET | OpenAI-compatible model listing |
151
+ | `/debug/scores?q=...&top_k=10&source_type=` | GET | Raw retrieval scores (no LLM call) |
152
+
153
+ ---
154
+
155
+ ### GET `/ask` β€” Main Query
156
+
157
+ ```bash
158
+ curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?&top_k=5"
159
+ ```
160
+
161
+ **Parameters:**
162
+ | Parameter | Default | Description |
163
+ |-----------|---------|-------------|
164
+ | `q` | *(required)* | Your Islamic question |
165
+ | `top_k` | `5` | Number of sources to retrieve (1–20) |
166
+ | `source_type` | both | `quran` or `hadith` |
167
+ | `grade_filter` | all | `sahih` or `hasan` |
168
+
169
+ **Response:**
170
+ ```json
171
+ {
172
+ "question": "What does Islam say about mercy?",
173
+ "answer": "Islam emphasizes mercy as a core value...",
174
+ "language": "english",
175
+ "intent": "general",
176
+ "analysis": null,
177
+ "sources": [
178
+ {
179
+ "source": "Surah Al-Baqarah 2:178",
180
+ "type": "quran",
181
+ "grade": null,
182
+ "arabic": "...",
183
+ "english": "...",
184
+ "_score": 0.876
185
+ }
186
+ ],
187
+ "top_score": 0.876,
188
+ "latency_ms": 342
189
+ }
190
+ ```
191
+
192
+ ### POST `/v1/chat/completions` β€” OpenAI-Compatible
193
+
194
+ ```bash
195
+ curl -X POST http://localhost:8000/v1/chat/completions \
196
+ -H "Content-Type: application/json" \
197
+ -d '{
198
+ "model": "QModel",
199
+ "messages": [{"role": "user", "content": "What does Islam say about patience?"}],
200
+ "temperature": 0.2,
201
+ "max_tokens": 2048,
202
+ "top_k": 5,
203
+ "stream": false
204
+ }'
205
+ ```
206
+
207
+ **Response:**
208
+ ```json
209
+ {
210
+ "id": "qmodel-1234567890",
211
+ "object": "chat.completion",
212
+ "created": 1234567890,
213
+ "model": "QModel",
214
+ "choices": [
215
+ {
216
+ "index": 0,
217
+ "message": { "role": "assistant", "content": "Islam emphasizes patience..." },
218
+ "finish_reason": "stop"
219
+ }
220
+ ],
221
+ "x_metadata": {
222
+ "language": "english",
223
+ "intent": "general",
224
+ "top_score": 0.876,
225
+ "latency_ms": 342,
226
+ "sources": [{ "source": "Surah Al-Imran 3:200", "type": "quran", "score": 0.876 }]
227
+ }
228
+ }
229
+ ```
230
+
231
+ ### GET `/hadith/verify` β€” Authenticity Check
232
+
233
+ ```bash
234
+ curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"
235
+ ```
236
+
237
+ **Response:**
238
+ ```json
239
+ {
240
+ "query": "Actions are judged by intentions",
241
+ "found": true,
242
+ "collection": "Sahih al-Bukhari",
243
+ "grade": "Sahih",
244
+ "reference": "Sahih al-Bukhari 1",
245
+ "arabic": "Ψ₯Ω†Ω…Ψ§ Ψ§Ω„Ψ£ΨΉΩ…Ψ§Ω„ Ψ¨Ψ§Ω„Ω†ΩŠΨ§Ψͺ",
246
+ "english": "Verily, actions are judged by intentions...",
247
+ "latency_ms": 156
248
+ }
249
+ ```
250
+
251
+ ### GET `/debug/scores` β€” Retrieval Inspection
252
+
253
+ ```bash
254
+ curl "http://localhost:8000/debug/scores?q=patience&top_k=10"
255
+ ```
256
+
257
+ Use this to calibrate `CONFIDENCE_THRESHOLD`. If queries you expect to work have `_score < threshold`, lower the threshold.
258
+
259
+ **Response:**
260
+ ```json
261
+ {
262
+ "query": "patience",
263
+ "intent": "general",
264
+ "threshold": 0.3,
265
+ "count": 10,
266
+ "results": [
267
+ {
268
+ "rank": 1,
269
+ "source": "Surah Al-Baqarah 2:45",
270
+ "type": "quran",
271
+ "_dense": 0.8234,
272
+ "_sparse": 0.5421,
273
+ "_score": 0.7234
274
+ }
275
+ ]
276
+ }
277
+ ```
278
+
279
  ---
280
 
281
  ## Example Queries
 
290
  # Authentic Hadiths only
291
  curl "http://localhost:8000/ask?q=prayer&source_type=hadith&grade_filter=sahih"
292
 
293
+ # Quran text search
294
+ curl "http://localhost:8000/quran/search?q=bismillah"
 
295
 
296
+ # Quran topic search
297
+ curl "http://localhost:8000/quran/topic?topic=patience&top_k=5"
298
+
299
+ # Quran word frequency
300
+ curl "http://localhost:8000/quran/word-frequency?word=mercy"
301
+
302
+ # Single chapter
303
+ curl "http://localhost:8000/quran/chapter/2"
304
 
305
+ # Exact verse
306
+ curl "http://localhost:8000/quran/verse/2:255"
307
+
308
+ # Hadith text search
309
+ curl "http://localhost:8000/hadith/search?q=actions+are+judged+by+intentions"
310
+
311
+ # Hadith topic search (Sahih only)
312
+ curl "http://localhost:8000/hadith/topic?topic=fasting&grade_filter=sahih"
313
+
314
+ # Verify Hadith authenticity
315
+ curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"
316
 
317
+ # Browse a collection
318
+ curl "http://localhost:8000/hadith/collection/bukhari?limit=5"
319
+
320
+ # Streaming (OpenAI-compatible)
321
+ curl -X POST http://localhost:8000/v1/chat/completions \
322
+ -H "Content-Type: application/json" \
323
+ -d '{"model":"QModel","messages":[{"role":"user","content":"What does Islam say about charity?"}],"stream":true}'
324
+ ```
325
 
326
  ---
327
 
328
+ ## Configuration
329
+
330
+ All configuration via environment variables (`.env` file or exported directly):
331
 
332
  ### Backend Selection
 
 
333
 
334
+ | Backend | Pros | Cons | When to Use |
335
+ |---------|------|------|------------|
336
+ | **Ollama** | Fast setup, no GPU, free | Smaller models | Development, testing |
337
+ | **HuggingFace** | Larger models, better quality | Requires GPU or significant RAM | Production |
338
 
339
+ ### Ollama Backend (Development)
 
 
 
340
 
341
+ ```bash
342
+ LLM_BACKEND=ollama
343
+ OLLAMA_HOST=http://localhost:11434
344
+ OLLAMA_MODEL=llama2 # or: mistral, neural-chat, orca-mini
345
+ ```
346
 
347
+ Requires: `ollama serve` running and model pulled (`ollama pull llama2`).
348
 
349
+ ### HuggingFace Backend (Production)
350
 
351
  ```bash
352
+ LLM_BACKEND=hf
353
+ HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
354
+ HF_DEVICE=auto # auto | cuda | cpu
355
+ HF_MAX_NEW_TOKENS=2048
356
+ ```
357
 
358
+ ### All Environment Variables
359
+
360
+ | Variable | Default | Description |
361
+ |----------|---------|-------------|
362
+ | **Backend** | | |
363
+ | `LLM_BACKEND` | `hf` | `ollama` or `hf` |
364
+ | `OLLAMA_HOST` | `http://localhost:11434` | Ollama server URL |
365
+ | `OLLAMA_MODEL` | `llama2` | Ollama model name |
366
+ | `HF_MODEL_NAME` | `Qwen/Qwen2-7B-Instruct` | HuggingFace model ID |
367
+ | `HF_DEVICE` | `auto` | `auto`, `cuda`, or `cpu` |
368
+ | `HF_MAX_NEW_TOKENS` | `2048` | Max output length |
369
+ | **Embedding & Data** | | |
370
+ | `EMBED_MODEL` | `intfloat/multilingual-e5-large` | Embedding model |
371
+ | `FAISS_INDEX` | `QModel.index` | Index file path |
372
+ | `METADATA_FILE` | `metadata.json` | Dataset file |
373
+ | **Retrieval** | | |
374
+ | `TOP_K_SEARCH` | `20` | Candidate pool (5–100) |
375
+ | `TOP_K_RETURN` | `5` | Results shown to user (1–20) |
376
+ | `RERANK_ALPHA` | `0.6` | Dense vs Sparse weight (0.0–1.0) |
377
+ | **Generation** | | |
378
+ | `TEMPERATURE` | `0.2` | Creativity (0.0–1.0, use 0.1–0.2 for religious) |
379
+ | `MAX_TOKENS` | `2048` | Max response length |
380
+ | **Safety** | | |
381
+ | `CONFIDENCE_THRESHOLD` | `0.30` | Min score to call LLM (higher = fewer hallucinations) |
382
+ | `HADITH_BOOST` | `0.08` | Score boost for hadith on hadith queries |
383
+ | **Other** | | |
384
+ | `CACHE_SIZE` | `512` | Query response cache entries |
385
+ | `CACHE_TTL` | `3600` | Cache expiry in seconds |
386
+ | `ALLOWED_ORIGINS` | `*` | CORS origins |
387
+ | `MAX_EXAMPLES` | `3` | Few-shot examples in system prompt |
388
+
389
+ ### Configuration Examples
390
+
391
+ **Development (Ollama)**
392
+ ```bash
393
+ LLM_BACKEND=ollama
394
+ OLLAMA_HOST=http://localhost:11434
395
+ OLLAMA_MODEL=llama2
396
+ TEMPERATURE=0.2
397
+ CONFIDENCE_THRESHOLD=0.30
398
+ ALLOWED_ORIGINS=*
399
+ ```
400
 
401
+ **Production (HuggingFace + GPU)**
402
+ ```bash
403
+ LLM_BACKEND=hf
404
+ HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
405
+ HF_DEVICE=cuda
406
+ TOP_K_SEARCH=30
407
+ TEMPERATURE=0.1
408
+ CONFIDENCE_THRESHOLD=0.35
409
+ ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com
410
  ```
411
 
412
+ ### Tuning Tips
413
+
414
+ - **Better results**: Increase `TOP_K_SEARCH`, lower `CONFIDENCE_THRESHOLD`, use `TEMPERATURE=0.1`
415
+ - **Faster performance**: Lower `TOP_K_SEARCH` and `TOP_K_RETURN`, reduce `MAX_TOKENS`, use Ollama
416
+ - **More conservative**: Increase `CONFIDENCE_THRESHOLD`, lower `TEMPERATURE`
417
 
418
  ---
419
 
420
+ ## Docker Deployment
421
 
422
+ ### Docker Compose (Recommended)
423
+
424
+ ```bash
425
+ cp .env.example .env # Configure backend (see Configuration section)
426
+ docker-compose up
427
  ```
428
+
429
+ ### Docker CLI
430
+
431
+ ```bash
432
+ docker build -t qmodel .
433
+
434
+ # With Ollama backend
435
+ docker run -p 8000:8000 \
436
+ --env-file .env \
437
+ --add-host host.docker.internal:host-gateway \
438
+ qmodel
439
+
440
+ # With HuggingFace backend
441
+ docker run -p 8000:8000 \
442
+ --env-file .env \
443
+ --env HF_TOKEN=your_token_here \
444
+ qmodel
445
+ ```
446
+
447
+ ### Docker with Ollama
448
+
449
+ ```bash
450
+ # .env
451
+ LLM_BACKEND=ollama
452
+ OLLAMA_HOST=http://host.docker.internal:11434
453
+ OLLAMA_MODEL=llama2
454
  ```
455
 
456
+ Requires Ollama running on the host (`ollama serve`).
457
+
458
+ ### Docker with HuggingFace
459
+
460
+ ```bash
461
+ # .env
462
+ LLM_BACKEND=hf
463
+ HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
464
+ HF_DEVICE=auto
465
+
466
+ # Pass HF token
467
+ export HF_TOKEN=hf_xxxxxxxxxxxxx
468
+ docker-compose up
469
+ ```
470
+
471
+ ### Docker Compose with GPU (Linux)
472
+
473
+ ```yaml
474
+ services:
475
+ qmodel:
476
+ deploy:
477
+ resources:
478
+ reservations:
479
+ devices:
480
+ - driver: nvidia
481
+ count: 1
482
+ capabilities: [gpu]
483
+ ```
484
 
485
+ ### Production Tips
 
 
 
 
486
 
487
+ - Remove dev volume mount (`.:/app`) in `docker-compose.yml`
488
+ - Set `restart: on-failure:5`
489
+ - Use specific `ALLOWED_ORIGINS` instead of `*`
490
 
491
  ---
492
 
493
+ ## Open-WebUI Integration
494
 
495
+ QModel is fully OpenAI-compatible and works out of the box with Open-WebUI.
496
 
497
+ ### Setup
 
 
498
 
499
+ ```bash
500
+ # Start QModel
501
+ python main.py
502
 
503
+ # Start Open-WebUI
504
+ docker run -d -p 3000:8080 --name open-webui ghcr.io/open-webui/open-webui:latest
505
+ ```
506
 
507
+ ### Connect
508
+
509
+ 1. **Settings** β†’ **Models** β†’ **Manage Models**
510
+ 2. Click **"Connect to OpenAI-compatible API"**
511
+ 3. **API Base URL**: `http://localhost:8000/v1`
512
+ 4. **Model Name**: `QModel`
513
+ 5. **API Key**: Leave blank
514
+ 6. **Save & Test** β†’ βœ… Connected
515
+
516
+ ### Docker Compose (QModel + Ollama + Open-WebUI)
517
+
518
+ ```yaml
519
+ version: '3.8'
520
+ services:
521
+ qmodel:
522
+ build: .
523
+ ports:
524
+ - "8000:8000"
525
+ environment:
526
+ - LLM_BACKEND=ollama
527
+ - OLLAMA_HOST=http://ollama:11434
528
+
529
+ ollama:
530
+ image: ollama/ollama:latest
531
+ ports:
532
+ - "11434:11434"
533
+
534
+ web-ui:
535
+ image: ghcr.io/open-webui/open-webui:latest
536
+ ports:
537
+ - "3000:8080"
538
+ depends_on:
539
+ - qmodel
540
  ```
541
 
542
+ ### Supported Features
543
+
544
+ | Feature | Status |
545
+ |---------|--------|
546
+ | Chat | βœ… Full support |
547
+ | Streaming | βœ… `stream: true` |
548
+ | Multi-turn context | βœ… Handled by Open-WebUI |
549
+ | Temperature | βœ… Configurable |
550
+ | Token limits | βœ… `max_tokens` |
551
+ | Model listing | βœ… `/v1/models` |
552
+ | Source attribution | βœ… `x_metadata.sources` |
553
 
554
  ---
555
 
556
+ ## Architecture
557
+
558
+ ### Module Structure
559
+
560
+ ```
561
+ main.py ← FastAPI app + router registration
562
+ app/
563
+ config.py ← Config class (env vars)
564
+ llm.py ← LLM providers (Ollama, HuggingFace)
565
+ cache.py ← TTL-LRU async cache
566
+ arabic_nlp.py ← Arabic normalization, stemming, language detection
567
+ search.py ← Hybrid FAISS+BM25, text search, query rewriting
568
+ analysis.py ← Intent detection, analytics, counting
569
+ prompts.py ← Prompt engineering (persona, anti-hallucination)
570
+ models.py ← Pydantic schemas
571
+ state.py ← AppState, lifespan, RAG pipeline
572
+ routers/
573
+ quran.py ← 6 Quran endpoints
574
+ hadith.py ← 5 Hadith endpoints
575
+ chat.py ← /ask + OpenAI-compatible chat
576
+ ops.py ← health, models, debug scores
577
+ ```
578
+
579
+ ### Data Pipeline
580
+
581
+ 1. **Ingest**: 47,626 documents (6,236 Quran verses + 41,390 Hadiths from 9 collections)
582
+ 2. **Embed**: Encode with `multilingual-e5-large` (Arabic + English dual embeddings)
583
+ 3. **Index**: FAISS `IndexFlatIP` for dense retrieval
584
+
585
+ ### Retrieval & Ranking
586
+
587
+ 1. Dense retrieval (FAISS semantic scoring)
588
+ 2. Sparse retrieval (BM25 term-frequency)
589
+ 3. Fusion: 60% dense + 40% sparse
590
+ 4. Intent-aware boost (+0.08 to Hadith when intent=hadith)
591
+ 5. Type filter (quran_only / hadith_only / authenticated_only)
592
+ 6. Text search fallback (exact phrase + word-overlap)
593
+
594
+ ### Anti-Hallucination Measures
595
+
596
+ - Few-shot examples including "not found" refusal path
597
+ - Hardcoded citation format rules
598
+ - Verbatim copy rules (no text reconstruction)
599
+ - Confidence threshold gating (default: 0.30)
600
+ - Post-generation citation verification
601
+ - Grade inference from collection name
602
+
603
+ ### Performance
604
 
605
  | Operation | Time | Backend |
606
  |-----------|------|---------|
607
  | Query (cached) | ~50ms | Both |
608
+ | Query (Ollama) | 400–800ms | Ollama |
609
+ | Query (HF GPU) | 500–1500ms | CUDA |
610
+ | Query (HF CPU) | 2–5s | CPU |
611
 
612
  ---
613
 
614
+ ## Troubleshooting
615
 
616
+ ### "Cannot connect to Ollama"
617
  ```bash
618
+ ollama serve # Ensure Ollama is running on host
619
+ # In Docker, use OLLAMA_HOST=http://host.docker.internal:11434
620
  ```
621
 
622
+ ### "HuggingFace model not found"
623
  ```bash
624
+ export HF_TOKEN=hf_xxxxxxxxxxxxx # Set token for gated models
625
  ```
626
 
627
+ ### "Out of memory"
628
+ - Use smaller model: `HF_MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2`
629
+ - Use Ollama with `neural-chat`
630
+ - Reduce `MAX_TOKENS` to 1024
631
+ - Increase Docker memory limit in `docker-compose.yml`
632
+
633
+ ### "Assistant returns 'Not found'"
634
+ This is expected β€” QModel rejects low-confidence queries. Try:
635
+ - More specific queries
636
+ - Lower `CONFIDENCE_THRESHOLD` in `.env`
637
+ - Check raw scores: `GET /debug/scores?q=your+query`
638
+
639
+ ### "Port already in use"
640
+ ```bash
641
+ docker-compose down && docker system prune
642
+ # Or change port: ports: ["8001:8000"]
643
+ ```
644
+
645
+ ---
646
 
647
+ ## Roadmap
648
+
649
+ - [x] Grade-based filtering
650
+ - [x] Streaming responses (SSE)
651
+ - [x] Modular architecture (4 routers, 18 endpoints)
652
+ - [x] Dual LLM backend (Ollama + HuggingFace)
653
+ - [x] Text search (exact substring + fuzzy matching)
654
+ - [ ] Chain of narrators (Isnad display)
655
+ - [ ] Synonym expansion (mercy β†’ rahma, compassion)
656
+ - [ ] Batch processing (multiple questions per request)
657
+ - [ ] Islamic calendar integration (Hijri dates)
658
+ - [ ] Tafsir endpoint with scholar citations
659
 
660
  ---
661
 
SETUP.md DELETED
@@ -1,590 +0,0 @@
1
- # QModel v6 Setup & Deployment Guide
2
-
3
- ## Quick Start
4
-
5
- ### 1. Prerequisites
6
- - Python 3.10+
7
- - 16 GB RAM minimum (for embeddings + LLM)
8
- - GPU recommended for HuggingFace backend
9
- - Ollama installed (for local development) OR internet access (for HuggingFace)
10
-
11
- ### 2. Installation
12
-
13
- ```bash
14
- # Clone and enter project
15
- cd /Users/elgendy/Projects/QModel
16
-
17
- # Create virtual environment
18
- python3 -m venv .venv
19
- source .venv/bin/activate
20
-
21
- # Install dependencies
22
- pip install -r requirements.txt
23
- ```
24
-
25
- ### 3. Data & Index
26
-
27
- The project includes pre-built data files:
28
- - `metadata.json` β€” 47,626 documents (6,236 Quran verses + 41,390 hadiths from 9 canonical collections)
29
- - `QModel.index` β€” FAISS search index (pre-generated)
30
-
31
- If you need to rebuild the index after dataset changes:
32
- ```bash
33
- python build_index.py
34
- ```
35
-
36
- ---
37
-
38
- ## Backend Configuration
39
-
40
- QModel supports two LLM backends. Choose based on your environment:
41
-
42
- | Backend | Pros | Cons | When to Use |
43
- |---------|------|------|------------|
44
- | **Ollama** (local) | Fast setup, no GPU needed, no model downloads, free | Smaller models, limited customization | Development, testing, resource-constrained |
45
- | **HuggingFace** (remote) | Larger models, better quality, full control | Requires GPU or significant RAM, slower downloads | Production, high-quality responses |
46
-
47
- ### LLM Backend Selection
48
-
49
- **Option 1: Local Ollama (Development)**
50
-
51
- For development, testing, and when you already have Ollama running locally:
52
-
53
- ```bash
54
- LLM_BACKEND=ollama
55
- OLLAMA_HOST=http://localhost:11434
56
- OLLAMA_MODEL=llama2 # or: mistral, neural-chat, orca-mini
57
- ```
58
-
59
- **Available Ollama Models:**
60
- - `llama2` β€” Fast, good quality (default, recommended)
61
- - `mistral` β€” Better Arabic support
62
- - `neural-chat` β€” Good balance
63
- - `openchat` β€” Good instruction following
64
- - `orca-mini` β€” Lightweight
65
-
66
- **Option 2: Remote HuggingFace (Production)**
67
-
68
- For production deployments with better quality and control:
69
-
70
- ```bash
71
- LLM_BACKEND=hf
72
- HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct # Excellent Arabic support
73
- HF_DEVICE=auto # auto | cuda | cpu
74
- HF_MAX_NEW_TOKENS=2048
75
- ```
76
-
77
- **Recommended HuggingFace Models:**
78
- - `Qwen/Qwen2-7B-Instruct` β€” Excellent Arabic, strong reasoning (default)
79
- - `mistralai/Mistral-7B-Instruct-v0.2` β€” Very capable, fast
80
- - `meta-llama/Llama-2-13b-chat-hf` β€” Larger, needs HF token
81
-
82
- **Device Options:**
83
- - `auto` β€” Auto-detect (GPU if available, else CPU)
84
- - `cuda` β€” Force GPU (requires NVIDIA GPU)
85
- - `cpu` β€” Force CPU (slower, but works everywhere)
86
-
87
- ### Complete Environment Variables Reference
88
-
89
- #### Backend Selection
90
- | Variable | Default | Options | Example |
91
- |----------|---------|---------|---------|
92
- | `LLM_BACKEND` | `hf` | `ollama`, `hf` | `ollama` |
93
-
94
- #### Ollama Backend
95
- | Variable | Default | Description | Example |
96
- |----------|---------|-------------|---------|
97
- | `OLLAMA_HOST` | `http://localhost:11434` | Ollama server URL | `http://localhost:11434` |
98
- | `OLLAMA_MODEL` | `llama2` | Model name | `mistral` |
99
-
100
- #### HuggingFace Backend
101
- | Variable | Default | Description | Example |
102
- |----------|---------|-------------|---------|
103
- | `HF_MODEL_NAME` | `Qwen/Qwen2-7B-Instruct` | Model ID | `Qwen/Qwen2-7B-Instruct` |
104
- | `HF_DEVICE` | `auto` | Device to use | `cuda` |
105
- | `HF_MAX_NEW_TOKENS` | `2048` | Max output length | `2048` |
106
-
107
- #### Embedding & Data
108
- | Variable | Default | Description |
109
- |----------|---------|-------------|
110
- | `EMBED_MODEL` | `intfloat/multilingual-e5-large` | Embedding model (keep default) |
111
- | `FAISS_INDEX` | `QModel.index` | Index file path |
112
- | `METADATA_FILE` | `metadata.json` | Dataset file |
113
-
114
- #### Retrieval & Ranking
115
- | Variable | Default | Range | Purpose |
116
- |----------|---------|-------|---------|
117
- | `TOP_K_SEARCH` | `20` | 5-100 | Candidate pool (⬆️ = slower but more coverage) |
118
- | `TOP_K_RETURN` | `5` | 1-20 | Results shown to user |
119
- | `RERANK_ALPHA` | `0.6` | 0.0-1.0 | Dense (0.6) vs Sparse (0.4) weighting |
120
-
121
- #### Generation
122
- | Variable | Default | Range | Purpose |
123
- |----------|---------|-------|---------|
124
- | `TEMPERATURE` | `0.2` | 0.0-1.0 | 0.0=deterministic, 1.0=creative (use 0.1-0.2 for religious) |
125
- | `MAX_TOKENS` | `2048` | 512-4096 | Max response length |
126
-
127
- #### Safety & Quality
128
- | Variable | Default | Range | Purpose |
129
- |----------|---------|-------|---------|
130
- | `CONFIDENCE_THRESHOLD` | `0.30` | 0.0-1.0 | Min score to call LLM (⬆️ = fewer hallucinations) |
131
- | `HADITH_BOOST` | `0.08` | 0.0-1.0 | Score boost for hadith on hadith queries |
132
-
133
- #### Other Settings
134
- | Variable | Default | Description |
135
- |----------|---------|-------------|
136
- | `CACHE_SIZE` | `512` | Query response cache entries |
137
- | `CACHE_TTL` | `3600` | Cache expiry in seconds |
138
- | `ALLOWED_ORIGINS` | `*` | CORS origins (use specific domains in production) |
139
- | `MAX_EXAMPLES` | `3` | Few-shot examples in system prompt |
140
-
141
- ### Configuration Examples
142
-
143
- **Development (Ollama) - Recommended for getting started**
144
- ```bash
145
- LLM_BACKEND=ollama
146
- OLLAMA_HOST=http://localhost:11434
147
- OLLAMA_MODEL=llama2
148
-
149
- EMBED_MODEL=intfloat/multilingual-e5-large
150
- FAISS_INDEX=QModel.index
151
- METADATA_FILE=metadata.json
152
-
153
- TOP_K_SEARCH=20
154
- TOP_K_RETURN=5
155
- TEMPERATURE=0.2
156
- CONFIDENCE_THRESHOLD=0.30
157
- ALLOWED_ORIGINS=*
158
- ```
159
-
160
- **Production (HuggingFace + GPU) - Best quality, uses GPU**
161
- ```bash
162
- LLM_BACKEND=hf
163
- HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
164
- HF_DEVICE=cuda
165
-
166
- EMBED_MODEL=intfloat/multilingual-e5-large
167
- FAISS_INDEX=QModel.index
168
- METADATA_FILE=metadata.json
169
-
170
- TOP_K_SEARCH=30 # More candidates for better quality
171
- TOP_K_RETURN=5
172
- TEMPERATURE=0.1 # More deterministic
173
- CONFIDENCE_THRESHOLD=0.35
174
- ALLOWED_ORIGINS=yourdomain.com,api.yourdomain.com
175
- ```
176
-
177
- **Production (HuggingFace + CPU) - CPU-only, slower but no GPU required**
178
- ```bash
179
- LLM_BACKEND=hf
180
- HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct
181
- HF_DEVICE=cpu
182
-
183
- TEMPERATURE=0.1
184
- MAX_TOKENS=1024 # Reduce for faster responses
185
- CONFIDENCE_THRESHOLD=0.35
186
- ```
187
-
188
- ### Tuning Tips
189
-
190
- **For Better Results:**
191
- - Increase `TOP_K_SEARCH` (costs slightly more compute)
192
- - Lower `CONFIDENCE_THRESHOLD` (may get some hallucinations)
193
- - Use larger model with more parameters
194
- - Set `TEMPERATURE=0.1` for most consistent answers
195
-
196
- **For Faster Performance:**
197
- - Lower `TOP_K_SEARCH` and `TOP_K_RETURN`
198
- - Use Ollama backend (faster inference)
199
- - Reduce `MAX_TOKENS`
200
- - Set `HF_DEVICE=cpu` if using HF (faster than auto-selecting)
201
-
202
- **For More Accurate/Conservative Answers:**
203
- - Increase `CONFIDENCE_THRESHOLD` (skip borderline queries)
204
- - Lower `TEMPERATURE` (more deterministic)
205
- - Use larger model (7B+ parameters)
206
-
207
- **For CPU-Only (No GPU Available):**
208
- - Use Ollama backend with `neural-chat` model
209
- - Set `HF_DEVICE=cpu` if using HF
210
- - Reduce `MAX_TOKENS` to 1024
211
-
212
- ---
213
-
214
- ## Running QModel
215
-
216
- ### Step-by-Step: Starting the API
217
-
218
- 1. **Create `.env` file**:
219
- ```bash
220
- cp .env.example .env
221
- # Edit .env and choose your backend (see Configuration section above)
222
- ```
223
-
224
- 2. **Start the backend service**:
225
-
226
- **If using Ollama:**
227
- ```bash
228
- # Terminal 1: Start Ollama daemon
229
- ollama serve
230
-
231
- # Terminal 2: Pull a model (first time only)
232
- ollama pull llama2 # or: mistral, neural-chat
233
- ```
234
-
235
- **If using HuggingFace:**
236
- - No separate service needed, models download automatically
237
-
238
- 3. **Start QModel API**:
239
- ```bash
240
- python main.py
241
- ```
242
-
243
- API available at `http://localhost:8000`
244
-
245
- View interactive docs: `http://localhost:8000/docs`
246
-
247
- ### Docker Option
248
-
249
- ```bash
250
- # Configure your backend in .env (see Configuration section)
251
- cp .env.example .env
252
- nano .env # Choose LLM_BACKEND=ollama or hf
253
-
254
- # Run with Docker Compose
255
- docker-compose up
256
- ```
257
-
258
- For full Docker documentation (including production deployment, troubleshooting, and multi-container setup), see **[DOCKER.md](DOCKER.md)**.
259
-
260
- ---
261
-
262
- ## API Endpoints
263
-
264
- ### Main Query Endpoint
265
-
266
- ```bash
267
- GET /ask?q=<question>&top_k=5&source_type=<filter>&grade_filter=<filter>
268
- ```
269
-
270
- **Parameters:**
271
- - `q` (required): Your Islamic question
272
- - `top_k`: Number of sources to retrieve (1-20, default: 5)
273
- - `source_type`: Filter by source type
274
- - `quran` β€” Quranic verses only
275
- - `hadith` β€” Hadiths only
276
- - `null` (default) β€” Both
277
- - `grade_filter`: Filter Hadith by authenticity grade
278
- - `sahih` β€” Only Sahih-graded Hadiths
279
- - `hasan` β€” Sahih + Hasan
280
- - `null` (default) β€” All grades
281
-
282
- **Example Requests:**
283
-
284
- ```bash
285
- # General question
286
- curl "http://localhost:8000/ask?q=What%20does%20Islam%20say%20about%20mercy?"
287
-
288
- # Quran-only with word frequency
289
- curl "http://localhost:8000/ask?q=How%20many%20times%20is%20mercy%20mentioned?&source_type=quran"
290
-
291
- # Authentic Hadiths only
292
- curl "http://localhost:8000/ask?q=Hadiths%20about%20prayer&source_type=hadith&grade_filter=sahih"
293
- ```
294
-
295
- **Response:**
296
- ```json
297
- {
298
- "question": "What does Islam say about mercy?",
299
- "answer": "Islam emphasizes mercy as a core value...",
300
- "language": "english",
301
- "intent": "general",
302
- "analysis": null,
303
- "sources": [
304
- {
305
- "source": "Surah Al-Baqarah 2:178",
306
- "type": "quran",
307
- "grade": null,
308
- "arabic": "...",
309
- "english": "...",
310
- "_score": 0.876
311
- }
312
- ],
313
- "top_score": 0.876,
314
- "latency_ms": 342
315
- }
316
- ```
317
-
318
- ---
319
-
320
- ### Hadith Verification Endpoint
321
-
322
- ```bash
323
- GET /hadith/verify?q=<hadith_text>&collection=<filter>
324
- ```
325
-
326
- **Purpose:** Quick authenticity check for a Hadith
327
-
328
- **Example:**
329
- ```bash
330
- curl "http://localhost:8000/hadith/verify?q=Actions%20are%20judged%20by%20intentions"
331
- ```
332
-
333
- **Response:**
334
- ```json
335
- {
336
- "query": "Actions are judged by intentions",
337
- "found": true,
338
- "collection": "Sahih al-Bukhari",
339
- "grade": "Sahih",
340
- "reference": "Sahih al-Bukhari 1",
341
- "arabic": "Ψ₯Ω†Ω…Ψ§ Ψ§Ω„Ψ£ΨΉΩ…Ψ§Ω„ Ψ¨Ψ§Ω„Ω†ΩŠΨ§Ψͺ",
342
- "english": "Verily, actions are judged by intentions...",
343
- "latency_ms": 156
344
- }
345
- ```
346
-
347
- ---
348
-
349
- ### Debug Endpoint
350
-
351
- ```bash
352
- GET /debug/scores?q=<question>&top_k=10
353
- ```
354
-
355
- **Purpose:** Inspect raw retrieval scores without LLM call. Use to calibrate `CONFIDENCE_THRESHOLD`.
356
-
357
- **Example:**
358
- ```bash
359
- curl "http://localhost:8000/debug/scores?q=patience&top_k=10"
360
- ```
361
-
362
- **Response:**
363
- ```json
364
- {
365
- "intent": "general",
366
- "threshold": 0.3,
367
- "results": [
368
- {
369
- "rank": 1,
370
- "source": "Surah Al-Baqarah 2:45",
371
- "type": "quran",
372
- "grade": null,
373
- "_dense": 0.8234,
374
- "_sparse": 0.5421,
375
- "_score": 0.7234
376
- }
377
- ]
378
- }
379
- ```
380
-
381
- Use this to fine-tune `CONFIDENCE_THRESHOLD`. If queries you expect to work have `_score < threshold`, lower the threshold.
382
-
383
- ---
384
-
385
- ### Health & Metadata
386
-
387
- ```bash
388
- # Health check
389
- curl http://localhost:8000/health
390
-
391
- # List available models
392
- curl http://localhost:8000/v1/models
393
-
394
- # Interactive API docs
395
- http://localhost:8000/docs
396
- ```
397
-
398
- ---
399
-
400
- ## Query Examples
401
-
402
- ### 1. Word Frequency Analysis
403
-
404
- **Question:** "How many times is the word 'mercy' mentioned in the Quran?"
405
-
406
- **System detects:** `intent=count`
407
-
408
- **Response includes:**
409
- ```json
410
- {
411
- "analysis": {
412
- "keyword": "mercy",
413
- "total_count": 87,
414
- "by_surah": {
415
- "2": {"name": "Al-Baqarah", "count": 12},
416
- "7": {"name": "Al-A'raf", "count": 8},
417
- ...
418
- }
419
- }
420
- }
421
- ```
422
-
423
- ---
424
-
425
- ### 2. Topic-Based Aya Retrieval
426
-
427
- **Question:** "What does the Quran say about patience?"
428
-
429
- **System detects:** `intent=tafsir`
430
-
431
- **Response:**
432
- - Retrieves top 5 verses about patience
433
- - LLM explains each with Tafsir
434
- - Shows interconnections between verses
435
-
436
- ---
437
-
438
- ### 3. Hadith Authentication
439
-
440
- **Question:** "Is the Hadith 'Actions are judged by intentions' authentic?"
441
-
442
- **System detects:** `intent=auth`
443
-
444
- **LLM response:**
445
- - "Yes, this is found in Sahih al-Bukhari 1"
446
- - "Grade: Sahih (authentic)"
447
- - "Explanation: This Hadith establishes the principle of intention..."
448
-
449
- ---
450
-
451
- ### 4. Bilingual Support
452
-
453
- **Arabic Question:** "Ω…Ψ§ Ψ£Ω‡Ω…ΩŠΨ© Ψ§Ω„Ψ΅Ψ¨Ψ± في Ψ§Ω„Ψ₯Ψ³Ω„Ψ§Ω…ΨŸ"
454
-
455
- **System detects:** Language = arabic
456
-
457
- **Response:** Full Arabic response with proper vocalization
458
-
459
- ---
460
-
461
- ## Tuning & Optimization
462
-
463
- ### Confidence Threshold
464
-
465
- The `CONFIDENCE_THRESHOLD` (default 0.30) controls when to call the LLM:
466
-
467
- - **Too high (e.g., 0.70)**: Many queries rejected as "not found" (safer but less helpful)
468
- - **Too low (e.g., 0.10)**: LLM called on weak matches (more hallucinations)
469
- - **Sweet spot (0.30-0.50)**: Most queries get through, but low-quality matches rejected
470
-
471
- **To calibrate:**
472
- 1. Run `/debug/scores` on representative queries
473
- 2. Check what `_score` values are returned
474
- 3. Adjust `CONFIDENCE_THRESHOLD` in `.env`
475
- 4. Restart service
476
-
477
- ---
478
-
479
- ### Temperature
480
-
481
- - **0.0**: Deterministic (best for factual Islamic answers)
482
- - **0.2**: Slightly creative (default)
483
- - **0.5+**: More creative (not recommended for religious content)
484
-
485
- ---
486
-
487
- ### Model Selection
488
-
489
- #### For Development (Ollama)
490
- - **llama2** β€” Fastest, good quality, easy setup
491
- - **mistral** β€” Better Arabic, slightly slower
492
- - **neural-chat** β€” Good balance
493
-
494
- ```bash
495
- ollama pull llama2
496
- OLLAMA_MODEL=llama2 python main.py
497
- ```
498
-
499
- #### For Production (HuggingFace)
500
- - **Qwen/Qwen2-7B-Instruct** β€” Strong Arabic, 7B params
501
- - **mistralai/Mistral-7B-Instruct-v0.2** β€” Very capable
502
- - **meta-llama/Llama-2-13b-chat-hf** β€” Larger, better quality (requires HF token)
503
-
504
- ```bash
505
- HF_MODEL_NAME=Qwen/Qwen2-7B-Instruct python main.py
506
- ```
507
-
508
- ---
509
-
510
- ## Troubleshooting
511
-
512
- ### Issue: "Service is still initialising"
513
-
514
- **Solution:** Wait 60-90 seconds for embedding model to load. Check logs:
515
- ```bash
516
- tail -f <logfile>
517
- ```
518
-
519
- ### Issue: Low retrieval scores
520
-
521
- **Cause:** Queries don't match dataset language better
522
-
523
- **Solution:**
524
- 1. Check `/debug/scores` output
525
- 2. Ensure query is in Arabic or clear English
526
- 3. Try synonyms (e.g., "mercy" vs "compassion")
527
- 4. Lower `CONFIDENCE_THRESHOLD` in `.env`
528
-
529
- ### Issue: LLM model not found (HF backend)
530
-
531
- **Solution:**
532
- ```bash
533
- huggingface-cli login
534
- export HF_TOKEN=<your_token>
535
- ```
536
-
537
- ### Issue: Out of memory
538
-
539
- **Solution:**
540
- - Use `OLLAMA_MODEL=neural-chat` (smaller)
541
- - Set `HF_DEVICE=cpu` (slower but uses RAM instead of VRAM)
542
- - Reduce `TOP_K_SEARCH` in `.env`
543
-
544
- ---
545
-
546
- ## Production Checklist
547
-
548
- - [ ] Test with at least 10 representative queries
549
- - [ ] Verify `/debug/scores` on low-confidence queries
550
- - [ ] Adjust `CONFIDENCE_THRESHOLD` to acceptable false-positive rate
551
- - [ ] Set `ALLOWED_ORIGINS` to your domain only (security)
552
- - [ ] Use production-grade LLM model (Qwen 7B+ or Mistral)
553
- - [ ] Set `TEMPERATURE=0.1` for maximum consistency
554
- - [ ] Monitor first 100 queries for quality
555
- - [ ] Enable access logging and error tracking
556
-
557
- ---
558
-
559
- ## Architecture Files
560
-
561
- - **main.py** β€” Core API + RAG pipeline (LLM backend abstraction, retrieval, generation)
562
- - **build_index.py** β€” FAISS index generation from metadata
563
- - **enrich_dataset.py** β€” Dataset enrichment script (fetch hadith collections, deduplicate)
564
- - **metadata.json** β€” Combined dataset: 6,236 Quran verses + 41,390 hadiths
565
- - **QModel.index** β€” FAISS vector index (pre-built, ready to use)
566
- - **ARCHITECTURE.md** β€” Detailed system design
567
- - **requirements.txt** β€” Python dependencies
568
-
569
- ---
570
-
571
- ## Next Steps
572
-
573
- After setup, consider:
574
- 1. Grade filtering: Try `?grade_filter=sahih` for authenticated-only results
575
- 2. Source filtering: Use `?source_type=quran` vs `?source_type=hadith`
576
- 3. Batch processing: Add endpoint for multiple questions
577
- 4. Webhook integration: Stream answers as they generate
578
- 5. Caching improvements: Persistent Redis cache for production
579
-
580
- ---
581
-
582
- ## Support
583
-
584
- For issues:
585
- 1. Check logs: `python main.py` (stdout)
586
- 2. Test endpoints: http://localhost:8000/docs
587
- 3. Review `/debug/scores` for retrieval quality
588
- 4. Check `.env` configuration
589
-
590
- Happy querying! πŸ•Œ
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/routers/chat.py CHANGED
@@ -1,16 +1,18 @@
1
- """Chat / inference endpoints β€” OpenAI-compatible."""
2
 
3
  from __future__ import annotations
4
 
5
  import json
6
  import logging
7
  import time
 
8
 
9
- from fastapi import APIRouter, HTTPException
10
  from fastapi.responses import StreamingResponse
11
 
12
  from app.config import cfg
13
  from app.models import (
 
14
  ChatCompletionChoice,
15
  ChatCompletionMessage,
16
  ChatCompletionRequest,
@@ -23,6 +25,45 @@ logger = logging.getLogger("qmodel.chat")
23
  router = APIRouter(tags=["inference"])
24
 
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  # ───────────────────────────────────────────────────────
27
  # POST /v1/chat/completions β€” OpenAI-compatible
28
  # ───────────────────────────────────────────────────────
 
1
+ """Chat / inference endpoints β€” OpenAI-compatible + convenience /ask."""
2
 
3
  from __future__ import annotations
4
 
5
  import json
6
  import logging
7
  import time
8
+ from typing import Literal, Optional
9
 
10
+ from fastapi import APIRouter, HTTPException, Query
11
  from fastapi.responses import StreamingResponse
12
 
13
  from app.config import cfg
14
  from app.models import (
15
+ AskResponse,
16
  ChatCompletionChoice,
17
  ChatCompletionMessage,
18
  ChatCompletionRequest,
 
25
  router = APIRouter(tags=["inference"])
26
 
27
 
28
+ # ───────────────────────────────────────────────────────
29
+ # GET /ask β€” convenience RAG query endpoint
30
+ # ───────────────────────────────────────────────────────
31
+ @router.get("/ask", response_model=AskResponse)
32
+ async def ask(
33
+ q: str = Query(..., min_length=1, max_length=500, description="Your Islamic question"),
34
+ top_k: int = Query(5, ge=1, le=20, description="Number of sources to retrieve"),
35
+ source_type: Optional[Literal["quran", "hadith"]] = Query(None, description="Filter: quran | hadith"),
36
+ grade_filter: Optional[str] = Query(None, description="Hadith grade filter: sahih | hasan"),
37
+ ):
38
+ """Direct RAG query with full source attribution.
39
+
40
+ Returns an AI-generated answer grounded in Quran and Hadith sources,
41
+ with language detection, intent classification, and scored references.
42
+ """
43
+ check_ready()
44
+ result = await run_rag_pipeline(q, top_k=top_k, source_type=source_type, grade_filter=grade_filter)
45
+ return AskResponse(
46
+ question=q,
47
+ answer=result["answer"],
48
+ language=result["language"],
49
+ intent=result["intent"],
50
+ analysis=result.get("analysis"),
51
+ sources=[
52
+ {
53
+ "source": s.get("source") or s.get("reference", ""),
54
+ "type": s.get("type", ""),
55
+ "grade": s.get("grade"),
56
+ "arabic": s.get("arabic", ""),
57
+ "english": s.get("english", ""),
58
+ "_score": round(s.get("_score", 0), 4),
59
+ }
60
+ for s in result.get("sources", [])
61
+ ],
62
+ top_score=round(result["top_score"], 4),
63
+ latency_ms=result["latency_ms"],
64
+ )
65
+
66
+
67
  # ───────────────────────────────────────────────────────
68
  # POST /v1/chat/completions β€” OpenAI-compatible
69
  # ───────────────────────────────────────────────────────
app/routers/ops.py CHANGED
@@ -1,14 +1,16 @@
1
- """Operational endpoints β€” health, models."""
2
 
3
  from __future__ import annotations
4
 
5
  import time
 
6
 
7
- from fastapi import APIRouter
8
 
9
  from app.config import cfg
10
  from app.models import ModelInfo, ModelsListResponse
11
- from app.state import state
 
12
 
13
  router = APIRouter(tags=["ops"])
14
 
@@ -18,7 +20,7 @@ def health():
18
  """Health check endpoint."""
19
  return {
20
  "status": "ok" if state.ready else "initialising",
21
- "version": "5.0.0",
22
  "llm_backend": cfg.LLM_BACKEND,
23
  "dataset_size": len(state.dataset) if state.dataset else 0,
24
  "faiss_total": state.faiss_index.ntotal if state.faiss_index else 0,
@@ -35,3 +37,40 @@ def list_models():
35
  ModelInfo(id="qmodel", created=int(time.time()), owned_by="elgendy"),
36
  ]
37
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Operational endpoints β€” health, models, debug."""
2
 
3
  from __future__ import annotations
4
 
5
  import time
6
+ from typing import Literal, Optional
7
 
8
+ from fastapi import APIRouter, Query
9
 
10
  from app.config import cfg
11
  from app.models import ModelInfo, ModelsListResponse
12
+ from app.search import hybrid_search, rewrite_query
13
+ from app.state import check_ready, state
14
 
15
  router = APIRouter(tags=["ops"])
16
 
 
20
  """Health check endpoint."""
21
  return {
22
  "status": "ok" if state.ready else "initialising",
23
+ "version": "6.0.0",
24
  "llm_backend": cfg.LLM_BACKEND,
25
  "dataset_size": len(state.dataset) if state.dataset else 0,
26
  "faiss_total": state.faiss_index.ntotal if state.faiss_index else 0,
 
37
  ModelInfo(id="qmodel", created=int(time.time()), owned_by="elgendy"),
38
  ]
39
  )
40
+
41
+
42
+ @router.get("/debug/scores", tags=["debug"])
43
+ async def debug_scores(
44
+ q: str = Query(..., min_length=1, max_length=500, description="Query to inspect"),
45
+ top_k: int = Query(10, ge=1, le=50, description="Number of results"),
46
+ source_type: Optional[Literal["quran", "hadith"]] = Query(None, description="Filter: quran | hadith"),
47
+ ):
48
+ """Inspect raw retrieval scores without calling the LLM.
49
+
50
+ Use this to calibrate CONFIDENCE_THRESHOLD and debug search quality.
51
+ """
52
+ check_ready()
53
+ rewrite = await rewrite_query(q, state.llm)
54
+ results = await hybrid_search(
55
+ q, rewrite,
56
+ state.embed_model, state.faiss_index, state.dataset,
57
+ top_n=top_k, source_type=source_type,
58
+ )
59
+ return {
60
+ "query": q,
61
+ "intent": rewrite.get("intent", "general"),
62
+ "threshold": cfg.CONFIDENCE_THRESHOLD,
63
+ "count": len(results),
64
+ "results": [
65
+ {
66
+ "rank": i + 1,
67
+ "source": r.get("source") or r.get("reference", ""),
68
+ "type": r.get("type", ""),
69
+ "grade": r.get("grade"),
70
+ "_dense": round(r.get("_dense", 0), 4),
71
+ "_sparse": round(r.get("_sparse", 0), 4),
72
+ "_score": round(r.get("_score", 0), 4),
73
+ }
74
+ for i, r in enumerate(results)
75
+ ],
76
+ }
main.py CHANGED
@@ -33,7 +33,7 @@ logging.basicConfig(
33
 
34
  from app.config import cfg
35
  from app.state import lifespan
36
- from app.routers import chat, ops
37
 
38
  # ═══════════════════════════════════════════════════════════════════════
39
  # FASTAPI APP
@@ -47,7 +47,7 @@ app = FastAPI(
47
  "- Streaming support\n"
48
  "- Islamic knowledge RAG pipeline"
49
  ),
50
- version="5.0.0",
51
  lifespan=lifespan,
52
  )
53
 
@@ -62,6 +62,8 @@ app.add_middleware(
62
  # Register routers
63
  app.include_router(ops.router)
64
  app.include_router(chat.router)
 
 
65
 
66
 
67
  if __name__ == "__main__":
 
33
 
34
  from app.config import cfg
35
  from app.state import lifespan
36
+ from app.routers import chat, hadith, ops, quran
37
 
38
  # ═══════════════════════════════════════════════════════════════════════
39
  # FASTAPI APP
 
47
  "- Streaming support\n"
48
  "- Islamic knowledge RAG pipeline"
49
  ),
50
+ version="6.0.0",
51
  lifespan=lifespan,
52
  )
53
 
 
62
  # Register routers
63
  app.include_router(ops.router)
64
  app.include_router(chat.router)
65
+ app.include_router(quran.router)
66
+ app.include_router(hadith.router)
67
 
68
 
69
  if __name__ == "__main__":