Spaces:

obumdukor
/

eye-wiki

Runtime error

App Files Files Community

stanleydukor commited on Jan 26

Commit

702ea87

1 Parent(s): 1c61c6e

Initial deployment

Browse files

Files changed (47) hide show

.gitignore +162 -0
CHANGELOG.md +95 -0
DOCKER.md +574 -0
Makefile +94 -0
README.md +1024 -9
config/__init__.py +1 -0
config/settings.py +229 -0
data/processed/.gitkeep +0 -0
data/raw/.gitkeep +0 -0
data/vectorstore/.gitkeep +0 -0
deployment_readme.md +151 -0
docker-compose.yml +80 -0
plan/implementation_plan.md +62 -0
prompts/medical_disclaimer.txt +1 -0
prompts/query_prompt.txt +21 -0
prompts/system_prompt.txt +87 -0
pytest.ini +26 -0
requirements.txt +47 -0
scripts/build_index.py +683 -0
scripts/evaluate.py +696 -0
scripts/run_server.py +479 -0
scripts/scrape_eyewiki.py +278 -0
src/__init__.py +0 -0
src/api/__init__.py +5 -0
src/api/gradio_ui.py +548 -0
src/api/main.py +627 -0
src/llm/__init__.py +0 -0
src/llm/llm_client.py +66 -0
src/llm/ollama_client.py +512 -0
src/llm/openai_client.py +187 -0
src/llm/sentence_transformer_client.py +161 -0
src/processing/__init__.py +0 -0
src/processing/chunker.py +423 -0
src/processing/metadata_extractor.py +433 -0
src/rag/__init__.py +0 -0
src/rag/query_engine.py +537 -0
src/rag/reranker.py +293 -0
src/rag/retriever.py +483 -0
src/scraper/__init__.py +0 -0
src/scraper/eyewiki_crawler.py +489 -0
src/vectorstore/__init__.py +0 -0
src/vectorstore/qdrant_store.py +587 -0
tests/README.md +172 -0
tests/__init__.py +0 -0
tests/conftest.py +24 -0
tests/test_components.py +699 -0
tests/test_questions.json +245 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,162 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+PIPFILE.lock
+# PyInstaller
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+.python-version
+# pipenv
+Pipfile.lock
+# poetry
+poetry.lock
+# pdm
+.pdm.toml
+# PEP 582
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+# Project specific
+data/raw/*
+!data/raw/.gitkeep
+data/processed/*
+!data/processed/.gitkeep
+data/vectorstore/*
+!data/vectorstore/.gitkeep
+# Model files
+*.bin
+*.onnx
+*.pt
+*.pth
+# Logs
+logs/
+*.log

CHANGELOG.md ADDED Viewed

	@@ -0,0 +1,95 @@

+# Changelog
+## [2.0.0] - 2026-01-05
+### Major Improvements
+#### Gradio UI Enhancements
+- **Fixed HTML rendering issue**: Changed from HTML badges to clean emoji-based confidence indicators
+  - High Confidence: ✅ (≥70%)
+  - Medium Confidence: ⚠️ (50-69%)
+  - Low Confidence: ⚡ (<50%)
+- **Improved message formatting**: Removed raw HTML display in chat interface
+- **Cleaner disclaimers**: Updated medical disclaimer to be more concise
+#### Content Updates
+- **Removed "educational purposes" language** across all files:
+  - Updated system prompts
+  - Updated medical disclaimers
+  - Updated README
+  - Updated UI text
+- **Streamlined medical disclaimers**: More professional, less verbose
+#### Bug Fixes
+- **Fixed Ollama GPU support**: Configured Ollama to use RTX 5090 GPU instead of CPU
+  - Added GPU initialization script
+  - Set proper CUDA environment variables
+  - Verified VRAM usage (4.79 GB on GPU)
+  - Performance improvement: ~10-50x faster inference
+- **Fixed Qdrant API compatibility**: Updated to qdrant-client v1.16.1 API
+  - Changed from `client.search()` to `client.query_points()`
+  - Added `using="dense"` parameter for named vectors
+  - Fixed both search and hybrid_search methods
+- **Fixed Pydantic validation errors**:
+  - Removed `ge=0.0` constraint from `RetrievalResult.score` (cross-encoder scores can be negative)
+  - Removed `ge=0.0, le=1.0` constraints from `SourceInfo.relevance_score`
+- **Fixed QdrantStoreManager initialization**:
+  - Changed `vector_size` → `embedding_dim`
+  - Changed `qdrant_path` → `path`
+  - Use `embedding_client.embedding_dim` instead of non-existent settings attribute
+- **Added missing Settings attributes**:
+  - `ollama_timeout` (default: 30)
+  - `reranker_model` (default: "cross-encoder/ms-marco-MiniLM-L-6-v2")
+  - `max_context_tokens` (default: 4096)
+- **Fixed OllamaClient embedding model verification**:
+  - Skip embedding model verification when `embedding_model=None`
+  - Prevents false errors when using SentenceTransformerClient for embeddings
+#### Code Cleanup
+- Removed unnecessary comments and annotations
+- Cleaned up fix-related comments
+- Improved code documentation
+- Removed redundant validation constraints
+### Technical Details
+#### Performance
+- **GPU Acceleration**: Full GPU support for Ollama (RTX 5090)
+- **Model Loading**: 4.79 GB VRAM usage confirmed
+- **Faster Inference**: Significant speedup from CPU to GPU
+#### API Changes
+- Qdrant API updated to v1.16.1 syntax
+- Improved error handling for cross-encoder scores
+- Better validation for unbounded reranker scores
+#### Configuration
+- New environment variables for Ollama GPU support:
+  - `CUDA_VISIBLE_DEVICES=0`
+  - `OLLAMA_NUM_PARALLEL=1`
+  - `OLLAMA_MAX_LOADED_MODELS=1`
+### Files Modified
+- `src/api/gradio_ui.py` - UI improvements and HTML rendering fix
+- `src/api/main.py` - Fixed initialization parameters
+- `src/rag/query_engine.py` - Updated disclaimers and validation
+- `src/rag/retriever.py` - Removed score constraints
+- `src/vectorstore/qdrant_store.py` - Updated Qdrant API calls
+- `src/llm/ollama_client.py` - Fixed embedding model handling
+- `config/settings.py` - Added missing configuration fields
+- `prompts/medical_disclaimer.txt` - Removed educational language
+- `prompts/system_prompt.txt` - Streamlined instructions
+- `README.md` - Updated disclaimers and documentation
+### Breaking Changes
+- None - all changes are backward compatible
+### Upgrade Notes
+1. Restart Ollama with GPU support using provided script
+2. Clear Python cache if experiencing import issues
+3. Verify GPU usage with `curl -s http://localhost:11434/api/ps | python3 -m json.tool`

DOCKER.md ADDED Viewed

	@@ -0,0 +1,574 @@

+# Docker Deployment Guide
+Complete guide for deploying EyeWiki RAG using Docker.
+## 📋 Table of Contents
+- [Prerequisites](#prerequisites)
+- [Quick Start](#quick-start)
+- [Architecture](#architecture)
+- [Configuration](#configuration)
+- [Operations](#operations)
+- [Troubleshooting](#troubleshooting)
+- [Production](#production)
+## Prerequisites
+### Required Software
+- **Docker** 20.10+ ([Install Docker](https://docs.docker.com/get-docker/))
+- **Docker Compose** 2.0+ ([Install Compose](https://docs.docker.com/compose/install/))
+- **Ollama** running on host ([Install Ollama](https://ollama.ai/download))
+### System Requirements
+- 8GB+ RAM allocated to Docker
+- 20GB+ disk space
+- CPU: 4+ cores recommended
+- GPU: Optional, for faster processing
+### Verify Installation
+```bash
+docker --version
+docker-compose --version
+ollama --version
+```
+## Quick Start
+### 1. Prepare Ollama (Host Machine)
+```bash
+# Start Ollama service
+ollama serve
+# Pull required models
+ollama pull nomic-embed-text  # ~270MB
+ollama pull mistral           # ~4.1GB
+# Verify models
+ollama list
+```
+### 2. Build and Start Services
+```bash
+# Clone repository
+git clone <repo-url>
+cd eyewiki-rag
+# Build images
+docker-compose build
+# Start services
+docker-compose up -d
+# Check status
+docker-compose ps
+```
+### 3. Verify Services
+```bash
+# Check API health
+curl http://localhost:8000/health
+# Check Qdrant
+curl http://localhost:6333/
+# View logs
+docker-compose logs -f
+```
+### 4. Access Services
+- **API**: http://localhost:8000
+- **Gradio UI**: http://localhost:8000/ui
+- **API Docs**: http://localhost:8000/docs
+- **Qdrant Dashboard**: http://localhost:6333/dashboard
+## Architecture
+### Container Network
+```
+┌─────────────────────────────────────────────┐
+│              Host Machine                    │
+│  ┌──────────────────────────────────────┐   │
+│  │   Ollama (GPU Access)                │   │
+│  │   - Port: 11434                      │   │
+│  │   - Models: mistral, nomic-embed     │   │
+│  └────────────┬─────────────────────────┘   │
+│               │                              │
+│  ┌────────────▼─────────────────────────┐   │
+│  │      Docker Network                   │   │
+│  │  ┌─────────────────────────────────┐ │   │
+│  │  │   eyewiki-rag (API Server)      │ │   │
+│  │  │   - Port: 8000                  │ │   │
+│  │  │   - Connects to Ollama via      │ │   │
+│  │  │     host.docker.internal        │ │   │
+│  │  └─────────────┬───────────────────┘ │   │
+│  │                │                     │   │
+│  │  ┌─────────────▼───────────────────┐ │   │
+│  │  │   qdrant (Vector DB)            │ │   │
+│  │  │   - Ports: 6333, 6334           │ │   │
+│  │  │   - Persistent volume           │ │   │
+│  │  └─────────────────────────────────┘ │   │
+│  └──────────────────────────────────────┘   │
+└─────────────────────────────────────────────┘
+```
+### Data Flow
+1. **User Request** → API Container (port 8000)
+2. **Query Engine** → Qdrant Container (vector search)
+3. **Embedding** → Ollama on Host (via host.docker.internal)
+4. **LLM Generation** → Ollama on Host
+5. **Response** → User
+### Volumes
+| Volume | Path | Purpose |
+|--------|------|---------|
+| `./data/raw` | `/app/data/raw` | Scraped content |
+| `./data/processed` | `/app/data/processed` | Chunked documents |
+| `qdrant_data` | `/app/data/qdrant` | Vector database |
+| `./prompts` | `/app/prompts` | Customizable prompts |
+## Configuration
+### Environment Variables
+Edit `docker-compose.yml`:
+```yaml
+environment:
+  # Ollama Configuration
+  - OLLAMA_BASE_URL=http://host.docker.internal:11434
+  - LLM_MODEL=mistral
+  - EMBEDDING_MODEL=nomic-embed-text
+  - OLLAMA_TIMEOUT=120
+  # Qdrant Configuration
+  - QDRANT_HOST=qdrant
+  - QDRANT_PORT=6333
+  - QDRANT_COLLECTION_NAME=eyewiki_rag
+  - QDRANT_PATH=/app/data/qdrant
+  # Processing Configuration
+  - CHUNK_SIZE=512
+  - CHUNK_OVERLAP=50
+  - MIN_CHUNK_SIZE=100
+  - MAX_CONTEXT_TOKENS=4000
+  # Retrieval Configuration
+  - RETRIEVAL_K=20
+  - RERANK_K=5
+  - RERANKER_MODEL=ms-marco-MiniLM-L-6-v2
+```
+### Custom Prompts
+Edit files in `./prompts/` directory (mounted into container):
+- `system_prompt.txt`
+- `query_prompt.txt`
+- `medical_disclaimer.txt`
+Changes take effect on container restart.
+### Resource Limits
+Add to service in `docker-compose.yml`:
+```yaml
+deploy:
+  resources:
+    limits:
+      cpus: '4'
+      memory: 8G
+    reservations:
+      cpus: '2'
+      memory: 4G
+```
+## Operations
+### Makefile Commands
+```bash
+# Service Management
+make up              # Start services
+make down            # Stop services
+make restart         # Restart services
+make ps              # Show status
+make logs            # View all logs
+make logs-api        # View API logs only
+make logs-qdrant     # View Qdrant logs only
+# Health & Monitoring
+make health          # Check service health
+make stats           # Show resource usage
+# Data Operations
+make scrape          # Run scraper
+make build-index     # Build vector index
+make evaluate        # Run evaluation
+make test            # Run tests
+# Maintenance
+make clean           # Remove containers & volumes
+make rebuild         # Clean rebuild
+make backup-qdrant   # Backup vector DB
+make restore-qdrant  # Restore from backup
+# Development
+make exec-api        # Bash into API container
+make exec-qdrant     # Shell into Qdrant container
+```
+### Manual Commands
+#### Start Services
+```bash
+docker-compose up -d
+```
+#### Stop Services
+```bash
+docker-compose down
+```
+#### View Logs
+```bash
+# All services
+docker-compose logs -f
+# Specific service
+docker-compose logs -f eyewiki-rag
+docker-compose logs -f qdrant
+# Last N lines
+docker-compose logs --tail=100 -f
+```
+#### Execute Commands in Container
+```bash
+# Run scraper
+docker-compose exec eyewiki-rag \
+  python scripts/scrape_eyewiki.py --max-pages 100
+# Build index
+docker-compose exec eyewiki-rag \
+  python scripts/build_index.py --index-vectors
+# Run evaluation
+docker-compose exec eyewiki-rag \
+  python scripts/evaluate.py -v
+# Run tests
+docker-compose exec eyewiki-rag pytest tests/ -v
+# Interactive shell
+docker-compose exec eyewiki-rag bash
+```
+#### Inspect Services
+```bash
+# Container status
+docker-compose ps
+# Resource usage
+docker stats eyewiki-rag-api eyewiki-qdrant
+# Network info
+docker network inspect eyewiki-network
+# Volume info
+docker volume ls
+docker volume inspect eyewiki-rag_qdrant_data
+```
+### Data Management
+#### Backup Qdrant
+```bash
+# Using Makefile
+make backup-qdrant
+# Manual
+docker-compose exec qdrant tar -czf /tmp/backup.tar.gz /qdrant/storage
+docker cp eyewiki-qdrant:/tmp/backup.tar.gz ./backups/qdrant-$(date +%Y%m%d).tar.gz
+```
+#### Restore Qdrant
+```bash
+# Stop services
+docker-compose down
+# Restore backup
+docker-compose up -d qdrant
+docker cp ./backups/qdrant-20241209.tar.gz eyewiki-qdrant:/tmp/backup.tar.gz
+docker-compose exec qdrant tar -xzf /tmp/backup.tar.gz -C /
+# Restart all services
+docker-compose up -d
+```
+#### Clear Data
+```bash
+# Remove all data and volumes
+docker-compose down -v
+# Remove only processed data
+rm -rf data/processed/*
+rm -rf data/qdrant/*
+```
+## Troubleshooting
+### Cannot Connect to Ollama
+**Symptoms:**
+- `ConnectionError: Failed to connect to Ollama`
+- 503 errors on API startup
+**Solutions:**
+1. **Verify Ollama is running:**
+```bash
+curl http://localhost:11434/api/tags
+```
+2. **On Linux, add to docker-compose.yml:**
+```yaml
+extra_hosts:
+  - "host.docker.internal:host-gateway"
+```
+3. **Use host IP instead:**
+```bash
+# Get host IP
+ip addr show docker0 | grep inet
+# Update OLLAMA_BASE_URL
+OLLAMA_BASE_URL=http://172.17.0.1:11434
+```
+### Qdrant Permission Errors
+**Symptoms:**
+- Permission denied errors in Qdrant logs
+- Cannot write to volume
+**Solution:**
+```bash
+# Fix permissions
+sudo chown -R 1000:1000 data/qdrant/
+# Or recreate volume
+docker-compose down -v
+docker-compose up -d
+```
+### Out of Memory
+**Symptoms:**
+- Container killed (exit code 137)
+- Slow performance
+**Solutions:**
+1. **Increase Docker memory:**
+   - Docker Desktop: Settings → Resources → Memory → 8GB+
+2. **Add resource limits:**
+```yaml
+deploy:
+  resources:
+    limits:
+      memory: 8G
+```
+3. **Use smaller models:**
+```bash
+ollama pull llama3.2:3b  # Instead of mistral
+```
+### Port Already in Use
+**Symptoms:**
+- `Bind for 0.0.0.0:8000 failed: port is already allocated`
+**Solutions:**
+1. **Find and kill process:**
+```bash
+lsof -i :8000
+kill <PID>
+```
+2. **Change port in docker-compose.yml:**
+```yaml
+ports:
+  - "8080:8000"  # Use 8080 instead
+```
+### Slow Performance
+**Solutions:**
+1. **Reduce batch sizes:**
+```yaml
+environment:
+  - RETRIEVAL_K=10  # Instead of 20
+  - RERANK_K=3      # Instead of 5
+```
+2. **Allocate more resources:**
+```yaml
+deploy:
+  resources:
+    limits:
+      cpus: '4'
+      memory: 8G
+```
+3. **Use GPU for Ollama** (on host)
+## Production
+### Production Configuration
+Create `docker-compose.prod.yml`:
+```yaml
+version: '3.8'
+services:
+  eyewiki-rag:
+    restart: always
+    deploy:
+      resources:
+        limits:
+          cpus: '4'
+          memory: 8G
+        reservations:
+          cpus: '2'
+          memory: 4G
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "100m"
+        max-file: "5"
+    environment:
+      - LOG_LEVEL=WARNING
+    healthcheck:
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+  qdrant:
+    restart: always
+    deploy:
+      resources:
+        limits:
+          cpus: '2'
+          memory: 4G
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "50m"
+        max-file: "3"
+```
+### Start Production
+```bash
+# Use production config
+docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
+# Or use Makefile
+make prod
+```
+### Monitoring
+```bash
+# Watch container status
+watch docker-compose ps
+# Monitor resources
+docker stats --no-stream eyewiki-rag-api eyewiki-qdrant
+# Check logs
+docker-compose logs --tail=100 -f
+# Test health endpoints
+watch curl -s http://localhost:8000/health
+```
+### Backup Strategy
+```bash
+# Daily backup script (add to cron)
+#!/bin/bash
+BACKUP_DIR="/backups/eyewiki-rag"
+DATE=$(date +%Y%m%d)
+# Backup Qdrant
+make backup-qdrant
+# Backup configuration
+tar -czf $BACKUP_DIR/config-$DATE.tar.gz \
+  docker-compose.yml prompts/ data/raw/
+# Keep last 7 days
+find $BACKUP_DIR -name "*.tar.gz" -mtime +7 -delete
+```
+### Update Strategy
+```bash
+# 1. Backup current state
+make backup-qdrant
+# 2. Pull latest code
+git pull origin main
+# 3. Rebuild images
+docker-compose build --no-cache
+# 4. Restart services with zero downtime
+docker-compose up -d --no-deps --build eyewiki-rag
+# 5. Verify health
+make health
+```
+## Best Practices
+### Security
+- Use environment files for secrets
+- Don't expose unnecessary ports
+- Run as non-root user (add to Dockerfile)
+- Keep base images updated
+- Use Docker secrets for production
+### Performance
+- Allocate sufficient memory (8GB+)
+- Use volume for Qdrant data
+- Monitor resource usage
+- Scale horizontally if needed
+### Maintenance
+- Regular backups
+- Monitor logs for errors
+- Update dependencies
+- Prune unused images/volumes
+### Development
+- Use `docker-compose.override.yml` for local config
+- Mount source code as volume for hot reload
+- Keep production and development configs separate
+---
+For more information, see the main [README.md](README.md).

Makefile ADDED Viewed

	@@ -0,0 +1,94 @@

+# EyeWiki RAG System - Makefile for Docker operations
+.PHONY: help build up down restart logs ps clean test
+help: ## Show this help message
+	@echo "EyeWiki RAG System - Docker Commands"
+	@echo ""
+	@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-20s\033[0m %s\n", $$1, $$2}'
+build: ## Build Docker images
+	docker-compose build
+up: ## Start all services
+	docker-compose up -d
+	@echo "Services starting..."
+	@echo "API: http://localhost:8000"
+	@echo "Gradio UI: http://localhost:8000/ui"
+	@echo "API Docs: http://localhost:8000/docs"
+	@echo "Qdrant: http://localhost:6333/dashboard"
+down: ## Stop all services
+	docker-compose down
+restart: ## Restart all services
+	docker-compose restart
+logs: ## View logs from all services
+	docker-compose logs -f
+logs-api: ## View API logs only
+	docker-compose logs -f eyewiki-rag
+logs-qdrant: ## View Qdrant logs only
+	docker-compose logs -f qdrant
+ps: ## Show running containers
+	docker-compose ps
+health: ## Check health of services
+	@echo "Checking Qdrant..."
+	@curl -s http://localhost:6333/healthz || echo "Qdrant not healthy"
+	@echo "\nChecking API..."
+	@curl -s http://localhost:8000/health | python -m json.tool || echo "API not healthy"
+exec-api: ## Execute bash in API container
+	docker-compose exec eyewiki-rag bash
+exec-qdrant: ## Execute bash in Qdrant container
+	docker-compose exec qdrant /bin/sh
+clean: ## Remove all containers, volumes, and images
+	docker-compose down -v
+	docker rmi eyewiki-rag_eyewiki-rag 2>/dev/null || true
+clean-volumes: ## Remove only volumes (keeps images)
+	docker-compose down -v
+rebuild: clean build up ## Clean rebuild and start
+test: ## Run tests in container
+	docker-compose exec eyewiki-rag pytest tests/ -v
+scrape: ## Run scraper in container (example: make scrape ARGS="--max-pages 50")
+	docker-compose exec eyewiki-rag python scripts/scrape_eyewiki.py $(ARGS)
+build-index: ## Build vector index in container
+	docker-compose exec eyewiki-rag python scripts/build_index.py --index-vectors
+evaluate: ## Run evaluation in container
+	docker-compose exec eyewiki-rag python scripts/evaluate.py
+stats: ## Show system statistics
+	@echo "Docker stats:"
+	docker stats --no-stream eyewiki-rag-api eyewiki-qdrant
+	@echo "\nDisk usage:"
+	docker system df
+backup-qdrant: ## Backup Qdrant data
+	docker-compose exec qdrant tar -czf /tmp/qdrant-backup.tar.gz /qdrant/storage
+	docker cp eyewiki-qdrant:/tmp/qdrant-backup.tar.gz ./backups/qdrant-backup-$$(date +%Y%m%d-%H%M%S).tar.gz
+	@echo "Backup saved to ./backups/"
+restore-qdrant: ## Restore Qdrant data (usage: make restore-qdrant BACKUP=backups/file.tar.gz)
+	docker cp $(BACKUP) eyewiki-qdrant:/tmp/qdrant-backup.tar.gz
+	docker-compose exec qdrant tar -xzf /tmp/qdrant-backup.tar.gz -C /
+prod: ## Start in production mode (detached, with restart policy)
+	docker-compose up -d --remove-orphans
+	@echo "Production services started"
+dev: ## Start in development mode (with logs)
+	docker-compose up
+.DEFAULT_GOAL := help

README.md CHANGED Viewed

@@ -1,11 +1,1026 @@
----
-title: Eye Wiki
-emoji: 📊
-colorFrom: gray
-colorTo: red
-sdk: docker
-pinned: false
-short_description: Eye Wiki RAG
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🏥 EyeWiki RAG System
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+A production-ready Retrieval-Augmented Generation (RAG) system for ophthalmology knowledge, powered by EyeWiki content and local LLMs.
+## 📋 Overview
+The EyeWiki RAG system provides intelligent question-answering capabilities for ophthalmology topics by combining:
+- **Web scraping** of authoritative EyeWiki content
+- **Semantic search** with hybrid retrieval (dense + sparse)
+- **Cross-encoder reranking** for precision
+- **Local LLM inference** via Ollama for privacy and control
+- **RESTful API** with interactive web UI
+Built for medical professionals, researchers, and students seeking quick, evidence-based answers to ophthalmology questions.
+## ✨ Features
+### Core Capabilities
+- 🔍 **Intelligent Retrieval**: Hybrid search combining dense embeddings and sparse BM25
+- 🎯 **Precise Reranking**: Cross-encoder models for relevance scoring
+- 🏠 **Local Processing**: All data stays on your machine (HIPAA-friendly)
+- 📚 **Source Citations**: Every answer includes EyeWiki article references
+- ⚡ **Streaming Responses**: Real-time answer generation
+- 🌐 **Web Interface**: Beautiful Gradio UI for easy interaction
+- 🔌 **REST API**: Programmatic access with FastAPI
+- ✅ **Comprehensive Testing**: 25+ pytest tests with mocking
+### Technical Highlights
+- **Polite Web Scraping**: Respects robots.txt and implements rate limiting
+- **Smart Chunking**: Hierarchical markdown splitting with section awareness
+- **Metadata Extraction**: Automatic ICD-10 codes, anatomical terms, medications
+- **Vector Store**: Local Qdrant with payload indexing
+- **Medical Disclaimer**: Automatic inclusion in all responses
+## 🏗️ Architecture
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         User Interface                           │
+│  ┌──────────────────┐              ┌─────────────────────────┐  │
+│  │   Gradio Web UI  │              │   REST API (FastAPI)    │  │
+│  │  - Chat interface│              │  - /query               │  │
+│  │  - Examples      │              │  - /query/stream        │  │
+│  │  - Source display│              │  - /health, /stats      │  │
+│  └────────┬─────────┘              └───────────┬─────────────┘  │
+└───────────┼────────────────────────────────────┼────────────────┘
+            │                                    │
+            └────────────────┬───────────────────┘
+                             ▼
+            ┌────────────────────────────────────────┐
+            │       Query Engine (Orchestrator)       │
+            │  - Context assembly                     │
+            │  - Prompt formatting                    │
+            │  - Source diversity                     │
+            └──┬────────────────────────┬────────────┘
+               │                        │
+       ┌───────▼──────┐         ┌──────▼──────────┐
+       │  Retriever   │         │  Ollama Client  │
+       │ (Hybrid)     │         │  - LLM (Mistral)│
+       │ Dense: 0.7   │         │                 │
+       │ Sparse: 0.3  │         │  Sentence-      │
+       └──┬───────────┘         │  Transformers   │
+          │                     │  - Embeddings   │
+          │                     │    (all-mpnet)  │
+          │                     └─────────────────┘
+          │
+  ┌───────▼──────────┐
+  │    Reranker      │
+  │ (CrossEncoder)   │
+  │ ms-marco-MiniLM  │
+  └──┬───────────────┘
+     │
+     ▼
+┌────────────────────────────────────┐
+│       Qdrant Vector Store          │
+│  - Dense vectors (768-dim)         │
+│  - Sparse vectors (BM25)           │
+│  - Metadata filtering              │
+│  - Local storage                   │
+└────────────────────────────────────┘
+```
+**Data Flow:**
+1. **Scraping** → EyeWiki → Raw Markdown
+2. **Processing** → Chunking → Metadata Extraction → JSON
+3. **Indexing** → Embeddings → Vector Store
+4. **Query** → Retrieval → Reranking → LLM → Response
+## 📁 Project Structure
+```
+eyewiki-rag/
+├── src/
+│   ├── scraper/          # Web scraping (crawl4ai)
+│   │   └── eyewiki_crawler.py
+│   ├── processing/       # Document processing
+│   │   ├── chunker.py           # Semantic chunking
+│   │   └── metadata_extractor.py # Medical metadata
+│   ├── vectorstore/      # Vector database
+│   │   └── qdrant_store.py
+│   ├── rag/              # RAG components
+│   │   ├── retriever.py         # Hybrid retrieval
+│   │   ├── reranker.py          # Cross-encoder reranking
+│   │   └── query_engine.py      # Main orchestrator
+│   ├── llm/              # LLM integration
+│   │   ├── ollama_client.py          # Ollama for LLM generation
+│   │   └── sentence_transformer_client.py  # Stable embeddings
+│   ├── api/              # FastAPI server
+│   │   ├── main.py              # API endpoints
+│   │   └── gradio_ui.py         # Web interface
+│   └── config/           # Configuration
+│       └── settings.py
+├── prompts/              # Customizable prompts
+│   ├── system_prompt.txt
+│   ├── query_prompt.txt
+│   └── medical_disclaimer.txt
+├── scripts/              # Utility scripts
+│   ├── scrape_eyewiki.py        # Web scraping
+│   ├── build_index.py           # Index building
+│   ├── run_server.py            # Server startup
+│   └── evaluate.py              # System evaluation
+├── tests/                # Comprehensive test suite
+│   ├── test_components.py       # Component tests
+│   ├── test_questions.json      # Evaluation questions
+│   └── conftest.py
+├── data/                 # Data storage (gitignored)
+│   ├── raw/             # Scraped content
+│   ├── processed/       # Chunked documents
+│   └── qdrant/          # Vector database
+└── requirements.txt      # Python dependencies
+```
+## 📋 Prerequisites
+### Required
+- **Python 3.10+** (tested on 3.10, 3.11)
+- **Ollama** (for local LLM text generation only)
+  - Install: https://ollama.ai/download
+  - Note: Embeddings now use sentence-transformers (more stable)
+- **8GB+ RAM** (16GB recommended for larger datasets)
+- **10GB+ disk space** (for models and vector store)
+### Optional
+- **CUDA-capable GPU** (for faster embedding generation with sentence-transformers)
+- **Docker** (if running Qdrant in container)
+### System Requirements by Component
+| Component | RAM | CPU | GPU | Disk |
+|-----------|-----|-----|-----|------|
+| Scraping | 2GB | 2 cores | No | 500MB |
+| Processing | 4GB | 4 cores | No | 2GB |
+| Indexing | 8GB | 4 cores | Optional | 5GB |
+| API Server | 4GB | 2 cores | Optional | 100MB |
+## 🚀 Quick Start
+### Step 1: Installation
+```bash
+# Clone repository
+git clone <repository-url>
+cd eyewiki-rag
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+# Install Python dependencies
+pip install -r requirements.txt
+# Install system dependencies for Playwright (Linux/WSL only)
+# This installs required shared libraries (libnss3, libnspr4, etc.)
+python -m playwright install-deps
+```
+### Step 2: Install Ollama and LLM Model
+```bash
+# Install Ollama from https://ollama.ai/download
+# Then pull required LLM model:
+ollama pull mistral             # LLM model (4.1GB)
+# or use smaller alternative:
+ollama pull llama3.2:3b         # Smaller LLM (2GB)
+# Note: Embedding model (sentence-transformers) will be auto-downloaded
+# when you first run build_index.py (no Ollama needed for embeddings!)
+```
+### Step 3: Scrape EyeWiki
+```bash
+# Quick test (50 pages, ~5 minutes)
+python scripts/scrape_eyewiki.py --max-pages 50
+# Full crawl (1000+ pages, ~2 hours)
+python scripts/scrape_eyewiki.py --max-pages 1000
+# Resume from checkpoint
+python scripts/scrape_eyewiki.py --resume
+```
+**Output:** `data/raw/*.json` (markdown files with metadata)
+### Step 4: Build Vector Index
+```bash
+# Process documents and build vector index
+python scripts/build_index.py --index-vectors
+# This will:
+# 1. Chunk documents (data/processed/)
+# 2. Extract metadata
+# 3. Generate embeddings using sentence-transformers (all-mpnet-base-v2)
+# 4. Build Qdrant index (data/qdrant/)
+# Optional: Use different embedding model
+python scripts/build_index.py --index-vectors --embedding-model "BAAI/bge-base-en-v1.5"
+```
+**Time:** ~10-30 minutes depending on dataset size
+**Note:** First run will download the embedding model (~400MB for all-mpnet-base-v2)
+### Step 5: Start Server
+```bash
+# Run with pre-flight checks
+python scripts/run_server.py
+# Development mode with hot reload
+python scripts/run_server.py --reload
+# Custom port
+python scripts/run_server.py --port 8080
+```
+### Step 6: Access the System
+**Web Interface:** http://localhost:8000/ui
+- Beautiful chat interface
+- Example questions
+- Source citations
+- Settings sidebar
+**API Docs:** http://localhost:8000/docs
+- Swagger UI
+- Interactive testing
+- Full API documentation
+**Health Check:** http://localhost:8000/health
+### Example Query
+```bash
+curl -X POST http://localhost:8000/query \
+  -H "Content-Type: application/json" \
+  -d '{
+    "question": "What are the symptoms of glaucoma?",
+    "include_sources": true
+  }'
+```
+## 🐳 Docker Deployment
+### Prerequisites
+- **Docker** and **Docker Compose** installed
+- **Ollama** running on host machine (for GPU access)
+- **8GB+ RAM** allocated to Docker
+### Quick Start with Docker
+```bash
+# 1. Ensure Ollama is running on host
+ollama serve
+# 2. Pull required models (on host)
+ollama pull nomic-embed-text
+ollama pull mistral
+# 3. Build and start services
+docker-compose up -d
+# 4. Check status
+docker-compose ps
+# 5. View logs
+docker-compose logs -f eyewiki-rag
+```
+**Access:**
+- API: http://localhost:8000
+- Gradio UI: http://localhost:8000/ui
+- API Docs: http://localhost:8000/docs
+- Qdrant Dashboard: http://localhost:6333/dashboard
+### Using Makefile Commands
+```bash
+# Start services
+make up
+# View logs
+make logs
+# Check health
+make health
+# Run scraper in container
+make scrape ARGS="--max-pages 50"
+# Build index
+make build-index
+# Run evaluation
+make evaluate
+# Stop services
+make down
+# Clean everything
+make clean
+```
+### Docker Compose Services
+**eyewiki-rag** (API Server)
+- Built from Dockerfile
+- Exposes port 8000
+- Connects to Ollama on host via `host.docker.internal`
+- Connects to Qdrant container
+- Mounts data volumes for persistence
+**qdrant** (Vector Database)
+- Official Qdrant image
+- Exposes ports 6333 (REST) and 6334 (gRPC)
+- Persistent volume for vector storage
+- Health checks enabled
+### Volume Management
+**Persistent volumes:**
+- `./data/raw` - Scraped content
+- `./data/processed` - Chunked documents
+- `qdrant_data` - Vector database (Docker volume)
+- `./prompts` - Customizable prompts
+**Backup Qdrant data:**
+```bash
+make backup-qdrant
+# Saves to ./backups/qdrant-backup-YYYYMMDD-HHMMSS.tar.gz
+```
+**Restore Qdrant data:**
+```bash
+make restore-qdrant BACKUP=backups/qdrant-backup-20241209-120000.tar.gz
+```
+### Configuration via Environment Variables
+Edit `docker-compose.yml` to customize:
+```yaml
+environment:
+  # Ollama settings
+  - OLLAMA_BASE_URL=http://host.docker.internal:11434
+  - LLM_MODEL=mistral
+  - EMBEDDING_MODEL=nomic-embed-text
+  # Qdrant settings
+  - QDRANT_HOST=qdrant
+  - QDRANT_PORT=6333
+  # Processing settings
+  - CHUNK_SIZE=512
+  - RETRIEVAL_K=20
+  - RERANK_K=5
+```
+### Running Scripts in Container
+```bash
+# Scrape EyeWiki
+docker-compose exec eyewiki-rag python scripts/scrape_eyewiki.py --max-pages 100
+# Build index
+docker-compose exec eyewiki-rag python scripts/build_index.py --index-vectors
+# Run evaluation
+docker-compose exec eyewiki-rag python scripts/evaluate.py
+# Run tests
+docker-compose exec eyewiki-rag pytest tests/ -v
+```
+### Production Deployment
+```bash
+# Start in production mode
+make prod
+# Or manually:
+docker-compose up -d --remove-orphans
+# Monitor with healthchecks
+watch docker-compose ps
+# View metrics
+docker stats eyewiki-rag-api eyewiki-qdrant
+```
+### Troubleshooting Docker
+**Problem:** Cannot connect to Ollama
+**Solution:**
+```bash
+# Linux: Use host.docker.internal
+# If not working, use host IP:
+docker network inspect eyewiki-network
+# Update OLLAMA_BASE_URL to http://<host-ip>:11434
+# Or on Linux, add to docker-compose.yml:
+extra_hosts:
+  - "host.docker.internal:host-gateway"
+```
+**Problem:** Qdrant volume permission issues
+**Solution:**
+```bash
+# Fix permissions
+sudo chown -R 1000:1000 data/qdrant/
+```
+**Problem:** Out of memory
+**Solution:**
+```bash
+# Increase Docker memory limit in Docker Desktop
+# Or in docker-compose.yml, add:
+deploy:
+  resources:
+    limits:
+      memory: 8G
+```
+### Docker Image Sizes
+| Image | Size | Purpose |
+|-------|------|---------|
+| eyewiki-rag | ~2.5GB | API server with dependencies |
+| qdrant/qdrant | ~200MB | Vector database |
+| **Total** | ~2.7GB | Both services |
+**Note:** Ollama models (~4-5GB) run on host for GPU access.
+## ⚙️ Configuration
+Configuration via `src/config/settings.py` (uses pydantic-settings):
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| **LLM Settings** |
+| `llm_model` | `mistral` | Ollama LLM model name |
+| `ollama_base_url` | `http://localhost:11434` | Ollama API URL |
+| `llm_temperature` | `0.7` | LLM sampling temperature |
+| `llm_max_tokens` | `2048` | Max tokens for LLM response |
+| **Embedding Settings** |
+| `embedding_model` | `all-mpnet-base-v2` | Sentence-transformers model |
+| **Vector Store** |
+| `qdrant_collection_name` | `eyewiki_rag` | Collection name |
+| `qdrant_path` | `./data/vectorstore` | Local storage path |
+| `qdrant_url` | `None` | Remote Qdrant URL (optional) |
+| **Chunking** |
+| `chunk_size` | `512` | Max tokens per chunk |
+| `chunk_overlap` | `50` | Overlap between chunks |
+| `min_chunk_size` | `100` | Minimum chunk size |
+| **Retrieval** |
+| `top_k` | `10` | Initial retrieval count |
+| `rerank_top_k` | `5` | After reranking |
+| `similarity_threshold` | `0.7` | Minimum similarity score |
+| **Scraper** |
+| `scraper_delay` | `1.0` | Delay between requests (seconds) |
+| `scraper_timeout` | `30` | Request timeout (seconds) |
+### Environment Variables
+Create `.env` file to override defaults (see `.env.example`):
+```env
+# Ollama Configuration (for LLM only)
+OLLAMA_BASE_URL=http://localhost:11434
+LLM_MODEL=mistral
+LLM_TEMPERATURE=0.7
+LLM_MAX_TOKENS=2048
+# Embedding Configuration (sentence-transformers)
+EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
+# Qdrant Vector Store
+QDRANT_COLLECTION_NAME=eyewiki_rag
+QDRANT_PATH=./data/vectorstore
+# QDRANT_URL=http://localhost:6333  # For remote Qdrant
+# QDRANT_API_KEY=your-key            # For Qdrant Cloud
+# Document Processing
+CHUNK_SIZE=512
+CHUNK_OVERLAP=50
+MIN_CHUNK_SIZE=100
+# RAG Retrieval
+TOP_K=10
+RERANK_TOP_K=5
+SIMILARITY_THRESHOLD=0.7
+# Web Scraper
+SCRAPER_DELAY=1.0
+SCRAPER_TIMEOUT=30
+# API Server
+API_HOST=0.0.0.0
+API_PORT=8000
+API_WORKERS=4
+# Gradio UI
+GRADIO_HOST=0.0.0.0
+GRADIO_PORT=7860
+GRADIO_SHARE=false
+# Data Paths
+DATA_RAW_PATH=./data/raw
+DATA_PROCESSED_PATH=./data/processed
+# Logging
+LOG_LEVEL=INFO
+LOG_FILE=logs/eyewiki_rag.log
+```
+### Customizing Prompts
+Edit files in `prompts/` directory:
+- `system_prompt.txt` - System instructions for LLM
+- `query_prompt.txt` - Query template with `{context}` and `{question}` placeholders
+- `medical_disclaimer.txt` - Medical disclaimer text
+## 📡 API Documentation
+### Endpoints
+#### `GET /`
+Root endpoint with API information
+#### `GET /health`
+Health check endpoint
+**Response:**
+```json
+{
+  "status": "healthy",
+  "ollama": {"status": "healthy", "models": {...}},
+  "qdrant": {"status": "healthy", "vectors_count": 1234},
+  "query_engine": {"status": "initialized"},
+  "timestamp": 1702134567.89
+}
+```
+#### `POST /query`
+Main query endpoint
+**Request:**
+```json
+{
+  "question": "What is glaucoma?",
+  "include_sources": true,
+  "filters": {"disease_name": "Glaucoma"}  // optional
+}
+```
+**Response:**
+```json
+{
+  "answer": "Glaucoma is a group of eye diseases...",
+  "sources": [
+    {
+      "title": "Primary Open-Angle Glaucoma",
+      "url": "https://eyewiki.aao.org/...",
+      "section": "Overview",
+      "relevance_score": 0.89
+    }
+  ],
+  "confidence": 0.85,
+  "disclaimer": "Medical disclaimer text...",
+  "query": "What is glaucoma?"
+}
+```
+#### `POST /query/stream`
+Streaming query with Server-Sent Events
+**Request:**
+```json
+{
+  "question": "What is glaucoma?",
+  "filters": {}  // optional
+}
+```
+**Response:** SSE stream
+```
+data: Glaucoma
+data: is
+data: a group of eye diseases...
+```
+#### `GET /stats`
+Index and pipeline statistics
+**Response:**
+```json
+{
+  "collection_info": {
+    "name": "eyewiki_rag",
+    "vectors_count": 1234
+  },
+  "pipeline_config": {
+    "retrieval_k": 20,
+    "rerank_k": 5,
+    "llm_model": "mistral"
+  },
+  "documents_indexed": 1234,
+  "timestamp": 1702134567.89
+}
+```
+### Python Client Example
+```python
+import requests
+# Query the API
+response = requests.post(
+    "http://localhost:8000/query",
+    json={
+        "question": "What causes diabetic retinopathy?",
+        "include_sources": True
+    }
+)
+result = response.json()
+print(f"Answer: {result['answer']}")
+print(f"Confidence: {result['confidence']:.2%}")
+print(f"Sources: {len(result['sources'])}")
+```
+### Streaming Example
+```python
+import requests
+response = requests.post(
+    "http://localhost:8000/query/stream",
+    json={"question": "What is glaucoma?"},
+    stream=True
+)
+for line in response.iter_lines():
+    if line.startswith(b"data: "):
+        chunk = line[6:].decode()
+        print(chunk, end="", flush=True)
+```
+## 🧪 Development
+### Running Tests
+```bash
+# Run all tests
+pytest
+# Run with coverage
+pytest --cov=src --cov-report=html
+# Run specific test file
+pytest tests/test_components.py -v
+# Run specific test
+pytest tests/test_components.py::test_chunk_respects_headers -v
+# Run by marker
+pytest -m unit       # Fast unit tests
+pytest -m api        # API tests
+```
+### Code Quality
+```bash
+# Format code
+black src/ scripts/ tests/
+isort src/ scripts/ tests/
+# Lint
+flake8 src/
+pylint src/
+# Type checking
+mypy src/
+```
+### Evaluation
+Run system evaluation on test questions:
+```bash
+# Run evaluation
+python scripts/evaluate.py
+# With custom questions
+python scripts/evaluate.py --questions tests/custom_questions.json
+# Save results
+python scripts/evaluate.py --output results/eval.json
+# Verbose output
+python scripts/evaluate.py -v
+```
+**Metrics:**
+- Retrieval Recall
+- Answer Relevance
+- Citation Precision/Recall/F1
+- Performance by category
+## 🔧 Troubleshooting
+### Ollama Issues
+**Problem:** "Connection refused" to Ollama
+**Solution:**
+```bash
+# Check if Ollama is running
+curl http://localhost:11434/api/tags
+# Start Ollama
+ollama serve
+# Verify models are installed
+ollama list
+```
+**Problem:** "Model not found"
+**Solution:**
+```bash
+# Pull required models
+ollama pull nomic-embed-text
+ollama pull mistral
+# List available models
+ollama list
+```
+### Vector Store Issues
+**Problem:** "Collection not found"
+**Solution:**
+```bash
+# Rebuild the index
+python scripts/build_index.py --index-vectors --recreate-collection
+# Check Qdrant data directory
+ls -la data/qdrant/
+```
+**Problem:** "Out of memory during indexing"
+**Solution:**
+```bash
+# Use smaller batch size
+python scripts/build_index.py --index-vectors --embedding-batch-size 16
+# Or process in stages
+python scripts/build_index.py  # Process only (no indexing)
+python scripts/build_index.py --index-only  # Index separately
+```
+### Scraping Issues
+**Problem:** "Rate limited by EyeWiki"
+**Solution:**
+```bash
+# Increase delay between requests
+python scripts/scrape_eyewiki.py --delay 5.0
+# Resume from checkpoint if interrupted
+python scripts/scrape_eyewiki.py --resume
+```
+**Problem:** "Timeout during scraping"
+**Solution:**
+```bash
+# Increase timeout
+python scripts/scrape_eyewiki.py --timeout 60
+```
+**Problem:** "error while loading shared libraries: libnspr4.so" or browser crashes
+**Solution:**
+```bash
+# Install Playwright system dependencies (Linux/WSL)
+python -m playwright install-deps
+# Or manually install required libraries
+sudo apt-get update && sudo apt-get install -y \
+    libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 \
+    libcups2 libdrm2 libdbus-1-3 libxkbcommon0 \
+    libatspi2.0-0 libxcomposite1 libxdamage1 \
+    libxfixes3 libxrandr2 libgbm1 libasound2
+```
+**Problem:** "Executable doesn't exist" - Chromium browser not found
+**Solution:**
+```bash
+# Install Playwright browsers
+playwright install chromium
+# Or install all browsers
+playwright install
+```
+### API Server Issues
+**Problem:** "Pre-flight checks failed"
+**Solution:**
+1. Check Ollama is running: `ollama serve`
+2. Verify models: `ollama list`
+3. Check vector store: `ls data/qdrant/`
+4. View logs for specific error
+**Problem:** "Gradio UI not loading"
+**Solution:**
+```bash
+# Check if port is in use
+lsof -i :8000
+# Use different port
+python scripts/run_server.py --port 8080
+# Skip checks for testing
+python scripts/run_server.py --skip-checks
+```
+### Performance Issues
+**Problem:** "Slow query responses"
+**Solution:**
+1. Use GPU for embeddings (if available)
+2. Reduce `retrieval_k` and `rerank_k` in config
+3. Decrease `max_context_tokens`
+4. Use smaller LLM model (llama3.2:3b instead of mistral)
+**Problem:** "High memory usage"
+**Solution:**
+```bash
+# Use smaller models
+ollama pull llama3.2:3b  # Only 2GB
+# Reduce batch sizes in config
+# Edit src/config/settings.py:
+# chunk_size = 256  (instead of 512)
+# retrieval_k = 10  (instead of 20)
+```
+### Common Error Messages
+| Error | Cause | Solution |
+|-------|-------|----------|
+| `ConnectionError: Ollama` | Ollama not running | `ollama serve` |
+| `Collection 'eyewiki_rag' not found` | Index not built | `python scripts/build_index.py --index-vectors` |
+| `Model 'mistral' not found` | Model not pulled | `ollama pull mistral` |
+| `503 Service Unavailable` | System not initialized | Check logs, verify dependencies |
+| `422 Validation Error` | Invalid request format | Check API docs |
+## 📊 Performance Benchmarks
+Typical performance on a modern laptop (16GB RAM, M1/M2 or equivalent):
+| Operation | Time | Notes |
+|-----------|------|-------|
+| Scraping (100 pages) | ~5-10 min | Network dependent |
+| Processing | ~2-5 min | 100 documents |
+| Embedding generation | ~5-10 min | 100 documents |
+| Index building | ~3-5 min | 100 documents |
+| Query (no streaming) | ~2-5s | Includes retrieval + LLM |
+| Query (streaming) | ~0.5s first token | Then ~50 tokens/s |
+## 📚 Additional Resources
+### Documentation
+- [EyeWiki](https://eyewiki.aao.org/) - Source of medical content
+- [Ollama Documentation](https://github.com/ollama/ollama/blob/main/docs/README.md)
+- [Qdrant Documentation](https://qdrant.tech/documentation/)
+- [FastAPI Documentation](https://fastapi.tiangolo.com/)
+### Related Projects
+- [LlamaIndex](https://www.llamaindex.ai/) - Data framework for LLM applications
+- [LangChain](https://www.langchain.com/) - Framework for developing LLM applications
+- [Haystack](https://haystack.deepset.ai/) - End-to-end NLP framework
+### Papers & Resources
+- [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)
+- [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906)
+## ⚠️ Medical Disclaimer
+**IMPORTANT:** This system provides information from EyeWiki, a resource of the American Academy of Ophthalmology (AAO).
+The information provided by this system:
+- Is not a substitute for professional medical advice, diagnosis, or treatment
+- May contain errors due to AI limitations
+- Should be verified with authoritative sources before clinical use
+Always consult with a qualified ophthalmologist or eye care professional for medical concerns. This system should not be used for:
+- Clinical decision-making
+- Patient diagnosis
+- Treatment recommendations
+- Emergency medical situations
+## 📄 License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+### Third-Party Licenses
+- **EyeWiki Content**: © American Academy of Ophthalmology - Used under fair use for research purposes
+- **Ollama**: Apache 2.0 License
+- **Qdrant**: Apache 2.0 License
+- **FastAPI**: MIT License
+- **Gradio**: Apache 2.0 License
+## 🙏 Attribution
+### EyeWiki & AAO
+This project uses content from [EyeWiki](https://eyewiki.aao.org/), the collaborative online encyclopedia of ophthalmology created and maintained by the [American Academy of Ophthalmology (AAO)](https://www.aao.org/).
+**Citation:**
+> American Academy of Ophthalmology. EyeWiki. Available at: https://eyewiki.aao.org/. Accessed [Date].
+### Models & Libraries
+- **nomic-embed-text**: [Nomic AI](https://www.nomic.ai/)
+- **mistral**: [Mistral AI](https://mistral.ai/)
+- **sentence-transformers**: [UKPLab](https://www.ukp.tu-darmstadt.de/)
+- **crawl4ai**: [Web scraping framework](https://github.com/unclecode/crawl4ai)
+## 🤝 Contributing
+Contributions are welcome! Here's how you can help:
+### Areas for Contribution
+- 🐛 Bug fixes
+- ✨ New features
+- 📝 Documentation improvements
+- 🧪 Test coverage
+- 🎨 UI/UX enhancements
+- 🌍 Internationalization
+### Development Workflow
+1. Fork the repository
+2. Create a feature branch: `git checkout -b feature/amazing-feature`
+3. Make your changes
+4. Run tests: `pytest`
+5. Commit: `git commit -m 'Add amazing feature'`
+6. Push: `git push origin feature/amazing-feature`
+7. Open a Pull Request
+### Code Style
+- Follow PEP 8
+- Use Black for formatting
+- Add type hints
+- Write docstrings
+- Include tests for new features
+## 📞 Support
+- **Issues**: [GitHub Issues](https://github.com/your-repo/issues)
+- **Discussions**: [GitHub Discussions](https://github.com/your-repo/discussions)
+- **Email**: your-email@example.com
+## 🗺️ Roadmap
+### Planned Features
+- [ ] Multi-language support
+- [ ] PDF document upload
+- [ ] Advanced filtering (date, author, etc.)
+- [ ] Conversation history
+- [ ] Feedback mechanism
+- [ ] Export answers to PDF
+- [ ] Mobile-responsive UI
+- [ ] Docker deployment
+- [ ] Cloud deployment guide (AWS, GCP, Azure)
+- [ ] Integration with medical record systems
+### Future Improvements
+- [ ] Support for images in articles
+- [ ] Better handling of tables and diagrams
+- [ ] Citation formatting options (APA, MLA, etc.)
+- [ ] Multi-modal retrieval (text + images)
+- [ ] Custom model fine-tuning
+## ⭐ Star History
+If you find this project helpful, please consider giving it a star!
 ---
+**Built with ❤️ for the ophthalmology community**

config/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Configuration package

config/settings.py ADDED Viewed

	@@ -0,0 +1,229 @@

+"""Configuration settings for EyeWiki RAG system."""
+from enum import Enum
+from pathlib import Path
+from typing import Optional
+from pydantic import Field
+from pydantic_settings import BaseSettings, SettingsConfigDict
+class LLMProvider(str, Enum):
+    """Supported LLM providers."""
+    OLLAMA = "ollama"
+    OPENAI = "openai"
+class Settings(BaseSettings):
+    """Application settings loaded from environment variables."""
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        case_sensitive=False,
+        extra="ignore",
+    )
+    # LLM Provider Configuration
+    llm_provider: LLMProvider = Field(
+        default=LLMProvider.OLLAMA,
+        description="LLM provider to use: 'ollama' for local Ollama, 'openai' for OpenAI-compatible APIs (Groq, DeepSeek, OpenAI)",
+    )
+    # Ollama Configuration
+    ollama_base_url: str = Field(
+        default="http://localhost:11434",
+        description="Base URL for Ollama API",
+    )
+    ollama_timeout: int = Field(
+        default=30,
+        gt=0,
+        description="Request timeout for Ollama API in seconds",
+    )
+    embedding_model: str = Field(
+        default="nomic-embed-text",
+        description="Ollama embedding model name",
+    )
+    llm_model: str = Field(
+        default="mistral",
+        description="Ollama LLM model name",
+    )
+    llm_temperature: float = Field(
+        default=0.7,
+        ge=0.0,
+        le=2.0,
+        description="LLM temperature for response generation",
+    )
+    llm_max_tokens: int = Field(
+        default=2048,
+        gt=0,
+        description="Maximum tokens for LLM response",
+    )
+    # OpenAI-compatible API Configuration (for Groq, DeepSeek, OpenAI, etc.)
+    openai_api_key: Optional[str] = Field(
+        default=None,
+        description="API key for OpenAI-compatible provider",
+    )
+    openai_base_url: Optional[str] = Field(
+        default=None,
+        description="Base URL for OpenAI-compatible API (e.g., https://api.groq.com/openai/v1 for Groq)",
+    )
+    openai_model: str = Field(
+        default="llama-3.3-70b-versatile",
+        description="Model name for OpenAI-compatible provider (e.g., llama-3.3-70b-versatile for Groq)",
+    )
+    # Qdrant Configuration
+    qdrant_path: str = Field(
+        default="./data/vectorstore",
+        description="Path to Qdrant vector database",
+    )
+    qdrant_collection_name: str = Field(
+        default="eyewiki_rag",
+        description="Qdrant collection name",
+    )
+    qdrant_url: Optional[str] = Field(
+        default=None,
+        description="Qdrant server URL (for remote Qdrant)",
+    )
+    qdrant_api_key: Optional[str] = Field(
+        default=None,
+        description="Qdrant API key (for Qdrant Cloud)",
+    )
+    # Document Processing Configuration
+    chunk_size: int = Field(
+        default=512,
+        gt=0,
+        description="Size of text chunks for processing",
+    )
+    chunk_overlap: int = Field(
+        default=50,
+        ge=0,
+        description="Overlap between consecutive chunks",
+    )
+    min_chunk_size: int = Field(
+        default=100,
+        gt=0,
+        description="Minimum chunk size in tokens (skip smaller chunks)",
+    )
+    # RAG Configuration
+    top_k: int = Field(
+        default=10,
+        gt=0,
+        description="Number of documents to retrieve",
+    )
+    rerank_top_k: int = Field(
+        default=5,
+        gt=0,
+        description="Number of documents after reranking",
+    )
+    similarity_threshold: float = Field(
+        default=0.7,
+        ge=0.0,
+        le=1.0,
+        description="Minimum similarity score for retrieval",
+    )
+    reranker_model: str = Field(
+        default="cross-encoder/ms-marco-MiniLM-L-6-v2",
+        description="Cross-encoder model for reranking",
+    )
+    max_context_tokens: int = Field(
+        default=4096,
+        gt=0,
+        description="Maximum tokens for context in LLM prompt",
+    )
+    # Scraper Configuration
+    scraper_delay: float = Field(
+        default=1.0,
+        ge=0.0,
+        description="Delay between scraping requests in seconds",
+    )
+    scraper_max_pages: Optional[int] = Field(
+        default=None,
+        description="Maximum number of pages to scrape (None for unlimited)",
+    )
+    scraper_timeout: int = Field(
+        default=30,
+        gt=0,
+        description="Request timeout in seconds",
+    )
+    # API Configuration
+    api_host: str = Field(
+        default="0.0.0.0",
+        description="API server host",
+    )
+    api_port: int = Field(
+        default=8000,
+        gt=0,
+        le=65535,
+        description="API server port",
+    )
+    api_workers: int = Field(
+        default=4,
+        gt=0,
+        description="Number of API workers",
+    )
+    # Gradio UI Configuration
+    gradio_host: str = Field(
+        default="0.0.0.0",
+        description="Gradio UI host",
+    )
+    gradio_port: int = Field(
+        default=7860,
+        gt=0,
+        le=65535,
+        description="Gradio UI port",
+    )
+    gradio_share: bool = Field(
+        default=False,
+        description="Create public Gradio share link",
+    )
+    # Data Paths
+    data_raw_path: str = Field(
+        default="./data/raw",
+        description="Path to raw scraped data",
+    )
+    data_processed_path: str = Field(
+        default="./data/processed",
+        description="Path to processed documents",
+    )
+    # Logging
+    log_level: str = Field(
+        default="INFO",
+        description="Logging level",
+    )
+    log_file: Optional[str] = Field(
+        default="logs/eyewiki_rag.log",
+        description="Log file path",
+    )
+    def get_data_paths(self) -> dict[str, Path]:
+        """Get all data paths as Path objects."""
+        return {
+            "raw": Path(self.data_raw_path),
+            "processed": Path(self.data_processed_path),
+            "vectorstore": Path(self.qdrant_path),
+        }
+    def ensure_data_directories(self) -> None:
+        """Create data directories if they don't exist."""
+        for path in self.get_data_paths().values():
+            path.mkdir(parents=True, exist_ok=True)
+        # Create logs directory if log_file is specified
+        if self.log_file:
+            log_path = Path(self.log_file)
+            log_path.parent.mkdir(parents=True, exist_ok=True)
+# Create global settings instance
+settings = Settings()

data/processed/.gitkeep ADDED Viewed

File without changes

data/raw/.gitkeep ADDED Viewed

File without changes

data/vectorstore/.gitkeep ADDED Viewed

File without changes

deployment_readme.md ADDED Viewed

	@@ -0,0 +1,151 @@

+# Deployment Guide - EyeWiki RAG on Free Hosting
+This guide covers deploying the EyeWiki RAG system using free/cheap cloud services:
+- **App Hosting**: Hugging Face Spaces (Docker SDK)
+- **Vector Database**: Qdrant Cloud (Free Tier)
+- **LLM Provider**: Groq (Free Tier) or any OpenAI-compatible API
+## Prerequisites
+- A [Hugging Face](https://huggingface.co) account
+- A [Qdrant Cloud](https://cloud.qdrant.io) account
+- A [Groq](https://console.groq.com) account (or other OpenAI-compatible provider)
+---
+## Step 1: Set Up Qdrant Cloud
+1. Go to [Qdrant Cloud](https://cloud.qdrant.io) and create a free cluster.
+2. Once created, note down:
+   - **Cluster URL** (e.g., `https://abc123-xyz.aws.cloud.qdrant.io:6333`)
+   - **API Key** (from the cluster dashboard)
+3. You will need to index your data into the Qdrant Cloud cluster. You can do this locally:
+   ```bash
+   export QDRANT_URL="https://your-cluster-url:6333"
+   export QDRANT_API_KEY="your-qdrant-api-key"
+   python scripts/build_index.py --index-vectors
+   ```
+## Step 2: Get a Groq API Key
+1. Go to [Groq Console](https://console.groq.com) and sign up.
+2. Create an API key from the dashboard.
+3. Note down the API key.
+## Step 3: Deploy to Hugging Face Spaces
+### Option A: Via the HF Web UI
+1. Go to [Hugging Face Spaces](https://huggingface.co/spaces) and click **Create new Space**.
+2. Choose **Docker** as the SDK.
+3. Upload the project files (or connect a Git repo).
+4. In the Space **Settings > Variables and secrets**, add:
+   | Variable          | Value                                         |
+   |-------------------|-----------------------------------------------|
+   | `LLM_PROVIDER`    | `openai`                                      |
+   | `OPENAI_API_KEY`  | `gsk_your_groq_api_key`                       |
+   | `OPENAI_BASE_URL` | `https://api.groq.com/openai/v1`              |
+   | `OPENAI_MODEL`    | `llama-3.3-70b-versatile`                     |
+   | `QDRANT_URL`      | `https://your-cluster.cloud.qdrant.io:6333`   |
+   | `QDRANT_API_KEY`  | `your_qdrant_api_key`                         |
+5. The Space will build using `Dockerfile.deploy` and start automatically.
+### Option B: Via the HF CLI
+```bash
+# Install HF CLI
+pip install huggingface_hub
+# Login
+huggingface-cli login
+# Create Space
+huggingface-cli repo create eyewiki-rag --type space --space-sdk docker
+# Clone and push
+git clone https://huggingface.co/spaces/YOUR_USERNAME/eyewiki-rag
+cd eyewiki-rag
+# Copy project files here, then:
+cp /path/to/project/Dockerfile.deploy ./Dockerfile
+git add . && git commit -m "Initial deployment" && git push
+```
+Then add the environment variables via the web UI (Settings > Variables and secrets).
+## Step 4: Verify Deployment
+Once the Space is running:
+1. Visit your Space URL (e.g., `https://your-username-eyewiki-rag.hf.space`)
+2. Check the health endpoint: `https://your-username-eyewiki-rag.hf.space/health`
+3. Try the Gradio UI: `https://your-username-eyewiki-rag.hf.space/ui`
+---
+## Environment Variables Reference
+| Variable                  | Required | Default                    | Description                                        |
+|---------------------------|----------|----------------------------|----------------------------------------------------|
+| `LLM_PROVIDER`            | No       | `ollama`                   | LLM provider: `ollama` or `openai`                 |
+| `OPENAI_API_KEY`          | If openai| -                          | API key for OpenAI-compatible provider              |
+| `OPENAI_BASE_URL`         | No       | `https://api.openai.com/v1`| Base URL for OpenAI-compatible API                  |
+| `OPENAI_MODEL`            | No       | `llama-3.3-70b-versatile`  | Model name for the provider                         |
+| `OLLAMA_BASE_URL`         | No       | `http://localhost:11434`   | Ollama API URL (only for ollama provider)           |
+| `LLM_MODEL`               | No       | `mistral`                  | Ollama model name (only for ollama provider)        |
+| `QDRANT_URL`              | No       | -                          | Qdrant Cloud cluster URL                            |
+| `QDRANT_API_KEY`          | No       | -                          | Qdrant Cloud API key                                |
+| `QDRANT_PATH`             | No       | `./data/vectorstore`       | Local Qdrant path (if not using cloud)              |
+| `QDRANT_COLLECTION_NAME`  | No       | `eyewiki_rag`              | Qdrant collection name                              |
+| `EMBEDDING_MODEL`         | No       | `nomic-embed-text`         | Sentence-transformer embedding model                |
+| `API_PORT`                | No       | `8000`                     | API server port                                     |
+---
+## Provider Examples
+### Groq (Free Tier)
+```env
+LLM_PROVIDER=openai
+OPENAI_API_KEY=gsk_your_key_here
+OPENAI_BASE_URL=https://api.groq.com/openai/v1
+OPENAI_MODEL=llama-3.3-70b-versatile
+```
+### OpenAI
+```env
+LLM_PROVIDER=openai
+OPENAI_API_KEY=sk-your_key_here
+OPENAI_MODEL=gpt-4o-mini
+```
+### DeepSeek
+```env
+LLM_PROVIDER=openai
+OPENAI_API_KEY=your_key_here
+OPENAI_BASE_URL=https://api.deepseek.com/v1
+OPENAI_MODEL=deepseek-chat
+```
+### Local Ollama (Default)
+```env
+LLM_PROVIDER=ollama
+OLLAMA_BASE_URL=http://localhost:11434
+LLM_MODEL=mistral
+```
+---
+## Troubleshooting
+- **Space fails to build**: Check that `Dockerfile.deploy` is renamed to `Dockerfile` in the Space repo.
+- **Model download slow on startup**: The embedding model (`all-mpnet-base-v2`) downloads on first run. Subsequent restarts use the cached version.
+- **Qdrant connection errors**: Verify your `QDRANT_URL` includes the port (`:6333`) and the API key is correct.
+- **LLM errors**: Check that your API key is valid and the model name is supported by your provider.

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,80 @@

+# EyeWiki RAG System - Docker Compose Configuration
+version: '3.8'
+services:
+  # Qdrant vector database
+  qdrant:
+    image: qdrant/qdrant:latest
+    container_name: eyewiki-qdrant
+    ports:
+      - "6333:6333"  # REST API
+      - "6334:6334"  # gRPC (optional)
+    volumes:
+      - qdrant_data:/qdrant/storage
+    environment:
+      - QDRANT__SERVICE__GRPC_PORT=6334
+    networks:
+      - eyewiki-network
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:6333/"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 40s
+  # EyeWiki RAG API
+  eyewiki-rag:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: eyewiki-rag-api
+    ports:
+      - "8000:8000"
+    volumes:
+      # Mount data directories for persistence
+      - ./data/raw:/app/data/raw
+      - ./data/processed:/app/data/processed
+      - qdrant_data:/app/data/qdrant
+      # Mount prompts for easy customization
+      - ./prompts:/app/prompts
+    environment:
+      # Ollama on host (access via host.docker.internal)
+      - OLLAMA_BASE_URL=http://host.docker.internal:11434
+      - LLM_MODEL=mistral
+      - EMBEDDING_MODEL=nomic-embed-text
+      # Qdrant service
+      - QDRANT_HOST=qdrant
+      - QDRANT_PORT=6333
+      - QDRANT_COLLECTION_NAME=eyewiki_rag
+      - QDRANT_PATH=/app/data/qdrant
+      # Processing settings
+      - CHUNK_SIZE=512
+      - CHUNK_OVERLAP=50
+      - MAX_CONTEXT_TOKENS=4000
+      # Retrieval settings
+      - RETRIEVAL_K=20
+      - RERANK_K=5
+    networks:
+      - eyewiki-network
+    depends_on:
+      qdrant:
+        condition: service_healthy
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+networks:
+  eyewiki-network:
+    driver: bridge
+volumes:
+  qdrant_data:
+    driver: local

plan/implementation_plan.md ADDED Viewed

	@@ -0,0 +1,62 @@

+# Implementation Plan - EyeWiki RAG Deployment
+This plan outlines the steps to prepare the EyeWiki RAG application for deployment on free/cheap hosting providers (specifically Hugging Face Spaces + Groq + Qdrant Cloud), by decoupling the local Ollama dependency.
+## User Review Required
+> [!IMPORTANT]
+> **LLM Provider Switch**: The deployment will support switching from local Ollama to "OpenAI-compatible" APIs (like Groq, DeepSeek, or OpenAI itself). This requires an API key for the chosen provider.
+> [!NOTE]
+> **Hosting Choice**: The recommended "free" stack is **Hugging Face Spaces (Docker)** for the app, **Qdrant Cloud (Free Tier)** for the vector DB, and **Groq (Free Tier)** for the LLM.
+## Proposed Changes
+### Configuration
+#### [MODIFY] [settings.py](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/config/settings.py)
+- Add `llm_provider` field (enum: "ollama", "openai").
+- Add `openai_api_key`, `openai_base_url`, `openai_model` fields.
+### LLM Abstraction
+#### [NEW] [llm_client.py](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/src/llm/llm_client.py)
+- Define `LLMClient` abstract base class/protocol with `generate` and `stream_generate` methods.
+#### [MODIFY] [ollama_client.py](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/src/llm/ollama_client.py)
+- Implement `LLMClient` interface.
+#### [NEW] [openai_client.py](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/src/llm/openai_client.py)
+- Implement `LLMClient` using `openai` python package.
+- Support standard OpenAI API and compatible endpoints (Groq).
+### Application Logic
+#### [MODIFY] [query_engine.py](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/src/rag/query_engine.py)
+- Update type hints to use abstract `LLMClient`.
+#### [MODIFY] [main.py](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/src/api/main.py)
+- Instantiate appropriate client based on `settings.llm_provider`.
+- Update lifecycle events.
+#### [MODIFY] [run_server.py](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/scripts/run_server.py)
+- Modify pre-flight checks to only check Ollama if `llm_provider_is_ollama`.
+- Add checks for API keys if provider is OpenAI/Groq.
+### Deployment Configuration
+#### [NEW] [Dockerfile.deploy](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/Dockerfile.deploy)
+- Optimized Dockerfile for Hugging Face Spaces (non-root user, specific cache directories).
+#### [NEW] [deployment_readme.md](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/deployment_readme.md)
+- Step-by-step guide for deploying to HF Spaces and setting up Qdrant Cloud.
+#### [MODIFY] [requirements.txt](file:///home/obum/Projects/Care1/research-and-ml/eyewiki-rag/requirements.txt)
+- Add `openai>=1.0.0`.
+## Verification Plan
+### Automated Tests
+- Run existing tests to ensure no regression: `pytest tests/`
+- *Note:* New client tests would require mocking OpenAI API, which might be out of scope for a "test deployment", but we will verify the code compiles and runs.
+### Manual Verification
+1.  **Local Test (Ollama)**: Run server with `LLM_PROVIDER=ollama` and verify standard functionality.
+2.  **Local Test (Mock/Groq)**: Run server with `LLM_PROVIDER=openai` and a valid API key (or mock) to verify the switch works.
+3.  **Deployment Build**: Build the `Dockerfile.deploy` locally to ensure it builds correctly.

prompts/medical_disclaimer.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ Medical Disclaimer: This information is sourced from EyeWiki, a resource of the American Academy of Ophthalmology (AAO). It is not a substitute for professional medical advice, diagnosis, or treatment. AI systems can make errors. Always consult with a qualified ophthalmologist or eye care professional for medical concerns and verify any critical information with authoritative sources.

prompts/query_prompt.txt ADDED Viewed

	@@ -0,0 +1,21 @@

+You are answering a question using information from the EyeWiki medical knowledge base.
+CONTEXT FROM EYEWIKI:
+{context}
+---
+QUESTION: {question}
+INSTRUCTIONS:
+1. Answer the question using ONLY the information provided in the context above
+2. Cite sources for all claims using the format: [Source: Article Title]
+3. If the context does not contain enough information to fully answer the question, clearly state: "The provided sources do not contain sufficient information about [specific aspect]"
+4. Organize your answer with:
+   - A direct answer to the question (1-2 sentences)
+   - Supporting details from the sources with citations
+   - Any relevant additional context from the sources
+5. Use clear medical terminology with explanations for technical terms
+6. Do NOT make up or infer information beyond what is explicitly stated in the context
+ANSWER:

prompts/system_prompt.txt ADDED Viewed

	@@ -0,0 +1,87 @@

+You are an expert ophthalmology knowledge assistant powered by the EyeWiki medical database. Your role is to provide accurate, evidence-based information about eye diseases, conditions, treatments, and procedures.
+CRITICAL GUIDELINES:
+1. CONTEXT-ONLY RESPONSES
+   - Base ALL answers strictly on the provided context from EyeWiki articles
+   - NEVER make up, infer, or add information that is not explicitly in the context
+   - If the context does not contain enough information to answer a question, clearly state this
+   - Do not use general knowledge or information from other sources
+2. SOURCE CITATION
+   - Always cite the specific EyeWiki article when referencing information
+   - Use the format: [Source: Article Title] or "According to [Article Title]..."
+   - When multiple sources support a point, cite all relevant sources
+   - Include section names when specific information comes from a particular section
+3. RESPONSE STRUCTURE
+   Your answers should follow this format:
+   a) Direct Answer
+      - Begin with a clear, concise answer to the specific question
+      - Use 1-2 sentences to address the core query
+   b) Supporting Details
+      - Provide relevant details, definitions, and explanations from the sources
+      - Use proper medical terminology, but include clear explanations for complex terms
+      - Organize information logically (e.g., causes, symptoms, diagnosis, treatment)
+   c) Additional Context (when appropriate)
+      - Include related information that provides valuable context
+      - Mention important considerations, risk factors, or variations
+      - Connect concepts to help understanding
+   d) Limitations
+      - If the context is incomplete, specify what information is missing
+      - Acknowledge when a question requires clinical judgment or patient-specific evaluation
+4. MEDICAL TERMINOLOGY
+   - Use accurate medical terminology as it appears in the sources
+   - Immediately follow technical terms with clear explanations in parentheses
+   - Example: "trabecular meshwork (the eye's drainage system)"
+   - Balance professional precision with accessibility
+5. UNCERTAINTY AND LIMITATIONS
+   When you cannot fully answer a question:
+   - Explicitly state: "The provided sources do not contain sufficient information about..."
+   - Offer what partial information IS available
+   - Suggest what type of information would be needed for a complete answer
+   - NEVER guess or extrapolate beyond what the sources explicitly state
+6. CLINICAL CONSULTATION REMINDER
+   - For questions about specific symptoms, diagnosis, or treatment decisions, remind users to consult a qualified eye care professional
+   - Emphasize that individual cases vary and require professional medical evaluation
+   - Do not provide specific medical advice for individual situations
+7. RESPONSE QUALITY
+   - Be thorough but concise - avoid unnecessary verbosity
+   - Use clear section headers for longer responses
+   - Present information in a logical, easy-to-follow structure
+   - Use bullet points or numbered lists when appropriate for clarity
+   - Maintain a professional yet approachable tone
+8. ACCURACY PRIORITIES
+   - Accuracy is more important than completeness
+   - It is better to say "I don't have enough information" than to speculate
+   - When sources conflict or present multiple perspectives, present all views and cite each
+   - Distinguish between established facts and areas of ongoing research or debate
+EXAMPLE RESPONSE PATTERNS:
+Good Response:
+"Primary open-angle glaucoma (POAG) is characterized by progressive optic nerve damage and visual field loss [Source: Primary Open-Angle Glaucoma]. The primary risk factor is elevated intraocular pressure (IOP), which occurs when the eye's drainage system (trabecular meshwork) becomes less efficient at draining aqueous humor [Source: Glaucoma Pathophysiology]..."
+Poor Response:
+"Glaucoma is usually treated with eye drops, and most patients do well with treatment."
+(No citations, no source verification, making general claims)
+When Uncertain:
+"The provided sources discuss glaucoma treatment options including medications and surgery [Source: Glaucoma Management], but do not contain specific information about the long-term success rates you're asking about. For detailed statistics on treatment outcomes, you would need additional clinical research data."
+REMEMBER:
+- You are a knowledge assistant, not a medical professional
+- Your purpose is to provide information, not to diagnose or prescribe
+- Every piece of information should be traceable to the provided sources
+- Professional consultation is irreplaceable for medical care
+Maintain these standards in every response to ensure users receive accurate, well-sourced, and appropriately contextualized medical information.

pytest.ini ADDED Viewed

	@@ -0,0 +1,26 @@

+[pytest]
+# Pytest configuration file
+# Test discovery patterns
+python_files = test_*.py
+python_classes = Test*
+python_functions = test_*
+# Test paths
+testpaths = tests
+# Output options
+addopts =
+    -v
+    --strict-markers
+    --tb=short
+    --disable-warnings
+# Markers
+markers =
+    unit: Unit tests (fast, isolated)
+    integration: Integration tests (may be slow)
+    api: API tests (requires server components)
+# Minimum Python version
+minversion = 3.8

requirements.txt ADDED Viewed

	@@ -0,0 +1,47 @@

+# Web Scraping
+crawl4ai>=0.3.0
+beautifulsoup4>=4.12.0
+markdownify>=0.11.0
+# RAG Framework
+llama-index>=0.10.0
+llama-index-vector-stores-qdrant>=0.2.0
+llama-index-embeddings-ollama>=0.1.0
+llama-index-llms-ollama>=0.1.0
+# Vector Storage
+qdrant-client>=1.7.0
+# Embeddings & Reranking
+sentence-transformers>=2.2.0  # For stable embeddings and cross-encoder reranking
+torch>=2.0.0  # Required by sentence-transformers
+# API Server
+fastapi>=0.104.0
+uvicorn[standard]>=0.24.0
+# UI
+gradio>=4.0.0
+# Configuration
+python-dotenv>=1.0.0
+pydantic>=2.0.0
+pydantic-settings>=2.0.0
+# CLI Output & Progress
+rich>=13.0.0
+tqdm>=4.66.0
+# OpenAI-compatible API
+openai>=1.0.0
+# Utilities
+requests>=2.31.0
+aiohttp>=3.9.0
+# Development
+pytest>=7.4.0
+pytest-asyncio>=0.21.0
+black>=23.11.0
+isort>=5.12.0
+flake8>=6.1.0

scripts/build_index.py ADDED Viewed

	@@ -0,0 +1,683 @@

+#!/usr/bin/env python3
+"""Build index by processing raw markdown files into semantic chunks with metadata."""
+import argparse
+import json
+import sys
+import traceback
+from pathlib import Path
+from typing import Dict, List
+from tqdm import tqdm
+# Add parent directory to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from src.processing.chunker import SemanticChunker, ChunkNode
+from src.processing.metadata_extractor import MetadataExtractor
+from src.vectorstore.qdrant_store import QdrantStoreManager
+from src.llm.sentence_transformer_client import SentenceTransformerClient
+from config.settings import settings
+from rich.console import Console
+from rich.panel import Panel
+from rich.table import Table
+def parse_args():
+    """Parse command line arguments."""
+    parser = argparse.ArgumentParser(
+        description="Process raw EyeWiki markdown into semantic chunks with medical metadata",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Just process files (no vector indexing)
+  python scripts/build_index.py
+  # Process AND build vector index
+  python scripts/build_index.py --index-vectors
+  # Only build vector index from existing processed files
+  python scripts/build_index.py --index-only
+  # Process with custom directories
+  python scripts/build_index.py --input-dir ./my_raw --output-dir ./my_processed
+  # Force rebuild with fresh Qdrant collection
+  python scripts/build_index.py --rebuild --index-vectors --recreate-collection
+  # Process only files matching pattern
+  python scripts/build_index.py --pattern "Glaucoma*.md" --index-vectors
+  # Custom chunking and embedding parameters
+  python scripts/build_index.py --chunk-size 1024 --embedding-batch-size 64 --index-vectors
+        """,
+    )
+    parser.add_argument(
+        "--input-dir",
+        type=str,
+        default=None,
+        help=f"Input directory with raw markdown files (default: {settings.data_raw_path})",
+    )
+    parser.add_argument(
+        "--output-dir",
+        type=str,
+        default=None,
+        help=f"Output directory for processed chunks (default: {settings.data_processed_path})",
+    )
+    parser.add_argument(
+        "--rebuild",
+        action="store_true",
+        help="Force rebuild even if output files exist",
+    )
+    parser.add_argument(
+        "--pattern",
+        type=str,
+        default="*.md",
+        help="Glob pattern for files to process (default: *.md)",
+    )
+    parser.add_argument(
+        "--chunk-size",
+        type=int,
+        default=None,
+        help=f"Chunk size in tokens (default: {settings.chunk_size})",
+    )
+    parser.add_argument(
+        "--chunk-overlap",
+        type=int,
+        default=None,
+        help=f"Chunk overlap in tokens (default: {settings.chunk_overlap})",
+    )
+    parser.add_argument(
+        "--min-chunk-size",
+        type=int,
+        default=None,
+        help=f"Minimum chunk size in tokens (default: {settings.min_chunk_size})",
+    )
+    parser.add_argument(
+        "--verbose",
+        "-v",
+        action="store_true",
+        help="Enable verbose output with detailed error messages",
+    )
+    parser.add_argument(
+        "--index-vectors",
+        action="store_true",
+        help="Build vector index in Qdrant after processing",
+    )
+    parser.add_argument(
+        "--index-only",
+        action="store_true",
+        help="Skip processing, only build vector index from existing processed files",
+    )
+    parser.add_argument(
+        "--recreate-collection",
+        action="store_true",
+        help="Recreate Qdrant collection (deletes existing data)",
+    )
+    parser.add_argument(
+        "--embedding-batch-size",
+        type=int,
+        default=32,
+        help="Batch size for embedding generation (default: 32)",
+    )
+    parser.add_argument(
+        "--embedding-model",
+        type=str,
+        default="sentence-transformers/all-mpnet-base-v2",
+        help="Sentence transformer model name (default: all-mpnet-base-v2)",
+    )
+    return parser.parse_args()
+def print_banner(console: Console):
+    """Print welcome banner."""
+    banner = """
+[bold cyan]EyeWiki Index Builder[/bold cyan]
+[dim]Processing pipeline: Markdown � Metadata Extraction � Semantic Chunking � JSON[/dim]
+    """
+    console.print(Panel(banner, border_style="cyan"))
+def load_markdown_file(md_file: Path) -> tuple[str, Dict]:
+    """
+    Load markdown content and corresponding JSON metadata.
+    Args:
+        md_file: Path to markdown file
+    Returns:
+        Tuple of (content, metadata)
+    Raises:
+        FileNotFoundError: If JSON metadata file not found
+        ValueError: If content is empty or metadata is invalid
+    """
+    # Read markdown content
+    with open(md_file, "r", encoding="utf-8") as f:
+        content = f.read()
+    if not content.strip():
+        raise ValueError("Empty markdown content")
+    # Look for corresponding JSON metadata
+    json_file = md_file.with_suffix(".json")
+    if not json_file.exists():
+        raise FileNotFoundError(f"Metadata file not found: {json_file}")
+    # Read metadata
+    with open(json_file, "r", encoding="utf-8") as f:
+        metadata = json.load(f)
+    if not isinstance(metadata, dict):
+        raise ValueError("Invalid metadata format (must be dict)")
+    return content, metadata
+def process_file(
+    md_file: Path,
+    output_dir: Path,
+    chunker: SemanticChunker,
+    extractor: MetadataExtractor,
+    rebuild: bool = False,
+    verbose: bool = False,
+) -> Dict:
+    """
+    Process a single markdown file through the pipeline.
+    Pipeline:
+    1. Load markdown and metadata
+    2. Extract medical metadata
+    3. Chunk document
+    4. Save chunks to JSON
+    Args:
+        md_file: Path to markdown file
+        output_dir: Output directory for chunks
+        chunker: SemanticChunker instance
+        extractor: MetadataExtractor instance
+        rebuild: Force rebuild even if output exists
+        verbose: Enable verbose error output
+    Returns:
+        Dictionary with processing results and statistics
+    """
+    result = {
+        "file": md_file.name,
+        "status": "pending",
+        "chunks_created": 0,
+        "total_tokens": 0,
+        "error": None,
+    }
+    output_file = output_dir / f"{md_file.stem}_chunks.json"
+    # Check if output already exists
+    if output_file.exists() and not rebuild:
+        result["status"] = "skipped"
+        result["error"] = "Output already exists (use --rebuild to force)"
+        return result
+    try:
+        # Step 1: Load file
+        content, metadata = load_markdown_file(md_file)
+        # Step 2: Extract medical metadata
+        enhanced_metadata = extractor.extract(content, metadata)
+        # Step 3: Chunk document
+        chunks = chunker.chunk_document(content, enhanced_metadata)
+        if not chunks:
+            result["status"] = "skipped"
+            result["error"] = "No chunks created (content too small or filtered)"
+            return result
+        # Step 4: Save chunks to JSON
+        output_dir.mkdir(parents=True, exist_ok=True)
+        with open(output_file, "w", encoding="utf-8") as f:
+            chunk_dicts = [chunk.to_dict() for chunk in chunks]
+            json.dump(chunk_dicts, f, indent=2, ensure_ascii=False)
+        # Update result
+        result["status"] = "success"
+        result["chunks_created"] = len(chunks)
+        result["total_tokens"] = sum(chunk.token_count for chunk in chunks)
+    except FileNotFoundError as e:
+        result["status"] = "error"
+        result["error"] = f"File not found: {e}"
+        if verbose:
+            result["traceback"] = traceback.format_exc()
+    except ValueError as e:
+        result["status"] = "error"
+        result["error"] = f"Invalid data: {e}"
+        if verbose:
+            result["traceback"] = traceback.format_exc()
+    except Exception as e:
+        result["status"] = "error"
+        result["error"] = f"Unexpected error: {e}"
+        if verbose:
+            result["traceback"] = traceback.format_exc()
+    return result
+def print_statistics(results: List[Dict], console: Console):
+    """
+    Print processing statistics.
+    Args:
+        results: List of processing results
+        console: Rich console for output
+    """
+    # Calculate statistics
+    total_files = len(results)
+    successful = sum(1 for r in results if r["status"] == "success")
+    skipped = sum(1 for r in results if r["status"] == "skipped")
+    errors = sum(1 for r in results if r["status"] == "error")
+    total_chunks = sum(r["chunks_created"] for r in results)
+    total_tokens = sum(r["total_tokens"] for r in results)
+    avg_chunks = total_chunks / successful if successful > 0 else 0
+    avg_tokens_per_chunk = total_tokens / total_chunks if total_chunks > 0 else 0
+    avg_tokens_per_doc = total_tokens / successful if successful > 0 else 0
+    # Create statistics table
+    table = Table(title="Processing Statistics", border_style="green")
+    table.add_column("Metric", style="cyan", justify="left")
+    table.add_column("Value", style="white", justify="right")
+    table.add_row("Total Files", f"{total_files:,}")
+    table.add_row("Successfully Processed", f"{successful:,}")
+    table.add_row("Skipped", f"{skipped:,}")
+    table.add_row("Errors", f"{errors:,}")
+    table.add_row("", "")  # Separator
+    table.add_row("Total Chunks Created", f"{total_chunks:,}")
+    table.add_row("Total Tokens", f"{total_tokens:,}")
+    table.add_row("", "")  # Separator
+    table.add_row("Avg Chunks per Document", f"{avg_chunks:.1f}")
+    table.add_row("Avg Tokens per Chunk", f"{avg_tokens_per_chunk:.1f}")
+    table.add_row("Avg Tokens per Document", f"{avg_tokens_per_doc:.1f}")
+    console.print("\n")
+    console.print(table)
+    # Show error details if any
+    error_results = [r for r in results if r["status"] == "error"]
+    if error_results:
+        console.print("\n[yellow]Error Details:[/yellow]")
+        for i, result in enumerate(error_results[:10], 1):
+            console.print(f"  {i}. [red]{result['file']}[/red]")
+            console.print(f"     [dim]{result['error']}[/dim]")
+            if "traceback" in result:
+                console.print(f"     [dim]{result['traceback']}[/dim]")
+        if len(error_results) > 10:
+            console.print(f"  [dim]... and {len(error_results) - 10} more errors[/dim]")
+    # Show skipped details if any
+    skip_results = [r for r in results if r["status"] == "skipped"]
+    if skip_results and len(skip_results) <= 5:
+        console.print("\n[yellow]Skipped Files:[/yellow]")
+        for i, result in enumerate(skip_results, 1):
+            console.print(f"  {i}. {result['file']}: {result['error']}")
+def load_processed_chunks(processed_dir: Path, console: Console) -> List[ChunkNode]:
+    """
+    Load all processed chunks from JSON files.
+    Args:
+        processed_dir: Directory containing processed chunk JSON files
+        console: Rich console for output
+    Returns:
+        List of ChunkNode objects
+    """
+    chunk_files = list(processed_dir.glob("*_chunks.json"))
+    if not chunk_files:
+        console.print(f"[yellow]No processed chunk files found in {processed_dir}[/yellow]")
+        return []
+    all_chunks = []
+    console.print(f"\n[cyan]Loading processed chunks from {len(chunk_files)} files...[/cyan]")
+    with tqdm(chunk_files, desc="Loading chunks", unit="file") as pbar:
+        for chunk_file in pbar:
+            try:
+                with open(chunk_file, "r", encoding="utf-8") as f:
+                    chunk_dicts = json.load(f)
+                # Convert dicts to ChunkNode objects
+                for chunk_dict in chunk_dicts:
+                    chunk = ChunkNode.from_dict(chunk_dict)
+                    all_chunks.append(chunk)
+                pbar.set_postfix({"total_chunks": len(all_chunks)})
+            except Exception as e:
+                console.print(f"[red]Error loading {chunk_file.name}: {e}[/red]")
+    console.print(f"[green]✓[/green] Loaded {len(all_chunks):,} chunks")
+    return all_chunks
+def build_vector_index(
+    chunks: List[ChunkNode],
+    embedding_client: SentenceTransformerClient,
+    qdrant_manager: QdrantStoreManager,
+    batch_size: int,
+    console: Console,
+) -> Dict:
+    """
+    Build vector index by generating embeddings and inserting into Qdrant.
+    Args:
+        chunks: List of ChunkNode objects
+        embedding_client: SentenceTransformerClient for stable embeddings
+        qdrant_manager: QdrantStoreManager for vector storage
+        batch_size: Batch size for embedding generation
+        console: Rich console for output
+    Returns:
+        Dictionary with indexing statistics
+    """
+    if not chunks:
+        console.print("[yellow]No chunks to index[/yellow]")
+        return {"chunks_indexed": 0, "time_taken": 0}
+    console.print(f"\n[bold cyan]Building Vector Index[/bold cyan]")
+    console.print(f"Chunks to index: {len(chunks):,}")
+    console.print(f"Embedding batch size: {batch_size}")
+    import time
+    start_time = time.time()
+    # Extract text content for embedding
+    texts = [chunk.content for chunk in chunks]
+    # Generate embeddings with progress bar
+    console.print("\n[cyan]Generating embeddings...[/cyan]")
+    try:
+        embeddings = embedding_client.embed_batch(
+            texts=texts,
+            batch_size=batch_size,
+            show_progress=True,
+        )
+    except Exception as e:
+        console.print(f"[red]Failed to generate embeddings: {e}[/red]")
+        raise
+    # Insert into Qdrant
+    console.print("\n[cyan]Inserting into Qdrant...[/cyan]")
+    try:
+        num_added = qdrant_manager.add_documents(
+            chunks=chunks,
+            dense_embeddings=embeddings,
+        )
+    except Exception as e:
+        console.print(f"[red]Failed to insert into Qdrant: {e}[/red]")
+        raise
+    elapsed_time = time.time() - start_time
+    # Get collection info
+    try:
+        collection_info = qdrant_manager.get_collection_info()
+    except Exception as e:
+        console.print(f"[yellow]Could not get collection info: {e}[/yellow]")
+        collection_info = {}
+    stats = {
+        "chunks_indexed": num_added,
+        "time_taken": elapsed_time,
+        "chunks_per_second": num_added / elapsed_time if elapsed_time > 0 else 0,
+        "collection_info": collection_info,
+    }
+    return stats
+def print_index_statistics(stats: Dict, console: Console):
+    """
+    Print vector indexing statistics.
+    Args:
+        stats: Statistics dictionary
+        console: Rich console for output
+    """
+    table = Table(title="Vector Index Statistics", border_style="green")
+    table.add_column("Metric", style="cyan", justify="left")
+    table.add_column("Value", style="white", justify="right")
+    table.add_row("Chunks Indexed", f"{stats['chunks_indexed']:,}")
+    table.add_row("Time Taken", f"{stats['time_taken']:.1f}s")
+    table.add_row("Chunks/Second", f"{stats['chunks_per_second']:.1f}")
+    if "collection_info" in stats and stats["collection_info"]:
+        info = stats["collection_info"]
+        table.add_row("", "")  # Separator
+        table.add_row("Collection Name", info.get("name", "N/A"))
+        table.add_row("Total Vectors", f"{info.get('vectors_count', 0):,}")
+        table.add_row("Total Points", f"{info.get('points_count', 0):,}")
+        table.add_row("Status", info.get("status", "N/A"))
+    console.print("\n")
+    console.print(table)
+def main():
+    """Main entry point for index building."""
+    args = parse_args()
+    console = Console()
+    # Print banner
+    print_banner(console)
+    # Prepare directories
+    input_dir = Path(args.input_dir) if args.input_dir else Path(settings.data_raw_path)
+    output_dir = Path(args.output_dir) if args.output_dir else Path(settings.data_processed_path)
+    # Check mode
+    index_only = args.index_only
+    should_index = args.index_vectors or args.index_only
+    # Print mode
+    if index_only:
+        console.print("[cyan]Mode:[/cyan] Index only (skip processing)")
+    elif should_index:
+        console.print("[cyan]Mode:[/cyan] Process and build vector index")
+    else:
+        console.print("[cyan]Mode:[/cyan] Process only (no vector indexing)")
+    # Validate input directory (only needed if not index-only)
+    if not index_only and not input_dir.exists():
+        console.print(f"[bold red]Error: Input directory does not exist: {input_dir}[/bold red]")
+        return 1
+    # Validate output directory exists (needed for index-only)
+    if index_only and not output_dir.exists():
+        console.print(f"[bold red]Error: Output directory does not exist: {output_dir}[/bold red]")
+        console.print("[yellow]Please run processing first without --index-only[/yellow]")
+        return 1
+    # Print configuration
+    if not index_only:
+        # Find all markdown files
+        md_files = list(input_dir.glob(args.pattern))
+        if not md_files:
+            console.print(f"[yellow]No files matching pattern '{args.pattern}' found in {input_dir}[/yellow]")
+            return 0
+        console.print(f"[cyan]Input directory:[/cyan] {input_dir}")
+        console.print(f"[cyan]Output directory:[/cyan] {output_dir}")
+        console.print(f"[cyan]Files found:[/cyan] {len(md_files)}")
+        console.print(f"[cyan]Pattern:[/cyan] {args.pattern}")
+        console.print(f"[cyan]Rebuild mode:[/cyan] {'Yes' if args.rebuild else 'No'}")
+    else:
+        console.print(f"[cyan]Processed directory:[/cyan] {output_dir}")
+    # Initialize components (only if processing)
+    results = []
+    if not index_only:
+        chunker = SemanticChunker(
+            chunk_size=args.chunk_size if args.chunk_size is not None else settings.chunk_size,
+            chunk_overlap=args.chunk_overlap if args.chunk_overlap is not None else settings.chunk_overlap,
+            min_chunk_size=args.min_chunk_size if args.min_chunk_size is not None else settings.min_chunk_size,
+        )
+        extractor = MetadataExtractor()
+        console.print(f"[cyan]Chunk size:[/cyan] {chunker.chunk_size} tokens")
+        console.print(f"[cyan]Chunk overlap:[/cyan] {chunker.chunk_overlap} tokens")
+        console.print(f"[cyan]Min chunk size:[/cyan] {chunker.min_chunk_size} tokens")
+        console.print()
+        # Process files with progress bar
+        console.print("[bold cyan]Processing Files...[/bold cyan]\n")
+        with tqdm(
+            total=len(md_files),
+            desc="Processing",
+            unit="file",
+            ncols=100,
+            bar_format="{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}]",
+        ) as pbar:
+            for md_file in md_files:
+                # Update progress bar description
+                pbar.set_description(f"Processing {md_file.name[:30]:30}")
+                # Process file
+                result = process_file(
+                    md_file=md_file,
+                    output_dir=output_dir,
+                    chunker=chunker,
+                    extractor=extractor,
+                    rebuild=args.rebuild,
+                    verbose=args.verbose,
+                )
+                results.append(result)
+                # Update progress bar postfix with running stats
+                successful = sum(1 for r in results if r["status"] == "success")
+                chunks = sum(r["chunks_created"] for r in results)
+                pbar.set_postfix({"success": successful, "chunks": chunks})
+                pbar.update(1)
+        # Print statistics
+        print_statistics(results, console)
+        # Check processing status
+        successful = sum(1 for r in results if r["status"] == "success")
+        errors = sum(1 for r in results if r["status"] == "error")
+        console.print()
+        if errors == 0 and successful > 0:
+            console.print("[bold green]Processing completed successfully![/bold green]")
+            console.print(f"[green]Processed files saved to: {output_dir}[/green]")
+        elif successful > 0:
+            console.print("[bold yellow]Processing completed with some errors.[/bold yellow]")
+            console.print(f"[yellow]Processed files saved to: {output_dir}[/yellow]")
+        else:
+            console.print("[bold red]Processing failed - no files were processed successfully.[/bold red]")
+            if not should_index:
+                return 1
+    # Vector indexing phase
+    if should_index:
+        try:
+            # Initialize embedding client with sentence-transformers
+            console.print("\n[bold cyan]Initializing Sentence Transformers Client...[/bold cyan]")
+            try:
+                embedding_client = SentenceTransformerClient(model_name=args.embedding_model)
+                model_info = embedding_client.get_model_info()
+                console.print(f"[green]✓[/green] Loaded model: {model_info['model_name']}")
+                console.print(f"[green]✓[/green] Device: {model_info['device']}")
+                console.print(f"[green]✓[/green] Embedding dimension: {model_info['embedding_dim']}")
+            except Exception as e:
+                console.print(f"[bold red]Failed to initialize Sentence Transformers: {e}[/bold red]")
+                console.print("[yellow]Install sentence-transformers: pip install sentence-transformers torch[/yellow]")
+                return 1
+            # Initialize Qdrant store
+            console.print("\n[bold cyan]Initializing Qdrant Store...[/bold cyan]")
+            try:
+                qdrant_manager = QdrantStoreManager()
+                qdrant_manager.initialize_collection(recreate=args.recreate_collection)
+            except Exception as e:
+                console.print(f"[bold red]Failed to initialize Qdrant: {e}[/bold red]")
+                return 1
+            # Load processed chunks
+            chunks = load_processed_chunks(output_dir, console)
+            if not chunks:
+                console.print("[yellow]No chunks to index. Please process documents first.[/yellow]")
+                return 0
+            # Build vector index
+            try:
+                index_stats = build_vector_index(
+                    chunks=chunks,
+                    embedding_client=embedding_client,
+                    qdrant_manager=qdrant_manager,
+                    batch_size=args.embedding_batch_size,
+                    console=console,
+                )
+                # Print index statistics
+                print_index_statistics(index_stats, console)
+                console.print("\n[bold green]Vector indexing completed successfully![/bold green]")
+            except Exception as e:
+                console.print(f"\n[bold red]Vector indexing failed: {e}[/bold red]")
+                if args.verbose:
+                    traceback.print_exc()
+                return 1
+        except KeyboardInterrupt:
+            console.print("\n[yellow]Indexing interrupted by user (Ctrl+C)[/yellow]")
+            return 130
+    return 0
+if __name__ == "__main__":
+    try:
+        exit_code = main()
+        sys.exit(exit_code)
+    except KeyboardInterrupt:
+        console = Console()
+        console.print("\n[yellow]Process interrupted by user (Ctrl+C)[/yellow]")
+        sys.exit(130)
+    except Exception as e:
+        console = Console()
+        console.print(f"\n[bold red]Fatal error: {e}[/bold red]")
+        traceback.print_exc()
+        sys.exit(1)

scripts/evaluate.py ADDED Viewed

	@@ -0,0 +1,696 @@

+#!/usr/bin/env python3
+"""
+Evaluation script for EyeWiki RAG system.
+Evaluates the system on a set of test questions and measures:
+- Retrieval recall (relevant sources retrieved)
+- Answer relevance (expected topics covered)
+- Source citation accuracy
+Usage:
+    python scripts/evaluate.py
+    python scripts/evaluate.py --questions tests/custom_questions.json
+    python scripts/evaluate.py --output results/eval_results.json
+"""
+import argparse
+import json
+import sys
+import time
+from pathlib import Path
+from typing import Dict, List, Any
+from rich.console import Console
+from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TimeElapsedColumn
+from rich.table import Table
+from rich.panel import Panel
+# Add project root to path
+project_root = Path(__file__).parent.parent
+sys.path.insert(0, str(project_root))
+from config.settings import Settings
+from src.llm.ollama_client import OllamaClient
+from src.rag.query_engine import EyeWikiQueryEngine
+from src.rag.reranker import CrossEncoderReranker
+from src.rag.retriever import HybridRetriever
+from src.vectorstore.qdrant_store import QdrantStoreManager
+console = Console()
+# ============================================================================
+# Evaluation Metrics
+# ============================================================================
+def calculate_retrieval_recall(
+    retrieved_sources: List[str],
+    expected_sources: List[str],
+) -> float:
+    """
+    Calculate retrieval recall.
+    Recall = (# of expected sources retrieved) / (# of expected sources)
+    Args:
+        retrieved_sources: List of retrieved source titles
+        expected_sources: List of expected source titles
+    Returns:
+        Recall score (0-1)
+    """
+    if not expected_sources:
+        return 1.0
+    # Normalize for case-insensitive matching
+    retrieved_lower = {s.lower() for s in retrieved_sources}
+    expected_lower = {s.lower() for s in expected_sources}
+    # Count matches (allow partial matching)
+    matches = 0
+    for expected in expected_lower:
+        for retrieved in retrieved_lower:
+            # Check if expected source name is in retrieved source or vice versa
+            if expected in retrieved or retrieved in expected:
+                matches += 1
+                break
+    recall = matches / len(expected_sources) if expected_sources else 0.0
+    return recall
+def calculate_answer_relevance(
+    answer: str,
+    expected_topics: List[str],
+) -> float:
+    """
+    Calculate answer relevance based on topic coverage.
+    Relevance = (# of expected topics found) / (# of expected topics)
+    Args:
+        answer: Generated answer text
+        expected_topics: List of expected topic keywords
+    Returns:
+        Relevance score (0-1)
+    """
+    if not expected_topics:
+        return 1.0
+    answer_lower = answer.lower()
+    # Count how many expected topics appear in answer
+    topics_found = sum(1 for topic in expected_topics if topic.lower() in answer_lower)
+    relevance = topics_found / len(expected_topics) if expected_topics else 0.0
+    return relevance
+def calculate_citation_accuracy(
+    answer: str,
+    cited_sources: List[str],
+    expected_sources: List[str],
+) -> Dict[str, float]:
+    """
+    Calculate citation accuracy metrics.
+    Args:
+        answer: Generated answer text
+        cited_sources: Sources returned by system
+        expected_sources: Expected sources
+    Returns:
+        Dictionary with citation metrics
+    """
+    # Check if answer contains explicit citations
+    has_citations = "[Source:" in answer or "According to" in answer
+    # Calculate precision and recall
+    if cited_sources and expected_sources:
+        cited_set = {s.lower() for s in cited_sources}
+        expected_set = {s.lower() for s in expected_sources}
+        # Allow partial matching
+        true_positives = 0
+        for cited in cited_set:
+            for expected in expected_set:
+                if expected in cited or cited in expected:
+                    true_positives += 1
+                    break
+        precision = true_positives / len(cited_sources) if cited_sources else 0.0
+        recall = true_positives / len(expected_sources) if expected_sources else 0.0
+        # F1 score
+        f1 = (
+            2 * (precision * recall) / (precision + recall)
+            if (precision + recall) > 0
+            else 0.0
+        )
+    else:
+        precision = 0.0
+        recall = 0.0
+        f1 = 0.0
+    return {
+        "has_explicit_citations": has_citations,
+        "precision": precision,
+        "recall": recall,
+        "f1": f1,
+    }
+# ============================================================================
+# Question Evaluation
+# ============================================================================
+def evaluate_question(
+    question_data: Dict[str, Any],
+    query_engine: EyeWikiQueryEngine,
+) -> Dict[str, Any]:
+    """
+    Evaluate a single question.
+    Args:
+        question_data: Question data with expected answers
+        query_engine: Query engine instance
+    Returns:
+        Evaluation results
+    """
+    question_id = question_data["id"]
+    question = question_data["question"]
+    expected_topics = question_data["expected_topics"]
+    expected_sources = question_data["expected_sources"]
+    # Query the system
+    start_time = time.time()
+    try:
+        response = query_engine.query(
+            question=question,
+            include_sources=True,
+        )
+        query_time = time.time() - start_time
+        # Extract retrieved sources
+        retrieved_sources = [s.title for s in response.sources]
+        # Calculate metrics
+        retrieval_recall = calculate_retrieval_recall(
+            retrieved_sources, expected_sources
+        )
+        answer_relevance = calculate_answer_relevance(
+            response.answer, expected_topics
+        )
+        citation_metrics = calculate_citation_accuracy(
+            response.answer, retrieved_sources, expected_sources
+        )
+        # Detailed topic analysis
+        topics_found = [
+            topic for topic in expected_topics if topic.lower() in response.answer.lower()
+        ]
+        topics_missing = [
+            topic
+            for topic in expected_topics
+            if topic.lower() not in response.answer.lower()
+        ]
+        # Source analysis
+        sources_retrieved = []
+        sources_missing = []
+        for expected in expected_sources:
+            found = False
+            for retrieved in retrieved_sources:
+                if expected.lower() in retrieved.lower() or retrieved.lower() in expected.lower():
+                    sources_retrieved.append(expected)
+                    found = True
+                    break
+            if not found:
+                sources_missing.append(expected)
+        result = {
+            "id": question_id,
+            "question": question,
+            "category": question_data.get("category", "unknown"),
+            "answer": response.answer,
+            "confidence": response.confidence,
+            "query_time": query_time,
+            "metrics": {
+                "retrieval_recall": retrieval_recall,
+                "answer_relevance": answer_relevance,
+                "citation_precision": citation_metrics["precision"],
+                "citation_recall": citation_metrics["recall"],
+                "citation_f1": citation_metrics["f1"],
+            },
+            "details": {
+                "retrieved_sources": retrieved_sources,
+                "expected_sources": expected_sources,
+                "sources_retrieved": sources_retrieved,
+                "sources_missing": sources_missing,
+                "topics_found": topics_found,
+                "topics_missing": topics_missing,
+                "has_explicit_citations": citation_metrics["has_explicit_citations"],
+            },
+            "success": True,
+        }
+    except Exception as e:
+        result = {
+            "id": question_id,
+            "question": question,
+            "category": question_data.get("category", "unknown"),
+            "error": str(e),
+            "query_time": time.time() - start_time,
+            "success": False,
+        }
+    return result
+# ============================================================================
+# Aggregate Analysis
+# ============================================================================
+def calculate_aggregate_metrics(results: List[Dict[str, Any]]) -> Dict[str, Any]:
+    """
+    Calculate aggregate metrics across all questions.
+    Args:
+        results: List of evaluation results
+    Returns:
+        Aggregate metrics
+    """
+    successful_results = [r for r in results if r["success"]]
+    if not successful_results:
+        return {"error": "No successful evaluations"}
+    # Average metrics
+    avg_retrieval_recall = sum(
+        r["metrics"]["retrieval_recall"] for r in successful_results
+    ) / len(successful_results)
+    avg_answer_relevance = sum(
+        r["metrics"]["answer_relevance"] for r in successful_results
+    ) / len(successful_results)
+    avg_citation_precision = sum(
+        r["metrics"]["citation_precision"] for r in successful_results
+    ) / len(successful_results)
+    avg_citation_recall = sum(
+        r["metrics"]["citation_recall"] for r in successful_results
+    ) / len(successful_results)
+    avg_citation_f1 = sum(
+        r["metrics"]["citation_f1"] for r in successful_results
+    ) / len(successful_results)
+    avg_confidence = sum(r["confidence"] for r in successful_results) / len(
+        successful_results
+    )
+    avg_query_time = sum(r["query_time"] for r in successful_results) / len(
+        successful_results
+    )
+    # Citation statistics
+    citations_present = sum(
+        1 for r in successful_results if r["details"]["has_explicit_citations"]
+    )
+    # Category breakdown
+    categories = {}
+    for result in successful_results:
+        category = result["category"]
+        if category not in categories:
+            categories[category] = {
+                "count": 0,
+                "retrieval_recall": 0,
+                "answer_relevance": 0,
+            }
+        categories[category]["count"] += 1
+        categories[category]["retrieval_recall"] += result["metrics"]["retrieval_recall"]
+        categories[category]["answer_relevance"] += result["metrics"]["answer_relevance"]
+    # Average by category
+    for category, data in categories.items():
+        count = data["count"]
+        data["retrieval_recall"] /= count
+        data["answer_relevance"] /= count
+    return {
+        "total_questions": len(results),
+        "successful": len(successful_results),
+        "failed": len(results) - len(successful_results),
+        "metrics": {
+            "retrieval_recall": avg_retrieval_recall,
+            "answer_relevance": avg_answer_relevance,
+            "citation_precision": avg_citation_precision,
+            "citation_recall": avg_citation_recall,
+            "citation_f1": avg_citation_f1,
+            "avg_confidence": avg_confidence,
+            "avg_query_time": avg_query_time,
+            "citation_rate": citations_present / len(successful_results),
+        },
+        "by_category": categories,
+    }
+# ============================================================================
+# Output Functions
+# ============================================================================
+def print_question_result(result: Dict[str, Any]):
+    """Print result for a single question."""
+    if not result["success"]:
+        console.print(
+            f"\n[red]✗ {result['id']}: {result['question']}[/red]",
+            f"[red]Error: {result['error']}[/red]",
+        )
+        return
+    # Create metrics table
+    table = Table(show_header=False, box=None, padding=(0, 1))
+    table.add_column(style="cyan")
+    table.add_column(style="yellow")
+    metrics = result["metrics"]
+    table.add_row("Retrieval Recall", f"{metrics['retrieval_recall']:.2%}")
+    table.add_row("Answer Relevance", f"{metrics['answer_relevance']:.2%}")
+    table.add_row("Citation F1", f"{metrics['citation_f1']:.2%}")
+    table.add_row("Confidence", f"{result['confidence']:.2%}")
+    table.add_row("Query Time", f"{result['query_time']:.2f}s")
+    # Determine overall status
+    avg_score = (metrics["retrieval_recall"] + metrics["answer_relevance"]) / 2
+    if avg_score >= 0.8:
+        status = "[green]✓ PASS[/green]"
+    elif avg_score >= 0.6:
+        status = "[yellow]~ PARTIAL[/yellow]"
+    else:
+        status = "[red]✗ FAIL[/red]"
+    console.print(f"\n{status} [bold]{result['id']}:[/bold] {result['question']}")
+    console.print(table)
+    # Print missing items
+    details = result["details"]
+    if details["topics_missing"]:
+        console.print(
+            f"  [dim]Missing topics: {', '.join(details['topics_missing'])}[/dim]"
+        )
+    if details["sources_missing"]:
+        console.print(
+            f"  [dim]Missing sources: {', '.join(details['sources_missing'])}[/dim]"
+        )
+def print_aggregate_results(aggregate: Dict[str, Any]):
+    """Print aggregate results."""
+    console.print("\n")
+    console.print(
+        Panel.fit(
+            "[bold cyan]Evaluation Summary[/bold cyan]",
+            border_style="cyan",
+        )
+    )
+    # Overall metrics table
+    table = Table(show_header=True, header_style="bold magenta")
+    table.add_column("Metric", style="cyan")
+    table.add_column("Score", style="yellow", justify="right")
+    table.add_column("Grade", style="green", justify="center")
+    metrics = aggregate["metrics"]
+    def get_grade(score: float) -> str:
+        if score >= 0.9:
+            return "[green]A[/green]"
+        elif score >= 0.8:
+            return "[green]B[/green]"
+        elif score >= 0.7:
+            return "[yellow]C[/yellow]"
+        elif score >= 0.6:
+            return "[yellow]D[/yellow]"
+        else:
+            return "[red]F[/red]"
+    table.add_row(
+        "Retrieval Recall",
+        f"{metrics['retrieval_recall']:.2%}",
+        get_grade(metrics["retrieval_recall"]),
+    )
+    table.add_row(
+        "Answer Relevance",
+        f"{metrics['answer_relevance']:.2%}",
+        get_grade(metrics["answer_relevance"]),
+    )
+    table.add_row(
+        "Citation Precision",
+        f"{metrics['citation_precision']:.2%}",
+        get_grade(metrics["citation_precision"]),
+    )
+    table.add_row(
+        "Citation Recall",
+        f"{metrics['citation_recall']:.2%}",
+        get_grade(metrics["citation_recall"]),
+    )
+    table.add_row(
+        "Citation F1",
+        f"{metrics['citation_f1']:.2%}",
+        get_grade(metrics["citation_f1"]),
+    )
+    console.print(table)
+    # Statistics
+    console.print(f"\n[bold]Statistics:[/bold]")
+    console.print(
+        f"  Total Questions: {aggregate['total_questions']}",
+        f"  Successful: [green]{aggregate['successful']}[/green]",
+        f"  Failed: [red]{aggregate['failed']}[/red]",
+        f"  Avg Confidence: {metrics['avg_confidence']:.2%}",
+        f"  Avg Query Time: {metrics['avg_query_time']:.2f}s",
+        f"  Citation Rate: {metrics['citation_rate']:.2%}",
+    )
+    # Category breakdown
+    if aggregate["by_category"]:
+        console.print(f"\n[bold]Performance by Category:[/bold]")
+        cat_table = Table(show_header=True, header_style="bold magenta")
+        cat_table.add_column("Category", style="cyan")
+        cat_table.add_column("Count", justify="right")
+        cat_table.add_column("Retrieval", justify="right")
+        cat_table.add_column("Relevance", justify="right")
+        for category, data in sorted(aggregate["by_category"].items()):
+            cat_table.add_row(
+                category,
+                str(data["count"]),
+                f"{data['retrieval_recall']:.2%}",
+                f"{data['answer_relevance']:.2%}",
+            )
+        console.print(cat_table)
+# ============================================================================
+# Main Evaluation
+# ============================================================================
+def load_test_questions(questions_file: Path) -> List[Dict[str, Any]]:
+    """Load test questions from JSON file."""
+    if not questions_file.exists():
+        console.print(f"[red]Error: Questions file not found: {questions_file}[/red]")
+        sys.exit(1)
+    with open(questions_file, "r") as f:
+        questions = json.load(f)
+    console.print(f"[green]✓[/green] Loaded {len(questions)} test questions")
+    return questions
+def initialize_system() -> EyeWikiQueryEngine:
+    """Initialize the RAG system."""
+    console.print("[bold]Initializing RAG system...[/bold]")
+    # Load settings
+    settings = Settings()
+    # Initialize components
+    ollama_client = OllamaClient(
+        base_url=settings.ollama_base_url,
+        llm_model=settings.llm_model,
+        embedding_model=settings.embedding_model,
+    )
+    qdrant_manager = QdrantStoreManager(
+        collection_name=settings.qdrant_collection_name,
+        qdrant_path=settings.qdrant_path,
+        vector_size=settings.embedding_dim,
+    )
+    retriever = HybridRetriever(
+        qdrant_manager=qdrant_manager,
+        ollama_client=ollama_client,
+    )
+    reranker = CrossEncoderReranker(
+        model_name=settings.reranker_model,
+    )
+    # Load prompts
+    prompts_dir = project_root / "prompts"
+    system_prompt_path = prompts_dir / "system_prompt.txt"
+    query_prompt_path = prompts_dir / "query_prompt.txt"
+    disclaimer_path = prompts_dir / "medical_disclaimer.txt"
+    query_engine = EyeWikiQueryEngine(
+        retriever=retriever,
+        reranker=reranker,
+        llm_client=ollama_client,
+        system_prompt_path=system_prompt_path if system_prompt_path.exists() else None,
+        query_prompt_path=query_prompt_path if query_prompt_path.exists() else None,
+        disclaimer_path=disclaimer_path if disclaimer_path.exists() else None,
+        max_context_tokens=settings.max_context_tokens,
+        retrieval_k=20,
+        rerank_k=5,
+    )
+    console.print("[green]✓[/green] System initialized\n")
+    return query_engine
+def run_evaluation(
+    questions_file: Path,
+    output_file: Path = None,
+    verbose: bool = False,
+):
+    """
+    Run evaluation on test questions.
+    Args:
+        questions_file: Path to test questions JSON
+        output_file: Optional path to save results
+        verbose: Print detailed results
+    """
+    console.print(
+        Panel.fit(
+            "[bold blue]EyeWiki RAG Evaluation[/bold blue]",
+            border_style="blue",
+        )
+    )
+    # Load questions
+    questions = load_test_questions(questions_file)
+    # Initialize system
+    query_engine = initialize_system()
+    # Evaluate questions
+    results = []
+    console.print("[bold]Evaluating questions...[/bold]\n")
+    with Progress(
+        SpinnerColumn(),
+        TextColumn("[progress.description]{task.description}"),
+        BarColumn(),
+        TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
+        TimeElapsedColumn(),
+        console=console,
+    ) as progress:
+        task = progress.add_task("Processing...", total=len(questions))
+        for question_data in questions:
+            result = evaluate_question(question_data, query_engine)
+            results.append(result)
+            if verbose:
+                print_question_result(result)
+            progress.update(task, advance=1)
+    # Calculate aggregate metrics
+    aggregate = calculate_aggregate_metrics(results)
+    # Print results
+    if not verbose:
+        console.print("\n[bold]Per-Question Results:[/bold]")
+        for result in results:
+            print_question_result(result)
+    print_aggregate_results(aggregate)
+    # Save results
+    if output_file:
+        output_data = {
+            "results": results,
+            "aggregate": aggregate,
+            "timestamp": time.time(),
+        }
+        output_file.parent.mkdir(parents=True, exist_ok=True)
+        with open(output_file, "w") as f:
+            json.dump(output_data, f, indent=2)
+        console.print(f"\n[green]✓[/green] Results saved to {output_file}")
+def main():
+    """Main entry point."""
+    parser = argparse.ArgumentParser(
+        description="Evaluate EyeWiki RAG system on test questions"
+    )
+    parser.add_argument(
+        "--questions",
+        type=Path,
+        default=project_root / "tests" / "test_questions.json",
+        help="Path to test questions JSON file",
+    )
+    parser.add_argument(
+        "--output",
+        type=Path,
+        default=None,
+        help="Path to save evaluation results (JSON)",
+    )
+    parser.add_argument(
+        "-v",
+        "--verbose",
+        action="store_true",
+        help="Print detailed results for each question",
+    )
+    args = parser.parse_args()
+    try:
+        run_evaluation(
+            questions_file=args.questions,
+            output_file=args.output,
+            verbose=args.verbose,
+        )
+    except KeyboardInterrupt:
+        console.print("\n[yellow]Evaluation interrupted by user[/yellow]")
+        sys.exit(1)
+    except Exception as e:
+        console.print(f"\n[red]Error: {e}[/red]")
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

scripts/run_server.py ADDED Viewed

	@@ -0,0 +1,479 @@

+#!/usr/bin/env python3
+"""
+Server startup script with pre-flight checks.
+Usage:
+    python scripts/run_server.py
+    python scripts/run_server.py --port 8080 --reload
+    python scripts/run_server.py --host 0.0.0.0 --port 8000
+"""
+import argparse
+import sys
+import time
+from pathlib import Path
+import requests
+from rich.console import Console
+from rich.panel import Panel
+from rich.table import Table
+# Add project root to path
+project_root = Path(__file__).parent.parent
+sys.path.insert(0, str(project_root))
+from config.settings import LLMProvider, Settings
+console = Console()
+def parse_args():
+    """Parse command line arguments."""
+    parser = argparse.ArgumentParser(
+        description="Start EyeWiki RAG API server with pre-flight checks"
+    )
+    parser.add_argument(
+        "--host",
+        type=str,
+        default="0.0.0.0",
+        help="Host to bind (default: 0.0.0.0)",
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=8000,
+        help="Port number (default: 8000)",
+    )
+    parser.add_argument(
+        "--reload",
+        action="store_true",
+        help="Enable hot reload for development",
+    )
+    parser.add_argument(
+        "--skip-checks",
+        action="store_true",
+        help="Skip pre-flight checks (not recommended)",
+    )
+    return parser.parse_args()
+def print_header():
+    """Print welcome header."""
+    console.print()
+    console.print(
+        Panel.fit(
+            "[bold blue]EyeWiki RAG API Server[/bold blue]\n"
+            "[dim]Retrieval-Augmented Generation for Medical Knowledge[/dim]",
+            border_style="blue",
+        )
+    )
+    console.print()
+def check_ollama(settings: Settings) -> bool:
+    """
+    Check if Ollama is running and has required models.
+    Args:
+        settings: Application settings
+    Returns:
+        True if check passed, False otherwise
+    """
+    console.print("[bold cyan]1. Checking Ollama service...[/bold cyan]")
+    try:
+        # Check if Ollama is running
+        response = requests.get(f"{settings.ollama_base_url}/api/tags", timeout=5)
+        response.raise_for_status()
+        models_data = response.json()
+        available_models = [model["name"] for model in models_data.get("models", [])]
+        # Check for required LLM model (embedding model is sentence-transformers, not Ollama)
+        required_models = {
+            "LLM": settings.llm_model,
+        }
+        table = Table(show_header=True, header_style="bold magenta")
+        table.add_column("Model Type", style="cyan")
+        table.add_column("Required Model", style="yellow")
+        table.add_column("Status", style="green")
+        all_found = True
+        for model_type, model_name in required_models.items():
+            # Check if model name (with or without tag) is in available models
+            found = any(
+                model_name in model or model in model_name for model in available_models
+            )
+            status = "[green]✓ Found[/green]" if found else "[red]✗ Missing[/red]"
+            table.add_row(model_type, model_name, status)
+            if not found:
+                all_found = False
+        console.print(table)
+        if not all_found:
+            console.print(
+                "\n[red]Error:[/red] Some required models are missing. "
+                "Pull them with:"
+            )
+            for model_type, model_name in required_models.items():
+                if not any(
+                    model_name in model or model in model_name
+                    for model in available_models
+                ):
+                    console.print(f"  [yellow]ollama pull {model_name}[/yellow]")
+            console.print()
+            return False
+        console.print("[green]✓ Ollama is running with all required models[/green]\n")
+        return True
+    except requests.RequestException as e:
+        console.print(f"[red]✗ Failed to connect to Ollama:[/red] {e}")
+        console.print(
+            f"\nMake sure Ollama is running at [yellow]{settings.ollama_base_url}[/yellow]"
+        )
+        console.print("Start it with: [yellow]ollama serve[/yellow]\n")
+        return False
+def check_openai_config(settings: Settings) -> bool:
+    """
+    Check if OpenAI-compatible API is configured with required API key.
+    Args:
+        settings: Application settings
+    Returns:
+        True if check passed, False otherwise
+    """
+    console.print("[bold cyan]1. Checking OpenAI-compatible API configuration...[/bold cyan]")
+    table = Table(show_header=True, header_style="bold magenta")
+    table.add_column("Property", style="cyan")
+    table.add_column("Value", style="yellow")
+    table.add_column("Status", style="green")
+    # Check API key
+    has_key = bool(settings.openai_api_key)
+    key_display = f"{settings.openai_api_key[:8]}..." if has_key else "(not set)"
+    key_status = "[green]✓ Set[/green]" if has_key else "[red]✗ Missing[/red]"
+    table.add_row("API Key", key_display, key_status)
+    # Show base URL
+    base_url = settings.openai_base_url or "(OpenAI default)"
+    table.add_row("Base URL", base_url, "[green]✓[/green]")
+    # Show model
+    table.add_row("Model", settings.openai_model, "[green]✓[/green]")
+    console.print(table)
+    if not has_key:
+        console.print(
+            "\n[red]Error:[/red] API key is required for OpenAI-compatible provider."
+        )
+        console.print(
+            "Set the [yellow]OPENAI_API_KEY[/yellow] environment variable or add it to your [yellow].env[/yellow] file.\n"
+        )
+        return False
+    console.print("[green]✓ OpenAI-compatible API configuration looks good[/green]\n")
+    return True
+def check_vector_store(settings: Settings) -> bool:
+    """
+    Check if vector store exists and has documents.
+    Args:
+        settings: Application settings
+    Returns:
+        True if check passed, False otherwise
+    """
+    console.print("[bold cyan]2. Checking vector store...[/bold cyan]")
+    qdrant_path = Path(settings.qdrant_path)
+    collection_name = settings.qdrant_collection_name
+    # Check if Qdrant directory exists
+    if not qdrant_path.exists():
+        console.print(f"[red]✗ Qdrant directory not found:[/red] {qdrant_path}")
+        console.print(
+            "\nRun the indexing pipeline first:\n"
+            "  [yellow]python scripts/build_index.py --index-vectors[/yellow]\n"
+        )
+        return False
+    # Try to connect to Qdrant and check collection
+    try:
+        from qdrant_client import QdrantClient
+        client = QdrantClient(path=str(qdrant_path))
+        # Check if collection exists
+        collections = client.get_collections().collections
+        collection_names = [col.name for col in collections]
+        if collection_name not in collection_names:
+            console.print(
+                f"[red]✗ Collection '{collection_name}' not found[/red]\n"
+                f"Available collections: {collection_names}"
+            )
+            console.print(
+                "\nRun the indexing pipeline first:\n"
+                "  [yellow]python scripts/build_index.py --index-vectors[/yellow]\n"
+            )
+            return False
+        # Get collection info
+        collection_info = client.get_collection(collection_name)
+        points_count = collection_info.points_count
+        if points_count == 0:
+            console.print(
+                f"[yellow]⚠ Collection '{collection_name}' exists but is empty[/yellow]"
+            )
+            console.print(
+                "\nRun the indexing pipeline:\n"
+                "  [yellow]python scripts/build_index.py --index-vectors[/yellow]\n"
+            )
+            return False
+        # Print stats
+        table = Table(show_header=True, header_style="bold magenta")
+        table.add_column("Property", style="cyan")
+        table.add_column("Value", style="yellow")
+        table.add_row("Collection", collection_name)
+        table.add_row("Location", str(qdrant_path))
+        table.add_row("Documents", f"{points_count:,}")
+        console.print(table)
+        console.print("[green]✓ Vector store is ready[/green]\n")
+        return True
+    except Exception as e:
+        console.print(f"[red]✗ Failed to access vector store:[/red] {e}\n")
+        return False
+def check_required_files() -> bool:
+    """
+    Check if all required files exist.
+    Returns:
+        True if all files exist, False otherwise
+    """
+    console.print("[bold cyan]3. Checking required files...[/bold cyan]")
+    required_files = {
+        "System Prompt": project_root / "prompts" / "system_prompt.txt",
+        "Query Prompt": project_root / "prompts" / "query_prompt.txt",
+        "Medical Disclaimer": project_root / "prompts" / "medical_disclaimer.txt",
+    }
+    table = Table(show_header=True, header_style="bold magenta")
+    table.add_column("File", style="cyan")
+    table.add_column("Path", style="yellow")
+    table.add_column("Status", style="green")
+    all_exist = True
+    for name, path in required_files.items():
+        exists = path.exists()
+        status = "[green]✓ Found[/green]" if exists else "[red]✗ Missing[/red]"
+        table.add_row(name, str(path.relative_to(project_root)), status)
+        if not exists:
+            all_exist = False
+    console.print(table)
+    if not all_exist:
+        console.print(
+            "\n[red]Error:[/red] Some required files are missing.\n"
+            "Make sure all prompt files are in the [yellow]prompts/[/yellow] directory.\n"
+        )
+        return False
+    console.print("[green]✓ All required files found[/green]\n")
+    return True
+def run_preflight_checks(skip_checks: bool = False) -> bool:
+    """
+    Run all pre-flight checks.
+    Args:
+        skip_checks: Skip all checks if True
+    Returns:
+        True if all checks passed, False otherwise
+    """
+    if skip_checks:
+        console.print("[yellow]⚠ Skipping pre-flight checks[/yellow]\n")
+        return True
+    console.print("[bold yellow]Running Pre-flight Checks...[/bold yellow]\n")
+    # Load settings
+    try:
+        settings = Settings()
+    except Exception as e:
+        console.print(f"[red]✗ Failed to load settings:[/red] {e}\n")
+        return False
+    console.print(f"[dim]LLM Provider: {settings.llm_provider.value}[/dim]\n")
+    # Check LLM provider (Ollama or OpenAI-compatible)
+    if settings.llm_provider == LLMProvider.OLLAMA:
+        llm_check = check_ollama(settings)
+    else:
+        llm_check = check_openai_config(settings)
+    # Run checks
+    checks = [
+        llm_check,
+        check_vector_store(settings),
+        check_required_files(),
+    ]
+    if not all(checks):
+        console.print("[bold red]✗ Pre-flight checks failed[/bold red]")
+        console.print("Fix the issues above and try again.\n")
+        return False
+    console.print("[bold green]✓ All pre-flight checks passed![/bold green]\n")
+    return True
+def print_access_urls(host: str, port: int):
+    """
+    Print access URLs for the server.
+    Args:
+        host: Server host
+        port: Server port
+    """
+    # Determine display host
+    display_host = "localhost" if host in ["0.0.0.0", "127.0.0.1"] else host
+    table = Table(
+        show_header=True,
+        header_style="bold magenta",
+        title="[bold green]Server Access URLs[/bold green]",
+        title_style="bold green",
+    )
+    table.add_column("Service", style="cyan", width=20)
+    table.add_column("URL", style="yellow")
+    table.add_column("Description", style="dim")
+    urls = [
+        ("API Root", f"http://{display_host}:{port}", "API information"),
+        ("Health Check", f"http://{display_host}:{port}/health", "Service health status"),
+        (
+            "Interactive Docs",
+            f"http://{display_host}:{port}/docs",
+            "Swagger UI documentation",
+        ),
+        ("ReDoc", f"http://{display_host}:{port}/redoc", "Alternative API docs"),
+        (
+            "Gradio UI",
+            f"http://{display_host}:{port}/ui",
+            "Web chat interface",
+        ),
+    ]
+    for service, url, description in urls:
+        table.add_row(service, url, description)
+    console.print()
+    console.print(table)
+    console.print()
+    # Print quick start commands
+    console.print("[bold cyan]Quick Test Commands:[/bold cyan]")
+    console.print(
+        f"  [dim]# Test health endpoint[/dim]\n"
+        f"  [yellow]curl http://{display_host}:{port}/health[/yellow]\n"
+    )
+    console.print(
+        f"  [dim]# Query the API[/dim]\n"
+        f"  [yellow]curl -X POST http://{display_host}:{port}/query \\[/yellow]\n"
+        f'  [yellow]  -H "Content-Type: application/json" \\[/yellow]\n'
+        f'  [yellow]  -d \'{{"question": "What is glaucoma?"}}\' [/yellow]\n'
+    )
+def start_server(host: str, port: int, reload: bool):
+    """
+    Start the uvicorn server.
+    Args:
+        host: Server host
+        port: Server port
+        reload: Enable hot reload
+    """
+    console.print("[bold green]Starting server...[/bold green]\n")
+    # Print URLs before starting
+    print_access_urls(host, port)
+    # Import uvicorn here to avoid import errors if not installed
+    try:
+        import uvicorn
+    except ImportError:
+        console.print("[red]Error:[/red] uvicorn is not installed")
+        console.print("Install it with: [yellow]pip install uvicorn[/yellow]\n")
+        sys.exit(1)
+    # Start server
+    try:
+        console.print(
+            f"[dim]Server listening on {host}:{port}[/dim]",
+            f"[dim](Press CTRL+C to stop)[/dim]\n",
+        )
+        uvicorn.run(
+            "src.api.main:app",
+            host=host,
+            port=port,
+            reload=reload,
+            log_level="info",
+        )
+    except KeyboardInterrupt:
+        console.print("\n\n[yellow]Server stopped by user[/yellow]")
+    except Exception as e:
+        console.print(f"\n[red]Error starting server:[/red] {e}")
+        sys.exit(1)
+def main():
+    """Main entry point."""
+    args = parse_args()
+    print_header()
+    # Run pre-flight checks
+    if not run_preflight_checks(skip_checks=args.skip_checks):
+        console.print("[red]Startup aborted due to failed checks[/red]\n")
+        sys.exit(1)
+    # Start server
+    start_server(host=args.host, port=args.port, reload=args.reload)
+if __name__ == "__main__":
+    main()

scripts/scrape_eyewiki.py ADDED Viewed

	@@ -0,0 +1,278 @@

+#!/usr/bin/env python3
+"""CLI script to run the EyeWiki crawler."""
+import argparse
+import asyncio
+import sys
+import time
+from pathlib import Path
+# Add parent directory to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from src.scraper.eyewiki_crawler import EyeWikiCrawler
+from config.settings import settings
+from rich.console import Console
+from rich.panel import Panel
+from rich.table import Table
+def parse_args():
+    """Parse command line arguments."""
+    parser = argparse.ArgumentParser(
+        description="Crawl EyeWiki medical articles",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Crawl up to 100 pages with default settings
+  python scripts/scrape_eyewiki.py --max-pages 100
+  # Resume previous crawl
+  python scripts/scrape_eyewiki.py --resume
+  # Crawl with depth 3 to custom directory
+  python scripts/scrape_eyewiki.py --depth 3 --output-dir ./my_data
+  # Full crawl (no page limit)
+  python scripts/scrape_eyewiki.py
+        """,
+    )
+    parser.add_argument(
+        "--max-pages",
+        type=int,
+        default=None,
+        help="Maximum number of pages to crawl (default: unlimited)",
+    )
+    parser.add_argument(
+        "--depth",
+        type=int,
+        default=2,
+        help="Maximum crawl depth (default: 2)",
+    )
+    parser.add_argument(
+        "--output-dir",
+        type=str,
+        default=None,
+        help=f"Output directory for scraped articles (default: {settings.data_raw_path})",
+    )
+    parser.add_argument(
+        "--resume",
+        action="store_true",
+        help="Resume from previous checkpoint if available",
+    )
+    parser.add_argument(
+        "--delay",
+        type=float,
+        default=None,
+        help=f"Delay between requests in seconds (default: {settings.scraper_delay})",
+    )
+    parser.add_argument(
+        "--timeout",
+        type=int,
+        default=None,
+        help=f"Request timeout in seconds (default: {settings.scraper_timeout})",
+    )
+    parser.add_argument(
+        "--start-urls",
+        type=str,
+        nargs="+",
+        default=None,
+        help="Starting URLs for crawl (default: EyeWiki main page and disease category)",
+    )
+    parser.add_argument(
+        "--checkpoint-file",
+        type=str,
+        default=None,
+        help="Custom checkpoint file path (default: output_dir/crawler_checkpoint.json)",
+    )
+    return parser.parse_args()
+def print_banner(console: Console):
+    """Print welcome banner."""
+    banner = """
+[bold cyan]EyeWiki Medical Article Crawler[/bold cyan]
+[dim]Powered by crawl4ai[/dim]
+    """
+    console.print(Panel(banner, border_style="cyan"))
+def print_configuration(console: Console, args, crawler: EyeWikiCrawler):
+    """Print crawler configuration."""
+    table = Table(title="Crawler Configuration", show_header=False, border_style="blue")
+    table.add_column("Setting", style="cyan")
+    table.add_column("Value", style="white")
+    table.add_row("Output Directory", str(crawler.output_dir))
+    table.add_row("Max Pages", str(args.max_pages) if args.max_pages else "Unlimited")
+    table.add_row("Depth", str(args.depth))
+    table.add_row("Delay", f"{crawler.delay}s")
+    table.add_row("Timeout", f"{crawler.timeout}s")
+    table.add_row("Checkpoint File", str(crawler.checkpoint_file))
+    table.add_row("Resume Mode", "Yes" if args.resume else "No")
+    console.print(table)
+    console.print()
+def print_summary(console: Console, crawler: EyeWikiCrawler, elapsed_time: float):
+    """Print crawl summary statistics."""
+    console.print("\n")
+    # Create summary table
+    table = Table(title="Crawl Summary", border_style="green", show_header=True)
+    table.add_column("Metric", style="cyan", justify="left")
+    table.add_column("Value", style="white", justify="right")
+    # Calculate stats
+    pages_per_minute = (crawler.articles_saved / elapsed_time * 60) if elapsed_time > 0 else 0
+    success_rate = (
+        crawler.articles_saved / len(crawler.visited_urls) * 100
+        if crawler.visited_urls
+        else 0
+    )
+    # Add rows
+    table.add_row("Articles Saved", f"{crawler.articles_saved:,}")
+    table.add_row("URLs Visited", f"{len(crawler.visited_urls):,}")
+    table.add_row("URLs Failed", f"{len(crawler.failed_urls):,}")
+    table.add_row("URLs Remaining", f"{len(crawler.to_crawl):,}")
+    table.add_row("Success Rate", f"{success_rate:.1f}%")
+    table.add_row("Time Elapsed", f"{elapsed_time:.1f}s")
+    table.add_row("Pages/Minute", f"{pages_per_minute:.1f}")
+    console.print(table)
+    # Show failed URLs if any
+    if crawler.failed_urls:
+        console.print("\n[yellow]Failed URLs:[/yellow]")
+        for i, (url, error) in enumerate(list(crawler.failed_urls.items())[:5], 1):
+            console.print(f"  {i}. [red]{url}[/red]")
+            console.print(f"     [dim]{error}[/dim]")
+        if len(crawler.failed_urls) > 5:
+            console.print(f"  [dim]... and {len(crawler.failed_urls) - 5} more[/dim]")
+    # Final status
+    console.print()
+    if crawler.articles_saved > 0:
+        console.print("[bold green]Crawl completed successfully![/bold green]")
+        console.print(f"[green]Articles saved to: {crawler.output_dir}[/green]")
+    else:
+        console.print("[bold yellow]No articles were saved.[/bold yellow]")
+        console.print("[yellow]Check the logs above for errors.[/yellow]")
+async def main():
+    """Main entry point for the crawler script."""
+    # Parse arguments
+    args = parse_args()
+    # Initialize console
+    console = Console()
+    # Print banner
+    print_banner(console)
+    # Prepare output directory
+    output_dir = Path(args.output_dir) if args.output_dir else Path(settings.data_raw_path)
+    output_dir.mkdir(parents=True, exist_ok=True)
+    # Prepare checkpoint file
+    checkpoint_file = None
+    if args.checkpoint_file:
+        checkpoint_file = Path(args.checkpoint_file)
+    # If not resuming and checkpoint exists, ask user
+    if not args.resume and checkpoint_file and checkpoint_file.exists():
+        console.print("[yellow]Warning: Checkpoint file exists![/yellow]")
+        console.print(f"[yellow]File: {checkpoint_file}[/yellow]")
+        console.print("[yellow]Use --resume to continue from checkpoint, or it will be overwritten.[/yellow]")
+        console.print()
+    # Initialize crawler
+    try:
+        crawler = EyeWikiCrawler(
+            base_url="https://eyewiki.org",
+            output_dir=output_dir,
+            checkpoint_file=checkpoint_file,
+            delay=args.delay if args.delay is not None else settings.scraper_delay,
+            timeout=args.timeout if args.timeout is not None else settings.scraper_timeout,
+        )
+    except Exception as e:
+        console.print(f"[bold red]Error initializing crawler: {e}[/bold red]")
+        return 1
+    # Print configuration
+    print_configuration(console, args, crawler)
+    # Prepare start URLs
+    start_urls = args.start_urls
+    if not start_urls and not args.resume:
+        # Start with popular medical articles that link to many other articles
+        start_urls = [
+            "https://eyewiki.org/Category:Articles"
+        ]
+        console.print("[blue]Using default start URLs (seed articles):[/blue]")
+        for url in start_urls:
+            console.print(f"  - {url}")
+        console.print()
+    # Start crawling
+    start_time = time.time()
+    try:
+        await crawler.crawl(
+            max_pages=args.max_pages,
+            depth=args.depth,
+            start_urls=start_urls,
+        )
+        elapsed_time = time.time() - start_time
+        # Print summary
+        print_summary(console, crawler, elapsed_time)
+        return 0
+    except KeyboardInterrupt:
+        elapsed_time = time.time() - start_time
+        console.print("\n[yellow]Crawl interrupted by user (Ctrl+C)[/yellow]")
+        console.print("[yellow]Saving checkpoint...[/yellow]")
+        # Crawler already saves checkpoint in its exception handler
+        # Just print summary
+        print_summary(console, crawler, elapsed_time)
+        console.print("\n[blue]You can resume with:[/blue]")
+        console.print(f"[blue]  python scripts/scrape_eyewiki.py --resume[/blue]")
+        return 130  # Standard exit code for SIGINT
+    except Exception as e:
+        elapsed_time = time.time() - start_time
+        console.print(f"\n[bold red]Unexpected error: {e}[/bold red]")
+        # Print summary of what was accomplished
+        print_summary(console, crawler, elapsed_time)
+        return 1
+if __name__ == "__main__":
+    try:
+        exit_code = asyncio.run(main())
+        sys.exit(exit_code)
+    except Exception as e:
+        console = Console()
+        console.print(f"[bold red]Fatal error: {e}[/bold red]")
+        sys.exit(1)

src/__init__.py ADDED Viewed

File without changes

src/api/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""API module for EyeWiki RAG system."""
+from src.api.main import app
+__all__ = ["app"]

src/api/gradio_ui.py ADDED Viewed

	@@ -0,0 +1,548 @@

+"""Gradio UI for EyeWiki RAG system."""
+import logging
+from typing import List, Dict
+import gradio as gr
+from src.rag.query_engine import QueryResponse
+logger = logging.getLogger(__name__)
+# ============================================================================
+# Example Questions
+# ============================================================================
+EXAMPLE_QUESTIONS = [
+    "What are the symptoms of glaucoma?",
+    "How is diabetic retinopathy treated?",
+    "What causes macular degeneration?",
+    "What is the difference between open-angle and angle-closure glaucoma?",
+    "What are the risk factors for cataracts?",
+    "How is retinal detachment diagnosed?",
+]
+# ============================================================================
+# Styling
+# ============================================================================
+CUSTOM_CSS = """
+/* Main container */
+.gradio-container {
+    font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
+    max-width: 1400px;
+    margin: 0 auto;
+}
+/* Header */
+.header {
+    background: linear-gradient(135deg, #1e3a8a 0%, #3b82f6 100%);
+    color: white;
+    padding: 2rem;
+    border-radius: 12px;
+    margin-bottom: 2rem;
+    box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
+}
+.header h1 {
+    margin: 0 0 0.5rem 0;
+    font-size: 2rem;
+    font-weight: 700;
+}
+.header p {
+    margin: 0;
+    font-size: 1rem;
+    opacity: 0.95;
+}
+/* Chat interface */
+.chatbot {
+    border: 1px solid #e5e7eb;
+    border-radius: 8px;
+    box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05);
+}
+/* Text input */
+.input-text textarea {
+    border: 2px solid #e5e7eb;
+    border-radius: 8px;
+    font-size: 1rem;
+    padding: 0.75rem;
+    transition: border-color 0.2s;
+}
+.input-text textarea:focus {
+    border-color: #3b82f6;
+    outline: none;
+}
+/* Buttons */
+.primary-button {
+    background: linear-gradient(135deg, #3b82f6 0%, #2563eb 100%);
+    color: white;
+    border: none;
+    border-radius: 8px;
+    padding: 0.75rem 1.5rem;
+    font-weight: 600;
+    cursor: pointer;
+    transition: transform 0.1s, box-shadow 0.2s;
+}
+.primary-button:hover {
+    transform: translateY(-1px);
+    box-shadow: 0 4px 8px rgba(59, 130, 246, 0.3);
+}
+.secondary-button {
+    background: white;
+    color: #374151;
+    border: 1px solid #d1d5db;
+    border-radius: 8px;
+    padding: 0.5rem 1rem;
+    font-weight: 500;
+    cursor: pointer;
+    transition: background 0.2s;
+}
+.secondary-button:hover {
+    background: #f9fafb;
+}
+/* Sources accordion */
+.accordion {
+    border: 1px solid #e5e7eb;
+    border-radius: 8px;
+    margin-top: 1rem;
+}
+/* Disclaimer */
+.disclaimer {
+    background: #fef3c7;
+    border-left: 4px solid #f59e0b;
+    padding: 1rem;
+    border-radius: 8px;
+    margin-top: 2rem;
+    font-size: 0.875rem;
+    line-height: 1.5;
+}
+.disclaimer strong {
+    color: #92400e;
+    font-weight: 700;
+}
+/* Settings sidebar */
+.settings {
+    background: #f9fafb;
+    border: 1px solid #e5e7eb;
+    border-radius: 8px;
+    padding: 1rem;
+}
+/* Example questions */
+.examples {
+    background: white;
+    border: 1px solid #e5e7eb;
+    border-radius: 8px;
+    padding: 1rem;
+    margin-bottom: 1rem;
+}
+.example-btn {
+    display: block;
+    width: 100%;
+    text-align: left;
+    padding: 0.75rem;
+    margin-bottom: 0.5rem;
+    background: white;
+    border: 1px solid #e5e7eb;
+    border-radius: 6px;
+    cursor: pointer;
+    transition: all 0.2s;
+    font-size: 0.875rem;
+}
+.example-btn:hover {
+    background: #f0f9ff;
+    border-color: #3b82f6;
+    transform: translateX(4px);
+}
+/* Confidence indicator */
+.confidence-high {
+    color: #059669;
+    font-weight: 600;
+}
+.confidence-medium {
+    color: #d97706;
+    font-weight: 600;
+}
+.confidence-low {
+    color: #dc2626;
+    font-weight: 600;
+}
+/* Source cards */
+.source-card {
+    background: white;
+    border: 1px solid #e5e7eb;
+    border-radius: 6px;
+    padding: 0.75rem;
+    margin-bottom: 0.5rem;
+    transition: box-shadow 0.2s;
+}
+.source-card:hover {
+    box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
+}
+.source-title {
+    font-weight: 600;
+    color: #1e40af;
+    margin-bottom: 0.25rem;
+}
+.source-score {
+    font-size: 0.75rem;
+    color: #6b7280;
+}
+"""
+# ============================================================================
+# Formatting Functions
+# ============================================================================
+def format_sources_html(response: QueryResponse, max_sources: int = 5) -> str:
+    """
+    Format sources as HTML.
+    Args:
+        response: Query response with sources
+        max_sources: Maximum number of sources to display
+    Returns:
+        HTML string with formatted sources
+    """
+    if not response.sources:
+        return "<p style='color: #6b7280; font-style: italic;'>No sources available.</p>"
+    html_parts = []
+    # Limit sources
+    sources = response.sources[:max_sources]
+    for i, source in enumerate(sources, 1):
+        # Confidence indicator
+        score_pct = int(source.relevance_score * 100)
+        if source.relevance_score >= 0.7:
+            score_class = "confidence-high"
+        elif source.relevance_score >= 0.5:
+            score_class = "confidence-medium"
+        else:
+            score_class = "confidence-low"
+        html = f"""
+        <div class="source-card">
+            <div class="source-title">
+                {i}. <a href="{source.url}" target="_blank" style="text-decoration: none;">
+                    {source.title}
+                </a>
+            </div>
+            {f'<div style="font-size: 0.875rem; color: #6b7280; margin-bottom: 0.25rem;">Section: {source.section}</div>' if source.section else ''}
+            <div class="source-score">
+                Relevance: <span class="{score_class}">{score_pct}%</span>
+            </div>
+        </div>
+        """
+        html_parts.append(html)
+    return "\n".join(html_parts)
+def format_confidence_text(confidence: float) -> str:
+    """
+    Format confidence as text.
+    Args:
+        confidence: Confidence score (0-1)
+    Returns:
+        Formatted confidence string
+    """
+    pct = int(confidence * 100)
+    if confidence >= 0.7:
+        emoji = "✅"
+        label = "High Confidence"
+    elif confidence >= 0.5:
+        emoji = "⚠️"
+        label = "Medium Confidence"
+    else:
+        emoji = "⚡"
+        label = "Low Confidence"
+    return f"{emoji} {label} ({pct}%)"
+# ============================================================================
+# Chat Interface Functions
+# ============================================================================
+def process_question(
+    question: str,
+    history: List[Dict[str, str]],
+    include_sources: bool,
+    max_sources: int,
+    query_engine_getter,
+) -> tuple[List[Dict[str, str]], str]:
+    """
+    Process a user question and update chat history.
+    Args:
+        question: User's question
+        history: Chat history (list of message dicts with 'role' and 'content')
+        include_sources: Whether to include sources
+        max_sources: Maximum number of sources to show
+        query_engine_getter: Callable that returns query engine instance
+    Returns:
+        Updated history and sources HTML
+    """
+    if not question or not question.strip():
+        return history, ""
+    # Get query engine
+    query_engine = query_engine_getter()
+    print(query_engine)
+    if not query_engine:
+        error_msg = "System is still initializing. Please wait a moment and try again."
+        history.append({"role": "user", "content": question})
+        history.append({"role": "assistant", "content": error_msg})
+        return history, ""
+    try:
+        # Query the engine
+        response = query_engine.query(
+            question=question,
+            include_sources=include_sources,
+        )
+        # Format answer with confidence
+        confidence_text = format_confidence_text(response.confidence)
+        answer = f"**{confidence_text}**\n\n{response.answer}"
+        # Add disclaimer if present (without "educational purposes" text)
+        if response.disclaimer and not any(word in response.disclaimer.lower() for word in ['educational', 'education']):
+            answer += f"\n\n---\n\n{response.disclaimer}"
+        # Update history with message dicts
+        history.append({"role": "user", "content": question})
+        history.append({"role": "assistant", "content": answer})
+        # Format sources
+        sources_html = format_sources_html(response, max_sources) if include_sources else ""
+        return history, sources_html
+    except Exception as e:
+        logger.error(f"Error processing question: {e}", exc_info=True)
+        error_msg = f"Sorry, I encountered an error processing your question: {str(e)}"
+        history.append({"role": "user", "content": question})
+        history.append({"role": "assistant", "content": error_msg})
+        return history, ""
+def clear_chat() -> tuple[List, str]:
+    """
+    Clear chat history.
+    Returns:
+        Empty history and sources
+    """
+    return [], ""
+def load_example(example: str) -> str:
+    """
+    Load an example question.
+    Args:
+        example: Example question text
+    Returns:
+        The example question
+    """
+    return example
+# ============================================================================
+# Gradio Interface
+# ============================================================================
+def create_gradio_interface(query_engine_getter) -> gr.Blocks:
+    """
+    Create Gradio interface for EyeWiki RAG.
+    Args:
+        query_engine_getter: Callable that returns the query engine instance
+    Returns:
+        Gradio Blocks interface
+    """
+    with gr.Blocks(
+        css=CUSTOM_CSS,
+        title="EyeWiki Medical Assistant",
+        theme=gr.themes.Soft(
+            primary_hue="blue",
+            secondary_hue="gray",
+            neutral_hue="slate",
+        ),
+    ) as interface:
+        # Header
+        gr.HTML("""
+        <div class="header">
+            <h1>🏥 EyeWiki Medical Assistant</h1>
+            <p>Ask questions about ophthalmology conditions, treatments, and procedures</p>
+        </div>
+        """)
+        with gr.Row():
+            # Main content (left side)
+            with gr.Column(scale=3):
+                # Chat interface
+                chatbot = gr.Chatbot(
+                    label="Conversation",
+                    height=500,
+                    elem_classes=["chatbot"],
+                    show_label=False,
+                    avatar_images=(None, "🏥"),
+                )
+                # Input
+                with gr.Row():
+                    question_input = gr.Textbox(
+                        placeholder="Ask a question about eye health...",
+                        label="Your Question",
+                        lines=2,
+                        elem_classes=["input-text"],
+                        scale=4,
+                    )
+                with gr.Row():
+                    submit_btn = gr.Button(
+                        "Send",
+                        variant="primary",
+                        elem_classes=["primary-button"],
+                        scale=1,
+                    )
+                    clear_btn = gr.Button(
+                        "Clear",
+                        elem_classes=["secondary-button"],
+                        scale=1,
+                    )
+                # Sources accordion
+                with gr.Accordion("📚 Sources", open=False, elem_classes=["accordion"]):
+                    sources_display = gr.HTML(
+                        value="<p style='color: #6b7280; font-style: italic;'>Sources will appear here after asking a question.</p>"
+                    )
+                # Medical disclaimer
+                gr.HTML("""
+                <div class="disclaimer">
+                    <strong>⚠️ Medical Disclaimer:</strong> This information is sourced from EyeWiki,
+                    a resource of the American Academy of Ophthalmology (AAO). It is not a substitute
+                    for professional medical advice, diagnosis, or treatment. AI systems can make errors.
+                    Always consult with a qualified ophthalmologist or eye care professional for medical
+                    concerns and verify any critical information with authoritative sources.
+                </div>
+                """)
+            # Sidebar (right side)
+            with gr.Column(scale=1, elem_classes=["settings"]):
+                gr.Markdown("### ⚙️ Settings")
+                include_sources = gr.Checkbox(
+                    label="Show sources",
+                    value=True,
+                    info="Include source citations in responses"
+                )
+                max_sources = gr.Slider(
+                    minimum=1,
+                    maximum=10,
+                    value=5,
+                    step=1,
+                    label="Max sources",
+                    info="Maximum number of sources to display"
+                )
+                gr.Markdown("---")
+                gr.Markdown("### 💡 Example Questions")
+                # Example buttons
+                example_buttons = []
+                for example in EXAMPLE_QUESTIONS:
+                    btn = gr.Button(
+                        example,
+                        elem_classes=["example-btn"],
+                        size="sm",
+                    )
+                    example_buttons.append(btn)
+                gr.Markdown("---")
+                gr.Markdown("""
+                ### 📖 About
+                **EyeWiki RAG System** - Powered by:
+                - Hybrid retrieval (semantic + keyword search)
+                - Cross-encoder reranking for precision
+                - Local LLM inference (GPU-accelerated)
+                - EyeWiki knowledge base (AAO)
+                All processing happens locally on your machine.
+                """)
+        # Event handlers
+        submit_event = submit_btn.click(
+            fn=lambda q, h, inc, max_s: process_question(q, h, inc, max_s, query_engine_getter),
+            inputs=[question_input, chatbot, include_sources, max_sources],
+            outputs=[chatbot, sources_display],
+        ).then(
+            fn=lambda: "",
+            outputs=[question_input],
+        )
+        question_input.submit(
+            fn=lambda q, h, inc, max_s: process_question(q, h, inc, max_s, query_engine_getter),
+            inputs=[question_input, chatbot, include_sources, max_sources],
+            outputs=[chatbot, sources_display],
+        ).then(
+            fn=lambda: "",
+            outputs=[question_input],
+        )
+        clear_btn.click(
+            fn=clear_chat,
+            outputs=[chatbot, sources_display],
+        )
+        # Example button handlers
+        for btn in example_buttons:
+            btn.click(
+                fn=load_example,
+                inputs=[btn],
+                outputs=[question_input],
+            )
+    return interface

src/api/main.py ADDED Viewed

	@@ -0,0 +1,627 @@

+"""FastAPI application for EyeWiki RAG system."""
+import logging
+import time
+from contextlib import asynccontextmanager
+from pathlib import Path
+from typing import Optional
+from fastapi import FastAPI, HTTPException, Request, status
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import StreamingResponse
+from pydantic import BaseModel, Field
+import gradio as gr
+from src.api.gradio_ui import create_gradio_interface
+from config.settings import LLMProvider, Settings
+from src.llm.llm_client import LLMClient
+from src.llm.ollama_client import OllamaClient
+from src.llm.openai_client import OpenAIClient
+from src.llm.sentence_transformer_client import SentenceTransformerClient
+from src.rag.query_engine import EyeWikiQueryEngine, QueryResponse
+from src.rag.reranker import CrossEncoderReranker
+from src.rag.retriever import HybridRetriever
+from src.vectorstore.qdrant_store import QdrantStoreManager
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+)
+logger = logging.getLogger(__name__)
+# ============================================================================
+# Request/Response Models
+# ============================================================================
+class QueryRequest(BaseModel):
+    """
+    Request model for query endpoint.
+    Attributes:
+        question: User's question
+        include_sources: Whether to include source information
+        filters: Optional metadata filters (disease_name, icd_codes, etc.)
+    """
+    question: str = Field(..., min_length=3, description="User's question")
+    include_sources: bool = Field(default=True, description="Include source documents")
+    filters: Optional[dict] = Field(default=None, description="Metadata filters")
+class StreamQueryRequest(BaseModel):
+    """
+    Request model for streaming query endpoint.
+    Attributes:
+        question: User's question
+        filters: Optional metadata filters
+    """
+    question: str = Field(..., min_length=3, description="User's question")
+    filters: Optional[dict] = Field(default=None, description="Metadata filters")
+class HealthResponse(BaseModel):
+    """
+    Response model for health check.
+    Attributes:
+        status: Overall status (healthy/unhealthy)
+        llm: LLM service status
+        qdrant: Qdrant service status
+        query_engine: Query engine initialization status
+        timestamp: Check timestamp
+    """
+    status: str = Field(..., description="Overall status")
+    llm: dict = Field(..., description="LLM service status")
+    qdrant: dict = Field(..., description="Qdrant service status")
+    query_engine: dict = Field(..., description="Query engine status")
+    timestamp: float = Field(..., description="Unix timestamp")
+class StatsResponse(BaseModel):
+    """
+    Response model for statistics endpoint.
+    Attributes:
+        collection_info: Qdrant collection information
+        pipeline_config: Query engine pipeline configuration
+        documents_indexed: Number of indexed documents
+        timestamp: Stats timestamp
+    """
+    collection_info: dict = Field(..., description="Collection information")
+    pipeline_config: dict = Field(..., description="Pipeline configuration")
+    documents_indexed: int = Field(..., description="Number of indexed documents")
+    timestamp: float = Field(..., description="Unix timestamp")
+class ErrorResponse(BaseModel):
+    """
+    Error response model.
+    Attributes:
+        error: Error message
+        detail: Optional detailed error information
+        timestamp: Error timestamp
+    """
+    error: str = Field(..., description="Error message")
+    detail: Optional[str] = Field(default=None, description="Error details")
+    timestamp: float = Field(..., description="Unix timestamp")
+# ============================================================================
+# Global State
+# ============================================================================
+class AppState:
+    """Application state container."""
+    def __init__(self):
+        self.settings: Optional[Settings] = None
+        self.llm_client: Optional[LLMClient] = None
+        self.embedding_client: Optional[SentenceTransformerClient] = None
+        self.qdrant_manager: Optional[QdrantStoreManager] = None
+        self.retriever: Optional[HybridRetriever] = None
+        self.reranker: Optional[CrossEncoderReranker] = None
+        self.query_engine: Optional[EyeWikiQueryEngine] = None
+        self.initialized: bool = False
+        self.initialization_error: Optional[str] = None
+app_state = AppState()
+# ============================================================================
+# Lifecycle Management
+# ============================================================================
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """
+    Application lifespan manager.
+    Handles startup and shutdown events.
+    """
+    # Startup
+    logger.info("Starting EyeWiki RAG API...")
+    try:
+        # Load settings
+        logger.info("Loading settings...")
+        app_state.settings = Settings()
+        # Initialize LLM client based on provider
+        logger.info(f"Initializing LLM client (provider: {app_state.settings.llm_provider.value})...")
+        if app_state.settings.llm_provider == LLMProvider.OPENAI:
+            app_state.llm_client = OpenAIClient(
+                api_key=app_state.settings.openai_api_key,
+                base_url=app_state.settings.openai_base_url,
+                model=app_state.settings.openai_model,
+            )
+        else:
+            app_state.llm_client = OllamaClient(
+                base_url=app_state.settings.ollama_base_url,
+                embedding_model=None,  # We use SentenceTransformerClient for embeddings
+                llm_model=app_state.settings.llm_model,
+                timeout=app_state.settings.ollama_timeout,
+            )
+        # Initialize embedding client (sentence-transformers for stable embeddings)
+        logger.info("Initializing embedding client...")
+        app_state.embedding_client = SentenceTransformerClient(
+            model_name=app_state.settings.embedding_model,
+        )
+        logger.info(f"Embedding model loaded: {app_state.settings.embedding_model}")
+        # Initialize Qdrant manager
+        logger.info("Initializing Qdrant manager...")
+        app_state.qdrant_manager = QdrantStoreManager(
+            collection_name=app_state.settings.qdrant_collection_name,
+            path=app_state.settings.qdrant_path,
+            embedding_dim=app_state.embedding_client.embedding_dim,
+        )
+        # Verify collection exists
+        collection_info = app_state.qdrant_manager.get_collection_info()
+        if not collection_info:
+            raise RuntimeError(
+                f"Qdrant collection '{app_state.settings.qdrant_collection_name}' not found. "
+                "Please run 'python scripts/build_index.py --index-vectors' first."
+            )
+        logger.info(
+            f"Qdrant collection loaded: {collection_info['vectors_count']} vectors"
+        )
+        # Initialize retriever
+        logger.info("Initializing retriever...")
+        app_state.retriever = HybridRetriever(
+            qdrant_manager=app_state.qdrant_manager,
+            embedding_client=app_state.embedding_client,
+        )
+        # Initialize reranker
+        logger.info("Initializing reranker...")
+        app_state.reranker = CrossEncoderReranker(
+            model_name=app_state.settings.reranker_model,
+        )
+        # Load prompt files
+        project_root = Path(__file__).parent.parent.parent
+        prompts_dir = project_root / "prompts"
+        system_prompt_path = prompts_dir / "system_prompt.txt"
+        query_prompt_path = prompts_dir / "query_prompt.txt"
+        disclaimer_path = prompts_dir / "medical_disclaimer.txt"
+        # Verify prompts exist
+        if not system_prompt_path.exists():
+            logger.warning(f"System prompt not found: {system_prompt_path}")
+            system_prompt_path = None
+        if not query_prompt_path.exists():
+            logger.warning(f"Query prompt not found: {query_prompt_path}")
+            query_prompt_path = None
+        if not disclaimer_path.exists():
+            logger.warning(f"Disclaimer not found: {disclaimer_path}")
+            disclaimer_path = None
+        # Initialize query engine
+        logger.info("Initializing query engine...")
+        app_state.query_engine = EyeWikiQueryEngine(
+            retriever=app_state.retriever,
+            reranker=app_state.reranker,
+            llm_client=app_state.llm_client,
+            system_prompt_path=system_prompt_path,
+            query_prompt_path=query_prompt_path,
+            disclaimer_path=disclaimer_path,
+            max_context_tokens=app_state.settings.max_context_tokens,
+            retrieval_k=20,
+            rerank_k=5,
+        )
+        app_state.initialized = True
+        logger.info("EyeWiki RAG API started successfully")
+        logger.info("Gradio UI available at /ui")
+    except Exception as e:
+        error_msg = f"Failed to initialize application: {e}"
+        logger.error(error_msg, exc_info=True)
+        app_state.initialization_error = error_msg
+        # Don't raise - allow app to start but endpoints will return errors
+    yield
+    # Shutdown
+    logger.info("Shutting down EyeWiki RAG API...")
+    # Cleanup Qdrant client
+    if app_state.qdrant_manager:
+        try:
+            app_state.qdrant_manager.close()
+            logger.info("Qdrant client closed")
+        except Exception as e:
+            logger.error(f"Error closing Qdrant client: {e}")
+# ============================================================================
+# FastAPI App
+# ============================================================================
+app = FastAPI(
+    title="EyeWiki RAG API",
+    description="Retrieval-Augmented Generation API for EyeWiki medical knowledge base",
+    version="1.0.0",
+    lifespan=lifespan,
+)
+# ============================================================================
+# Middleware
+# ============================================================================
+# CORS middleware for local development
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # Configure appropriately for production
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+@app.middleware("http")
+async def log_requests(request: Request, call_next):
+    """
+    Request logging middleware.
+    Logs all incoming requests with timing information.
+    """
+    start_time = time.time()
+    # Log request
+    logger.info(
+        f"Request: {request.method} {request.url.path} "
+        f"from {request.client.host if request.client else 'unknown'}"
+    )
+    # Process request
+    response = await call_next(request)
+    # Log response
+    duration = time.time() - start_time
+    logger.info(
+        f"Response: {response.status_code} "
+        f"in {duration:.3f}s"
+    )
+    return response
+# ============================================================================
+# Helper Functions
+# ============================================================================
+def check_initialization():
+    """
+    Check if application is initialized.
+    Raises:
+        HTTPException: If app not initialized
+    """
+    if not app_state.initialized:
+        error_detail = app_state.initialization_error or "Application not initialized"
+        raise HTTPException(
+            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
+            detail=error_detail
+        )
+# ============================================================================
+# Endpoints
+# ============================================================================
+@app.get("/")
+async def root():
+    """
+    Root endpoint.
+    Returns:
+        Welcome message with API information
+    """
+    return {
+        "name": "EyeWiki RAG API",
+        "version": "1.0.0",
+        "description": "Retrieval-Augmented Generation API for EyeWiki medical knowledge base",
+        "endpoints": {
+            "health": "GET /health",
+            "query": "POST /query",
+            "stream": "POST /query/stream",
+            "stats": "GET /stats",
+            "docs": "GET /docs",
+        }
+    }
+@app.get("/health", response_model=HealthResponse)
+async def health_check():
+    """
+    Health check endpoint.
+    Checks status of:
+    - Ollama service
+    - Qdrant service
+    - Query engine initialization
+    Returns:
+        HealthResponse with service statuses
+    """
+    timestamp = time.time()
+    # Check LLM provider
+    llm_status = {"status": "unknown", "detail": None}
+    if app_state.llm_client:
+        provider = app_state.settings.llm_provider.value if app_state.settings else "unknown"
+        llm_status["provider"] = provider
+        try:
+            if isinstance(app_state.llm_client, OllamaClient):
+                health_ok = app_state.llm_client.check_health()
+                llm_status["status"] = "healthy" if health_ok else "unhealthy"
+                llm_status["model"] = app_state.llm_client.llm_model
+            else:
+                # For OpenAI-compatible clients, assume healthy if initialized
+                llm_status["status"] = "healthy"
+                llm_status["model"] = app_state.llm_client.llm_model
+        except Exception as e:
+            llm_status = {"status": "unhealthy", "detail": str(e), "provider": provider}
+    else:
+        llm_status = {"status": "not_initialized", "detail": "Client not created"}
+    # Check Qdrant
+    qdrant_status = {"status": "unknown", "detail": None}
+    if app_state.qdrant_manager:
+        try:
+            info = app_state.qdrant_manager.get_collection_info()
+            if info:
+                qdrant_status = {
+                    "status": "healthy",
+                    "collection": info["name"],
+                    "vectors_count": info["vectors_count"],
+                }
+            else:
+                qdrant_status = {
+                    "status": "unhealthy",
+                    "detail": "Collection not found"
+                }
+        except Exception as e:
+            qdrant_status = {"status": "unhealthy", "detail": str(e)}
+    else:
+        qdrant_status = {"status": "not_initialized", "detail": "Manager not created"}
+    # Check query engine
+    query_engine_status = {
+        "status": "initialized" if app_state.initialized else "not_initialized",
+        "error": app_state.initialization_error,
+    }
+    # Overall status
+    overall_status = "healthy"
+    if not app_state.initialized:
+        overall_status = "unhealthy"
+    elif llm_status["status"] != "healthy" or qdrant_status["status"] != "healthy":
+        overall_status = "degraded"
+    return HealthResponse(
+        status=overall_status,
+        llm=llm_status,
+        qdrant=qdrant_status,
+        query_engine=query_engine_status,
+        timestamp=timestamp,
+    )
+@app.post("/query", response_model=QueryResponse)
+async def query(request: QueryRequest):
+    """
+    Main query endpoint.
+    Processes a question using the full RAG pipeline:
+    1. Retrieval (hybrid search)
+    2. Reranking (cross-encoder)
+    3. Context assembly
+    4. LLM generation
+    Args:
+        request: QueryRequest with question and options
+    Returns:
+        QueryResponse with answer, sources, and disclaimer
+    Raises:
+        HTTPException: If service unavailable or query fails
+    """
+    check_initialization()
+    try:
+        logger.info(f"Processing query: '{request.question}'")
+        response = app_state.query_engine.query(
+            question=request.question,
+            include_sources=request.include_sources,
+            filters=request.filters,
+        )
+        logger.info(
+            f"Query complete: {len(response.sources)} sources, "
+            f"confidence: {response.confidence:.2f}"
+        )
+        return response
+    except Exception as e:
+        logger.error(f"Error processing query: {e}", exc_info=True)
+        raise HTTPException(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            detail=f"Error processing query: {str(e)}"
+        )
+@app.post("/query/stream")
+async def stream_query(request: StreamQueryRequest):
+    """
+    Streaming query endpoint.
+    Returns answer as Server-Sent Events (SSE) for real-time streaming.
+    Args:
+        request: StreamQueryRequest with question and options
+    Returns:
+        StreamingResponse with SSE
+    Raises:
+        HTTPException: If service unavailable or query fails
+    """
+    check_initialization()
+    async def generate():
+        """Generate SSE stream."""
+        try:
+            logger.info(f"Processing streaming query: '{request.question}'")
+            # Stream answer chunks
+            for chunk in app_state.query_engine.stream_query(
+                question=request.question,
+                filters=request.filters,
+            ):
+                # SSE format: data: <content>\n\n
+                yield f"data: {chunk}\n\n"
+            logger.info("Streaming query complete")
+        except Exception as e:
+            logger.error(f"Error in streaming query: {e}", exc_info=True)
+            yield f"data: [ERROR] {str(e)}\n\n"
+    return StreamingResponse(
+        generate(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+        }
+    )
+@app.get("/stats", response_model=StatsResponse)
+async def get_stats():
+    """
+    Get index and pipeline statistics.
+    Returns:
+        StatsResponse with collection info and pipeline config
+    Raises:
+        HTTPException: If service unavailable or stats retrieval fails
+    """
+    check_initialization()
+    try:
+        # Get collection info
+        collection_info = app_state.qdrant_manager.get_collection_info()
+        if not collection_info:
+            raise HTTPException(
+                status_code=status.HTTP_404_NOT_FOUND,
+                detail="Collection not found"
+            )
+        # Get pipeline config
+        pipeline_config = app_state.query_engine.get_pipeline_info()
+        return StatsResponse(
+            collection_info=collection_info,
+            pipeline_config=pipeline_config,
+            documents_indexed=collection_info.get("vectors_count", 0),
+            timestamp=time.time(),
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Error retrieving stats: {e}", exc_info=True)
+        raise HTTPException(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            detail=f"Error retrieving stats: {str(e)}"
+        )
+# ============================================================================
+# Error Handlers
+# ============================================================================
+@app.exception_handler(HTTPException)
+async def http_exception_handler(request: Request, exc: HTTPException):
+    """
+    Handle HTTP exceptions.
+    Returns:
+        JSON error response with proper status code
+    """
+    return {
+        "error": exc.detail,
+        "status_code": exc.status_code,
+        "timestamp": time.time(),
+    }
+@app.exception_handler(Exception)
+async def general_exception_handler(request: Request, exc: Exception):
+    """
+    Handle general exceptions.
+    Returns:
+        JSON error response with 500 status
+    """
+    logger.error(f"Unhandled exception: {exc}", exc_info=True)
+    return {
+        "error": "Internal server error",
+        "detail": str(exc),
+        "status_code": status.HTTP_500_INTERNAL_SERVER_ERROR,
+        "timestamp": time.time(),
+    }
+# ============================================================================
+# Mount Gradio UI
+# ============================================================================
+# Create and mount Gradio interface
+# Gradio will access query_engine through app_state once initialized
+gradio_interface = create_gradio_interface(
+    query_engine_getter=lambda: app_state.query_engine
+)
+app = gr.mount_gradio_app(app, gradio_interface, path="/ui")
+logger.info("Gradio UI mounted at /ui")

src/llm/__init__.py ADDED Viewed

File without changes

src/llm/llm_client.py ADDED Viewed

	@@ -0,0 +1,66 @@

+"""Abstract base class for LLM clients."""
+from abc import ABC, abstractmethod
+from typing import Generator, List, Optional
+class LLMClient(ABC):
+    """
+    Abstract base class for LLM clients.
+    All LLM providers (Ollama, OpenAI-compatible, etc.) must implement
+    this interface to be used interchangeably in the RAG pipeline.
+    Implementations must also expose a ``llm_model`` attribute (str)
+    identifying the model in use.
+    """
+    llm_model: str
+    @abstractmethod
+    def generate(
+        self,
+        prompt: str,
+        system_prompt: Optional[str] = None,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+    ) -> str:
+        """
+        Generate text using the LLM (non-streaming).
+        Args:
+            prompt: User prompt
+            system_prompt: Optional system prompt
+            temperature: Sampling temperature
+            max_tokens: Maximum tokens to generate
+            stop: Stop sequences
+        Returns:
+            Generated text
+        """
+        ...
+    @abstractmethod
+    def stream_generate(
+        self,
+        prompt: str,
+        system_prompt: Optional[str] = None,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+    ) -> Generator[str, None, None]:
+        """
+        Generate text using the LLM with streaming.
+        Args:
+            prompt: User prompt
+            system_prompt: Optional system prompt
+            temperature: Sampling temperature
+            max_tokens: Maximum tokens to generate
+            stop: Stop sequences
+        Yields:
+            Generated text chunks
+        """
+        ...

src/llm/ollama_client.py ADDED Viewed

	@@ -0,0 +1,512 @@

+"""Ollama client for embeddings and LLM inference."""
+import logging
+import time
+from typing import Generator, List, Optional
+import requests
+from rich.console import Console
+from config.settings import settings
+from src.llm.llm_client import LLMClient
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class OllamaConnectionError(Exception):
+    """Raised when cannot connect to Ollama."""
+    pass
+class OllamaModelNotFoundError(Exception):
+    """Raised when requested model is not available."""
+    pass
+class OllamaClient(LLMClient):
+    """
+    Client for interacting with Ollama for embeddings and LLM inference.
+    Features:
+    - Embedding generation (single and batch)
+    - LLM text generation (streaming and non-streaming)
+    - Health checks
+    - Automatic retry with exponential backoff
+    - Model verification
+    """
+    def __init__(
+        self,
+        base_url: Optional[str] = None,
+        embedding_model: Optional[str] = None,
+        llm_model: Optional[str] = None,
+        timeout: int = 30,
+        max_retries: int = 3,
+    ):
+        """
+        Initialize Ollama client.
+        Args:
+            base_url: Ollama API base URL (default: from settings)
+            embedding_model: Embedding model name (None to skip, or from settings if not provided)
+            llm_model: LLM model name (default: from settings)
+            timeout: Request timeout in seconds
+            max_retries: Maximum number of retries for failed requests
+        """
+        self.base_url = (base_url or settings.ollama_base_url).rstrip("/")
+        self.embedding_model = embedding_model
+        self.llm_model = llm_model or settings.llm_model
+        self.timeout = timeout
+        self.max_retries = max_retries
+        self.console = Console()
+        # Test connection and verify models
+        self._initialize()
+    def _initialize(self):
+        """Initialize connection and verify models."""
+        # Check if Ollama is running
+        if not self.check_health():
+            error_msg = (
+                f"Cannot connect to Ollama at {self.base_url}. "
+                "Please ensure Ollama is running."
+            )
+            logger.error(error_msg)
+            raise OllamaConnectionError(error_msg)
+        self.console.print(f"[green][/green] Connected to Ollama at {self.base_url}")
+        # Verify embedding model (only if specified)
+        if self.embedding_model and not self._check_model_exists(self.embedding_model):
+            error_msg = (
+                f"Embedding model '{self.embedding_model}' not found. "
+                f"Please pull it with: ollama pull {self.embedding_model}"
+            )
+            logger.error(error_msg)
+            raise OllamaModelNotFoundError(error_msg)
+        # Get and log embedding model info
+        if self.embedding_model:
+            embed_info = self._get_model_info(self.embedding_model)
+            if embed_info:
+                self.console.print(
+                    f"[green][/green] Embedding model: {self.embedding_model}"
+                )
+                logger.info(f"Embedding model info: {embed_info}")
+        # Verify LLM model
+        if not self._check_model_exists(self.llm_model):
+            error_msg = (
+                f"LLM model '{self.llm_model}' not found. "
+                f"Please pull it with: ollama pull {self.llm_model}"
+            )
+            logger.error(error_msg)
+            raise OllamaModelNotFoundError(error_msg)
+        # Get and log LLM model info
+        llm_info = self._get_model_info(self.llm_model)
+        if llm_info:
+            self.console.print(f"[green][/green] LLM model: {self.llm_model}")
+            logger.info(f"LLM model info: {llm_info}")
+    def check_health(self) -> bool:
+        """
+        Check if Ollama server is running and reachable.
+        Returns:
+            True if server is healthy, False otherwise
+        """
+        try:
+            response = requests.get(
+                f"{self.base_url}/api/tags", timeout=self.timeout
+            )
+            return response.status_code == 200
+        except requests.exceptions.RequestException as e:
+            logger.warning(f"Health check failed: {e}")
+            return False
+    def _check_model_exists(self, model_name: str) -> bool:
+        """
+        Check if a model exists in Ollama.
+        Args:
+            model_name: Name of the model to check
+        Returns:
+            True if model exists, False otherwise
+        """
+        try:
+            response = requests.get(
+                f"{self.base_url}/api/tags", timeout=self.timeout
+            )
+            if response.status_code == 200:
+                data = response.json()
+                models = [m["name"] for m in data.get("models", [])]
+                # Check both exact match and with :latest tag
+                return (
+                    model_name in models
+                    or f"{model_name}:latest" in models
+                    or any(m.startswith(f"{model_name}:") for m in models)
+                )
+        except requests.exceptions.RequestException as e:
+            logger.error(f"Error checking model existence: {e}")
+        return False
+    def _get_model_info(self, model_name: str) -> Optional[dict]:
+        """
+        Get information about a model.
+        Args:
+            model_name: Name of the model
+        Returns:
+            Dictionary with model information or None
+        """
+        try:
+            response = requests.post(
+                f"{self.base_url}/api/show",
+                json={"name": model_name},
+                timeout=self.timeout,
+            )
+            if response.status_code == 200:
+                return response.json()
+        except requests.exceptions.RequestException as e:
+            logger.warning(f"Could not get model info: {e}")
+        return None
+    def _retry_with_backoff(self, func, *args, **kwargs):
+        """
+        Retry a function with exponential backoff.
+        Args:
+            func: Function to retry
+            *args: Positional arguments for func
+            **kwargs: Keyword arguments for func
+        Returns:
+            Function result
+        Raises:
+            Last exception if all retries fail
+        """
+        last_exception = None
+        for attempt in range(self.max_retries):
+            try:
+                return func(*args, **kwargs)
+            except requests.exceptions.RequestException as e:
+                last_exception = e
+                if attempt < self.max_retries - 1:
+                    # Exponential backoff: 1s, 2s, 4s, ...
+                    wait_time = 2**attempt
+                    logger.warning(
+                        f"Request failed (attempt {attempt + 1}/{self.max_retries}), "
+                        f"retrying in {wait_time}s: {e}"
+                    )
+                    time.sleep(wait_time)
+                else:
+                    logger.error(f"All {self.max_retries} attempts failed")
+        raise last_exception
+    def embed_text(self, text: str, return_zero_on_failure: bool = False, max_chars: int = 2000) -> List[float]:
+        """
+        Generate embedding for a single text.
+        Args:
+            text: Input text to embed
+            return_zero_on_failure: If True, return zero vector instead of raising exception
+            max_chars: Maximum characters to send to Ollama (default: 2000, safe limit for WSL2)
+        Returns:
+            Embedding vector as list of floats
+        Raises:
+            OllamaConnectionError: If request fails after retries (unless return_zero_on_failure=True)
+        """
+        # Handle empty text
+        if not text or not text.strip():
+            logger.warning("Empty text provided for embedding, returning zero vector")
+            return [0.0] * 768  # Standard embedding dimension
+        # Truncate if too long to prevent context overflow
+        original_length = len(text)
+        if len(text) > max_chars:
+            text = text[:max_chars]
+            logger.debug(f"Truncated text from {original_length} to {max_chars} chars for embedding")
+        def _embed():
+            response = requests.post(
+                f"{self.base_url}/api/embed",  # Correct endpoint for Ollama 0.13.2+
+                json={"model": self.embedding_model, "input": text},  # Use 'input' not 'prompt'
+                timeout=self.timeout,
+            )
+            response.raise_for_status()
+            data = response.json()
+            # API returns embeddings array, we want the first one
+            return data["embeddings"][0] if "embeddings" in data else data["embedding"]
+        try:
+            return self._retry_with_backoff(_embed)
+        except requests.exceptions.RequestException as e:
+            if return_zero_on_failure:
+                logger.warning(f"Failed to generate embedding (text length: {len(text)}), returning zero vector: {e}")
+                return [0.0] * 768
+            else:
+                logger.error(f"Failed to generate embedding: {e}")
+                raise OllamaConnectionError(f"Embedding generation failed: {e}")
+    def embed_batch(
+        self, texts: List[str], batch_size: int = 1, show_progress: bool = True
+    ) -> List[List[float]]:
+        """
+        Generate embeddings for multiple texts sequentially.
+        Note: batch_size parameter is kept for API compatibility but is ignored.
+        Processing is always sequential to avoid overwhelming local Ollama instance.
+        Args:
+            texts: List of input texts
+            batch_size: Ignored (kept for compatibility)
+            show_progress: Show progress bar
+        Returns:
+            List of embedding vectors
+        """
+        import time
+        embeddings = []
+        failed_count = 0
+        if show_progress:
+            from tqdm import tqdm
+            pbar = tqdm(total=len(texts), desc="Generating embeddings", unit="chunk")
+        for i, text in enumerate(texts):
+            try:
+                if i > 0:
+                    time.sleep(0.5)
+                # Use return_zero_on_failure to prevent single failures from stopping the entire process
+                embedding = self.embed_text(text, return_zero_on_failure=True)
+                embeddings.append(embedding)
+                # Check if we got a zero vector (indicates failure)
+                if embedding == [0.0] * 768:
+                    failed_count += 1
+            except Exception as e:
+                logger.error(f"Unexpected error embedding text {i}: {e}")
+                # Fallback to zero vector
+                embeddings.append([0.0] * 768)
+                failed_count += 1
+            if show_progress:
+                pbar.update(1)
+        if show_progress:
+            pbar.close()
+        success_count = len(texts) - failed_count
+        logger.info(f"Generated {success_count}/{len(texts)} embeddings successfully")
+        if failed_count > 0:
+            logger.warning(f"{failed_count} chunks failed and were assigned zero vectors")
+        return embeddings
+    def generate(
+        self,
+        prompt: str,
+        system_prompt: Optional[str] = None,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+    ) -> str:
+        """
+        Generate text using LLM (non-streaming).
+        Args:
+            prompt: User prompt
+            system_prompt: Optional system prompt
+            temperature: Sampling temperature (default: from settings)
+            max_tokens: Maximum tokens to generate (default: from settings)
+            stop: Stop sequences
+        Returns:
+            Generated text
+        Raises:
+            OllamaConnectionError: If generation fails
+        """
+        temperature = temperature if temperature is not None else settings.llm_temperature
+        max_tokens = max_tokens if max_tokens is not None else settings.llm_max_tokens
+        def _generate():
+            payload = {
+                "model": self.llm_model,
+                "prompt": prompt,
+                "stream": False,
+                "options": {
+                    "temperature": temperature,
+                    "num_predict": max_tokens,
+                },
+            }
+            if system_prompt:
+                payload["system"] = system_prompt
+            if stop:
+                payload["options"]["stop"] = stop
+            response = requests.post(
+                f"{self.base_url}/api/generate",
+                json=payload,
+                timeout=self.timeout * 2,  # Longer timeout for generation
+            )
+            response.raise_for_status()
+            data = response.json()
+            return data["response"]
+        try:
+            return self._retry_with_backoff(_generate)
+        except requests.exceptions.RequestException as e:
+            logger.error(f"Failed to generate text: {e}")
+            raise OllamaConnectionError(f"Text generation failed: {e}")
+    def stream_generate(
+        self,
+        prompt: str,
+        system_prompt: Optional[str] = None,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+    ) -> Generator[str, None, None]:
+        """
+        Generate text using LLM with streaming.
+        Args:
+            prompt: User prompt
+            system_prompt: Optional system prompt
+            temperature: Sampling temperature (default: from settings)
+            max_tokens: Maximum tokens to generate (default: from settings)
+            stop: Stop sequences
+        Yields:
+            Generated text chunks
+        Raises:
+            OllamaConnectionError: If generation fails
+        """
+        temperature = temperature if temperature is not None else settings.llm_temperature
+        max_tokens = max_tokens if max_tokens is not None else settings.llm_max_tokens
+        payload = {
+            "model": self.llm_model,
+            "prompt": prompt,
+            "stream": True,
+            "options": {
+                "temperature": temperature,
+                "num_predict": max_tokens,
+            },
+        }
+        if system_prompt:
+            payload["system"] = system_prompt
+        if stop:
+            payload["options"]["stop"] = stop
+        try:
+            response = requests.post(
+                f"{self.base_url}/api/generate",
+                json=payload,
+                stream=True,
+                timeout=self.timeout * 2,
+            )
+            response.raise_for_status()
+            # Stream responses
+            for line in response.iter_lines():
+                if line:
+                    import json
+                    data = json.loads(line)
+                    if "response" in data:
+                        yield data["response"]
+                    # Check if done
+                    if data.get("done", False):
+                        break
+        except requests.exceptions.RequestException as e:
+            logger.error(f"Failed to stream generate text: {e}")
+            raise OllamaConnectionError(f"Streaming generation failed: {e}")
+    def get_available_models(self) -> List[str]:
+        """
+        Get list of available models in Ollama.
+        Returns:
+            List of model names
+        """
+        try:
+            response = requests.get(
+                f"{self.base_url}/api/tags", timeout=self.timeout
+            )
+            if response.status_code == 200:
+                data = response.json()
+                return [m["name"] for m in data.get("models", [])]
+        except requests.exceptions.RequestException as e:
+            logger.error(f"Failed to get available models: {e}")
+        return []
+    def pull_model(self, model_name: str) -> bool:
+        """
+        Pull a model from Ollama registry.
+        Args:
+            model_name: Name of model to pull
+        Returns:
+            True if successful
+        Note:
+            This is a blocking operation that may take a while
+        """
+        try:
+            self.console.print(f"[cyan]Pulling model: {model_name}...[/cyan]")
+            response = requests.post(
+                f"{self.base_url}/api/pull",
+                json={"name": model_name},
+                stream=True,
+                timeout=None,  # No timeout for pulling
+            )
+            # Stream progress
+            for line in response.iter_lines():
+                if line:
+                    import json
+                    data = json.loads(line)
+                    status = data.get("status", "")
+                    if status:
+                        self.console.print(f"  {status}")
+            self.console.print(f"[green][/green] Model pulled: {model_name}")
+            return True
+        except requests.exceptions.RequestException as e:
+            logger.error(f"Failed to pull model: {e}")
+            self.console.print(f"[red][/red] Failed to pull model: {e}")
+            return False

src/llm/openai_client.py ADDED Viewed

	@@ -0,0 +1,187 @@

+"""OpenAI-compatible client for LLM inference (supports Groq, DeepSeek, OpenAI, etc.)."""
+import logging
+from typing import Generator, List, Optional
+from rich.console import Console
+from config.settings import settings
+from src.llm.llm_client import LLMClient
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class OpenAIClientError(Exception):
+    """Raised when an OpenAI-compatible API call fails."""
+    pass
+class OpenAIClient(LLMClient):
+    """
+    Client for interacting with OpenAI-compatible APIs for LLM inference.
+    Supports:
+    - OpenAI (https://api.openai.com/v1)
+    - Groq (https://api.groq.com/openai/v1)
+    - DeepSeek (https://api.deepseek.com/v1)
+    - Any OpenAI-compatible endpoint
+    Features:
+    - Non-streaming and streaming text generation
+    - Configurable model, temperature, and max tokens
+    - Automatic retry via the openai SDK
+    """
+    def __init__(
+        self,
+        api_key: Optional[str] = None,
+        base_url: Optional[str] = None,
+        model: Optional[str] = None,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+    ):
+        """
+        Initialize OpenAI-compatible client.
+        Args:
+            api_key: API key (default: from settings)
+            base_url: Base URL for the API (default: from settings, or OpenAI default)
+            model: Model name (default: from settings)
+            temperature: Default temperature (default: from settings)
+            max_tokens: Default max tokens (default: from settings)
+        """
+        try:
+            from openai import OpenAI
+        except ImportError:
+            raise ImportError(
+                "The 'openai' package is required for OpenAI-compatible providers. "
+                "Install it with: pip install openai>=1.0.0"
+            )
+        self._api_key = api_key or settings.openai_api_key
+        if not self._api_key:
+            raise OpenAIClientError(
+                "API key is required for OpenAI-compatible provider. "
+                "Set OPENAI_API_KEY environment variable or pass api_key parameter."
+            )
+        self._base_url = base_url or settings.openai_base_url
+        self.llm_model = model or settings.openai_model
+        self._temperature = temperature if temperature is not None else settings.llm_temperature
+        self._max_tokens = max_tokens if max_tokens is not None else settings.llm_max_tokens
+        self.console = Console()
+        # Initialize OpenAI client
+        client_kwargs = {"api_key": self._api_key}
+        if self._base_url:
+            client_kwargs["base_url"] = self._base_url
+        self._client = OpenAI(**client_kwargs)
+        # Log initialization
+        provider_name = self._base_url or "OpenAI (default)"
+        self.console.print(f"[green][/green] OpenAI-compatible client initialized")
+        self.console.print(f"  Provider: {provider_name}")
+        self.console.print(f"  Model: {self.llm_model}")
+        logger.info(f"OpenAI client initialized: provider={provider_name}, model={self.llm_model}")
+    def generate(
+        self,
+        prompt: str,
+        system_prompt: Optional[str] = None,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+    ) -> str:
+        """
+        Generate text using the OpenAI-compatible API (non-streaming).
+        Args:
+            prompt: User prompt
+            system_prompt: Optional system prompt
+            temperature: Sampling temperature (default: from init/settings)
+            max_tokens: Maximum tokens to generate (default: from init/settings)
+            stop: Stop sequences
+        Returns:
+            Generated text
+        Raises:
+            OpenAIClientError: If generation fails
+        """
+        temperature = temperature if temperature is not None else self._temperature
+        max_tokens = max_tokens if max_tokens is not None else self._max_tokens
+        messages = []
+        if system_prompt:
+            messages.append({"role": "system", "content": system_prompt})
+        messages.append({"role": "user", "content": prompt})
+        try:
+            response = self._client.chat.completions.create(
+                model=self.llm_model,
+                messages=messages,
+                temperature=temperature,
+                max_tokens=max_tokens,
+                stop=stop,
+            )
+            return response.choices[0].message.content or ""
+        except Exception as e:
+            logger.error(f"Failed to generate text via OpenAI-compatible API: {e}")
+            raise OpenAIClientError(f"Text generation failed: {e}")
+    def stream_generate(
+        self,
+        prompt: str,
+        system_prompt: Optional[str] = None,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        stop: Optional[List[str]] = None,
+    ) -> Generator[str, None, None]:
+        """
+        Generate text using the OpenAI-compatible API with streaming.
+        Args:
+            prompt: User prompt
+            system_prompt: Optional system prompt
+            temperature: Sampling temperature (default: from init/settings)
+            max_tokens: Maximum tokens to generate (default: from init/settings)
+            stop: Stop sequences
+        Yields:
+            Generated text chunks
+        Raises:
+            OpenAIClientError: If generation fails
+        """
+        temperature = temperature if temperature is not None else self._temperature
+        max_tokens = max_tokens if max_tokens is not None else self._max_tokens
+        messages = []
+        if system_prompt:
+            messages.append({"role": "system", "content": system_prompt})
+        messages.append({"role": "user", "content": prompt})
+        try:
+            stream = self._client.chat.completions.create(
+                model=self.llm_model,
+                messages=messages,
+                temperature=temperature,
+                max_tokens=max_tokens,
+                stop=stop,
+                stream=True,
+            )
+            for chunk in stream:
+                if chunk.choices and chunk.choices[0].delta.content:
+                    yield chunk.choices[0].delta.content
+        except Exception as e:
+            logger.error(f"Failed to stream generate text via OpenAI-compatible API: {e}")
+            raise OpenAIClientError(f"Streaming generation failed: {e}")

src/llm/sentence_transformer_client.py ADDED Viewed

	@@ -0,0 +1,161 @@

+"""Sentence Transformers client for reliable embeddings."""
+import logging
+from typing import List
+import torch
+from sentence_transformers import SentenceTransformer
+from tqdm import tqdm
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class SentenceTransformerClient:
+    """
+    Client for generating embeddings using sentence-transformers.
+    This is a drop-in replacement for OllamaClient embeddings with much better
+    stability and performance. Uses HuggingFace models directly without any server.
+    """
+    def __init__(
+        self,
+        model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
+        device: str = None,
+    ):
+        """
+        Initialize the sentence transformer client.
+        Args:
+            model_name: HuggingFace model name
+                Options:
+                - "sentence-transformers/all-MiniLM-L6-v2" (384 dim, fast, general)
+                - "sentence-transformers/all-mpnet-base-v2" (768 dim, better quality)
+                - "BAAI/bge-small-en-v1.5" (384 dim, good for retrieval)
+                - "BAAI/bge-base-en-v1.5" (768 dim, better quality)
+            device: Device to use ('cuda', 'cpu', or None for auto-detect)
+        """
+        self.model_name = model_name
+        # Auto-detect device if not specified
+        if device is None:
+            self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        else:
+            self.device = device
+        logger.info(f"Loading embedding model: {model_name}")
+        logger.info(f"Using device: {self.device}")
+        # Load model
+        self.model = SentenceTransformer(model_name, device=self.device)
+        # Get embedding dimension
+        self.embedding_dim = self.model.get_sentence_embedding_dimension()
+        logger.info(f"Embedding dimension: {self.embedding_dim}")
+    def embed_text(self, text: str, return_zero_on_failure: bool = False) -> List[float]:
+        """
+        Generate embedding for a single text.
+        Args:
+            text: Input text to embed
+            return_zero_on_failure: If True, return zero vector on error (for compatibility)
+        Returns:
+            Embedding vector as list of floats
+        """
+        if not text or not text.strip():
+            logger.warning("Empty text provided, returning zero vector")
+            return [0.0] * self.embedding_dim
+        try:
+            embedding = self.model.encode(
+                text,
+                convert_to_numpy=True,
+                show_progress_bar=False,
+            )
+            return embedding.tolist()
+        except Exception as e:
+            logger.error(f"Failed to generate embedding: {e}")
+            if return_zero_on_failure:
+                return [0.0] * self.embedding_dim
+            raise
+    def embed_batch(
+        self,
+        texts: List[str],
+        batch_size: int = 32,
+        show_progress: bool = True,
+    ) -> List[List[float]]:
+        """
+        Generate embeddings for multiple texts efficiently.
+        Args:
+            texts: List of input texts
+            batch_size: Number of texts to process in parallel
+            show_progress: Show progress bar
+        Returns:
+            List of embedding vectors
+        """
+        if not texts:
+            return []
+        logger.info(f"Generating embeddings for {len(texts)} texts (batch_size={batch_size})")
+        try:
+            embeddings = self.model.encode(
+                texts,
+                batch_size=batch_size,
+                show_progress_bar=show_progress,
+                convert_to_numpy=True,
+            )
+            # Convert to list of lists
+            embeddings_list = embeddings.tolist()
+            logger.info(f"Successfully generated {len(embeddings_list)} embeddings")
+            return embeddings_list
+        except Exception as e:
+            logger.error(f"Batch embedding failed: {e}")
+            # Fallback to sequential processing
+            logger.warning("Falling back to sequential processing")
+            embeddings = []
+            iterator = tqdm(texts, desc="Generating embeddings") if show_progress else texts
+            for text in iterator:
+                embedding = self.embed_text(text, return_zero_on_failure=True)
+                embeddings.append(embedding)
+            failed_count = sum(1 for emb in embeddings if emb == [0.0] * self.embedding_dim)
+            if failed_count > 0:
+                logger.warning(f"{failed_count} embeddings failed and were assigned zero vectors")
+            return embeddings
+    def get_model_info(self) -> dict:
+        """Get information about the loaded model."""
+        return {
+            "model_name": self.model_name,
+            "device": self.device,
+            "embedding_dim": self.embedding_dim,
+            "max_seq_length": self.model.max_seq_length,
+        }
+# Convenience function to create client with settings
+def create_embedding_client(
+    model_name: str = "sentence-transformers/all-mpnet-base-v2",
+) -> SentenceTransformerClient:
+    """
+    Create embedding client with default settings.
+    Using all-mpnet-base-v2 by default as it provides 768-dim embeddings
+    (same as nomic-embed-text) with better quality.
+    """
+    return SentenceTransformerClient(model_name=model_name)

src/processing/__init__.py ADDED Viewed

File without changes

src/processing/chunker.py ADDED Viewed

	@@ -0,0 +1,423 @@

+"""Semantic chunker for processing markdown documents with hierarchical structure."""
+import hashlib
+import json
+import re
+from pathlib import Path
+from typing import Dict, List, Optional, Tuple
+from llama_index.core.node_parser import SentenceSplitter
+from pydantic import BaseModel, Field
+from rich.console import Console
+from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn
+from config.settings import settings
+class ChunkNode(BaseModel):
+    """
+    Pydantic model representing a semantic chunk of text.
+    Attributes:
+        chunk_id: Unique identifier for the chunk
+        content: The actual text content
+        parent_section: The section header this chunk belongs to
+        document_title: Original article title
+        source_url: EyeWiki URL of the source document
+        chunk_index: Position of chunk in the document (0-indexed)
+        token_count: Approximate number of tokens in the chunk
+        metadata: Additional metadata from the source document
+    """
+    chunk_id: str = Field(..., description="Unique identifier (hash-based)")
+    content: str = Field(..., description="Text content of the chunk")
+    parent_section: str = Field(default="", description="Parent section header")
+    document_title: str = Field(default="", description="Original document title")
+    source_url: str = Field(default="", description="Source URL")
+    chunk_index: int = Field(..., ge=0, description="Position in document")
+    token_count: int = Field(..., ge=0, description="Approximate token count")
+    metadata: Dict = Field(default_factory=dict, description="Additional metadata")
+    def to_dict(self) -> Dict:
+        """Convert to dictionary representation."""
+        return self.model_dump()
+    @classmethod
+    def from_dict(cls, data: Dict) -> "ChunkNode":
+        """Create ChunkNode from dictionary."""
+        return cls(**data)
+class SemanticChunker:
+    """
+    Hierarchical semantic chunker that respects markdown structure.
+    Features:
+    - Splits on ## headers first (sections)
+    - Then splits large sections into semantic chunks
+    - Preserves parent section context
+    - Uses LlamaIndex SentenceSplitter for semantic splitting
+    - Configurable chunk sizes and overlap
+    """
+    def __init__(
+        self,
+        chunk_size: Optional[int] = None,
+        chunk_overlap: Optional[int] = None,
+        min_chunk_size: int = 100,
+    ):
+        """
+        Initialize the SemanticChunker.
+        Args:
+            chunk_size: Target chunk size in tokens (default: from settings)
+            chunk_overlap: Overlap between chunks in tokens (default: from settings)
+            min_chunk_size: Minimum chunk size to keep (default: 100 tokens)
+        """
+        self.chunk_size = chunk_size or settings.chunk_size
+        self.chunk_overlap = chunk_overlap or settings.chunk_overlap
+        self.min_chunk_size = min_chunk_size
+        # Initialize LlamaIndex sentence splitter
+        self.sentence_splitter = SentenceSplitter(
+            chunk_size=self.chunk_size,
+            chunk_overlap=self.chunk_overlap,
+        )
+        self.console = Console()
+    def _estimate_tokens(self, text: str) -> int:
+        """
+        Estimate token count for text.
+        Uses a simple heuristic: ~4 characters per token.
+        More accurate than word count for medical/technical text.
+        Args:
+            text: Input text
+        Returns:
+            Estimated token count
+        """
+        return len(text) // 4
+    def _generate_chunk_id(self, content: str, chunk_index: int, source_url: str) -> str:
+        """
+        Generate unique chunk ID using hash.
+        Args:
+            content: Chunk content
+            chunk_index: Index of chunk
+            source_url: Source URL
+        Returns:
+            Unique chunk identifier
+        """
+        # Create a unique string combining content snippet, index, and source
+        unique_string = f"{source_url}:{chunk_index}:{content[:100]}"
+        return hashlib.sha256(unique_string.encode()).hexdigest()[:16]
+    def _parse_markdown_sections(self, markdown: str) -> List[Tuple[str, str]]:
+        """
+        Parse markdown into sections based on ## headers.
+        Args:
+            markdown: Markdown content
+        Returns:
+            List of (header, content) tuples
+        """
+        sections = []
+        # Split by ## headers (h2)
+        # Pattern matches: ## Header or ##Header
+        pattern = r"^##\s+(.+?)$"
+        lines = markdown.split("\n")
+        current_header = ""
+        current_content = []
+        for line in lines:
+            match = re.match(pattern, line)
+            if match:
+                # Save previous section if it has content
+                if current_content:
+                    sections.append((current_header, "\n".join(current_content)))
+                # Start new section
+                current_header = match.group(1).strip()
+                current_content = [line]  # Include the header in content
+            else:
+                current_content.append(line)
+        # Add final section
+        if current_content:
+            sections.append((current_header, "\n".join(current_content)))
+        return sections
+    def _split_large_section(self, text: str) -> List[str]:
+        """
+        Split large section into semantic chunks using LlamaIndex.
+        Args:
+            text: Section text to split
+        Returns:
+            List of text chunks
+        """
+        # Use LlamaIndex SentenceSplitter
+        chunks = self.sentence_splitter.split_text(text)
+        return chunks
+    def _clean_content(self, content: str) -> str:
+        """
+        Clean chunk content by removing excessive whitespace.
+        Args:
+            content: Raw content
+        Returns:
+            Cleaned content
+        """
+        # Remove excessive blank lines (more than 2 consecutive)
+        content = re.sub(r"\n{3,}", "\n\n", content)
+        # Remove leading/trailing whitespace
+        content = content.strip()
+        return content
+    def chunk_document(
+        self,
+        markdown_content: str,
+        metadata: Dict,
+    ) -> List[ChunkNode]:
+        """
+        Chunk a markdown document with hierarchical structure.
+        Process:
+        1. Parse document into sections by ## headers
+        2. For each section, check if it needs splitting
+        3. If section is small enough, keep as single chunk
+        4. If section is large, split into semantic chunks
+        5. Preserve parent section context in each chunk
+        Args:
+            markdown_content: Markdown text content
+            metadata: Document metadata (must include 'url' and 'title')
+        Returns:
+            List of ChunkNode objects
+        """
+        chunks = []
+        chunk_index = 0
+        # Extract metadata
+        source_url = metadata.get("url", "")
+        document_title = metadata.get("title", "Untitled")
+        # Parse into sections
+        sections = self._parse_markdown_sections(markdown_content)
+        # If no sections found, treat entire document as one section
+        if not sections or (len(sections) == 1 and not sections[0][0]):
+            sections = [("", markdown_content)]
+        for section_header, section_content in sections:
+            # Clean section content
+            section_content = self._clean_content(section_content)
+            # Skip empty sections
+            if not section_content:
+                continue
+            # Estimate tokens in section
+            section_tokens = self._estimate_tokens(section_content)
+            # If section is smaller than chunk size, keep as single chunk
+            if section_tokens <= self.chunk_size:
+                # Only create chunk if it meets minimum size
+                if section_tokens >= self.min_chunk_size:
+                    chunk_id = self._generate_chunk_id(
+                        section_content, chunk_index, source_url
+                    )
+                    chunk = ChunkNode(
+                        chunk_id=chunk_id,
+                        content=section_content,
+                        parent_section=section_header,
+                        document_title=document_title,
+                        source_url=source_url,
+                        chunk_index=chunk_index,
+                        token_count=section_tokens,
+                        metadata=metadata,
+                    )
+                    chunks.append(chunk)
+                    chunk_index += 1
+            else:
+                # Section is large, split into semantic chunks
+                sub_chunks = self._split_large_section(section_content)
+                for sub_chunk_content in sub_chunks:
+                    sub_chunk_content = self._clean_content(sub_chunk_content)
+                    # Skip if empty or too small
+                    sub_chunk_tokens = self._estimate_tokens(sub_chunk_content)
+                    if sub_chunk_tokens < self.min_chunk_size:
+                        continue
+                    chunk_id = self._generate_chunk_id(
+                        sub_chunk_content, chunk_index, source_url
+                    )
+                    chunk = ChunkNode(
+                        chunk_id=chunk_id,
+                        content=sub_chunk_content,
+                        parent_section=section_header,
+                        document_title=document_title,
+                        source_url=source_url,
+                        chunk_index=chunk_index,
+                        token_count=sub_chunk_tokens,
+                        metadata=metadata,
+                    )
+                    chunks.append(chunk)
+                    chunk_index += 1
+        return chunks
+    def chunk_directory(
+        self,
+        input_dir: Path,
+        output_dir: Path,
+        pattern: str = "*.md",
+    ) -> Dict[str, int]:
+        """
+        Process all markdown files in a directory.
+        For each .md file, looks for corresponding .json metadata file,
+        chunks the document, and saves chunks to output directory.
+        Args:
+            input_dir: Directory containing markdown files
+            output_dir: Directory to save chunked outputs
+            pattern: Glob pattern for files to process (default: "*.md")
+        Returns:
+            Dictionary with processing statistics
+        """
+        input_dir = Path(input_dir)
+        output_dir = Path(output_dir)
+        output_dir.mkdir(parents=True, exist_ok=True)
+        # Find all markdown files
+        md_files = list(input_dir.glob(pattern))
+        if not md_files:
+            self.console.print(f"[yellow]No files matching '{pattern}' found in {input_dir}[/yellow]")
+            return {"processed": 0, "failed": 0, "total_chunks": 0}
+        stats = {
+            "processed": 0,
+            "failed": 0,
+            "skipped": 0,
+            "total_chunks": 0,
+            "total_tokens": 0,
+        }
+        self.console.print(f"\n[bold cyan]Chunking Documents[/bold cyan]")
+        self.console.print(f"Input: {input_dir}")
+        self.console.print(f"Output: {output_dir}")
+        self.console.print(f"Files found: {len(md_files)}\n")
+        with Progress(
+            SpinnerColumn(),
+            TextColumn("[progress.description]{task.description}"),
+            BarColumn(),
+            TaskProgressColumn(),
+            console=self.console,
+        ) as progress:
+            task = progress.add_task(
+                "[cyan]Processing...",
+                total=len(md_files),
+            )
+            for md_file in md_files:
+                try:
+                    # Look for corresponding JSON metadata file
+                    json_file = md_file.with_suffix(".json")
+                    if not json_file.exists():
+                        self.console.print(
+                            f"[yellow]Skipping {md_file.name}: No metadata file found[/yellow]"
+                        )
+                        stats["skipped"] += 1
+                        progress.advance(task)
+                        continue
+                    # Read markdown content
+                    with open(md_file, "r", encoding="utf-8") as f:
+                        markdown_content = f.read()
+                    # Read metadata
+                    with open(json_file, "r", encoding="utf-8") as f:
+                        metadata = json.load(f)
+                    # Skip if markdown is too small
+                    if self._estimate_tokens(markdown_content) < self.min_chunk_size:
+                        self.console.print(
+                            f"[yellow]Skipping {md_file.name}: Content too small[/yellow]"
+                        )
+                        stats["skipped"] += 1
+                        progress.advance(task)
+                        continue
+                    # Chunk the document
+                    chunks = self.chunk_document(markdown_content, metadata)
+                    if not chunks:
+                        self.console.print(
+                            f"[yellow]Skipping {md_file.name}: No chunks created[/yellow]"
+                        )
+                        stats["skipped"] += 1
+                        progress.advance(task)
+                        continue
+                    # Save chunks to output file
+                    output_file = output_dir / f"{md_file.stem}_chunks.json"
+                    with open(output_file, "w", encoding="utf-8") as f:
+                        chunk_dicts = [chunk.to_dict() for chunk in chunks]
+                        json.dump(chunk_dicts, f, indent=2, ensure_ascii=False)
+                    # Update stats
+                    stats["processed"] += 1
+                    stats["total_chunks"] += len(chunks)
+                    stats["total_tokens"] += sum(chunk.token_count for chunk in chunks)
+                    progress.update(
+                        task,
+                        description=f"[cyan]Processing ({stats['processed']} done, {stats['total_chunks']} chunks): {md_file.name[:40]}...",
+                    )
+                    progress.advance(task)
+                except Exception as e:
+                    self.console.print(f"[red]Error processing {md_file.name}: {e}[/red]")
+                    stats["failed"] += 1
+                    progress.advance(task)
+        # Print summary
+        self.console.print("\n[bold cyan]Chunking Summary[/bold cyan]")
+        self.console.print(f"Files processed: {stats['processed']}")
+        self.console.print(f"Files skipped: {stats['skipped']}")
+        self.console.print(f"Files failed: {stats['failed']}")
+        self.console.print(f"Total chunks created: {stats['total_chunks']}")
+        self.console.print(f"Total tokens: {stats['total_tokens']:,}")
+        if stats["processed"] > 0:
+            avg_chunks = stats["total_chunks"] / stats["processed"]
+            avg_tokens = stats["total_tokens"] / stats["total_chunks"] if stats["total_chunks"] > 0 else 0
+            self.console.print(f"Average chunks per document: {avg_chunks:.1f}")
+            self.console.print(f"Average tokens per chunk: {avg_tokens:.1f}")
+        return stats

src/processing/metadata_extractor.py ADDED Viewed

	@@ -0,0 +1,433 @@

+"""Medical metadata extractor for EyeWiki articles."""
+import re
+from typing import Dict, List, Set
+class MetadataExtractor:
+    """
+    Extract medical metadata from EyeWiki articles.
+    Extracts:
+    - Disease names
+    - ICD-10 codes
+    - Anatomical structures
+    - Symptoms
+    - Treatments (medications and procedures)
+    - Categories
+    """
+    # Comprehensive list of eye anatomical structures
+    ANATOMICAL_STRUCTURES = {
+        # Major structures
+        "cornea", "corneal", "sclera", "scleral", "retina", "retinal",
+        "lens", "crystalline lens", "iris", "iridial", "pupil", "pupillary",
+        "choroid", "choroidal", "vitreous", "vitreous humor",
+        "optic nerve", "optic disc", "optic cup",
+        # Anterior segment
+        "anterior chamber", "posterior chamber", "anterior segment",
+        "trabecular meshwork", "schlemm's canal", "ciliary body", "ciliary muscle",
+        "zonules", "zonular", "aqueous humor", "aqueous",
+        # Posterior segment
+        "posterior segment", "macula", "macular", "fovea", "foveal",
+        "retinal pigment epithelium", "rpe", "photoreceptors",
+        "rods", "cones", "ganglion cells",
+        # Retinal layers
+        "inner limiting membrane", "nerve fiber layer", "ganglion cell layer",
+        "inner plexiform layer", "inner nuclear layer", "outer plexiform layer",
+        "outer nuclear layer", "external limiting membrane",
+        "photoreceptor layer", "bruch's membrane",
+        # Extraocular
+        "eyelid", "eyelids", "conjunctiva", "conjunctival",
+        "lacrimal gland", "tear film", "meibomian glands",
+        "extraocular muscles", "rectus muscle", "oblique muscle",
+        "orbit", "orbital", "optic chiasm",
+        # Blood vessels
+        "central retinal artery", "central retinal vein",
+        "retinal vessels", "vascular", "vasculature",
+        "choriocapillaris",
+        # Angles and spaces
+        "angle", "iridocorneal angle", "suprachoroidal space",
+    }
+    # Common ophthalmic medications
+    MEDICATIONS = {
+        # Glaucoma medications
+        "latanoprost", "timolol", "dorzolamide", "brinzolamide",
+        "brimonidine", "apraclonidine", "bimatoprost", "travoprost",
+        "tafluprost", "pilocarpine", "carbachol",
+        "acetazolamide", "methazolamide",
+        # Anti-VEGF agents
+        "bevacizumab", "ranibizumab", "aflibercept", "brolucizumab",
+        "pegaptanib", "faricimab",
+        # Steroids
+        "prednisolone", "dexamethasone", "triamcinolone", "fluocinolone",
+        "difluprednate", "fluorometholone", "loteprednol",
+        "betamethasone", "hydrocortisone",
+        # Antibiotics
+        "moxifloxacin", "gatifloxacin", "ciprofloxacin", "ofloxacin",
+        "levofloxacin", "tobramycin", "gentamicin", "erythromycin",
+        "azithromycin", "bacitracin", "polymyxin", "neomycin",
+        "vancomycin", "ceftazidime", "cefazolin",
+        # Antivirals
+        "acyclovir", "ganciclovir", "valganciclovir", "valacyclovir",
+        "trifluridine", "foscarnet",
+        # Anti-inflammatory
+        "ketorolac", "diclofenac", "nepafenac", "bromfenac",
+        "cyclosporine", "tacrolimus", "lifitegrast",
+        # Mydriatics/Cycloplegics
+        "tropicamide", "cyclopentolate", "atropine", "homatropine",
+        "phenylephrine",
+        # Other
+        "mitomycin", "5-fluorouracil", "interferon",
+        "methotrexate", "chlorambucil",
+    }
+    # Common ophthalmic procedures
+    PROCEDURES = {
+        # Cataract surgery
+        "phacoemulsification", "phaco", "cataract extraction",
+        "extracapsular cataract extraction", "ecce",
+        "intracapsular cataract extraction", "icce",
+        "iol implantation", "intraocular lens",
+        # Glaucoma procedures
+        "trabeculectomy", "tube shunt", "glaucoma drainage device",
+        "ahmed valve", "baerveldt implant", "molteno implant",
+        "selective laser trabeculoplasty", "slt", "argon laser trabeculoplasty", "alt",
+        "laser peripheral iridotomy", "lpi", "iridotomy",
+        "cyclophotocoagulation", "cyclocryotherapy",
+        "minimally invasive glaucoma surgery", "migs",
+        "trabectome", "istent", "kahook dual blade", "goniotomy",
+        # Retinal procedures
+        "vitrectomy", "pars plana vitrectomy", "ppv",
+        "membrane peeling", "epiretinal membrane peeling",
+        "endolaser", "photocoagulation", "panretinal photocoagulation", "prp",
+        "focal laser", "grid laser",
+        "pneumatic retinopexy", "scleral buckle",
+        "silicone oil", "gas tamponade", "c3f8", "sf6",
+        # Corneal procedures
+        "penetrating keratoplasty", "pkp", "corneal transplant",
+        "descemet stripping endothelial keratoplasty", "dsek", "dsaek",
+        "descemet membrane endothelial keratoplasty", "dmek",
+        "deep anterior lamellar keratoplasty", "dalk",
+        "phototherapeutic keratectomy", "ptk",
+        "corneal crosslinking", "cxl",
+        # Refractive surgery
+        "lasik", "prk", "photorefractive keratectomy",
+        "smile", "lasek", "refractive lens exchange",
+        "phakic iol", "icl",
+        # Injections
+        "intravitreal injection", "intravitreal",
+        "subtenon injection", "retrobulbar block", "peribulbar block",
+        # Laser procedures
+        "yag laser capsulotomy", "laser capsulotomy",
+        "laser iridotomy", "laser trabeculoplasty",
+        # Other
+        "enucleation", "evisceration", "exenteration",
+        "orbital decompression", "ptosis repair", "blepharoplasty",
+        "dacryocystorhinostomy", "dcr",
+    }
+    # Common ophthalmic symptoms
+    SYMPTOMS = {
+        # Visual symptoms
+        "blurred vision", "blurring", "vision loss", "visual loss",
+        "decreased vision", "blindness", "blind spot",
+        "photophobia", "light sensitivity", "glare", "halos",
+        "diplopia", "double vision", "metamorphopsia", "distortion",
+        "scotoma", "floaters", "flashes", "photopsia",
+        "night blindness", "nyctalopia", "color vision defect",
+        "visual field defect", "peripheral vision loss",
+        # Pain and discomfort
+        "eye pain", "ocular pain", "pain", "foreign body sensation",
+        "irritation", "burning", "stinging", "grittiness",
+        "discomfort", "ache", "headache",
+        # Discharge and tearing
+        "discharge", "tearing", "epiphora", "watery eyes",
+        "mucus", "crusting", "mattering",
+        # Redness and inflammation
+        "redness", "red eye", "injection", "hyperemia",
+        "swelling", "edema", "chemosis", "inflammation",
+        # Other
+        "itching", "pruritus", "dryness", "dry eye",
+        "eye strain", "asthenopia", "fatigue",
+    }
+    def __init__(self):
+        """Initialize the metadata extractor."""
+        # Compile regex patterns for efficiency
+        self.icd_pattern = re.compile(
+            r'\b[A-Z]\d{2}(?:\.\d{1,2})?\b|'  # ICD-10: H40.1, H35.32, etc.
+            r'\b[H][0-5]\d(?:\.\d{1,3})?\b'   # Ophthalmic ICD-10 (H00-H59)
+        )
+    def extract_icd_codes(self, text: str) -> List[str]:
+        """
+        Extract ICD-10 codes from text using regex.
+        Patterns matched:
+        - Standard ICD-10: H40.1, H35.32, etc.
+        - Ophthalmic codes: H00-H59 range
+        - Generic codes: A00, B99.9, etc.
+        Args:
+            text: Input text to search
+        Returns:
+            List of unique ICD-10 codes found
+        """
+        codes = self.icd_pattern.findall(text)
+        # Filter to valid ophthalmic codes (H00-H59) and deduplicate
+        valid_codes = set()
+        for code in codes:
+            # Prioritize H codes (ophthalmic)
+            if code.startswith('H'):
+                # Validate H00-H59 range
+                try:
+                    main_code = int(code[1:3])
+                    if 0 <= main_code <= 59:
+                        valid_codes.add(code)
+                except (ValueError, IndexError):
+                    continue
+            else:
+                # Keep other valid ICD-10 codes
+                valid_codes.add(code)
+        return sorted(list(valid_codes))
+    def extract_anatomical_terms(self, text: str) -> List[str]:
+        """
+        Extract anatomical structure mentions from text.
+        Uses case-insensitive pattern matching against predefined
+        anatomical structure vocabulary.
+        Args:
+            text: Input text to search
+        Returns:
+            List of unique anatomical structures found
+        """
+        text_lower = text.lower()
+        found_structures = set()
+        for structure in self.ANATOMICAL_STRUCTURES:
+            # Use word boundaries to avoid partial matches
+            pattern = r'\b' + re.escape(structure) + r's?\b'  # Allow plural
+            if re.search(pattern, text_lower):
+                found_structures.add(structure)
+        return sorted(list(found_structures))
+    def extract_medications(self, text: str) -> List[str]:
+        """
+        Extract medication mentions from text.
+        Args:
+            text: Input text to search
+        Returns:
+            List of unique medications found
+        """
+        text_lower = text.lower()
+        found_medications = set()
+        for medication in self.MEDICATIONS:
+            # Use word boundaries to avoid partial matches
+            pattern = r'\b' + re.escape(medication) + r'\b'
+            if re.search(pattern, text_lower):
+                found_medications.add(medication)
+        return sorted(list(found_medications))
+    def extract_procedures(self, text: str) -> List[str]:
+        """
+        Extract procedure mentions from text.
+        Args:
+            text: Input text to search
+        Returns:
+            List of unique procedures found
+        """
+        text_lower = text.lower()
+        found_procedures = set()
+        for procedure in self.PROCEDURES:
+            # Use word boundaries to avoid partial matches
+            pattern = r'\b' + re.escape(procedure) + r'\b'
+            if re.search(pattern, text_lower):
+                found_procedures.add(procedure)
+        return sorted(list(found_procedures))
+    def extract_symptoms(self, text: str) -> List[str]:
+        """
+        Extract symptom mentions from text.
+        Args:
+            text: Input text to search
+        Returns:
+            List of unique symptoms found
+        """
+        text_lower = text.lower()
+        found_symptoms = set()
+        for symptom in self.SYMPTOMS:
+            # Use word boundaries for multi-word symptoms
+            pattern = r'\b' + re.escape(symptom) + r'\b'
+            if re.search(pattern, text_lower):
+                found_symptoms.add(symptom)
+        return sorted(list(found_symptoms))
+    def extract_disease_name(self, existing_metadata: Dict) -> str:
+        """
+        Extract primary disease name from metadata.
+        Tries multiple sources:
+        1. Article title
+        2. First category
+        3. URL path
+        Args:
+            existing_metadata: Metadata dict with 'title', 'url', 'categories'
+        Returns:
+            Primary disease/condition name
+        """
+        # Try title first
+        title = existing_metadata.get("title", "")
+        if title:
+            # Clean title - remove common prefixes
+            cleaned = re.sub(r'^(Disease|Condition|Syndrome):\s*', '', title, flags=re.IGNORECASE)
+            return cleaned.strip()
+        # Try first category
+        categories = existing_metadata.get("categories", [])
+        if categories and len(categories) > 0:
+            return categories[0].strip()
+        # Try URL path as fallback
+        url = existing_metadata.get("url", "")
+        if url:
+            # Extract last part of URL path
+            match = re.search(r'/([^/]+)$', url)
+            if match:
+                # Replace underscores with spaces
+                name = match.group(1).replace('_', ' ')
+                return name.strip()
+        return "Unknown"
+    def extract(self, content: str, existing_metadata: Dict) -> Dict:
+        """
+        Extract comprehensive medical metadata from article content.
+        Args:
+            content: Article text content (markdown)
+            existing_metadata: Existing metadata dict with basic info
+        Returns:
+            Enhanced metadata dictionary with medical information
+        """
+        # Start with existing metadata
+        enhanced_metadata = existing_metadata.copy()
+        # Extract disease name
+        enhanced_metadata["disease_name"] = self.extract_disease_name(existing_metadata)
+        # Extract ICD codes
+        enhanced_metadata["icd_codes"] = self.extract_icd_codes(content)
+        # Extract anatomical structures
+        enhanced_metadata["anatomical_structures"] = self.extract_anatomical_terms(content)
+        # Extract symptoms
+        enhanced_metadata["symptoms"] = self.extract_symptoms(content)
+        # Extract treatments
+        medications = self.extract_medications(content)
+        procedures = self.extract_procedures(content)
+        enhanced_metadata["treatments"] = {
+            "medications": medications,
+            "procedures": procedures,
+        }
+        # Preserve existing categories
+        if "categories" not in enhanced_metadata:
+            enhanced_metadata["categories"] = []
+        # Add extraction statistics
+        enhanced_metadata["extraction_stats"] = {
+            "icd_codes_found": len(enhanced_metadata["icd_codes"]),
+            "anatomical_terms_found": len(enhanced_metadata["anatomical_structures"]),
+            "symptoms_found": len(enhanced_metadata["symptoms"]),
+            "medications_found": len(medications),
+            "procedures_found": len(procedures),
+        }
+        return enhanced_metadata
+    def extract_batch(self, documents: List[Dict]) -> List[Dict]:
+        """
+        Extract metadata from multiple documents.
+        Args:
+            documents: List of dicts with 'content' and 'metadata' keys
+        Returns:
+            List of enhanced metadata dictionaries
+        """
+        results = []
+        for doc in documents:
+            content = doc.get("content", "")
+            metadata = doc.get("metadata", {})
+            enhanced = self.extract(content, metadata)
+            results.append(enhanced)
+        return results
+    def get_anatomical_vocabulary(self) -> Set[str]:
+        """Get the full anatomical vocabulary set."""
+        return self.ANATOMICAL_STRUCTURES.copy()
+    def get_medication_vocabulary(self) -> Set[str]:
+        """Get the full medication vocabulary set."""
+        return self.MEDICATIONS.copy()
+    def get_procedure_vocabulary(self) -> Set[str]:
+        """Get the full procedure vocabulary set."""
+        return self.PROCEDURES.copy()
+    def get_symptom_vocabulary(self) -> Set[str]:
+        """Get the full symptom vocabulary set."""
+        return self.SYMPTOMS.copy()

src/rag/__init__.py ADDED Viewed

File without changes

src/rag/query_engine.py ADDED Viewed

	@@ -0,0 +1,537 @@

+"""Query engine orchestrating the full RAG pipeline."""
+import logging
+from pathlib import Path
+from typing import Generator, List, Optional
+from pydantic import BaseModel, Field
+from rich.console import Console
+from src.rag.retriever import HybridRetriever, RetrievalResult
+from src.rag.reranker import CrossEncoderReranker
+from src.llm.llm_client import LLMClient
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Medical disclaimer (default)
+MEDICAL_DISCLAIMER = (
+    "**Medical Disclaimer:** This information is sourced from EyeWiki, a resource of the "
+    "American Academy of Ophthalmology (AAO). It is not a substitute for professional "
+    "medical advice, diagnosis, or treatment. AI systems can make errors. Always consult "
+    "with a qualified ophthalmologist or eye care professional for medical concerns and "
+    "verify any critical information with authoritative sources."
+)
+# Default system prompt
+DEFAULT_SYSTEM_PROMPT = """You are an expert ophthalmology assistant with comprehensive knowledge of eye diseases, treatments, and procedures.
+Your role is to provide accurate, evidence-based information from the EyeWiki medical knowledge base.
+Guidelines:
+- Base your answers strictly on the provided context
+- Cite sources using [Source: Title] format when referencing information
+- If the context doesn't contain enough information, say so explicitly
+- Use clear, precise medical terminology while remaining accessible
+- Structure your responses logically with appropriate sections
+- For treatment information, emphasize the importance of professional consultation
+- Always maintain professional medical standards"""
+class SourceInfo(BaseModel):
+    """
+    Information about a source document.
+    Attributes:
+        title: Document title
+        url: Source URL
+        section: Section within document
+        relevance_score: Relevance score (cross-encoder scores, unbounded)
+    """
+    title: str = Field(..., description="Document title")
+    url: str = Field(..., description="Source URL")
+    section: str = Field(default="", description="Section within document")
+    relevance_score: float = Field(..., description="Relevance score (cross-encoder, unbounded)")
+class QueryResponse(BaseModel):
+    """
+    Response from query engine.
+    Attributes:
+        answer: Generated answer text
+        sources: List of source documents used
+        confidence: Confidence score based on retrieval
+        disclaimer: Medical disclaimer text
+        query: Original query
+    """
+    answer: str = Field(..., description="Generated answer")
+    sources: List[SourceInfo] = Field(default_factory=list, description="Source documents")
+    confidence: float = Field(..., ge=0.0, le=1.0, description="Confidence score")
+    disclaimer: str = Field(default=MEDICAL_DISCLAIMER, description="Medical disclaimer")
+    query: str = Field(..., description="Original query")
+class EyeWikiQueryEngine:
+    """
+    Query engine orchestrating the full RAG pipeline.
+    Pipeline:
+    1. Query � Retriever (hybrid search)
+    2. Results � Reranker (cross-encoder)
+    3. Top results � Context assembly
+    4. Context + Query � LLM generation
+    5. Response + Sources + Disclaimer
+    Features:
+    - Two-stage retrieval (fast + precise)
+    - Context assembly with token limits
+    - Source diversity prioritization
+    - Medical disclaimer inclusion
+    - Streaming and non-streaming modes
+    """
+    def __init__(
+        self,
+        retriever: HybridRetriever,
+        reranker: CrossEncoderReranker,
+        llm_client: LLMClient,
+        system_prompt_path: Optional[Path] = None,
+        query_prompt_path: Optional[Path] = None,
+        disclaimer_path: Optional[Path] = None,
+        max_context_tokens: int = 4000,
+        retrieval_k: int = 20,
+        rerank_k: int = 5,
+    ):
+        """
+        Initialize query engine.
+        Args:
+            retriever: HybridRetriever instance
+            reranker: CrossEncoderReranker instance
+            llm_client: LLMClient instance (OllamaClient or OpenAIClient)
+            system_prompt_path: Path to custom system prompt file
+            query_prompt_path: Path to custom query prompt template
+            disclaimer_path: Path to custom medical disclaimer file
+            max_context_tokens: Maximum tokens for context
+            retrieval_k: Number of documents to retrieve initially
+            rerank_k: Number of documents after reranking
+        """
+        self.retriever = retriever
+        self.reranker = reranker
+        self.llm_client = llm_client
+        self.max_context_tokens = max_context_tokens
+        self.retrieval_k = retrieval_k
+        self.rerank_k = rerank_k
+        self.console = Console()
+        # Load system prompt
+        if system_prompt_path and system_prompt_path.exists():
+            with open(system_prompt_path, "r") as f:
+                self.system_prompt = f.read()
+            logger.info(f"Loaded system prompt from {system_prompt_path}")
+        else:
+            self.system_prompt = DEFAULT_SYSTEM_PROMPT
+            logger.info("Using default system prompt")
+        # Load query prompt template
+        if query_prompt_path and query_prompt_path.exists():
+            with open(query_prompt_path, "r") as f:
+                self.query_prompt_template = f.read()
+            logger.info(f"Loaded query prompt from {query_prompt_path}")
+        else:
+            self.query_prompt_template = None
+            logger.info("Using inline query prompt formatting")
+        # Load medical disclaimer
+        if disclaimer_path and disclaimer_path.exists():
+            with open(disclaimer_path, "r") as f:
+                self.medical_disclaimer = f.read().strip()
+            logger.info(f"Loaded medical disclaimer from {disclaimer_path}")
+        else:
+            self.medical_disclaimer = MEDICAL_DISCLAIMER
+            logger.info("Using default medical disclaimer")
+    def _estimate_tokens(self, text: str) -> int:
+        """
+        Estimate token count for text.
+        Uses simple heuristic: ~4 characters per token.
+        Args:
+            text: Input text
+        Returns:
+            Estimated token count
+        """
+        return len(text) // 4
+    def _prioritize_diverse_sources(
+        self, results: List[RetrievalResult]
+    ) -> List[RetrievalResult]:
+        """
+        Prioritize results from diverse sources.
+        Ensures we don't just get multiple chunks from the same article.
+        Args:
+            results: Sorted list of retrieval results
+        Returns:
+            Reordered list prioritizing diversity
+        """
+        seen_documents = set()
+        diverse_results = []
+        remaining_results = []
+        # First pass: one chunk per document
+        for result in results:
+            doc_title = result.document_title
+            if doc_title not in seen_documents:
+                diverse_results.append(result)
+                seen_documents.add(doc_title)
+            else:
+                remaining_results.append(result)
+        # Second pass: add remaining high-scoring chunks
+        diverse_results.extend(remaining_results)
+        return diverse_results
+    def _assemble_context(self, results: List[RetrievalResult]) -> str:
+        """
+        Assemble context from retrieval results.
+        Features:
+        - Formats with section headers
+        - Limits to max_context_tokens
+        - Prioritizes diverse sources
+        - Includes source citations
+        Args:
+            results: List of retrieval results
+        Returns:
+            Formatted context string
+        """
+        if not results:
+            return ""
+        # Prioritize diversity
+        diverse_results = self._prioritize_diverse_sources(results)
+        context_parts = []
+        total_tokens = 0
+        for i, result in enumerate(diverse_results, 1):
+            # Format context chunk
+            chunk_text = f"[Source {i}: {result.document_title}"
+            if result.section:
+                chunk_text += f" - {result.section}"
+            chunk_text += f"]\n{result.content}\n"
+            # Check token limit
+            chunk_tokens = self._estimate_tokens(chunk_text)
+            if total_tokens + chunk_tokens > self.max_context_tokens:
+                logger.info(
+                    f"Reached context token limit ({self.max_context_tokens}), "
+                    f"using {i-1} of {len(diverse_results)} chunks"
+                )
+                break
+            context_parts.append(chunk_text)
+            total_tokens += chunk_tokens
+        context = "\n".join(context_parts)
+        logger.info(
+            f"Assembled context: {len(context_parts)} chunks, "
+            f"~{total_tokens} tokens"
+        )
+        return context
+    def _extract_sources(self, results: List[RetrievalResult]) -> List[SourceInfo]:
+        """
+        Extract source information from results.
+        Args:
+            results: List of retrieval results
+        Returns:
+            List of SourceInfo objects
+        """
+        sources = []
+        seen_titles = set()
+        for result in results:
+            # Deduplicate by title
+            if result.document_title not in seen_titles:
+                source = SourceInfo(
+                    title=result.document_title,
+                    url=result.source_url,
+                    section=result.section,
+                    relevance_score=result.score,
+                )
+                sources.append(source)
+                seen_titles.add(result.document_title)
+        return sources
+    def _calculate_confidence(self, results: List[RetrievalResult]) -> float:
+        """
+        Calculate confidence score based on retrieval scores.
+        Uses average of top reranked scores.
+        Args:
+            results: List of retrieval results
+        Returns:
+            Confidence score (0-1)
+        """
+        if not results:
+            return 0.0
+        # Use average of top scores
+        top_scores = [r.score for r in results[:self.rerank_k]]
+        if not top_scores:
+            return 0.0
+        avg_score = sum(top_scores) / len(top_scores)
+        # Normalize to 0-1 range (assuming scores are roughly 0-1)
+        confidence = min(max(avg_score, 0.0), 1.0)
+        return confidence
+    def _format_prompt(self, query: str, context: str) -> str:
+        """
+        Format the prompt for LLM.
+        Uses query_prompt_template if loaded, otherwise uses default format.
+        Args:
+            query: User query
+            context: Assembled context
+        Returns:
+            Formatted prompt
+        """
+        if self.query_prompt_template:
+            # Use template with placeholders
+            prompt = self.query_prompt_template.format(
+                context=context,
+                question=query
+            )
+        else:
+            # Default inline formatting
+            prompt = f"""Context from EyeWiki medical knowledge base:
+{context}
+---
+Question: {query}
+Please provide a comprehensive answer based on the context above. Structure your response clearly and cite sources where appropriate."""
+        return prompt
+    def query(
+        self,
+        question: str,
+        include_sources: bool = True,
+        filters: Optional[dict] = None,
+    ) -> QueryResponse:
+        """
+        Query the engine and get response.
+        Pipeline:
+        1. Retrieve documents (retrieval_k)
+        2. Rerank with cross-encoder (rerank_k)
+        3. Assemble context with token limits
+        4. Generate answer with LLM
+        5. Return response with sources and disclaimer
+        Args:
+            question: User question
+            include_sources: Include source information in response
+            filters: Optional metadata filters for retrieval
+        Returns:
+            QueryResponse object
+        """
+        logger.info(f"Processing query: '{question}'")
+        # Step 1: Retrieve documents
+        logger.info(f"Retrieving top {self.retrieval_k} candidates...")
+        retrieval_results = self.retriever.retrieve(
+            query=question,
+            top_k=self.retrieval_k,
+            filters=filters,
+        )
+        if not retrieval_results:
+            logger.warning("No results found for query")
+            return QueryResponse(
+                answer="I couldn't find relevant information to answer this question in the EyeWiki knowledge base.",
+                sources=[],
+                confidence=0.0,
+                query=question,
+            )
+        # Step 2: Rerank for precision
+        logger.info(f"Reranking to top {self.rerank_k}...")
+        reranked_results = self.reranker.rerank(
+            query=question,
+            documents=retrieval_results,
+            top_k=self.rerank_k,
+        )
+        # Step 3: Assemble context
+        context = self._assemble_context(reranked_results)
+        # Step 4: Generate answer
+        logger.info("Generating answer with LLM...")
+        prompt = self._format_prompt(question, context)
+        try:
+            answer = self.llm_client.generate(
+                prompt=prompt,
+                system_prompt=self.system_prompt,
+                temperature=0.1,  # Low temperature for factual responses
+            )
+        except Exception as e:
+            logger.error(f"Error generating answer: {e}")
+            answer = (
+                "I encountered an error while generating the answer. "
+                "Please try again or rephrase your question."
+            )
+        # Step 5: Extract sources
+        sources = self._extract_sources(reranked_results) if include_sources else []
+        # Step 6: Calculate confidence
+        confidence = self._calculate_confidence(reranked_results)
+        # Create response
+        response = QueryResponse(
+            answer=answer,
+            sources=sources,
+            confidence=confidence,
+            query=question,
+        )
+        logger.info(
+            f"Query complete: {len(sources)} sources, "
+            f"confidence: {confidence:.2f}"
+        )
+        return response
+    def stream_query(
+        self,
+        question: str,
+        filters: Optional[dict] = None,
+    ) -> Generator[str, None, None]:
+        """
+        Query with streaming response.
+        Yields answer chunks in real-time.
+        Args:
+            question: User question
+            filters: Optional metadata filters
+        Yields:
+            Answer chunks as they are generated
+        """
+        logger.info(f"Processing streaming query: '{question}'")
+        # Retrieval and reranking (same as query())
+        retrieval_results = self.retriever.retrieve(
+            query=question,
+            top_k=self.retrieval_k,
+            filters=filters,
+        )
+        if not retrieval_results:
+            yield "I couldn't find relevant information to answer this question."
+            return
+        reranked_results = self.reranker.rerank(
+            query=question,
+            documents=retrieval_results,
+            top_k=self.rerank_k,
+        )
+        # Assemble context
+        context = self._assemble_context(reranked_results)
+        # Generate prompt
+        prompt = self._format_prompt(question, context)
+        # Stream generation
+        try:
+            for chunk in self.llm_client.stream_generate(
+                prompt=prompt,
+                system_prompt=self.system_prompt,
+                temperature=0.1,
+            ):
+                yield chunk
+        except Exception as e:
+            logger.error(f"Error in streaming generation: {e}")
+            yield "\n\n[Error: Failed to generate response]"
+    def batch_query(
+        self,
+        questions: List[str],
+        include_sources: bool = True,
+    ) -> List[QueryResponse]:
+        """
+        Process multiple queries.
+        Args:
+            questions: List of questions
+            include_sources: Include sources in responses
+        Returns:
+            List of QueryResponse objects
+        """
+        responses = []
+        for question in questions:
+            response = self.query(question, include_sources=include_sources)
+            responses.append(response)
+        return responses
+    def get_pipeline_info(self) -> dict:
+        """
+        Get information about the pipeline configuration.
+        Returns:
+            Dictionary with pipeline settings
+        """
+        return {
+            "retrieval_k": self.retrieval_k,
+            "rerank_k": self.rerank_k,
+            "max_context_tokens": self.max_context_tokens,
+            "retriever_config": {
+                "dense_weight": self.retriever.dense_weight,
+                "sparse_weight": self.retriever.sparse_weight,
+                "term_expansion": self.retriever.enable_term_expansion,
+            },
+            "reranker_info": self.reranker.get_model_info(),
+            "llm_model": self.llm_client.llm_model,
+        }

src/rag/reranker.py ADDED Viewed

	@@ -0,0 +1,293 @@

+"""Cross-encoder reranker for improved retrieval relevance."""
+import logging
+from typing import List, Optional, Tuple
+import torch
+from sentence_transformers import CrossEncoder
+from rich.console import Console
+from src.rag.retriever import RetrievalResult
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class CrossEncoderReranker:
+    """
+    Reranker using cross-encoder models for improved relevance.
+    Features:
+    - Uses sentence-transformers cross-encoder
+    - Automatic GPU/CPU detection
+    - Model caching for efficiency
+    - Preserves original retrieval scores
+    - Batch processing for speed
+    """
+    # Model cache to avoid reloading
+    _model_cache = {}
+    # Available models
+    AVAILABLE_MODELS = {
+        "ms-marco-mini": "cross-encoder/ms-marco-MiniLM-L-6-v2",
+        "ms-marco-base": "cross-encoder/ms-marco-MiniLM-L-12-v2",
+        "medicalai": "pritamdeka/BioBERT-mnli-snli-scinli-scitail-mednli-stsb",  # Medical domain
+    }
+    def __init__(
+        self,
+        model_name: str = "ms-marco-mini",
+        device: Optional[str] = None,
+        max_length: int = 512,
+    ):
+        """
+        Initialize cross-encoder reranker.
+        Args:
+            model_name: Model name (key from AVAILABLE_MODELS) or full path
+            device: Device to use ('cuda', 'cpu', or None for auto-detect)
+            max_length: Maximum sequence length
+        """
+        # Resolve model name
+        if model_name in self.AVAILABLE_MODELS:
+            self.model_path = self.AVAILABLE_MODELS[model_name]
+            self.model_name = model_name
+        else:
+            self.model_path = model_name
+            self.model_name = "custom"
+        self.max_length = max_length
+        self.console = Console()
+        # Detect device
+        if device is None:
+            self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        else:
+            self.device = device
+        # Load model
+        self._load_model()
+    def _load_model(self):
+        """Load cross-encoder model with caching."""
+        cache_key = f"{self.model_path}_{self.device}"
+        # Check cache
+        if cache_key in self._model_cache:
+            self.model = self._model_cache[cache_key]
+            logger.info(f"Loaded reranker model from cache: {self.model_name}")
+            return
+        # Load model
+        try:
+            self.console.print(f"[cyan]Loading reranker model: {self.model_name}...[/cyan]")
+            self.model = CrossEncoder(
+                self.model_path,
+                max_length=self.max_length,
+                device=self.device,
+            )
+            # Cache model
+            self._model_cache[cache_key] = self.model
+            device_info = f"GPU ({torch.cuda.get_device_name(0)})" if self.device == "cuda" else "CPU"
+            self.console.print(
+                f"[green][/green] Loaded reranker model: {self.model_name} on {device_info}"
+            )
+            logger.info(
+                f"Loaded cross-encoder model: {self.model_path} on {self.device}"
+            )
+        except Exception as e:
+            logger.error(f"Failed to load reranker model: {e}")
+            self.console.print(f"[red][/red] Failed to load reranker model: {e}")
+            raise
+    def score_pairs(self, query: str, documents: List[str]) -> List[float]:
+        """
+        Score query-document pairs.
+        Args:
+            query: Search query
+            documents: List of document texts
+        Returns:
+            List of relevance scores (higher is better)
+        """
+        if not documents:
+            return []
+        # Create query-document pairs
+        pairs = [[query, doc] for doc in documents]
+        try:
+            # Get scores from cross-encoder
+            scores = self.model.predict(pairs, convert_to_numpy=True)
+            # Convert to Python list
+            scores = scores.tolist()
+            logger.debug(f"Scored {len(documents)} documents")
+            return scores
+        except Exception as e:
+            logger.error(f"Error scoring pairs: {e}")
+            # Return zeros if scoring fails
+            return [0.0] * len(documents)
+    def rerank(
+        self,
+        query: str,
+        documents: List[RetrievalResult],
+        top_k: Optional[int] = None,
+    ) -> List[RetrievalResult]:
+        """
+        Rerank documents using cross-encoder.
+        Args:
+            query: Search query
+            documents: List of RetrievalResult objects from retriever
+            top_k: Number of top results to return (None for all)
+        Returns:
+            List of RetrievalResult objects sorted by reranker score
+        """
+        if not documents:
+            logger.warning("No documents to rerank")
+            return []
+        # Extract document texts
+        doc_texts = [doc.content for doc in documents]
+        # Score all documents
+        logger.info(f"Reranking {len(documents)} documents for query: '{query[:50]}...'")
+        rerank_scores = self.score_pairs(query, doc_texts)
+        # Create new results with updated scores
+        reranked_results = []
+        for doc, rerank_score in zip(documents, rerank_scores):
+            # Create a new RetrievalResult with updated score
+            # Store original retrieval score in metadata
+            updated_metadata = doc.metadata.copy()
+            updated_metadata["original_retrieval_score"] = doc.score
+            updated_metadata["reranker_score"] = float(rerank_score)
+            reranked_doc = RetrievalResult(
+                content=doc.content,
+                metadata=updated_metadata,
+                score=float(rerank_score),  # Use reranker score as primary score
+                source_url=doc.source_url,
+                section=doc.section,
+                chunk_id=doc.chunk_id,
+                document_title=doc.document_title,
+            )
+            reranked_results.append(reranked_doc)
+        # Sort by reranker score (descending)
+        reranked_results.sort(key=lambda x: x.score, reverse=True)
+        # Log score changes
+        if reranked_results:
+            logger.info(
+                f"Reranking complete. Top result score: {reranked_results[0].score:.4f} "
+                f"(original: {reranked_results[0].metadata.get('original_retrieval_score', 0):.4f})"
+            )
+        # Return top_k if specified
+        if top_k is not None:
+            return reranked_results[:top_k]
+        return reranked_results
+    def rerank_with_comparison(
+        self,
+        query: str,
+        documents: List[RetrievalResult],
+        top_k: Optional[int] = None,
+    ) -> List[Tuple[RetrievalResult, dict]]:
+        """
+        Rerank with detailed comparison of scores.
+        Args:
+            query: Search query
+            documents: List of RetrievalResult objects
+            top_k: Number of top results to return
+        Returns:
+            List of (RetrievalResult, comparison_dict) tuples
+            where comparison_dict contains:
+            - original_score: Original retrieval score
+            - reranker_score: Cross-encoder score
+            - score_change: Difference (reranker - original)
+            - rank_change: Change in ranking position
+        """
+        if not documents:
+            return []
+        # Store original rankings
+        original_rankings = {doc.chunk_id: idx for idx, doc in enumerate(documents)}
+        # Rerank documents
+        reranked_docs = self.rerank(query, documents, top_k=None)
+        # Create comparison results
+        results_with_comparison = []
+        for new_rank, doc in enumerate(reranked_docs):
+            original_rank = original_rankings[doc.chunk_id]
+            original_score = doc.metadata.get("original_retrieval_score", 0.0)
+            reranker_score = doc.score
+            comparison = {
+                "original_score": original_score,
+                "reranker_score": reranker_score,
+                "score_change": reranker_score - original_score,
+                "original_rank": original_rank,
+                "new_rank": new_rank,
+                "rank_change": original_rank - new_rank,  # Positive = moved up
+            }
+            results_with_comparison.append((doc, comparison))
+        # Return top_k if specified
+        if top_k is not None:
+            return results_with_comparison[:top_k]
+        return results_with_comparison
+    def get_model_info(self) -> dict:
+        """
+        Get information about the loaded model.
+        Returns:
+            Dictionary with model information
+        """
+        return {
+            "model_name": self.model_name,
+            "model_path": self.model_path,
+            "device": self.device,
+            "max_length": self.max_length,
+            "gpu_available": torch.cuda.is_available(),
+            "gpu_name": torch.cuda.get_device_name(0) if torch.cuda.is_available() else None,
+        }
+    def clear_cache(self):
+        """Clear the model cache."""
+        self._model_cache.clear()
+        logger.info("Cleared model cache")
+    @classmethod
+    def get_available_models(cls) -> dict:
+        """
+        Get dictionary of available models.
+        Returns:
+            Dictionary mapping model names to paths
+        """
+        return cls.AVAILABLE_MODELS.copy()

src/rag/retriever.py ADDED Viewed

	@@ -0,0 +1,483 @@

+"""Hybrid retriever combining dense and sparse search for optimal retrieval."""
+import logging
+import re
+from typing import Dict, List, Optional, Tuple
+from pydantic import BaseModel, Field
+from rich.console import Console
+from src.vectorstore.qdrant_store import QdrantStoreManager, SearchResult
+from src.llm.sentence_transformer_client import SentenceTransformerClient
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class RetrievalResult(BaseModel):
+    """
+    Pydantic model for retrieval results.
+    Attributes:
+        content: Retrieved text content
+        metadata: Document metadata (disease, ICD codes, etc.)
+        score: Relevance score
+        source_url: EyeWiki source URL
+        section: Parent section header
+        chunk_id: Unique chunk identifier
+        document_title: Article title
+    """
+    content: str = Field(..., description="Retrieved text content")
+    metadata: Dict = Field(default_factory=dict, description="Document metadata")
+    score: float = Field(..., description="Relevance score (can be negative for cross-encoder)")
+    source_url: str = Field(default="", description="EyeWiki source URL")
+    section: str = Field(default="", description="Parent section header")
+    chunk_id: str = Field(default="", description="Unique chunk identifier")
+    document_title: str = Field(default="", description="Article title")
+    @classmethod
+    def from_search_result(cls, result: SearchResult) -> "RetrievalResult":
+        """
+        Convert SearchResult to RetrievalResult.
+        Args:
+            result: SearchResult from Qdrant
+        Returns:
+            RetrievalResult instance
+        """
+        return cls(
+            content=result.content,
+            metadata=result.metadata,
+            score=result.score,
+            source_url=result.source_url,
+            section=result.parent_section,
+            chunk_id=result.chunk_id,
+            document_title=result.document_title,
+        )
+class HybridRetriever:
+    """
+    Hybrid retriever combining dense (semantic) and sparse (BM25) search.
+    Features:
+    - Dense vector search via embeddings (default weight: 0.7)
+    - Sparse BM25 keyword search (default weight: 0.3)
+    - Configurable fusion weights
+    - Query preprocessing
+    - Medical term expansion
+    - Metadata filtering
+    """
+    # Medical term synonyms and abbreviations for query expansion
+    MEDICAL_TERM_EXPANSIONS = {
+        # Common abbreviations
+        "iop": ["intraocular pressure", "iop"],
+        "amd": ["age-related macular degeneration", "amd"],
+        "armd": ["age-related macular degeneration", "armd"],
+        "dme": ["diabetic macular edema", "dme"],
+        "dr": ["diabetic retinopathy", "dr"],
+        "poag": ["primary open-angle glaucoma", "poag"],
+        "pacg": ["primary angle-closure glaucoma", "pacg"],
+        "rvo": ["retinal vein occlusion", "rvo"],
+        "rao": ["retinal artery occlusion", "rao"],
+        "crvo": ["central retinal vein occlusion", "crvo"],
+        "brvo": ["branch retinal vein occlusion", "brvo"],
+        "crao": ["central retinal artery occlusion", "crao"],
+        "vegf": ["vascular endothelial growth factor", "vegf"],
+        "oct": ["optical coherence tomography", "oct"],
+        "fa": ["fluorescein angiography", "fa"],
+        "icg": ["indocyanine green angiography", "icg"],
+        "erg": ["electroretinography", "erg"],
+        "vf": ["visual field", "vf"],
+        "va": ["visual acuity", "va"],
+        # Common synonyms
+        "retina": ["retina", "retinal"],
+        "cornea": ["cornea", "corneal"],
+        "glaucoma": ["glaucoma", "glaucomatous"],
+        "cataract": ["cataract", "lens opacity"],
+        "macula": ["macula", "macular"],
+        "optic nerve": ["optic nerve", "optic disc", "optic cup"],
+    }
+    def __init__(
+        self,
+        qdrant_manager: QdrantStoreManager,
+        embedding_client: SentenceTransformerClient,
+        dense_weight: float = 0.7,
+        sparse_weight: float = 0.3,
+        enable_term_expansion: bool = True,
+    ):
+        """
+        Initialize hybrid retriever.
+        Args:
+            qdrant_manager: QdrantStoreManager for vector search
+            embedding_client: SentenceTransformerClient for query embeddings
+            dense_weight: Weight for dense (semantic) search (0-1)
+            sparse_weight: Weight for sparse (BM25) search (0-1)
+            enable_term_expansion: Enable medical term expansion
+        """
+        self.qdrant_manager = qdrant_manager
+        self.embedding_client = embedding_client
+        self.dense_weight = dense_weight
+        self.sparse_weight = sparse_weight
+        self.enable_term_expansion = enable_term_expansion
+        self.console = Console()
+        # Validate weights
+        total_weight = dense_weight + sparse_weight
+        if not (0.99 <= total_weight <= 1.01):  # Allow small floating point error
+            logger.warning(
+                f"Weights sum to {total_weight:.2f}, not 1.0. "
+                "Normalizing weights."
+            )
+            self.dense_weight = dense_weight / total_weight
+            self.sparse_weight = sparse_weight / total_weight
+        logger.info(
+            f"Initialized HybridRetriever (dense: {self.dense_weight:.2f}, "
+            f"sparse: {self.sparse_weight:.2f})"
+        )
+    def _preprocess_query(self, query: str) -> str:
+        """
+        Preprocess query text.
+        - Convert to lowercase
+        - Remove excessive whitespace
+        - Normalize punctuation
+        Args:
+            query: Raw query string
+        Returns:
+            Preprocessed query
+        """
+        # Convert to lowercase
+        query = query.lower()
+        # Remove excessive whitespace
+        query = re.sub(r'\s+', ' ', query)
+        # Strip leading/trailing whitespace
+        query = query.strip()
+        return query
+    def _expand_medical_terms(self, query: str) -> str:
+        """
+        Expand medical abbreviations and add synonyms.
+        Args:
+            query: Preprocessed query
+        Returns:
+            Expanded query with synonyms
+        """
+        if not self.enable_term_expansion:
+            return query
+        expanded_terms = []
+        words = query.split()
+        for word in words:
+            # Check if word matches any abbreviation or term
+            if word in self.MEDICAL_TERM_EXPANSIONS:
+                # Add all expansions
+                expansions = self.MEDICAL_TERM_EXPANSIONS[word]
+                expanded_terms.extend(expansions)
+            else:
+                # Keep original word
+                expanded_terms.append(word)
+        # Join and deduplicate
+        expanded_query = " ".join(expanded_terms)
+        logger.debug(f"Query expansion: '{query}' � '{expanded_query}'")
+        return expanded_query
+    def _generate_query_embedding(self, query: str) -> List[float]:
+        """
+        Generate embedding for query.
+        Args:
+            query: Query text
+        Returns:
+            Query embedding vector
+        """
+        try:
+            embedding = self.embedding_client.embed_text(query)
+            return embedding
+        except Exception as e:
+            logger.error(f"Failed to generate query embedding: {e}")
+            raise
+    def _merge_results(
+        self,
+        dense_results: List[SearchResult],
+        sparse_results: Optional[List[SearchResult]] = None,
+    ) -> List[Tuple[RetrievalResult, float]]:
+        """
+        Merge dense and sparse results using weighted fusion.
+        Uses Reciprocal Rank Fusion (RRF) for score combination.
+        Args:
+            dense_results: Results from dense search
+            sparse_results: Results from sparse search (if available)
+        Returns:
+            List of (RetrievalResult, combined_score) tuples
+        """
+        # If no sparse results, just use dense results
+        if not sparse_results:
+            results = []
+            for result in dense_results:
+                retrieval_result = RetrievalResult.from_search_result(result)
+                # Apply dense weight to score
+                weighted_score = result.score * self.dense_weight
+                results.append((retrieval_result, weighted_score))
+            return results
+        # Create score dictionaries keyed by chunk_id
+        dense_scores = {r.chunk_id: r.score for r in dense_results}
+        sparse_scores = {r.chunk_id: r.score for r in sparse_results}
+        # Get all unique chunk_ids
+        all_chunk_ids = set(dense_scores.keys()) | set(sparse_scores.keys())
+        # Create lookup for full result objects
+        result_lookup = {}
+        for result in dense_results:
+            result_lookup[result.chunk_id] = result
+        for result in sparse_results:
+            if result.chunk_id not in result_lookup:
+                result_lookup[result.chunk_id] = result
+        # Calculate weighted combined scores
+        combined_results = []
+        for chunk_id in all_chunk_ids:
+            dense_score = dense_scores.get(chunk_id, 0.0)
+            sparse_score = sparse_scores.get(chunk_id, 0.0)
+            # Weighted combination
+            combined_score = (
+                dense_score * self.dense_weight + sparse_score * self.sparse_weight
+            )
+            result = result_lookup[chunk_id]
+            retrieval_result = RetrievalResult.from_search_result(result)
+            combined_results.append((retrieval_result, combined_score))
+        # Sort by combined score (descending)
+        combined_results.sort(key=lambda x: x[1], reverse=True)
+        return combined_results
+    def retrieve_with_scores(
+        self,
+        query: str,
+        top_k: int = 10,
+        filters: Optional[Dict] = None,
+    ) -> List[Tuple[RetrievalResult, float]]:
+        """
+        Retrieve documents with scores.
+        Args:
+            query: Search query
+            top_k: Number of results to return
+            filters: Optional metadata filters
+        Returns:
+            List of (RetrievalResult, score) tuples
+        """
+        # Preprocess query
+        processed_query = self._preprocess_query(query)
+        # Expand medical terms
+        expanded_query = self._expand_medical_terms(processed_query)
+        logger.info(f"Retrieving for query: '{query}'")
+        logger.debug(f"Processed query: '{expanded_query}'")
+        # Generate query embedding
+        query_embedding = self._generate_query_embedding(expanded_query)
+        # Perform dense search
+        dense_results = self.qdrant_manager.search(
+            query_embedding=query_embedding,
+            top_k=top_k * 2,  # Get more for fusion
+            filters=filters,
+        )
+        logger.info(f"Dense search returned {len(dense_results)} results")
+        # Note: For true hybrid search with sparse vectors, you would also:
+        # 1. Generate sparse vector for query (BM25)
+        # 2. Perform sparse search via qdrant_manager.hybrid_search()
+        # 3. Merge results using RRF
+        #
+        # For now, we'll use dense search only
+        # In production, implement proper BM25 sparse vector generation
+        sparse_results = None  # Placeholder for sparse search
+        # Merge results
+        combined_results = self._merge_results(dense_results, sparse_results)
+        # Return top_k
+        return combined_results[:top_k]
+    def retrieve(
+        self,
+        query: str,
+        top_k: int = 10,
+        filters: Optional[Dict] = None,
+    ) -> List[RetrievalResult]:
+        """
+        Retrieve documents (without scores).
+        Args:
+            query: Search query
+            top_k: Number of results to return
+            filters: Optional metadata filters
+        Returns:
+            List of RetrievalResult objects
+        """
+        results_with_scores = self.retrieve_with_scores(query, top_k, filters)
+        # Extract just the results, drop scores
+        results = [result for result, score in results_with_scores]
+        return results
+    def retrieve_by_disease(
+        self,
+        query: str,
+        disease_name: str,
+        top_k: int = 10,
+    ) -> List[RetrievalResult]:
+        """
+        Retrieve documents filtered by disease name.
+        Args:
+            query: Search query
+            disease_name: Disease name to filter by
+            top_k: Number of results to return
+        Returns:
+            List of RetrievalResult objects
+        """
+        filters = {"disease_name": disease_name}
+        return self.retrieve(query, top_k, filters)
+    def retrieve_by_icd_code(
+        self,
+        query: str,
+        icd_codes: List[str],
+        top_k: int = 10,
+    ) -> List[RetrievalResult]:
+        """
+        Retrieve documents filtered by ICD codes.
+        Args:
+            query: Search query
+            icd_codes: List of ICD codes to filter by
+            top_k: Number of results to return
+        Returns:
+            List of RetrievalResult objects
+        """
+        filters = {"icd_codes": icd_codes}
+        return self.retrieve(query, top_k, filters)
+    def retrieve_by_anatomy(
+        self,
+        query: str,
+        anatomical_structures: List[str],
+        top_k: int = 10,
+    ) -> List[RetrievalResult]:
+        """
+        Retrieve documents filtered by anatomical structures.
+        Args:
+            query: Search query
+            anatomical_structures: List of anatomical terms
+            top_k: Number of results to return
+        Returns:
+            List of RetrievalResult objects
+        """
+        filters = {"anatomical_structures": anatomical_structures}
+        return self.retrieve(query, top_k, filters)
+    def get_similar_sections(
+        self,
+        section_content: str,
+        top_k: int = 5,
+        filters: Optional[Dict] = None,
+    ) -> List[RetrievalResult]:
+        """
+        Find similar sections based on content.
+        Useful for "related sections" or "see also" features.
+        Args:
+            section_content: Content to find similar sections for
+            top_k: Number of results to return
+            filters: Optional metadata filters
+        Returns:
+            List of RetrievalResult objects
+        """
+        # Use the section content itself as the query
+        return self.retrieve(section_content, top_k, filters)
+    def multi_query_retrieve(
+        self,
+        queries: List[str],
+        top_k: int = 10,
+        filters: Optional[Dict] = None,
+        deduplicate: bool = True,
+    ) -> List[RetrievalResult]:
+        """
+        Retrieve using multiple queries and combine results.
+        Useful for query decomposition or multi-faceted questions.
+        Args:
+            queries: List of query strings
+            top_k: Total number of results to return
+            filters: Optional metadata filters
+            deduplicate: Remove duplicate results
+        Returns:
+            List of RetrievalResult objects
+        """
+        all_results = []
+        seen_chunk_ids = set()
+        # Retrieve for each query
+        for query in queries:
+            results = self.retrieve(query, top_k=top_k, filters=filters)
+            for result in results:
+                if deduplicate:
+                    if result.chunk_id not in seen_chunk_ids:
+                        all_results.append(result)
+                        seen_chunk_ids.add(result.chunk_id)
+                else:
+                    all_results.append(result)
+        # Return top_k overall
+        return all_results[:top_k]

src/scraper/__init__.py ADDED Viewed

File without changes

src/scraper/eyewiki_crawler.py ADDED Viewed

	@@ -0,0 +1,489 @@

+"""EyeWiki crawler for medical article scraping using crawl4ai."""
+import asyncio
+import json
+import re
+from collections import deque
+from datetime import datetime
+from pathlib import Path
+from typing import Dict, Optional, Set
+from urllib.parse import urljoin, urlparse, parse_qs
+from urllib.robotparser import RobotFileParser
+import aiohttp
+from bs4 import BeautifulSoup
+from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
+from rich.console import Console
+from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn
+from config.settings import settings
+class EyeWikiCrawler:
+    """
+    Asynchronous crawler for EyeWiki medical articles.
+    Features:
+    - Asynchronous crawling with crawl4ai
+    - Respects robots.txt
+    - Polite crawling with configurable delays
+    - Markdown content extraction
+    - Checkpointing for resume capability
+    - Progress tracking with rich console
+    """
+    def __init__(
+        self,
+        base_url: str = "https://eyewiki.org",
+        output_dir: Optional[Path] = None,
+        checkpoint_file: Optional[Path] = None,
+        delay: float = 1.5,
+        timeout: int = 30,
+    ):
+        """
+        Initialize the EyeWiki crawler.
+        Args:
+            base_url: Base URL for EyeWiki
+            output_dir: Directory to save scraped articles
+            checkpoint_file: Path to checkpoint file
+            delay: Delay between requests in seconds
+            timeout: Request timeout in seconds
+        """
+        self.base_url = base_url
+        self.domain = urlparse(base_url).netloc
+        self.output_dir = output_dir or Path(settings.data_raw_path)
+        self.checkpoint_file = checkpoint_file or (self.output_dir / "crawler_checkpoint.json")
+        self.delay = delay
+        self.timeout = timeout
+        # Ensure output directory exists
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+        # Crawl state
+        self.visited_urls: Set[str] = set()
+        self.to_crawl: deque = deque()
+        self.failed_urls: Dict[str, str] = {}
+        self.articles_saved: int = 0
+        # Rich console for logging
+        self.console = Console()
+        # Robot parser
+        self.robot_parser = RobotFileParser()
+        self.robot_parser.set_url(urljoin(base_url, "/robots.txt"))
+        # Patterns to skip
+        self.skip_patterns = [
+            r"/index\.php\?title=.*&action=",  # Edit, history, etc.
+            r"/index\.php\?title=.*&diff=",  # Page diffs
+            r"/index\.php\?title=.*&oldid=",  # Page history/revisions
+            r"/index\.php\?title=.*&direction=",  # Page navigation
+            r"/index\.php\?title=Special:",  # Special pages (login, create account, etc.)
+            r"/Special:",  # Special pages
+            r"/User:",  # User pages
+            r"/User_talk:",  # User talk pages
+            r"/Talk:",  # Talk pages
+            r"/File:",  # File pages
+            r"/Template:",  # Template pages
+            r"/Help:",  # Help pages
+            r"/MediaWiki:",  # MediaWiki pages
+            r"#",  # Anchor links
+        ]
+    def _is_valid_article_url(self, url: str) -> bool:
+        """
+        Check if URL is a valid medical article.
+        Args:
+            url: URL to check
+        Returns:
+            True if valid article URL
+        """
+        # Must be from eyewiki.org domain
+        if self.domain not in url:
+            return False
+        # Skip patterns (these take precedence)
+        for pattern in self.skip_patterns:
+            if re.search(pattern, url):
+                return False
+        # Parse URL to check path
+        parsed = urlparse(url)
+        path = parsed.path.strip("/")
+        # Must be article-like URL
+        # EyeWiki articles can be:
+        # 1. Direct: /Article_Name (e.g., /Cataract)
+        # 2. Wiki-style: /wiki/Article_Name
+        # 3. Query-based: /w/index.php?title=Article_Name
+        # For query-based URLs, check if title parameter exists and is not a special page
+        if parsed.query and "title=" in parsed.query:
+            return True
+        # For direct URLs, check if path is non-empty and looks like an article
+        # (starts with capital letter, no file extension)
+        if path and not path.startswith("w/") and not "." in path:
+            # Path should look like an article name (capitalized, underscores/spaces)
+            if path[0].isupper() or path.startswith("wiki/"):
+                return True
+        return False
+    def _normalize_url(self, url: str) -> str:
+        """
+        Normalize URL for consistent comparison.
+        Args:
+            url: URL to normalize
+        Returns:
+            Normalized URL
+        """
+        # Remove fragment
+        url = url.split("#")[0]
+        # Remove trailing slash
+        url = url.rstrip("/")
+        return url
+    def _can_fetch(self, url: str) -> bool:
+        """
+        Check if URL can be fetched according to robots.txt.
+        Args:
+            url: URL to check
+        Returns:
+            True if allowed to fetch
+        """
+        try:
+            return self.robot_parser.can_fetch("*", url)
+        except Exception as e:
+            self.console.print(f"[yellow]Warning: Could not check robots.txt: {e}[/yellow]")
+            return True  # Be permissive if robots.txt check fails
+    def _extract_links(self, html: str, current_url: str) -> Set[str]:
+        """
+        Extract valid article links from HTML.
+        Args:
+            html: HTML content
+            current_url: Current page URL for resolving relative links
+        Returns:
+            Set of valid article URLs
+        """
+        soup = BeautifulSoup(html, "html.parser")
+        links = set()
+        for a_tag in soup.find_all("a", href=True):
+            href = a_tag["href"]
+            # Resolve relative URLs
+            absolute_url = urljoin(current_url, href)
+            normalized_url = self._normalize_url(absolute_url)
+            if self._is_valid_article_url(normalized_url):
+                links.add(normalized_url)
+        return links
+    def _extract_metadata(self, soup: BeautifulSoup, url: str) -> Dict:
+        """
+        Extract metadata from article page.
+        Args:
+            soup: BeautifulSoup object
+            url: Article URL
+        Returns:
+            Dictionary of metadata
+        """
+        metadata = {
+            "url": url,
+            "title": "",
+            "last_updated": None,
+            "categories": [],
+            "scraped_at": datetime.utcnow().isoformat(),
+        }
+        # Extract title
+        title_tag = soup.find("h1", {"id": "firstHeading"}) or soup.find("h1")
+        if title_tag:
+            metadata["title"] = title_tag.get_text(strip=True)
+        # Extract categories
+        category_links = soup.find_all("a", href=re.compile(r"/Category:"))
+        metadata["categories"] = [link.get_text(strip=True) for link in category_links]
+        # Extract last modified date (if available)
+        last_modified = soup.find("li", {"id": "footer-info-lastmod"})
+        if last_modified:
+            metadata["last_updated"] = last_modified.get_text(strip=True)
+        return metadata
+    def save_article(self, content: Dict, filepath: Path) -> None:
+        """
+        Save article content and metadata to files.
+        Args:
+            content: Dictionary with 'markdown' and 'metadata' keys
+            filepath: Base filepath (without extension)
+        """
+        # Save markdown content
+        md_file = filepath.with_suffix(".md")
+        with open(md_file, "w", encoding="utf-8") as f:
+            f.write(content["markdown"])
+        # Save metadata as JSON sidecar
+        json_file = filepath.with_suffix(".json")
+        with open(json_file, "w", encoding="utf-8") as f:
+            json.dump(content["metadata"], f, indent=2, ensure_ascii=False)
+        self.articles_saved += 1
+        self.console.print(f"[green][/green] Saved: {content['metadata'].get('title', 'Untitled')}")
+    def load_checkpoint(self) -> bool:
+        """
+        Load checkpoint data to resume crawling.
+        Returns:
+            True if checkpoint was loaded successfully
+        """
+        if not self.checkpoint_file.exists():
+            return False
+        try:
+            with open(self.checkpoint_file, "r") as f:
+                data = json.load(f)
+            self.visited_urls = set(data.get("visited_urls", []))
+            self.to_crawl = deque(data.get("to_crawl", []))
+            self.failed_urls = data.get("failed_urls", {})
+            self.articles_saved = data.get("articles_saved", 0)
+            self.console.print(f"[blue]Loaded checkpoint:[/blue] {len(self.visited_urls)} visited, "
+                             f"{len(self.to_crawl)} queued, {self.articles_saved} saved")
+            return True
+        except Exception as e:
+            self.console.print(f"[red]Error loading checkpoint: {e}[/red]")
+            return False
+    def save_checkpoint(self) -> None:
+        """Save current crawl state to checkpoint file."""
+        data = {
+            "visited_urls": list(self.visited_urls),
+            "to_crawl": list(self.to_crawl),
+            "failed_urls": self.failed_urls,
+            "articles_saved": self.articles_saved,
+            "last_checkpoint": datetime.utcnow().isoformat(),
+        }
+        try:
+            with open(self.checkpoint_file, "w") as f:
+                json.dump(data, f, indent=2)
+        except Exception as e:
+            self.console.print(f"[red]Error saving checkpoint: {e}[/red]")
+    async def crawl_single_page(self, url: str) -> Optional[Dict]:
+        """
+        Crawl a single page and extract content.
+        Args:
+            url: URL to crawl
+        Returns:
+            Dictionary with markdown content and metadata, or None if failed
+        """
+        if not self._can_fetch(url):
+            self.console.print(f"[yellow]Blocked by robots.txt:[/yellow] {url}")
+            return None
+        try:
+            # Configure browser settings
+            browser_config = BrowserConfig(
+                headless=True,
+                verbose=False,
+            )
+            # Configure crawler settings
+            crawler_config = CrawlerRunConfig(
+                cache_mode=CacheMode.BYPASS,
+                page_timeout=self.timeout * 1000,  # Convert to milliseconds
+                wait_for="body",
+            )
+            # Create crawler and run
+            async with AsyncWebCrawler(config=browser_config) as crawler:
+                result = await crawler.arun(
+                    url=url,
+                    config=crawler_config,
+                )
+                if not result.success:
+                    self.console.print(f"[red]Failed to crawl:[/red] {url}")
+                    return None
+                # Parse HTML for metadata
+                soup = BeautifulSoup(result.html, "html.parser")
+                metadata = self._extract_metadata(soup, url)
+                # Get markdown content
+                markdown = result.markdown
+                return {
+                    "markdown": markdown,
+                    "metadata": metadata,
+                    "html": result.html,
+                    "links": self._extract_links(result.html, url),
+                }
+        except Exception as e:
+            self.console.print(f"[red]Error crawling {url}:[/red] {e}")
+            self.failed_urls[url] = str(e)
+            return None
+    async def crawl(
+        self,
+        max_pages: Optional[int] = None,
+        depth: int = 2,
+        start_urls: Optional[list] = None,
+    ) -> None:
+        """
+        Crawl EyeWiki starting from the main page.
+        Args:
+            max_pages: Maximum number of pages to crawl (None for unlimited)
+            depth: Maximum depth to crawl
+            start_urls: Optional list of starting URLs (defaults to base_url)
+        """
+        # Try to load checkpoint
+        checkpoint_loaded = self.load_checkpoint()
+        # Initialize robot parser
+        try:
+            self.robot_parser.read()
+            self.console.print("[green][/green] Loaded robots.txt")
+        except Exception as e:
+            self.console.print(f"[yellow]Warning: Could not load robots.txt: {e}[/yellow]")
+        # Initialize queue if not loaded from checkpoint
+        if not checkpoint_loaded:
+            if start_urls:
+                self.to_crawl.extend([(url, 0) for url in start_urls])
+            else:
+                self.to_crawl.append((self.base_url, 0))
+        self.console.print(f"\n[bold cyan]Starting EyeWiki Crawl[/bold cyan]")
+        self.console.print(f"Max pages: {max_pages or 'unlimited'}")
+        self.console.print(f"Max depth: {depth}")
+        self.console.print(f"Delay: {self.delay}s\n")
+        with Progress(
+            SpinnerColumn(),
+            TextColumn("[progress.description]{task.description}"),
+            BarColumn(),
+            TaskProgressColumn(),
+            console=self.console,
+        ) as progress:
+            task = progress.add_task(
+                "[cyan]Crawling...",
+                total=max_pages if max_pages else 100,
+            )
+            try:
+                while self.to_crawl:
+                    # Check max_pages limit
+                    if max_pages and self.articles_saved >= max_pages:
+                        self.console.print(f"\n[yellow]Reached max_pages limit: {max_pages}[/yellow]")
+                        break
+                    # Get next URL
+                    current_url, current_depth = self.to_crawl.popleft()
+                    # Skip if already visited
+                    if current_url in self.visited_urls:
+                        continue
+                    # Check depth limit
+                    if current_depth > depth:
+                        continue
+                    # Mark as visited
+                    self.visited_urls.add(current_url)
+                    # Update progress
+                    progress.update(
+                        task,
+                        completed=self.articles_saved,
+                        description=f"[cyan]Crawling ({self.articles_saved} saved, {len(self.to_crawl)} queued): {current_url[:60]}...",
+                    )
+                    # Crawl the page
+                    result = await self.crawl_single_page(current_url)
+                    if result:
+                        # Create filename from URL
+                        parsed = urlparse(current_url)
+                        # For URLs with query parameters (like index.php?title=Article_Name),
+                        # extract the title parameter
+                        if parsed.query:
+                            query_params = parse_qs(parsed.query)
+                            if 'title' in query_params:
+                                # Use the title parameter as filename
+                                filename = query_params['title'][0]
+                            else:
+                                # Fallback: use the entire query string
+                                filename = parsed.query
+                        else:
+                            # Use path-based filename for clean URLs like /wiki/Article_Name
+                            path_parts = parsed.path.strip("/").split("/")
+                            filename = "_".join(path_parts[-2:]) if len(path_parts) > 1 else path_parts[-1]
+                        # Clean filename
+                        filename = re.sub(r"[^\w\s-]", "_", filename)
+                        filename = re.sub(r"[-\s]+", "_", filename)
+                        filename = filename[:200]  # Limit length
+                        # Save article
+                        filepath = self.output_dir / filename
+                        self.save_article(result, filepath)
+                        # Add discovered links to queue
+                        for link in result["links"]:
+                            if link not in self.visited_urls:
+                                self.to_crawl.append((link, current_depth + 1))
+                    # Polite delay
+                    await asyncio.sleep(self.delay)
+                    # Periodic checkpoint save (every 10 articles)
+                    if self.articles_saved % 10 == 0:
+                        self.save_checkpoint()
+            except KeyboardInterrupt:
+                self.console.print("\n[yellow]Crawl interrupted by user[/yellow]")
+            except Exception as e:
+                self.console.print(f"\n[red]Error during crawl: {e}[/red]")
+            finally:
+                # Final checkpoint save
+                self.save_checkpoint()
+                # Print summary
+                self.console.print("\n[bold cyan]Crawl Summary[/bold cyan]")
+                self.console.print(f"Articles saved: {self.articles_saved}")
+                self.console.print(f"URLs visited: {len(self.visited_urls)}")
+                self.console.print(f"URLs failed: {len(self.failed_urls)}")
+                self.console.print(f"URLs remaining: {len(self.to_crawl)}")
+                if self.failed_urls:
+                    self.console.print("\n[yellow]Failed URLs:[/yellow]")
+                    for url, error in list(self.failed_urls.items())[:10]:
+                        self.console.print(f"  - {url}: {error}")
+                    if len(self.failed_urls) > 10:
+                        self.console.print(f"  ... and {len(self.failed_urls) - 10} more")

src/vectorstore/__init__.py ADDED Viewed

File without changes

src/vectorstore/qdrant_store.py ADDED Viewed

	@@ -0,0 +1,587 @@

+"""Qdrant vector store manager for EyeWiki RAG system."""
+import uuid
+import logging
+from pathlib import Path
+from typing import Dict, List, Optional
+from pydantic import BaseModel, Field
+from qdrant_client import QdrantClient
+from qdrant_client.models import (
+    Distance,
+    VectorParams,
+    SparseVectorParams,
+    SparseIndexParams,
+    PointStruct,
+    Filter,
+    FieldCondition,
+    MatchValue,
+    MatchAny,
+    Range,
+    ScoredPoint,
+)
+from rich.console import Console
+from config.settings import settings
+from src.processing.chunker import ChunkNode
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class SearchResult(BaseModel):
+    """
+    Pydantic model for search results.
+    Attributes:
+        id: Unique identifier of the result
+        score: Relevance score
+        chunk_id: Chunk identifier
+        content: Text content
+        parent_section: Section header
+        document_title: Article title
+        source_url: EyeWiki URL
+        metadata: Additional metadata
+    """
+    id: str = Field(..., description="Unique result identifier")
+    score: float = Field(..., ge=0.0, description="Relevance score")
+    chunk_id: str = Field(..., description="Chunk identifier")
+    content: str = Field(..., description="Text content")
+    parent_section: str = Field(default="", description="Parent section header")
+    document_title: str = Field(default="", description="Document title")
+    source_url: str = Field(default="", description="Source URL")
+    metadata: Dict = Field(default_factory=dict, description="Additional metadata")
+    @classmethod
+    def from_scored_point(cls, point: ScoredPoint) -> "SearchResult":
+        """
+        Create SearchResult from Qdrant ScoredPoint.
+        Args:
+            point: Qdrant scored point
+        Returns:
+            SearchResult instance
+        """
+        payload = point.payload or {}
+        return cls(
+            id=str(point.id),
+            score=point.score,
+            chunk_id=payload.get("chunk_id", ""),
+            content=payload.get("content", ""),
+            parent_section=payload.get("parent_section", ""),
+            document_title=payload.get("document_title", ""),
+            source_url=payload.get("source_url", ""),
+            metadata=payload.get("metadata", {}),
+        )
+class QdrantStoreManager:
+    """
+    Qdrant vector store manager for EyeWiki documents.
+    Features:
+    - Local/persistent Qdrant storage
+    - Dense vector search (semantic)
+    - Sparse vector search (BM25)
+    - Hybrid search combining both
+    - Metadata filtering
+    - Batched operations for efficiency
+    """
+    def __init__(
+        self,
+        collection_name: Optional[str] = None,
+        path: Optional[str] = None,
+        embedding_dim: int = 768,  # Default for nomic-embed-text
+        batch_size: int = 100,
+    ):
+        """
+        Initialize Qdrant store manager.
+        Args:
+            collection_name: Name of the collection (default: from settings)
+            path: Path to Qdrant storage (default: from settings)
+            embedding_dim: Dimension of dense embeddings
+            batch_size: Batch size for bulk operations
+        """
+        self.collection_name = collection_name or settings.qdrant_collection_name
+        self.path = Path(path or settings.qdrant_path)
+        self.embedding_dim = embedding_dim
+        self.batch_size = batch_size
+        # Create storage directory
+        self.path.mkdir(parents=True, exist_ok=True)
+        # Initialize Qdrant client (local/persistent mode)
+        try:
+            self.client = QdrantClient(path=str(self.path))
+            logger.info(f"Initialized Qdrant client at {self.path}")
+        except Exception as e:
+            logger.error(f"Failed to initialize Qdrant client: {e}")
+            raise
+        self.console = Console()
+    def initialize_collection(self, recreate: bool = False) -> None:
+        """
+        Initialize the Qdrant collection with vector configurations.
+        Creates collection with:
+        - Dense vectors for semantic search (cosine similarity)
+        - Sparse vectors for BM25/keyword search
+        - Payload indexing for metadata filtering
+        Args:
+            recreate: If True, delete existing collection and recreate
+        """
+        try:
+            # Check if collection exists
+            collections = self.client.get_collections().collections
+            collection_exists = any(c.name == self.collection_name for c in collections)
+            if collection_exists:
+                if recreate:
+                    self.console.print(
+                        f"[yellow]Deleting existing collection: {self.collection_name}[/yellow]"
+                    )
+                    self.client.delete_collection(self.collection_name)
+                else:
+                    self.console.print(
+                        f"[blue]Collection already exists: {self.collection_name}[/blue]"
+                    )
+                    return
+            # Create collection with dense and sparse vector configurations
+            self.console.print(f"[cyan]Creating collection: {self.collection_name}[/cyan]")
+            self.client.create_collection(
+                collection_name=self.collection_name,
+                vectors_config={
+                    # Dense vector for semantic search
+                    "dense": VectorParams(
+                        size=self.embedding_dim,
+                        distance=Distance.COSINE,
+                    ),
+                },
+                sparse_vectors_config={
+                    # Sparse vector for BM25/keyword search
+                    "sparse": SparseVectorParams(
+                        index=SparseIndexParams(
+                            on_disk=False,  # Keep in memory for speed
+                        ),
+                    ),
+                },
+            )
+            # Create payload indexes for efficient filtering
+            # Index on key metadata fields
+            self.client.create_payload_index(
+                collection_name=self.collection_name,
+                field_name="document_title",
+                field_schema="keyword",
+            )
+            self.client.create_payload_index(
+                collection_name=self.collection_name,
+                field_name="parent_section",
+                field_schema="keyword",
+            )
+            self.client.create_payload_index(
+                collection_name=self.collection_name,
+                field_name="metadata.disease_name",
+                field_schema="keyword",
+            )
+            self.client.create_payload_index(
+                collection_name=self.collection_name,
+                field_name="metadata.icd_codes",
+                field_schema="keyword",
+            )
+            self.console.print(
+                f"[green][/green] Collection created: {self.collection_name}"
+            )
+            logger.info(f"Created collection: {self.collection_name}")
+        except Exception as e:
+            logger.error(f"Failed to initialize collection: {e}")
+            raise
+    def add_documents(
+        self,
+        chunks: List[ChunkNode],
+        dense_embeddings: List[List[float]],
+        sparse_embeddings: Optional[List[Dict]] = None,
+    ) -> int:
+        """
+        Add documents to the vector store with batched upserts.
+        Args:
+            chunks: List of ChunkNode objects
+            dense_embeddings: List of dense embedding vectors
+            sparse_embeddings: Optional list of sparse vectors (for BM25)
+        Returns:
+            Number of documents successfully added
+        Raises:
+            ValueError: If chunks and embeddings length mismatch
+        """
+        if len(chunks) != len(dense_embeddings):
+            raise ValueError(
+                f"Chunks ({len(chunks)}) and embeddings ({len(dense_embeddings)}) "
+                "must have same length"
+            )
+        if sparse_embeddings and len(sparse_embeddings) != len(chunks):
+            raise ValueError(
+                f"Chunks ({len(chunks)}) and sparse embeddings ({len(sparse_embeddings)}) "
+                "must have same length"
+            )
+        total_added = 0
+        try:
+            # Process in batches
+            for i in range(0, len(chunks), self.batch_size):
+                batch_chunks = chunks[i : i + self.batch_size]
+                batch_dense = dense_embeddings[i : i + self.batch_size]
+                batch_sparse = (
+                    sparse_embeddings[i : i + self.batch_size]
+                    if sparse_embeddings
+                    else None
+                )
+                # Create points for batch
+                points = []
+                for j, chunk in enumerate(batch_chunks):
+                    # Prepare vector dict
+                    vectors = {"dense": batch_dense[j]}
+                    # Add sparse vector if available
+                    if batch_sparse:
+                        vectors["sparse"] = batch_sparse[j]
+                    # Create point
+                    point_id = str(uuid.uuid5(uuid.NAMESPACE_DNS, chunk.chunk_id))
+                    point = PointStruct(
+                        id=point_id,
+                        vector=vectors,
+                        payload={
+                            "chunk_id": chunk.chunk_id,
+                            "content": chunk.content,
+                            "parent_section": chunk.parent_section,
+                            "document_title": chunk.document_title,
+                            "source_url": chunk.source_url,
+                            "chunk_index": chunk.chunk_index,
+                            "token_count": chunk.token_count,
+                            "metadata": chunk.metadata,
+                        },
+                    )
+                    points.append(point)
+                # Upsert batch
+                self.client.upsert(
+                    collection_name=self.collection_name,
+                    points=points,
+                )
+                total_added += len(points)
+                logger.info(
+                    f"Uploaded batch {i // self.batch_size + 1}: "
+                    f"{len(points)} points (total: {total_added})"
+                )
+            self.console.print(
+                f"[green][/green] Added {total_added} documents to {self.collection_name}"
+            )
+            return total_added
+        except Exception as e:
+            logger.error(f"Failed to add documents: {e}")
+            raise
+    def search(
+        self,
+        query_embedding: List[float],
+        top_k: int = 10,
+        filters: Optional[Dict] = None,
+        score_threshold: Optional[float] = None,
+    ) -> List[SearchResult]:
+        """
+        Search using dense vector (semantic search).
+        Args:
+            query_embedding: Dense query vector
+            top_k: Number of results to return
+            filters: Optional metadata filters (e.g., {"disease_name": "Glaucoma"})
+            score_threshold: Minimum score threshold
+        Returns:
+            List of SearchResult objects
+        """
+        try:
+            # Build filter conditions
+            query_filter = self._build_filter(filters) if filters else None
+            # Perform search
+            results = self.client.query_points(
+                collection_name=self.collection_name,
+                query=query_embedding,
+                using="dense",  # Specify which named vector to use
+                limit=top_k,
+                query_filter=query_filter,
+                score_threshold=score_threshold,
+            ).points
+            # Convert to SearchResult objects
+            search_results = [SearchResult.from_scored_point(r) for r in results]
+            logger.info(f"Dense search returned {len(search_results)} results")
+            return search_results
+        except Exception as e:
+            logger.error(f"Search failed: {e}")
+            raise
+    def hybrid_search(
+        self,
+        query_embedding: List[float],
+        query_sparse: Optional[Dict] = None,
+        top_k: int = 10,
+        filters: Optional[Dict] = None,
+    ) -> List[SearchResult]:
+        """
+        Hybrid search combining dense (semantic) and sparse (BM25) vectors.
+        Args:
+            query_embedding: Dense query vector
+            query_sparse: Sparse query vector for BM25
+            top_k: Number of results to return
+            filters: Optional metadata filters
+        Returns:
+            List of SearchResult objects with combined scores
+        """
+        try:
+            # If no sparse vector provided, fall back to dense search
+            if query_sparse is None:
+                logger.warning("No sparse vector provided, using dense search only")
+                return self.search(query_embedding, top_k, filters)
+            # Build filter conditions
+            query_filter = self._build_filter(filters) if filters else None
+            # Perform hybrid search
+            # Note: Qdrant supports multiple vectors in search, but for true hybrid
+            # we'd need to do two separate searches and merge results
+            # For simplicity, we'll use the query API with dense vector
+            # In production, you'd want to implement proper RRF (Reciprocal Rank Fusion)
+            results = self.client.query_points(
+                collection_name=self.collection_name,
+                query=query_embedding,
+                using="dense",  # Specify which named vector to use
+                limit=top_k * 2,  # Get more results for reranking
+                query_filter=query_filter,
+            ).points
+            # Convert to SearchResult objects
+            search_results = [SearchResult.from_scored_point(r) for r in results]
+            # For now, return top_k results
+            # In production, implement RRF combining dense and sparse results
+            logger.info(f"Hybrid search returned {len(search_results[:top_k])} results")
+            return search_results[:top_k]
+        except Exception as e:
+            logger.error(f"Hybrid search failed: {e}")
+            raise
+    def _build_filter(self, filters: Dict) -> Filter:
+        """
+        Build Qdrant filter from dictionary.
+        Supports:
+        - disease_name: str
+        - icd_codes: List[str]
+        - anatomical_structures: List[str]
+        - document_title: str
+        - parent_section: str
+        Args:
+            filters: Dictionary of filter conditions
+        Returns:
+            Qdrant Filter object
+        """
+        conditions = []
+        # Disease name filter
+        if "disease_name" in filters:
+            conditions.append(
+                FieldCondition(
+                    key="metadata.disease_name",
+                    match=MatchValue(value=filters["disease_name"]),
+                )
+            )
+        # ICD codes filter (match any)
+        if "icd_codes" in filters:
+            icd_list = filters["icd_codes"]
+            if isinstance(icd_list, str):
+                icd_list = [icd_list]
+            conditions.append(
+                FieldCondition(
+                    key="metadata.icd_codes",
+                    match=MatchAny(any=icd_list),
+                )
+            )
+        # Anatomical structures filter
+        if "anatomical_structures" in filters:
+            structures = filters["anatomical_structures"]
+            if isinstance(structures, str):
+                structures = [structures]
+            conditions.append(
+                FieldCondition(
+                    key="metadata.anatomical_structures",
+                    match=MatchAny(any=structures),
+                )
+            )
+        # Document title filter
+        if "document_title" in filters:
+            conditions.append(
+                FieldCondition(
+                    key="document_title",
+                    match=MatchValue(value=filters["document_title"]),
+                )
+            )
+        # Parent section filter
+        if "parent_section" in filters:
+            conditions.append(
+                FieldCondition(
+                    key="parent_section",
+                    match=MatchValue(value=filters["parent_section"]),
+                )
+            )
+        # Token count range filter
+        if "min_tokens" in filters or "max_tokens" in filters:
+            range_filter = {}
+            if "min_tokens" in filters:
+                range_filter["gte"] = filters["min_tokens"]
+            if "max_tokens" in filters:
+                range_filter["lte"] = filters["max_tokens"]
+            conditions.append(
+                FieldCondition(
+                    key="token_count",
+                    range=Range(**range_filter),
+                )
+            )
+        return Filter(must=conditions) if conditions else None
+    def get_collection_info(self) -> Dict:
+        """
+        Get information about the collection.
+        Returns:
+            Dictionary with collection statistics
+        """
+        try:
+            info = self.client.get_collection(self.collection_name)
+            return {
+                "name": self.collection_name,
+                "vectors_count": getattr(info, "vectors_count", 0),
+                "points_count": info.points_count,
+                "status": info.status,
+                "optimizer_status": info.optimizer_status,
+                "indexed_vectors_count": getattr(info, "indexed_vectors_count", 0),
+            }
+        except Exception as e:
+            logger.error(f"Failed to get collection info: {e}")
+            raise
+    def delete_collection(self) -> bool:
+        """
+        Delete the collection.
+        Returns:
+            True if successful
+        """
+        try:
+            result = self.client.delete_collection(self.collection_name)
+            self.console.print(
+                f"[yellow]Deleted collection: {self.collection_name}[/yellow]"
+            )
+            logger.info(f"Deleted collection: {self.collection_name}")
+            return result
+        except Exception as e:
+            logger.error(f"Failed to delete collection: {e}")
+            raise
+    def count_documents(self) -> int:
+        """
+        Count total documents in collection.
+        Returns:
+            Number of documents
+        """
+        try:
+            info = self.client.get_collection(self.collection_name)
+            return info.points_count or 0
+        except Exception as e:
+            logger.error(f"Failed to count documents: {e}")
+            return 0
+    def get_document_by_id(self, doc_id: str) -> Optional[SearchResult]:
+        """
+        Retrieve a specific document by ID.
+        Args:
+            doc_id: Document ID (chunk_id)
+        Returns:
+            SearchResult if found, None otherwise
+        """
+        try:
+            points = self.client.retrieve(
+                collection_name=self.collection_name,
+                ids=[doc_id],
+            )
+            if not points:
+                return None
+            point = points[0]
+            payload = point.payload or {}
+            return SearchResult(
+                id=str(point.id),
+                score=1.0,  # No score for direct retrieval
+                chunk_id=payload.get("chunk_id", ""),
+                content=payload.get("content", ""),
+                parent_section=payload.get("parent_section", ""),
+                document_title=payload.get("document_title", ""),
+                source_url=payload.get("source_url", ""),
+                metadata=payload.get("metadata", {}),
+            )
+        except Exception as e:
+            logger.error(f"Failed to get document by ID: {e}")
+            return None

tests/README.md ADDED Viewed

	@@ -0,0 +1,172 @@

+# Tests
+Comprehensive test suite for the EyeWiki RAG system.
+## Installation
+Install test dependencies:
+```bash
+pip install pytest pytest-cov pytest-mock requests
+```
+## Running Tests
+### Run all tests:
+```bash
+pytest
+```
+### Run with verbose output:
+```bash
+pytest -v
+```
+### Run specific test file:
+```bash
+pytest tests/test_components.py -v
+```
+### Run specific test:
+```bash
+pytest tests/test_components.py::test_chunk_respects_headers -v
+```
+### Run tests by marker:
+```bash
+# Run only unit tests
+pytest -m unit
+# Run only integration tests
+pytest -m integration
+# Run only API tests
+pytest -m api
+```
+### Run with coverage:
+```bash
+pytest --cov=src --cov-report=html
+```
+This will generate a coverage report in `htmlcov/index.html`.
+## Test Categories
+### Unit Tests (`@pytest.mark.unit`)
+- Fast, isolated tests
+- Mock external dependencies
+- Test individual components
+### Integration Tests (`@pytest.mark.integration`)
+- Test multiple components together
+- May be slower
+- May require real dependencies
+### API Tests (`@pytest.mark.api`)
+- Test FastAPI endpoints
+- Require server components
+- Use TestClient
+## Test Structure
+### Chunker Tests
+- `test_chunk_respects_headers()` - Verifies markdown header handling
+- `test_chunk_size_limits()` - Checks chunk size constraints
+- `test_metadata_preserved()` - Ensures metadata propagation
+### Retriever Tests
+- `test_retrieval_returns_results()` - Basic retrieval functionality
+- `test_hybrid_search_combines_scores()` - Score combination logic
+- `test_filters_work()` - Metadata filtering
+### Reranker Tests
+- `test_reranking_changes_order()` - Verifies reranking effect
+- `test_top_k_respected()` - Checks top_k parameter
+### Query Engine Tests
+- `test_full_query_pipeline()` - End-to-end query flow
+- `test_sources_included()` - Source citation functionality
+- `test_disclaimer_present()` - Medical disclaimer inclusion
+- `test_streaming_query()` - Streaming response
+### API Tests
+- `test_health_endpoint()` - Health check endpoint
+- `test_query_endpoint()` - Main query endpoint
+- `test_query_endpoint_validation()` - Input validation
+### Metadata Tests
+- `test_icd_code_extraction()` - ICD-10 code extraction
+- `test_anatomical_term_extraction()` - Anatomical term detection
+- `test_medication_extraction()` - Medication identification
+## Fixtures
+Reusable test fixtures are defined in `test_components.py`:
+- `semantic_chunker` - ChunkerSemanticChunker instance
+- `metadata_extractor` - MetadataExtractor instance
+- `sample_chunks` - Sample ChunkNode objects
+- `mock_retriever` - Mocked HybridRetriever
+- `mock_reranker` - Mocked CrossEncoderReranker
+- `mock_ollama_client` - Mocked OllamaClient
+- `query_engine` - Fully configured QueryEngine with mocks
+- `test_client` - FastAPI TestClient
+## Writing New Tests
+### Example unit test:
+```python
+@pytest.mark.unit
+def test_my_component(my_fixture):
+    """Test description."""
+    result = my_fixture.some_method()
+    assert result == expected_value
+```
+### Example integration test:
+```python
+@pytest.mark.integration
+def test_component_interaction():
+    """Test multiple components together."""
+    # Setup
+    component_a = ComponentA()
+    component_b = ComponentB(component_a)
+    # Test
+    result = component_b.process()
+    # Assert
+    assert result.is_valid()
+```
+### Example API test:
+```python
+@pytest.mark.api
+def test_my_endpoint(test_client):
+    """Test API endpoint."""
+    response = test_client.get("/my-endpoint")
+    assert response.status_code == 200
+    assert "expected_field" in response.json()
+```
+## Continuous Integration
+These tests are designed to run in CI/CD pipelines. Mock external dependencies (Ollama, Qdrant) to ensure tests run in any environment.
+## Troubleshooting
+### Import Errors
+Make sure the project root is in PYTHONPATH:
+```bash
+export PYTHONPATH=/path/to/eyewiki-rag:$PYTHONPATH
+```
+### Mock Issues
+If mocks aren't working properly, check that you're using the correct spec:
+```python
+mock = Mock(spec=RealClass)
+```
+### API Tests Failing
+API tests may fail if the application isn't properly initialized. Use mocking to isolate components.

tests/__init__.py ADDED Viewed

File without changes

tests/conftest.py ADDED Viewed

	@@ -0,0 +1,24 @@

+"""
+Pytest configuration and shared fixtures.
+"""
+import sys
+from pathlib import Path
+# Add project root to Python path
+project_root = Path(__file__).parent.parent
+sys.path.insert(0, str(project_root))
+def pytest_configure(config):
+    """Configure pytest."""
+    # Add custom markers
+    config.addinivalue_line(
+        "markers", "integration: mark test as integration test (may be slow)"
+    )
+    config.addinivalue_line(
+        "markers", "api: mark test as API test (requires server components)"
+    )
+    config.addinivalue_line(
+        "markers", "unit: mark test as unit test (fast, isolated)"
+    )

tests/test_components.py ADDED Viewed

	@@ -0,0 +1,699 @@

+"""
+Comprehensive tests for EyeWiki RAG components.
+Run with:
+    pytest tests/test_components.py -v
+    pytest tests/test_components.py::test_chunk_respects_headers -v
+"""
+import pytest
+from pathlib import Path
+from unittest.mock import Mock, patch, MagicMock
+from typing import List
+from src.processing.chunker import ChunkNode, SemanticChunker
+from src.processing.metadata_extractor import MetadataExtractor
+from src.rag.retriever import HybridRetriever, RetrievalResult
+from src.rag.reranker import CrossEncoderReranker
+from src.rag.query_engine import EyeWikiQueryEngine, QueryResponse, SourceInfo
+# ============================================================================
+# Test Data
+# ============================================================================
+SAMPLE_MARKDOWN = """# Glaucoma
+## Overview
+Glaucoma is a group of eye conditions that damage the optic nerve.
+## Symptoms
+Common symptoms include:
+- Vision loss
+- Eye pain
+- Halos around lights
+## Treatment
+Treatment options include:
+- Medications (IOP-lowering drops)
+- Laser procedures
+- Surgery
+### Medications
+Beta-blockers and prostaglandin analogs are commonly used.
+### Surgery
+Trabeculectomy is a common surgical procedure.
+"""
+SAMPLE_METADATA = {
+    "title": "Glaucoma",
+    "url": "https://eyewiki.aao.org/Glaucoma",
+    "source": "eyewiki",
+}
+# ============================================================================
+# Fixtures
+# ============================================================================
+@pytest.fixture
+def semantic_chunker():
+    """Create a SemanticChunker instance."""
+    return SemanticChunker(
+        chunk_size=200,
+        chunk_overlap=20,
+        min_chunk_size=50,
+    )
+@pytest.fixture
+def metadata_extractor():
+    """Create a MetadataExtractor instance."""
+    return MetadataExtractor()
+@pytest.fixture
+def sample_chunks():
+    """Create sample retrieval results for testing."""
+    return [
+        ChunkNode(
+            id="chunk_1",
+            content="Glaucoma is characterized by elevated intraocular pressure (IOP).",
+            document_title="Glaucoma",
+            source_url="https://eyewiki.aao.org/Glaucoma",
+            parent_section="Overview",
+            metadata={"icd_codes": ["H40.1"], "anatomical_terms": ["optic nerve"]},
+            chunk_index=0,
+            total_chunks=5,
+        ),
+        ChunkNode(
+            id="chunk_2",
+            content="Treatment includes beta-blockers and prostaglandin analogs.",
+            document_title="Glaucoma",
+            source_url="https://eyewiki.aao.org/Glaucoma",
+            parent_section="Treatment",
+            metadata={"medications": ["beta-blockers", "prostaglandin analogs"]},
+            chunk_index=1,
+            total_chunks=5,
+        ),
+        ChunkNode(
+            id="chunk_3",
+            content="Diabetic retinopathy affects the retinal blood vessels.",
+            document_title="Diabetic Retinopathy",
+            source_url="https://eyewiki.aao.org/Diabetic_Retinopathy",
+            parent_section="Overview",
+            metadata={"icd_codes": ["E11.3"], "anatomical_terms": ["retina"]},
+            chunk_index=0,
+            total_chunks=3,
+        ),
+    ]
+@pytest.fixture
+def mock_retriever(sample_chunks):
+    """Create a mock HybridRetriever."""
+    retriever = Mock(spec=HybridRetriever)
+    # Convert ChunkNodes to RetrievalResults
+    retrieval_results = [
+        RetrievalResult(
+            id=chunk.id,
+            content=chunk.content,
+            document_title=chunk.document_title,
+            source_url=chunk.source_url,
+            section=chunk.parent_section,
+            metadata=chunk.metadata,
+            score=0.9 - (i * 0.1),  # Decreasing scores
+        )
+        for i, chunk in enumerate(sample_chunks)
+    ]
+    retriever.retrieve.return_value = retrieval_results
+    return retriever
+@pytest.fixture
+def mock_reranker():
+    """Create a mock CrossEncoderReranker."""
+    reranker = Mock(spec=CrossEncoderReranker)
+    def rerank_func(query: str, documents: List[RetrievalResult], top_k: int):
+        # Reverse order to simulate reranking
+        reranked = list(reversed(documents[:top_k]))
+        # Update scores
+        for i, doc in enumerate(reranked):
+            doc.score = 0.95 - (i * 0.05)
+        return reranked
+    reranker.rerank.side_effect = rerank_func
+    return reranker
+@pytest.fixture
+def mock_ollama_client():
+    """Create a mock OllamaClient."""
+    client = Mock()
+    client.generate.return_value = (
+        "Glaucoma is a group of eye diseases that damage the optic nerve. "
+        "It is often associated with elevated intraocular pressure (IOP). "
+        "[Source: Glaucoma]"
+    )
+    client.stream_generate.return_value = iter(["Glaucoma ", "is ", "a disease."])
+    client.embed_text.return_value = [0.1] * 768
+    return client
+@pytest.fixture
+def query_engine(mock_retriever, mock_reranker, mock_ollama_client, tmp_path):
+    """Create a QueryEngine instance with mocked dependencies."""
+    # Create temporary prompt files
+    system_prompt = tmp_path / "system_prompt.txt"
+    system_prompt.write_text("You are an expert ophthalmology assistant.")
+    query_prompt = tmp_path / "query_prompt.txt"
+    query_prompt.write_text("Context: {context}\n\nQuestion: {question}\n\nAnswer:")
+    disclaimer = tmp_path / "disclaimer.txt"
+    disclaimer.write_text("Medical disclaimer text.")
+    return EyeWikiQueryEngine(
+        retriever=mock_retriever,
+        reranker=mock_reranker,
+        llm_client=mock_ollama_client,
+        system_prompt_path=system_prompt,
+        query_prompt_path=query_prompt,
+        disclaimer_path=disclaimer,
+        max_context_tokens=4000,
+        retrieval_k=20,
+        rerank_k=5,
+    )
+# ============================================================================
+# Chunker Tests
+# ============================================================================
+def test_chunk_respects_headers(semantic_chunker):
+    """Test that chunker respects markdown headers."""
+    chunks = semantic_chunker.chunk_document(
+        markdown_content=SAMPLE_MARKDOWN,
+        metadata=SAMPLE_METADATA,
+    )
+    # Should have multiple chunks based on headers
+    assert len(chunks) > 0
+    # Check that parent sections are correctly identified
+    sections = {chunk.parent_section for chunk in chunks}
+    assert "Overview" in sections or "Symptoms" in sections or "Treatment" in sections
+    # Verify each chunk has required fields
+    for chunk in chunks:
+        assert chunk.content
+        assert chunk.document_title == "Glaucoma"
+        assert chunk.source_url == SAMPLE_METADATA["url"]
+        assert chunk.id
+        assert isinstance(chunk.chunk_index, int)
+        assert isinstance(chunk.total_chunks, int)
+def test_chunk_size_limits(semantic_chunker):
+    """Test that chunks respect size limits."""
+    # Create a very long section
+    long_text = "This is a test sentence. " * 200  # Very long text
+    long_markdown = f"# Test\n\n## Section\n\n{long_text}"
+    chunks = semantic_chunker.chunk_document(
+        markdown_content=long_markdown,
+        metadata=SAMPLE_METADATA,
+    )
+    # All chunks should respect min size
+    for chunk in chunks:
+        # Token estimation: len(text) // 4
+        estimated_tokens = len(chunk.content) // 4
+        # Should not be too small (unless it's the last chunk)
+        if chunk.chunk_index < chunk.total_chunks - 1:
+            assert estimated_tokens >= semantic_chunker.min_chunk_size
+    # Should have created multiple chunks for long text
+    assert len(chunks) > 1
+def test_metadata_preserved(semantic_chunker):
+    """Test that metadata is preserved in chunks."""
+    custom_metadata = {
+        "title": "Test Document",
+        "url": "https://example.com/test",
+        "custom_field": "custom_value",
+    }
+    chunks = semantic_chunker.chunk_document(
+        markdown_content=SAMPLE_MARKDOWN,
+        metadata=custom_metadata,
+    )
+    # All chunks should have the same base metadata
+    for chunk in chunks:
+        assert chunk.document_title == custom_metadata["title"]
+        assert chunk.source_url == custom_metadata["url"]
+# ============================================================================
+# Retriever Tests
+# ============================================================================
+def test_retrieval_returns_results(mock_retriever):
+    """Test that retriever returns results."""
+    query = "What is glaucoma?"
+    results = mock_retriever.retrieve(query=query, top_k=10)
+    assert len(results) > 0
+    assert all(isinstance(r, RetrievalResult) for r in results)
+    # Verify result structure
+    for result in results:
+        assert result.id
+        assert result.content
+        assert result.document_title
+        assert result.source_url
+        assert 0 <= result.score <= 1
+def test_hybrid_search_combines_scores(mock_retriever):
+    """Test that hybrid search returns combined scores."""
+    query = "glaucoma treatment"
+    results = mock_retriever.retrieve(query=query, top_k=5)
+    # Scores should be in descending order
+    scores = [r.score for r in results]
+    assert scores == sorted(scores, reverse=True)
+    # All scores should be valid
+    assert all(0 <= score <= 1 for score in scores)
+def test_filters_work(mock_retriever):
+    """Test that metadata filters work."""
+    # Add filter functionality to mock
+    def retrieve_with_filter(query: str, top_k: int, filters: dict = None):
+        results = mock_retriever.retrieve(query=query, top_k=top_k)
+        if filters:
+            # Simple filter implementation for testing
+            filtered = []
+            for r in results:
+                if "disease_name" in filters:
+                    if filters["disease_name"] in r.document_title:
+                        filtered.append(r)
+                else:
+                    filtered.append(r)
+            return filtered
+        return results
+    mock_retriever.retrieve.side_effect = retrieve_with_filter
+    # Test with filter
+    results = mock_retriever.retrieve(
+        query="treatment",
+        top_k=10,
+        filters={"disease_name": "Glaucoma"}
+    )
+    # All results should match filter
+    assert all("Glaucoma" in r.document_title for r in results)
+# ============================================================================
+# Reranker Tests
+# ============================================================================
+def test_reranking_changes_order(mock_reranker, sample_chunks):
+    """Test that reranking changes result order."""
+    # Convert to RetrievalResults
+    results = [
+        RetrievalResult(
+            id=chunk.id,
+            content=chunk.content,
+            document_title=chunk.document_title,
+            source_url=chunk.source_url,
+            section=chunk.parent_section,
+            metadata=chunk.metadata,
+            score=0.5,  # All same initial score
+        )
+        for chunk in sample_chunks
+    ]
+    original_order = [r.id for r in results]
+    reranked = mock_reranker.rerank(
+        query="What is glaucoma?",
+        documents=results,
+        top_k=3,
+    )
+    reranked_order = [r.id for r in reranked]
+    # Order should change (due to our mock reversing the order)
+    assert reranked_order != original_order
+def test_top_k_respected(mock_reranker, sample_chunks):
+    """Test that reranker respects top_k parameter."""
+    results = [
+        RetrievalResult(
+            id=chunk.id,
+            content=chunk.content,
+            document_title=chunk.document_title,
+            source_url=chunk.source_url,
+            section=chunk.parent_section,
+            metadata=chunk.metadata,
+            score=0.5,
+        )
+        for chunk in sample_chunks
+    ]
+    top_k = 2
+    reranked = mock_reranker.rerank(
+        query="treatment options",
+        documents=results,
+        top_k=top_k,
+    )
+    # Should return exactly top_k results
+    assert len(reranked) == top_k
+# ============================================================================
+# Query Engine Tests
+# ============================================================================
+def test_full_query_pipeline(query_engine):
+    """Test the full query pipeline."""
+    query = "What is glaucoma?"
+    response = query_engine.query(
+        question=query,
+        include_sources=True,
+    )
+    # Verify response structure
+    assert isinstance(response, QueryResponse)
+    assert response.answer
+    assert response.query == query
+    assert 0 <= response.confidence <= 1
+    assert response.disclaimer
+def test_sources_included(query_engine):
+    """Test that sources are included in response."""
+    response = query_engine.query(
+        question="What is glaucoma?",
+        include_sources=True,
+    )
+    # Should have sources
+    assert len(response.sources) > 0
+    # Verify source structure
+    for source in response.sources:
+        assert isinstance(source, SourceInfo)
+        assert source.title
+        assert source.url
+        assert 0 <= source.relevance_score <= 1
+def test_disclaimer_present(query_engine):
+    """Test that medical disclaimer is present."""
+    response = query_engine.query(
+        question="How is glaucoma treated?",
+        include_sources=True,
+    )
+    # Disclaimer should be present
+    assert response.disclaimer
+    assert len(response.disclaimer) > 0
+def test_query_without_sources(query_engine):
+    """Test query with sources disabled."""
+    response = query_engine.query(
+        question="What is glaucoma?",
+        include_sources=False,
+    )
+    # Should still have answer
+    assert response.answer
+    # Sources should be empty
+    assert len(response.sources) == 0
+def test_streaming_query(query_engine):
+    """Test streaming query functionality."""
+    chunks = list(query_engine.stream_query(
+        question="What is glaucoma?",
+    ))
+    # Should have received chunks
+    assert len(chunks) > 0
+    # All chunks should be strings
+    assert all(isinstance(chunk, str) for chunk in chunks)
+def test_confidence_calculation(query_engine):
+    """Test confidence score calculation."""
+    response = query_engine.query(
+        question="What is glaucoma?",
+        include_sources=True,
+    )
+    # Confidence should be calculated
+    assert response.confidence is not None
+    assert 0 <= response.confidence <= 1
+    # With high-scoring retrieval results, confidence should be high
+    # (Our mock returns scores like 0.9, 0.8, 0.7)
+    assert response.confidence > 0.5
+def test_empty_retrieval_results(query_engine, mock_retriever):
+    """Test handling of empty retrieval results."""
+    # Mock retriever to return empty list
+    mock_retriever.retrieve.return_value = []
+    response = query_engine.query(
+        question="What is xyzabc?",  # Non-existent topic
+        include_sources=True,
+    )
+    # Should still return a response
+    assert response.answer
+    assert "couldn't find" in response.answer.lower() or "no results" in response.answer.lower()
+    assert len(response.sources) == 0
+    assert response.confidence == 0.0
+# ============================================================================
+# API Tests
+# ============================================================================
+@pytest.fixture
+def test_client():
+    """Create a test client for FastAPI."""
+    from fastapi.testclient import TestClient
+    from src.api.main import app
+    return TestClient(app)
+def test_health_endpoint(test_client):
+    """Test the health check endpoint."""
+    response = test_client.get("/health")
+    # Should return 200 or 503 depending on initialization
+    assert response.status_code in [200, 503]
+    # Should have JSON response
+    data = response.json()
+    assert "status" in data
+    assert "timestamp" in data
+def test_root_endpoint(test_client):
+    """Test the root endpoint."""
+    response = test_client.get("/")
+    assert response.status_code == 200
+    data = response.json()
+    assert "name" in data
+    assert "version" in data
+    assert "endpoints" in data
+def test_query_endpoint(test_client):
+    """Test the query endpoint."""
+    # Note: This will likely fail if system is not fully initialized
+    # In real testing, you'd mock the app_state
+    response = test_client.post(
+        "/query",
+        json={
+            "question": "What is glaucoma?",
+            "include_sources": True,
+        }
+    )
+    # Should return 200 if initialized, 503 if not
+    assert response.status_code in [200, 503]
+    if response.status_code == 200:
+        data = response.json()
+        assert "answer" in data
+        assert "query" in data
+        assert "confidence" in data
+        assert "disclaimer" in data
+def test_query_endpoint_validation(test_client):
+    """Test query endpoint input validation."""
+    # Test with invalid input
+    response = test_client.post(
+        "/query",
+        json={
+            "question": "",  # Empty question
+        }
+    )
+    # Should return validation error
+    assert response.status_code == 422  # Unprocessable Entity
+def test_stats_endpoint(test_client):
+    """Test the stats endpoint."""
+    response = test_client.get("/stats")
+    # Should return 200 if initialized, 503 if not
+    assert response.status_code in [200, 503, 404]
+    if response.status_code == 200:
+        data = response.json()
+        assert "collection_info" in data
+        assert "pipeline_config" in data
+        assert "documents_indexed" in data
+# ============================================================================
+# Metadata Extractor Tests
+# ============================================================================
+def test_icd_code_extraction(metadata_extractor):
+    """Test ICD-10 code extraction."""
+    text = "Patient diagnosed with H40.1 (Primary open-angle glaucoma) and E11.3 (Type 2 diabetes with ophthalmic complications)."
+    icd_codes = metadata_extractor.extract_icd_codes(text)
+    assert "H40.1" in icd_codes
+    assert "E11.3" in icd_codes
+def test_anatomical_term_extraction(metadata_extractor):
+    """Test anatomical term extraction."""
+    text = "The optic nerve and retina are affected. The cornea appears normal."
+    terms = metadata_extractor.extract_anatomical_terms(text)
+    assert "optic nerve" in terms
+    assert "retina" in terms
+    assert "cornea" in terms
+def test_medication_extraction(metadata_extractor):
+    """Test medication extraction."""
+    text = "Prescribed latanoprost and timolol for IOP reduction."
+    medications = metadata_extractor.extract_medications(text)
+    assert "latanoprost" in medications or "timolol" in medications
+def test_full_metadata_extraction(metadata_extractor):
+    """Test full metadata extraction."""
+    text = """
+    Patient with H40.1 primary open-angle glaucoma affecting the optic nerve.
+    Prescribed latanoprost drops. Vision loss and eye pain reported.
+    """
+    metadata = metadata_extractor.extract(text, existing_metadata={})
+    # Should extract various metadata
+    assert "icd_codes" in metadata
+    assert "anatomical_terms" in metadata
+    assert "medications" in metadata
+    assert "symptoms" in metadata
+# ============================================================================
+# Integration Tests
+# ============================================================================
+def test_end_to_end_chunk_to_query():
+    """Test end-to-end flow from chunking to query (with mocks)."""
+    # 1. Chunk document
+    chunker = SemanticChunker(chunk_size=200, chunk_overlap=20)
+    chunks = chunker.chunk_document(
+        markdown_content=SAMPLE_MARKDOWN,
+        metadata=SAMPLE_METADATA,
+    )
+    assert len(chunks) > 0
+    # 2. Convert to retrieval results
+    results = [
+        RetrievalResult(
+            id=chunk.id,
+            content=chunk.content,
+            document_title=chunk.document_title,
+            source_url=chunk.source_url,
+            section=chunk.parent_section,
+            metadata=chunk.metadata,
+            score=0.8,
+        )
+        for chunk in chunks[:3]
+    ]
+    # 3. Mock reranker
+    reranker = Mock(spec=CrossEncoderReranker)
+    reranker.rerank.return_value = results[:2]
+    # 4. Mock LLM
+    llm = Mock()
+    llm.generate.return_value = "Glaucoma is an eye disease."
+    # 5. Mock retriever
+    retriever = Mock(spec=HybridRetriever)
+    retriever.retrieve.return_value = results
+    # 6. Create query engine
+    engine = EyeWikiQueryEngine(
+        retriever=retriever,
+        reranker=reranker,
+        llm_client=llm,
+        max_context_tokens=4000,
+        retrieval_k=20,
+        rerank_k=5,
+    )
+    # 7. Query
+    response = engine.query("What is glaucoma?")
+    assert response.answer
+    assert response.confidence > 0

tests/test_questions.json ADDED Viewed

	@@ -0,0 +1,245 @@

+[
+  {
+    "id": "q1",
+    "question": "What are the main symptoms of glaucoma?",
+    "expected_topics": [
+      "vision loss",
+      "peripheral vision",
+      "eye pressure",
+      "optic nerve damage",
+      "blind spots"
+    ],
+    "expected_sources": [
+      "Glaucoma",
+      "Primary Open-Angle Glaucoma"
+    ],
+    "category": "symptoms"
+  },
+  {
+    "id": "q2",
+    "question": "How is diabetic retinopathy treated?",
+    "expected_topics": [
+      "laser treatment",
+      "anti-VEGF",
+      "photocoagulation",
+      "vitrectomy",
+      "blood sugar control"
+    ],
+    "expected_sources": [
+      "Diabetic Retinopathy",
+      "Proliferative Diabetic Retinopathy"
+    ],
+    "category": "treatment"
+  },
+  {
+    "id": "q3",
+    "question": "What causes age-related macular degeneration?",
+    "expected_topics": [
+      "aging",
+      "macula",
+      "drusen",
+      "photoreceptor",
+      "central vision"
+    ],
+    "expected_sources": [
+      "Age-Related Macular Degeneration",
+      "AMD",
+      "Macular Degeneration"
+    ],
+    "category": "etiology"
+  },
+  {
+    "id": "q4",
+    "question": "What is the difference between open-angle and angle-closure glaucoma?",
+    "expected_topics": [
+      "drainage angle",
+      "trabecular meshwork",
+      "acute",
+      "chronic",
+      "iridotomy"
+    ],
+    "expected_sources": [
+      "Glaucoma",
+      "Primary Open-Angle Glaucoma",
+      "Angle-Closure Glaucoma"
+    ],
+    "category": "classification"
+  },
+  {
+    "id": "q5",
+    "question": "What are the risk factors for cataracts?",
+    "expected_topics": [
+      "age",
+      "diabetes",
+      "UV exposure",
+      "smoking",
+      "steroid"
+    ],
+    "expected_sources": [
+      "Cataract",
+      "Age-Related Cataract"
+    ],
+    "category": "risk_factors"
+  },
+  {
+    "id": "q6",
+    "question": "How is retinal detachment diagnosed?",
+    "expected_topics": [
+      "dilated eye exam",
+      "ophthalmoscopy",
+      "ultrasound",
+      "floaters",
+      "flashes"
+    ],
+    "expected_sources": [
+      "Retinal Detachment",
+      "Rhegmatogenous Retinal Detachment"
+    ],
+    "category": "diagnosis"
+  },
+  {
+    "id": "q7",
+    "question": "What medications are used to lower intraocular pressure?",
+    "expected_topics": [
+      "prostaglandin analogs",
+      "beta-blockers",
+      "alpha agonists",
+      "carbonic anhydrase inhibitors",
+      "latanoprost",
+      "timolol"
+    ],
+    "expected_sources": [
+      "Glaucoma",
+      "Medical Therapy for Glaucoma"
+    ],
+    "category": "pharmacology"
+  },
+  {
+    "id": "q8",
+    "question": "What is keratoconus and how is it managed?",
+    "expected_topics": [
+      "cornea",
+      "thinning",
+      "cone-shaped",
+      "corneal crosslinking",
+      "contact lenses"
+    ],
+    "expected_sources": [
+      "Keratoconus"
+    ],
+    "category": "corneal_disease"
+  },
+  {
+    "id": "q9",
+    "question": "What are the complications of cataract surgery?",
+    "expected_topics": [
+      "posterior capsule opacification",
+      "infection",
+      "endophthalmitis",
+      "cystoid macular edema",
+      "retinal detachment"
+    ],
+    "expected_sources": [
+      "Cataract Surgery",
+      "Phacoemulsification"
+    ],
+    "category": "complications"
+  },
+  {
+    "id": "q10",
+    "question": "How does dry eye syndrome present?",
+    "expected_topics": [
+      "burning",
+      "irritation",
+      "tear film",
+      "meibomian gland",
+      "artificial tears"
+    ],
+    "expected_sources": [
+      "Dry Eye",
+      "Dry Eye Syndrome"
+    ],
+    "category": "symptoms"
+  },
+  {
+    "id": "q11",
+    "question": "What is the pathophysiology of uveitis?",
+    "expected_topics": [
+      "inflammation",
+      "uvea",
+      "anterior",
+      "posterior",
+      "immune-mediated"
+    ],
+    "expected_sources": [
+      "Uveitis",
+      "Anterior Uveitis"
+    ],
+    "category": "pathophysiology"
+  },
+  {
+    "id": "q12",
+    "question": "What imaging modalities are used for macular disease?",
+    "expected_topics": [
+      "OCT",
+      "optical coherence tomography",
+      "fluorescein angiography",
+      "fundus photography",
+      "angiography"
+    ],
+    "expected_sources": [
+      "Macular Degeneration",
+      "OCT",
+      "Optical Coherence Tomography"
+    ],
+    "category": "imaging"
+  },
+  {
+    "id": "q13",
+    "question": "What is optic neuritis and what are its causes?",
+    "expected_topics": [
+      "optic nerve inflammation",
+      "vision loss",
+      "pain with eye movement",
+      "multiple sclerosis",
+      "demyelination"
+    ],
+    "expected_sources": [
+      "Optic Neuritis"
+    ],
+    "category": "neuro_ophthalmology"
+  },
+  {
+    "id": "q14",
+    "question": "How is proliferative diabetic retinopathy different from non-proliferative?",
+    "expected_topics": [
+      "neovascularization",
+      "microaneurysms",
+      "hemorrhages",
+      "vitreous hemorrhage",
+      "retinal ischemia"
+    ],
+    "expected_sources": [
+      "Diabetic Retinopathy",
+      "Proliferative Diabetic Retinopathy",
+      "Non-Proliferative Diabetic Retinopathy"
+    ],
+    "category": "classification"
+  },
+  {
+    "id": "q15",
+    "question": "What are the signs of papilledema?",
+    "expected_topics": [
+      "optic disc swelling",
+      "increased intracranial pressure",
+      "headache",
+      "blurred vision",
+      "nausea"
+    ],
+    "expected_sources": [
+      "Papilledema",
+      "Optic Disc Edema"
+    ],
+    "category": "diagnosis"
+  }
+]