Spaces:

NinjainPJs
/

Ragcore

Sleeping

App Files Files Community

NinjainPJs Claude Opus 4.6 (1M context) commited on Mar 18

Commit

a34068e

0 Parent(s):

Initial deploy: RagCore RAG system with hybrid search and Gradio UI

Browse files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (42) hide show

.dockerignore +14 -0
.github/workflows/ci.yml +31 -0
.gitignore +39 -0
Dockerfile +19 -0
README.md +1499 -0
app/__init__.py +0 -0
app/api/__init__.py +0 -0
app/api/deps.py +53 -0
app/api/routes/__init__.py +0 -0
app/api/routes/health.py +42 -0
app/api/routes/ingest.py +155 -0
app/api/routes/query.py +128 -0
app/config.py +58 -0
app/core/__init__.py +0 -0
app/core/bm25.py +124 -0
app/core/chunker.py +60 -0
app/core/embedder.py +43 -0
app/core/generator.py +128 -0
app/core/llm.py +88 -0
app/core/metadata.py +62 -0
app/core/query_analyzer.py +127 -0
app/core/reranker.py +52 -0
app/core/retriever.py +126 -0
app/core/vectorstore.py +219 -0
app/main.py +75 -0
app/models/__init__.py +0 -0
app/models/document.py +32 -0
app/models/schemas.py +70 -0
app/ui/__init__.py +0 -0
app/ui/gradio_app.py +427 -0
app/utils/__init__.py +0 -0
app/utils/helpers.py +51 -0
app/utils/parsers.py +76 -0
docker-compose.yml +10 -0
requirements.txt +18 -0
tests/__init__.py +0 -0
tests/conftest.py +21 -0
tests/test_api.py +23 -0
tests/test_chunker.py +42 -0
tests/test_parsers.py +35 -0
tests/test_query_analyzer.py +52 -0
tests/test_retrieval.py +56 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,14 @@

+.git
+__pycache__
+*.pyc
+.env
+.venv
+venv
+tests/
+.github/
+.pytest_cache
+.coverage
+htmlcov
+*.egg-info
+flashrank_cache/
+.cache/

.github/workflows/ci.yml ADDED Viewed

	@@ -0,0 +1,31 @@

+name: CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+jobs:
+  lint-and-test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - name: Install dependencies
+        run: pip install -r requirements.txt
+      - name: Lint
+        run: ruff check .
+      - name: Run unit tests
+        run: pytest tests/ -v --ignore=tests/test_integration.py -x
+        env:
+          GEMINI_API_KEY: "test"
+          QDRANT_URL: "http://localhost:6333"
+          QDRANT_API_KEY: "test"

.gitignore ADDED Viewed

	@@ -0,0 +1,39 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+*.egg-info/
+dist/
+build/
+*.egg
+# Virtual environment
+.venv/
+venv/
+env/
+# Environment variables
+.env
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Models cache
+flashrank_cache/
+.cache/
+# Uploads
+uploads/
+# Testing
+.pytest_cache/
+htmlcov/
+.coverage

Dockerfile ADDED Viewed

	@@ -0,0 +1,19 @@

+FROM python:3.12-slim
+WORKDIR /app
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Pre-download models so they're cached in the image layer
+RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"
+RUN python -c "from flashrank import Ranker; Ranker(model_name='ms-marco-MiniLM-L-12-v2')"
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md ADDED Viewed

	@@ -0,0 +1,1499 @@

+---
+title: RagCore
+emoji: 🔍
+colorFrom: indigo
+colorTo: purple
+sdk: docker
+app_port: 7860
+pinned: false
+---
+# RagCore
+**A production-ready Retrieval-Augmented Generation system with hybrid search, metadata filtering, and a conversational UI.**
+RagCore solves the problem of querying unstructured documents (PDFs, text files, HTML pages) using natural language. It ingests documents, splits them into semantically meaningful chunks, indexes them in both a vector database and a BM25 keyword index, then retrieves and reranks the most relevant passages to generate grounded, citation-backed answers using Google Gemini.
+Unlike naive RAG implementations that rely solely on vector similarity, RagCore combines dense (semantic) and sparse (keyword) retrieval using Reciprocal Rank Fusion, applies a cross-encoder reranker to promote the most relevant passages, and uses an intelligent query analyzer that automatically extracts filters (date ranges, document types, sources) from natural language queries.
+---
+## Table of Contents
+1. [Architecture Overview](#architecture-overview)
+2. [Tech Stack](#tech-stack)
+3. [Project Structure](#project-structure)
+4. [Core Components Deep Dive](#core-components-deep-dive)
+5. [Data Models](#data-models)
+6. [API Reference](#api-reference)
+7. [UI Guide](#ui-guide)
+8. [Setup and Installation](#setup-and-installation)
+9. [Deployment](#deployment)
+10. [Configuration Reference](#configuration-reference)
+11. [How It Works End-to-End](#how-it-works-end-to-end)
+12. [Testing](#testing)
+13. [CI/CD](#cicd)
+14. [Performance and Limits](#performance-and-limits)
+15. [Troubleshooting](#troubleshooting)
+---
+## Architecture Overview
+RagCore is built as a FastAPI application with two main pipelines: **Ingestion** and **Query**. A Gradio-based UI is mounted directly onto the FastAPI app at `/ui`.
+### Ingestion Pipeline
+```
++------------------+     +----------------+     +-------------------+
+|   File Upload    | --> |    Parser      | --> |    Text Cleaner   |
+| (PDF/TXT/HTML)   |     | (pypdf/bs4)    |     | (regex cleanup)   |
++------------------+     +----------------+     +-------------------+
+                                                        |
+                                                        v
++------------------+     +----------------+     +-------------------+
+|  Qdrant Cloud    | <-- |   Embedder     | <-- |    Chunker        |
+|  (vector store)  |     | (MiniLM-L6-v2) |     | (sentence-aware)  |
++------------------+     +----------------+     +-------------------+
+        |                                               |
+        |                                               v
+        |                                      +-------------------+
+        +------------------------------------> |  BM25 Index       |
+                                               | (in-memory)       |
+                                               +-------------------+
+                                                        ^
+                                                        |
+                                               +-------------------+
+                                               | Metadata Extractor|
+                                               | (title/dates/tags)|
+                                               +-------------------+
+```
+**Step-by-step flow:**
+1. User uploads a file via the `/api/ingest` endpoint or the Gradio UI.
+2. The **Parser** detects file type by extension and extracts raw text (pypdf for PDFs, BeautifulSoup for HTML, direct decoding for TXT).
+3. The **Text Cleaner** normalizes whitespace, collapses blank lines, and trims each line.
+4. The **Metadata Extractor** pulls out the document title (first non-empty line), dates (via regex patterns), and tags (frequent capitalized phrases).
+5. The **Chunker** splits text into overlapping chunks at sentence boundaries, respecting a configurable word-count limit.
+6. The **Embedder** encodes each chunk into a 384-dimensional vector using the `all-MiniLM-L6-v2` sentence transformer.
+7. Chunks with their vectors and payload metadata are upserted into **Qdrant Cloud** in batches of 100.
+8. The same chunks are added to the in-memory **BM25 index** for keyword search.
+### Query Pipeline
+```
++------------------+     +-------------------+     +------------------+
+|   User Query     | --> |  Query Analyzer   | --> |  Hybrid Retriever|
+| "What is RAG     |     | (intent, filters, |     |                  |
+|  from PDFs?"     |     |  cleaned query)   |     |  +----------+   |
++------------------+     +-------------------+     |  |Dense     |   |
+                                                   |  |(Qdrant)  |   |
+                                                   |  +----------+   |
+                                                   |       |         |
+                                                   |  +----------+   |
+                                                   |  |Sparse    |   |
+                                                   |  |(BM25)    |   |
+                                                   |  +----------+   |
+                                                   |       |         |
+                                                   |  +----------+   |
+                                                   |  |RRF Fusion|   |
+                                                   |  +----------+   |
+                                                   +------------------+
+                                                          |
+                                                          v
+                         +-------------------+     +------------------+
+                         |  Answer Generator | <-- |   Reranker       |
+                         | (Gemini Flash)    |     | (FlashRank)      |
+                         +-------------------+     +------------------+
+                                |
+                                v
+                         +-------------------+
+                         |  Cited Answer     |
+                         |  with Sources     |
+                         +-------------------+
+```
+**Step-by-step flow:**
+1. User submits a natural language query.
+2. The **Query Analyzer** classifies intent (factual, summarize, comparative, list, explanatory), extracts inline filters (doc type, date range, source filename), and produces a cleaned query.
+3. The **Hybrid Retriever** runs two parallel searches:
+   - **Dense search**: encodes the query with the same embedding model, queries Qdrant with cosine similarity, fetching `top_k * 2` results.
+   - **Sparse search**: tokenizes the query and scores all chunks via BM25Okapi, also fetching `top_k * 2` results.
+4. Results are fused using **Reciprocal Rank Fusion (RRF)** with configurable weights (default: 0.6 dense, 0.4 sparse).
+5. The top-K fused results are passed to the **Reranker** (FlashRank cross-encoder), which rescores and selects the best 5 passages.
+6. The **Answer Generator** builds a prompt with numbered context passages and sends it to **Google Gemini Flash**, which generates a cited, markdown-formatted answer.
+7. The answer is returned with source references (streaming or non-streaming).
+---
+## Tech Stack
+| Technology | Version | Purpose |
+|---|---|---|
+| **Python** | 3.12 | Runtime language. Chosen for its ML/NLP ecosystem. |
+| **FastAPI** | >=0.110 | Async web framework. High performance, automatic OpenAPI docs, dependency injection. |
+| **Uvicorn** | >=0.29 | ASGI server for running FastAPI in production. |
+| **Pydantic** | >=2.6 | Data validation and serialization for all request/response models. |
+| **pydantic-settings** | >=2.2 | Environment-based configuration with `.env` file support. |
+| **sentence-transformers** | >=2.6 | Embedding model loading and inference (`all-MiniLM-L6-v2`). Chosen for fast CPU inference and high quality at 384 dimensions. |
+| **qdrant-client** | >=1.8 | Client for Qdrant vector database. Chosen for its generous free tier (1GB), filtering support, and payload storage. |
+| **rank-bm25** | >=0.2.2 | BM25Okapi implementation for sparse keyword retrieval. Lightweight, pure-Python, no external dependencies. |
+| **FlashRank** | >=0.2 | Ultra-fast cross-encoder reranker (`ms-marco-MiniLM-L-12-v2`). Runs on CPU, no GPU required. |
+| **google-generativeai** | >=0.5 | Official Google Gemini SDK. Gemini 2.0 Flash offers a free tier with 15 RPM. |
+| **Gradio** | >=4.20 | Web UI framework mounted directly on FastAPI. Two-tab interface for Q&A and document management. |
+| **pypdf** | >=4.1 | PDF text extraction. Handles most PDF formats without external system dependencies. |
+| **beautifulsoup4** | >=4.12 | HTML parsing with tag stripping (removes scripts, styles, nav, footer, header). |
+| **httpx** | >=0.27 | Async/sync HTTP client used by the Gradio UI to call the FastAPI backend. |
+| **python-multipart** | >=0.0.9 | Required by FastAPI for file upload support. |
+| **python-dateutil** | >=2.9 | Fuzzy date parsing for the query analyzer's absolute date extraction. |
+| **Ruff** | >=0.3 | Fast Python linter. Used in CI for code quality checks. |
+| **pytest** | >=8.0 | Test framework. Unit tests for chunker, parsers, query analyzer, retrieval, and API. |
+| **Docker** | - | Containerization. Pre-downloads ML models in the build step for fast cold starts. |
+---
+## Project Structure
+```
+ragcore/
+|-- .github/
+|   +-- workflows/
+|       +-- ci.yml                  # GitHub Actions CI pipeline (lint + test)
+|-- app/
+|   |-- __init__.py
+|   |-- config.py                   # Settings class with all env vars, setup_logging()
+|   |-- main.py                     # FastAPI app creation, lifespan, middleware, routing
+|   |-- api/
+|   |   |-- __init__.py
+|   |   |-- deps.py                 # Dependency injection factories for all services
+|   |   +-- routes/
+|   |       |-- __init__.py
+|   |       |-- health.py           # GET /health endpoint
+|   |       |-- ingest.py           # POST /api/ingest, GET /api/documents, DELETE /api/documents/{id}
+|   |       +-- query.py            # POST /api/search, POST /api/ask (with streaming)
+|   |-- core/
+|   |   |-- __init__.py
+|   |   |-- bm25.py                 # BM25 index: tokenization, search, rebuild from vectorstore
+|   |   |-- chunker.py              # Sentence-aware text chunking with overlap
+|   |   |-- embedder.py             # SentenceTransformer embedding service
+|   |   |-- generator.py            # Answer generation with prompt templates and streaming
+|   |   |-- llm.py                  # Gemini API client with rate limiting
+|   |   |-- metadata.py             # Metadata extraction (title, dates, tags)
+|   |   |-- query_analyzer.py       # Query intent classification and filter extraction
+|   |   |-- reranker.py             # FlashRank cross-encoder reranking
+|   |   |-- retriever.py            # Hybrid retriever with RRF fusion
+|   |   +-- vectorstore.py          # Qdrant client wrapper (CRUD, search, filtering)
+|   |-- models/
+|   |   |-- __init__.py
+|   |   |-- document.py             # DocumentMetadata, Chunk, Document models
+|   |   +-- schemas.py              # API request/response schemas (IngestResponse, QueryRequest, etc.)
+|   |-- ui/
+|   |   |-- __init__.py
+|   |   +-- gradio_app.py           # Gradio Blocks UI (Ask tab, Documents tab)
+|   +-- utils/
+|       |-- __init__.py
+|       |-- helpers.py              # generate_id, clean_text, count_words, timer, retry_with_backoff
+|       +-- parsers.py              # File parsing (PDF, TXT, HTML) and page count extraction
+|-- tests/
+|   |-- __init__.py
+|   |-- conftest.py                 # Shared fixtures (TestClient, sample_text)
+|   |-- test_api.py                 # API integration tests (health, redirect, docs)
+|   |-- test_chunker.py             # Chunker unit tests (empty, single, multiple, overlap)
+|   |-- test_parsers.py             # Parser unit tests (UTF-8, Latin-1, HTML, unsupported)
+|   |-- test_query_analyzer.py      # Query analyzer tests (intents, filters, dates)
+|   +-- test_retrieval.py           # RRF fusion tests (basic, empty, weights, filters)
+|-- .dockerignore
+|-- .env                            # Environment variables (not committed to git)
+|-- .gitignore
+|-- Dockerfile                      # Python 3.12-slim, pre-downloads ML models
+|-- docker-compose.yml              # Single-service compose with env_file
++-- requirements.txt                # All Python dependencies with version constraints
+```
+---
+## Core Components Deep Dive
+### Parsers (`app/utils/parsers.py`)
+**What it does:** Extracts raw text from uploaded files based on their extension.
+**Supported formats:** `.pdf`, `.txt`, `.html`, `.htm`
+**How it works internally:**
+- `parse_document(file_bytes, filename)` is the main dispatcher. It reads the file extension and calls the appropriate parser.
+- **PDF parsing** uses `pypdf.PdfReader` to iterate over all pages, extract text from each, and join them with double newlines.
+- **HTML parsing** uses `BeautifulSoup` with the `html.parser` backend. Before extracting text, it decomposes `<script>`, `<style>`, `<nav>`, `<footer>`, and `<header>` tags to remove boilerplate content. Text is extracted with `get_text(separator="\n")`.
+- **TXT parsing** attempts UTF-8 decoding first, falling back to Latin-1 for non-UTF-8 files.
+- All parsers pass their output through `clean_text()` for normalization.
+**Key functions:**
+```python
+def parse_document(file_bytes: bytes, filename: str) -> str
+def parse_pdf(file_bytes: bytes, filename: str) -> str
+def parse_text(file_bytes: bytes, filename: str) -> str
+def parse_html(file_bytes: bytes, filename: str) -> str
+def get_page_count(file_bytes: bytes, filename: str) -> int | None
+```
+**Configuration:** No direct configuration. File size is validated at the API layer (`max_file_size_mb`).
+---
+### Chunker (`app/core/chunker.py`)
+**What it does:** Splits raw text into overlapping chunks at sentence boundaries, sized by word count.
+**How it works internally:**
+1. Text is split into sentences using the regex pattern `(?<=[.!?])\s+` (splits after sentence-ending punctuation followed by whitespace).
+2. Sentences are accumulated word-by-word into the current chunk.
+3. When adding the next sentence would exceed `chunk_size` words, the current chunk is finalized.
+4. Overlap is implemented by retaining the last `chunk_overlap` words from the previous chunk as the start of the new chunk.
+5. Each chunk records its `text`, `start_char`, `end_char`, and `chunk_index`.
+**Key function:**
+```python
+def chunk_text(
+    text: str,
+    chunk_size: int = 512,      # Maximum words per chunk
+    chunk_overlap: int = 50,    # Number of overlapping words between consecutive chunks
+) -> list[dict]
+```
+**Return format:** Each dict contains `{"text": str, "start_char": int, "end_char": int, "chunk_index": int}`.
+**Configuration:**
+| Setting | Default | Description |
+|---|---|---|
+| `CHUNK_SIZE` | 512 | Maximum number of words per chunk |
+| `CHUNK_OVERLAP` | 50 | Number of overlapping words between consecutive chunks |
+**Design note:** Sentence-aware splitting avoids cutting mid-sentence, which improves both retrieval relevance and answer generation quality compared to fixed-character splitting.
+---
+### Metadata Extractor (`app/core/metadata.py`)
+**What it does:** Automatically extracts structured metadata from raw document text.
+**How it works internally:**
+- **Title extraction:** Scans lines from the top of the document, returning the first non-empty line with more than 3 characters (truncated to 200 chars).
+- **Date extraction:** Searches the first 2000 characters for dates using three regex patterns:
+  - `YYYY-MM-DD` (ISO format)
+  - `MM/DD/YYYY` (US format)
+  - `Month DD, YYYY` (long format, e.g., "January 15, 2024")
+- **Tag extraction:** Finds all capitalized phrases (e.g., "Machine Learning", "Neural Network") using regex, counts their occurrences, and returns the top 10 that appear at least twice. Tags are lowercased before returning.
+- **Doc type:** Derived from the file extension (e.g., "pdf", "html", "txt").
+**Key function:**
+```python
+def extract_metadata(raw_text: str, filename: str, page_count: int | None = None) -> DocumentMetadata
+```
+**Supporting functions:**
+```python
+def extract_title(text: str) -> str | None
+def extract_dates(text: str) -> datetime | None
+def extract_tags(text: str, max_tags: int = 10) -> list[str]
+```
+---
+### Embedder (`app/core/embedder.py`)
+**What it does:** Converts text into dense vector representations using a sentence transformer model.
+**How it works internally:**
+- Uses `sentence-transformers` to load the `all-MiniLM-L6-v2` model on CPU at startup.
+- Encodes text in batches of 64 with L2 normalization enabled (so cosine similarity is equivalent to dot product).
+- The model produces 384-dimensional embeddings.
+- Singleton pattern via `get_embedder()` ensures the model is loaded only once.
+**Key class:** `EmbedderService`
+```python
+class EmbedderService:
+    EMBEDDING_DIM = 384
+    def __init__(self, model_name: str)
+    def embed_texts(self, texts: list[str]) -> list[list[float]]   # Batch embedding
+    def embed_query(self, query: str) -> list[float]                # Single query embedding
+```
+**Configuration:**
+| Setting | Default | Description |
+|---|---|---|
+| `EMBEDDING_MODEL` | `all-MiniLM-L6-v2` | HuggingFace sentence-transformers model name |
+| `EMBEDDING_DIM` | 384 | Embedding vector dimensionality |
+---
+### Vector Store -- Qdrant (`app/core/vectorstore.py`)
+**What it does:** Manages all interactions with the Qdrant vector database: collection management, upserting chunks, searching, filtering, scrolling, and deleting.
+**How it works internally:**
+- On initialization, connects to Qdrant Cloud using the provided URL and API key.
+- `ensure_collection()` checks if the collection exists; if not, creates it with cosine distance and the configured vector size.
+- **Upsert:** Chunks are uploaded in batches of 100 as `PointStruct` objects, with the chunk text and all metadata stored in the payload.
+- **Search:** Uses `query_points()` with an optional `Filter` object built from `SearchFilters`. Over-fetches `top_k * 2` results to give the fusion step more candidates.
+- **Filtering:** Supports exact match on `source`, `doc_type`, `MatchAny` on `tags`, and `Range` on `created_date`.
+- **Scroll:** Iterates through all points in the collection using offset-based pagination (batch size 100). Used to rebuild the BM25 index on startup.
+- **Document listing:** Aggregates all points by `document_id` to return a list of unique documents with chunk counts.
+**Key class:** `VectorStoreService`
+```python
+class VectorStoreService:
+    def __init__(self, url: str, api_key: str, collection_name: str)
+    def ensure_collection(self, vector_size: int = 384) -> None
+    def upsert_chunks(self, chunks: list[Chunk], embeddings: list[list[float]]) -> None
+    def search(self, query_vector: list[float], limit: int = 10, filters: SearchFilters | None = None) -> list[dict]
+    def delete_document(self, document_id: str) -> int
+    def scroll_all(self, batch_size: int = 100) -> list[dict]
+    def get_document_ids(self) -> list[dict]
+    def count(self) -> int
+```
+**Payload schema stored per point:**
+```json
+{
+    "text": "chunk text content",
+    "document_id": "uuid-string",
+    "chunk_index": 0,
+    "source": "filename.pdf",
+    "doc_type": "pdf",
+    "title": "Document Title or null",
+    "created_date": "2024-01-15T00:00:00 or null",
+    "tags": ["machine learning", "neural networks"],
+    "page_count": 12
+}
+```
+**Configuration:**
+| Setting | Default | Description |
+|---|---|---|
+| `QDRANT_URL` | (required) | Qdrant Cloud cluster URL |
+| `QDRANT_API_KEY` | (required) | Qdrant Cloud API key |
+| `QDRANT_COLLECTION` | `ragcore_docs` | Collection name in Qdrant |
+---
+### BM25 Index (`app/core/bm25.py`)
+**What it does:** Maintains an in-memory BM25 keyword index for sparse retrieval alongside the dense vector search.
+**How it works internally:**
+- **Tokenization:** Text is lowercased, split into words via `\b\w+\b`, then filtered to remove stop words (58 common English words) and single-character tokens.
+- Uses `rank_bm25.BM25Okapi`, which implements the Okapi BM25 scoring formula:
+  ```
+  score(D, Q) = SUM[ IDF(q) * (f(q,D) * (k1+1)) / (f(q,D) + k1 * (1 - b + b * |D|/avgdl)) ]
+  ```
+- On startup, the index is rebuilt from all existing points in Qdrant via `rebuild_from_vectorstore()`, which scrolls through all stored chunks.
+- When new documents are ingested, `add_documents()` appends them and rebuilds the full BM25 corpus (the index is not incremental -- it rebuilds from the full document list).
+- Search returns scored results filtered to only those with `score > 0`.
+**Key class:** `BM25Index`
+```python
+class BM25Index:
+    def __init__(self)
+    def build_index(self, chunks: list[Chunk]) -> None
+    def add_documents(self, chunks: list[Chunk]) -> None
+    def search(self, query: str, top_k: int = 10) -> list[dict]
+    def rebuild_from_vectorstore(self, vectorstore) -> None
+    @property
+    def doc_count(self) -> int
+```
+**Tokenization function:**
+```python
+def tokenize(text: str) -> list[str]
+```
+**Design note:** The in-memory approach means the BM25 index is rebuilt on every application restart (from Qdrant data). This is acceptable for small-to-medium collections (thousands of chunks) but would need a persistent store for larger deployments.
+---
+### Hybrid Retriever with RRF (`app/core/retriever.py`)
+**What it does:** Combines dense (vector) and sparse (BM25) retrieval results using Reciprocal Rank Fusion.
+**How it works internally:**
+1. Embeds the query using the same `EmbedderService`.
+2. Runs a dense search via Qdrant, fetching `top_k * 2` candidates (over-fetch to give fusion more options).
+3. Runs a BM25 search, also fetching `top_k * 2` candidates.
+4. If filters were provided, applies them post-hoc to BM25 results (since BM25 does not natively support metadata filtering).
+5. Fuses both result lists using the **RRF formula**:
+```
+RRF_score(d) = SUM_over_lists[ weight_i * 1 / (k + rank_i(d)) ]
+```
+Where `k = 60` (smoothing constant), `rank_i(d)` is the rank of document `d` in list `i` (0-indexed), and `weight_i` is the list weight (default: 0.6 for dense, 0.4 for sparse).
+6. Deduplicates by `chunk_id` and returns the top-K results as `RetrievedChunk` objects.
+**Key class:** `HybridRetriever`
+```python
+class HybridRetriever:
+    def __init__(self, vectorstore: VectorStoreService, bm25: BM25Index, embedder: EmbedderService)
+    def retrieve(self, query: str, top_k: int = 10, filters: SearchFilters | None = None,
+                 dense_weight: float = 0.6, sparse_weight: float = 0.4) -> list[RetrievedChunk]
+    @staticmethod
+    def rrf_fuse(result_lists: list[list[dict]], k: int = 60,
+                 weights: list[float] | None = None) -> list[dict]
+    @staticmethod
+    def _apply_filters(results: list[dict], filters: SearchFilters) -> list[dict]
+```
+**Configuration:**
+| Setting | Default | Description |
+|---|---|---|
+| `TOP_K` | 10 | Number of chunks to return from retrieval |
+| `DENSE_WEIGHT` | 0.6 | Weight for dense (vector) search in RRF |
+| `SPARSE_WEIGHT` | 0.4 | Weight for sparse (BM25) search in RRF |
+**Why RRF?** Reciprocal Rank Fusion is a score-agnostic fusion method. Since BM25 scores and cosine similarity scores are on different scales, RRF uses only rank positions, making it a robust choice for combining heterogeneous retrieval signals.
+---
+### Reranker (`app/core/reranker.py`)
+**What it does:** Rescores retrieved chunks using a cross-encoder model to improve ranking precision.
+**How it works internally:**
+- Uses FlashRank with the `ms-marco-MiniLM-L-12-v2` model, which is a lightweight cross-encoder trained on the MS MARCO passage ranking dataset.
+- Unlike embedding models (which encode query and document independently), cross-encoders process the query-document pair jointly, allowing richer interaction signals.
+- Input: the query string and a list of `RetrievedChunk` objects from the hybrid retriever.
+- Output: the top `rerank_top_k` chunks reordered by cross-encoder score.
+- The reranker model is cached in `./flashrank_cache/` to avoid re-downloading on each startup.
+**Key class:** `RerankerService`
+```python
+class RerankerService:
+    def __init__(self)
+    def rerank(self, query: str, chunks: list[RetrievedChunk], top_k: int = 5) -> list[RetrievedChunk]
+```
+**Configuration:**
+| Setting | Default | Description |
+|---|---|---|
+| `RERANK_TOP_K` | 5 | Number of chunks to keep after reranking |
+---
+### LLM Client (`app/core/llm.py`)
+**What it does:** Manages all communication with the Google Gemini API, including rate limiting and streaming.
+**How it works internally:**
+- Configures the `google.generativeai` library with the provided API key.
+- Instantiates a `GenerativeModel` for the configured model name (default: `gemini-2.0-flash`).
+- **Rate limiting:** Enforces a minimum interval between API calls based on `rpm_limit`. For the free tier (15 RPM), the minimum interval is 4 seconds. Uses `time.sleep()` for synchronous calls and `asyncio.sleep()` for async calls.
+- **Synchronous generation:** `generate(prompt, temperature, max_tokens)` returns the full response text.
+- **Streaming generation:** `generate_stream(prompt, temperature, max_tokens)` is an async generator that yields text chunks as they arrive from the API.
+**Key class:** `GeminiService`
+```python
+class GeminiService:
+    def __init__(self, api_key: str, model_name: str, rpm_limit: int = 15)
+    def generate(self, prompt: str, temperature: float = 0.3, max_tokens: int = 2048) -> str
+    async def generate_stream(self, prompt: str, temperature: float = 0.3,
+                               max_tokens: int = 2048) -> AsyncGenerator[str, None]
+```
+**Configuration:**
+| Setting | Default | Description |
+|---|---|---|
+| `GEMINI_API_KEY` | (required) | Google Gemini API key |
+| `GEMINI_MODEL` | `gemini-2.0-flash` | Gemini model identifier |
+| `GEMINI_RPM_LIMIT` | 15 | Requests per minute limit |
+| `GEMINI_TEMPERATURE` | 0.3 | Generation temperature (lower = more deterministic) |
+| `GEMINI_MAX_TOKENS` | 2048 | Maximum output tokens per generation |
+---
+### Query Analyzer (`app/core/query_analyzer.py`)
+**What it does:** Parses natural language queries to extract intent, metadata filters, and a cleaned query string.
+**How it works internally:**
+The analyzer performs multiple regex-based extractions in sequence:
+1. **Document type extraction:** Matches patterns like "PDFs", "pdf", "HTML", "text files", "txt" and sets the `doc_type` filter.
+2. **Relative date extraction:** Matches temporal phrases like "last week", "last month", "this year", "today", "yesterday" and converts them to `date_from`/`date_to` datetime ranges.
+3. **Absolute date extraction:** Matches "after {date}" and "before {date}" patterns. Uses `python-dateutil` for fuzzy parsing of the date string.
+4. **Source extraction:** Matches "from {filename.ext}" patterns to filter by specific source file.
+5. **Query cleaning:** Removes all matched filter phrases from the query, collapses whitespace, and strips dangling prepositions (about, from, in, on).
+6. **Intent classification:** Matches the original query against patterns for five intent types:
+   - `summarize` -- "summarize", "summary", "overview"
+   - `comparative` -- "compare", "difference", "vs", "versus"
+   - `list` -- "list", "enumerate", "what are all"
+   - `explanatory` -- starts with "why", "how", "explain"
+   - `factual` -- starts with "what", "who", "when", "where", "how many/much" (default fallback)
+7. **Confidence scoring:** Starts at 0.5, incremented by 0.1 for each filter successfully extracted, capped at 1.0.
+**Key class:** `QueryAnalyzer`
+```python
+class QueryAnalyzer:
+    def analyze(self, query: str) -> AnalyzedQuery
+```
+**Example:**
+Input: `"summarize PDFs from last month"`
+Output:
+```json
+{
+    "original_query": "summarize PDFs from last month",
+    "clean_query": "summarize",
+    "intent": "summarize",
+    "extracted_filters": {
+        "doc_type": "pdf",
+        "date_from": "2026-02-17T00:00:00",
+        "date_to": "2026-03-17T00:00:00"
+    },
+    "confidence": 0.7
+}
+```
+---
+### Answer Generator (`app/core/generator.py`)
+**What it does:** Builds a prompt from retrieved chunks and generates a cited answer using the LLM.
+**How it works internally:**
+1. **Reranking:** Calls the `RerankerService` to narrow the retrieved chunks to `rerank_top_k`.
+2. **Context building:** Formats each reranked chunk as a numbered passage with its source filename:
+   ```
+   [1] (Source: report.pdf)
+   Chunk text content here...
+   [2] (Source: notes.txt)
+   Another chunk text...
+   ```
+3. **Prompt selection:** Uses `SYSTEM_PROMPT` for most intents and `SUMMARY_PROMPT` when the intent is "summarize".
+4. **Prompt rules instruct the LLM to:**
+   - Answer based ONLY on the provided context
+   - Cite sources inline using [1], [2], etc.
+   - Admit when context is insufficient
+   - Use markdown formatting
+5. **Streaming:** The `generate_answer_stream()` async generator yields text chunks during generation, then yields a final `GeneratedAnswer` object with source metadata.
+**Key class:** `AnswerGenerator`
+```python
+class AnswerGenerator:
+    def __init__(self, llm: GeminiService, reranker: RerankerService)
+    def generate_answer(self, query: str, chunks: list[RetrievedChunk],
+                        rerank_top_k: int = 5, intent: str = "factual") -> GeneratedAnswer
+    async def generate_answer_stream(self, query: str, chunks: list[RetrievedChunk],
+                                      rerank_top_k: int = 5, intent: str = "factual") -> AsyncGenerator
+```
+---
+## Data Models
+All models are defined using Pydantic v2 and live in `app/models/`.
+### Core Document Models (`app/models/document.py`)
+#### `DocumentMetadata`
+Stores extracted metadata for a document or chunk.
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `source` | `str` | `""` | Original filename |
+| `doc_type` | `str` | `""` | File type without dot (e.g., "pdf", "html", "txt") |
+| `title` | `str \| None` | `None` | Extracted title (first meaningful line) |
+| `created_date` | `datetime \| None` | `None` | Extracted date from document content |
+| `tags` | `list[str]` | `[]` | Auto-extracted topic tags |
+| `page_count` | `int \| None` | `None` | Number of pages (PDFs only) |
+#### `Chunk`
+Represents a single text chunk derived from a document.
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `chunk_id` | `str` | `uuid4()` | Unique chunk identifier |
+| `document_id` | `str` | `""` | Parent document identifier |
+| `text` | `str` | `""` | Chunk text content |
+| `metadata` | `DocumentMetadata` | `{}` | Inherited document metadata |
+| `chunk_index` | `int` | `0` | Position of this chunk in the document |
+| `start_char` | `int` | `0` | Start character offset in original text |
+| `end_char` | `int` | `0` | End character offset in original text |
+#### `Document`
+Represents a full ingested document.
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `document_id` | `str` | `uuid4()` | Unique document identifier |
+| `filename` | `str` | `""` | Original filename |
+| `metadata` | `DocumentMetadata` | `{}` | Extracted metadata |
+| `chunks` | `list[Chunk]` | `[]` | Child chunks (populated during ingestion) |
+| `raw_text` | `str` | `""` | Full extracted text |
+### API Schemas (`app/models/schemas.py`)
+#### `IngestResponse`
+Returned after successful document ingestion.
+| Field | Type | Description |
+|---|---|---|
+| `document_id` | `str` | Assigned UUID |
+| `filename` | `str` | Original filename |
+| `num_chunks` | `int` | Number of chunks created |
+| `message` | `str` | Human-readable success message |
+#### `SearchFilters`
+Used for metadata filtering in search and query operations.
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `source` | `str \| None` | `None` | Filter by exact source filename |
+| `doc_type` | `str \| None` | `None` | Filter by document type |
+| `date_from` | `datetime \| None` | `None` | Filter documents created on or after this date |
+| `date_to` | `datetime \| None` | `None` | Filter documents created on or before this date |
+| `tags` | `list[str] \| None` | `None` | Filter by any matching tag |
+#### `RetrievedChunk`
+A chunk returned from retrieval, with its relevance score and rank.
+| Field | Type | Description |
+|---|---|---|
+| `chunk_id` | `str` | Chunk identifier |
+| `document_id` | `str` | Parent document identifier |
+| `text` | `str` | Chunk text |
+| `score` | `float` | Relevance score (RRF-fused or reranker score) |
+| `metadata` | `DocumentMetadata` | Chunk metadata |
+| `rank` | `int` | Position in the result list (0-indexed) |
+#### `SearchRequest`
+Request body for the `/api/search` endpoint.
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `query` | `str` | (required) | Natural language search query |
+| `top_k` | `int` | `10` | Number of results to return |
+| `filters` | `SearchFilters \| None` | `None` | Optional explicit filters (overrides auto-extraction) |
+#### `SearchResponse`
+Response from the `/api/search` endpoint.
+| Field | Type | Description |
+|---|---|---|
+| `query` | `str` | Original query |
+| `results` | `list[RetrievedChunk]` | Retrieved and ranked chunks |
+| `total_results` | `int` | Number of results returned |
+| `search_time_ms` | `float` | Total search time in milliseconds |
+#### `QueryRequest`
+Request body for the `/api/ask` endpoint.
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `query` | `str` | (required) | Natural language question |
+| `top_k` | `int` | `10` | Number of chunks to retrieve |
+| `rerank_top_k` | `int` | `5` | Number of chunks to keep after reranking |
+| `filters` | `SearchFilters \| None` | `None` | Optional explicit filters |
+| `stream` | `bool` | `False` | Enable Server-Sent Events streaming |
+#### `GeneratedAnswer`
+Response from the `/api/ask` endpoint (non-streaming).
+| Field | Type | Description |
+|---|---|---|
+| `query` | `str` | Original question |
+| `answer` | `str` | Generated markdown answer with inline citations |
+| `sources` | `list[RetrievedChunk]` | Source chunks used for generation |
+| `generation_time_ms` | `float` | Total generation time in milliseconds |
+| `model` | `str` | LLM model name used |
+#### `AnalyzedQuery`
+Internal model from the query analyzer (not directly exposed via API).
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `original_query` | `str` | - | The raw user query |
+| `clean_query` | `str` | - | Query with filter phrases removed |
+| `intent` | `str` | `"factual"` | Classified intent |
+| `extracted_filters` | `SearchFilters` | `{}` | Automatically extracted filters |
+| `confidence` | `float` | `0.5` | Confidence in filter extraction |
+---
+## API Reference
+The FastAPI app automatically generates interactive API documentation at `/docs` (Swagger UI) and `/redoc` (ReDoc).
+### Health Check
+```
+GET /health
+```
+Returns the status of all system components.
+**Response:**
+```json
+{
+    "status": "ok",
+    "components": {
+        "embedder": "loaded",
+        "bm25": "142 documents",
+        "vectorstore": "connected"
+    }
+}
+```
+**curl example:**
+```bash
+curl http://localhost:7860/health
+```
+---
+### Ingest Document
+```
+POST /api/ingest
+Content-Type: multipart/form-data
+```
+Uploads and indexes a document. The file is parsed, chunked, embedded, and stored in both the vector database and the BM25 index.
+**Request:** Multipart form with a `file` field.
+**Constraints:**
+- Supported extensions: `.pdf`, `.txt`, `.html`, `.htm`
+- Maximum file size: 10 MB (configurable via `MAX_FILE_SIZE_MB`)
+**Response (200):**
+```json
+{
+    "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
+    "filename": "report.pdf",
+    "num_chunks": 47,
+    "message": "Successfully ingested 'report.pdf' with 47 chunks"
+}
+```
+**Error responses:**
+- `400` -- Missing filename or unsupported file type
+- `413` -- File exceeds maximum size
+- `422` -- Could not extract text from file
+**curl example:**
+```bash
+curl -X POST http://localhost:7860/api/ingest \
+  -F "file=@/path/to/document.pdf"
+```
+---
+### List Documents
+```
+GET /api/documents
+```
+Returns all indexed documents with their metadata and chunk counts.
+**Response (200):**
+```json
+{
+    "documents": [
+        {
+            "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
+            "source": "report.pdf",
+            "title": "Annual Report 2024",
+            "doc_type": "pdf",
+            "num_chunks": 47
+        }
+    ],
+    "total": 1
+}
+```
+**curl example:**
+```bash
+curl http://localhost:7860/api/documents
+```
+---
+### Delete Document
+```
+DELETE /api/documents/{document_id}
+```
+Removes all chunks for the given document from Qdrant and rebuilds the BM25 index.
+**Response (200):**
+```json
+{
+    "message": "Document 'a1b2c3d4-e5f6-7890-abcd-ef1234567890' deleted successfully"
+}
+```
+**curl example:**
+```bash
+curl -X DELETE http://localhost:7860/api/documents/a1b2c3d4-e5f6-7890-abcd-ef1234567890
+```
+---
+### Search (Retrieval Only)
+```
+POST /api/search
+Content-Type: application/json
+```
+Performs hybrid retrieval without LLM generation. Useful for inspecting which chunks would be retrieved for a given query.
+**Request body:**
+```json
+{
+    "query": "What is retrieval-augmented generation?",
+    "top_k": 10,
+    "filters": {
+        "doc_type": "pdf",
+        "tags": ["machine learning"]
+    }
+}
+```
+**Response (200):**
+```json
+{
+    "query": "What is retrieval-augmented generation?",
+    "results": [
+        {
+            "chunk_id": "uuid",
+            "document_id": "uuid",
+            "text": "Retrieval-Augmented Generation (RAG) is...",
+            "score": 0.0234,
+            "metadata": {
+                "source": "report.pdf",
+                "doc_type": "pdf",
+                "title": "Annual Report",
+                "created_date": null,
+                "tags": ["machine learning"],
+                "page_count": 12
+            },
+            "rank": 0
+        }
+    ],
+    "total_results": 10,
+    "search_time_ms": 142.5
+}
+```
+**curl example:**
+```bash
+curl -X POST http://localhost:7860/api/search \
+  -H "Content-Type: application/json" \
+  -d '{"query": "What is RAG?", "top_k": 5}'
+```
+---
+### Ask (Full RAG Pipeline)
+```
+POST /api/ask
+Content-Type: application/json
+```
+Runs the full pipeline: query analysis, hybrid retrieval, reranking, and LLM answer generation.
+**Request body:**
+```json
+{
+    "query": "What are the key findings in the report?",
+    "top_k": 10,
+    "rerank_top_k": 5,
+    "filters": null,
+    "stream": false
+}
+```
+**Response (200, non-streaming):**
+```json
+{
+    "query": "What are the key findings in the report?",
+    "answer": "Based on the provided documents, the key findings are:\n\n1. **Finding one** [1]...\n2. **Finding two** [2]...",
+    "sources": [
+        {
+            "chunk_id": "uuid",
+            "document_id": "uuid",
+            "text": "chunk text...",
+            "score": 0.892,
+            "metadata": { "source": "report.pdf", "..." : "..." },
+            "rank": 0
+        }
+    ],
+    "generation_time_ms": 3420.5,
+    "model": "gemini-2.0-flash"
+}
+```
+**Streaming response (`"stream": true`):**
+Returns `text/event-stream` with Server-Sent Events:
+```
+data: {"text": "Based on"}
+data: {"text": " the provided"}
+data: {"text": " documents..."}
+data: {"done": true, "sources": [...], "model": "gemini-2.0-flash", "time_ms": 3420.5}
+```
+**curl examples:**
+```bash
+# Non-streaming
+curl -X POST http://localhost:7860/api/ask \
+  -H "Content-Type: application/json" \
+  -d '{"query": "Summarize the report", "stream": false}'
+# Streaming
+curl -X POST http://localhost:7860/api/ask \
+  -H "Content-Type: application/json" \
+  -d '{"query": "What is RAG?", "stream": true}' \
+  --no-buffer
+```
+---
+## UI Guide
+RagCore includes a Gradio web interface mounted at `/ui` (the root `/` redirects there automatically).
+### Ask Tab
+The primary interaction surface for querying your documents.
+**Components:**
+- **Query input** -- A text box where you type your question in natural language. Supports pressing Enter to submit.
+- **Document Type filter** -- Dropdown to restrict results to a specific file type: All, PDF, TXT, or HTML.
+- **Stream response toggle** -- Checkbox (default: on) to enable real-time streaming of the answer as it is generated.
+- **Ask button** -- Submits the query.
+- **Answer area** -- Displays the generated answer with markdown formatting, followed by a "Sources" section listing each referenced chunk with its filename, relevance score, and a text snippet.
+- **Example queries** -- Pre-filled example questions you can click to populate the query input.
+### Documents Tab
+Manages the document collection.
+**Components:**
+- **File upload zone** -- Drag-and-drop or click to select a file (`.pdf`, `.txt`, `.html`, `.htm`).
+- **Upload & Index button** -- Triggers the ingestion pipeline. Shows a status card with filename, chunk count, and document ID on success.
+- **Indexed Documents table** -- Displays all ingested documents with their filename, type, chunk count, and truncated document ID. Click "Refresh" to update.
+- **Delete section** -- Paste a full document ID and click "Delete" to remove a document and all its chunks.
+### Stats Bar
+At the top of every tab, a card shows the current count of indexed documents and total chunks.
+---
+## Setup and Installation
+### Prerequisites
+- Python 3.12 or later
+- A Qdrant Cloud account (free tier)
+- A Google AI Studio account (free tier Gemini API key)
+- (Optional) Docker and Docker Compose
+### Step 1: Get API Keys
+**Qdrant Cloud (vector database):**
+1. Go to [https://cloud.qdrant.io](https://cloud.qdrant.io) and create a free account.
+2. Create a new cluster (the free tier provides 1 GB of storage).
+3. Copy the cluster URL (e.g., `https://abc123-xyz.us-east4-0.gcp.cloud.qdrant.io:6333`).
+4. Generate an API key from the cluster dashboard.
+**Google Gemini (LLM):**
+1. Go to [https://aistudio.google.com/apikey](https://aistudio.google.com/apikey).
+2. Click "Create API key" and select or create a Google Cloud project.
+3. Copy the generated API key. The free tier allows 15 requests per minute for Gemini 2.0 Flash.
+### Step 2: Clone and Configure
+```bash
+git clone <repository-url>
+cd ragcore
+```
+Create a `.env` file in the `ragcore/` directory:
+```env
+# Required
+GEMINI_API_KEY=your-gemini-api-key-here
+QDRANT_URL=https://your-cluster.cloud.qdrant.io:6333
+QDRANT_API_KEY=your-qdrant-api-key-here
+# Optional (these are the defaults)
+EMBEDDING_MODEL=all-MiniLM-L6-v2
+EMBEDDING_DIM=384
+QDRANT_COLLECTION=ragcore_docs
+CHUNK_SIZE=512
+CHUNK_OVERLAP=50
+TOP_K=10
+RERANK_TOP_K=5
+DENSE_WEIGHT=0.6
+SPARSE_WEIGHT=0.4
+GEMINI_MODEL=gemini-2.0-flash
+GEMINI_RPM_LIMIT=15
+GEMINI_TEMPERATURE=0.3
+GEMINI_MAX_TOKENS=2048
+LOG_LEVEL=INFO
+MAX_FILE_SIZE_MB=10
+```
+### Step 3: Running Locally
+```bash
+# Create and activate a virtual environment
+python -m venv .venv
+source .venv/bin/activate       # On Linux/macOS
+# .venv\Scripts\activate        # On Windows
+# Install dependencies
+pip install -r requirements.txt
+# Start the server
+uvicorn app.main:app --host 0.0.0.0 --port 7860
+```
+The first startup will download two ML models (~90 MB for the embedding model, ~50 MB for the reranker). Subsequent startups use cached models.
+Once running:
+- Web UI: [http://localhost:7860/ui](http://localhost:7860/ui)
+- API docs: [http://localhost:7860/docs](http://localhost:7860/docs)
+- Health check: [http://localhost:7860/health](http://localhost:7860/health)
+### Step 4: Running with Docker
+```bash
+# Build and run
+docker compose up --build
+# Or build and run in detached mode
+docker compose up --build -d
+```
+The Docker build pre-downloads both ML models into the image layer, so container startup is faster. The app is exposed on port 8000 (mapped from container port 7860).
+Once running: [http://localhost:8000/ui](http://localhost:8000/ui)
+---
+## Deployment
+### Deploying to HuggingFace Spaces
+HuggingFace Spaces provides free hosting for Gradio and Docker-based applications. RagCore is pre-configured for deployment there.
+**Step-by-step:**
+1. **Create a HuggingFace account** at [https://huggingface.co](https://huggingface.co) if you do not have one.
+2. **Create a new Space:**
+   - Go to [https://huggingface.co/new-space](https://huggingface.co/new-space).
+   - Choose a name (e.g., `ragcore`).
+   - Select **Docker** as the SDK.
+   - Choose the **Free** CPU basic tier.
+   - Click "Create Space".
+3. **Configure secrets:**
+   - Go to your Space's Settings > Repository secrets.
+   - Add the following secrets:
+     - `GEMINI_API_KEY` -- your Google Gemini API key
+     - `QDRANT_URL` -- your Qdrant Cloud cluster URL
+     - `QDRANT_API_KEY` -- your Qdrant Cloud API key
+4. **Push the code:**
+   ```bash
+   cd ragcore
+   git remote add space https://huggingface.co/spaces/YOUR_USERNAME/ragcore
+   git push space main
+   ```
+   Alternatively, upload files via the HuggingFace web interface.
+5. **Wait for the build** -- the Docker image will be built on HuggingFace's infrastructure. The first build takes 5-10 minutes due to model downloads. The Space will show "Running" when ready.
+6. **Access your app** at `https://YOUR_USERNAME-ragcore.hf.space`.
+**Important notes:**
+- HuggingFace Spaces exposes port 7860 by default, which matches the Dockerfile's `EXPOSE 7860`.
+- The free tier has 2 vCPU and 16 GB RAM, which is sufficient for RagCore.
+- Spaces may sleep after inactivity. The first request after sleep triggers a cold start (30-60 seconds).
+---
+## Configuration Reference
+All settings are managed via environment variables, loaded from a `.env` file by `pydantic-settings`.
+| Variable | Type | Default | Description |
+|---|---|---|---|
+| `GEMINI_API_KEY` | string | `""` | **Required.** Google Gemini API key for LLM generation. |
+| `QDRANT_URL` | string | `""` | **Required.** Full URL of the Qdrant Cloud cluster (including port). |
+| `QDRANT_API_KEY` | string | `""` | **Required.** Qdrant Cloud API key for authentication. |
+| `EMBEDDING_MODEL` | string | `all-MiniLM-L6-v2` | HuggingFace model name for sentence-transformers. |
+| `EMBEDDING_DIM` | integer | `384` | Dimensionality of the embedding vectors. Must match the model. |
+| `QDRANT_COLLECTION` | string | `ragcore_docs` | Name of the Qdrant collection to use. Created automatically if missing. |
+| `CHUNK_SIZE` | integer | `512` | Maximum number of words per text chunk. |
+| `CHUNK_OVERLAP` | integer | `50` | Number of words overlapping between consecutive chunks. |
+| `TOP_K` | integer | `10` | Number of chunks retrieved by the hybrid retriever. |
+| `RERANK_TOP_K` | integer | `5` | Number of chunks kept after cross-encoder reranking. |
+| `DENSE_WEIGHT` | float | `0.6` | Weight for dense (vector) search in RRF fusion. Range: 0.0-1.0. |
+| `SPARSE_WEIGHT` | float | `0.4` | Weight for sparse (BM25) search in RRF fusion. Range: 0.0-1.0. |
+| `GEMINI_MODEL` | string | `gemini-2.0-flash` | Gemini model identifier. |
+| `GEMINI_RPM_LIMIT` | integer | `15` | Maximum requests per minute to the Gemini API. |
+| `GEMINI_TEMPERATURE` | float | `0.3` | LLM generation temperature. Lower values produce more deterministic output. |
+| `GEMINI_MAX_TOKENS` | integer | `2048` | Maximum number of output tokens per LLM generation. |
+| `LOG_LEVEL` | string | `INFO` | Logging level. Valid values: DEBUG, INFO, WARNING, ERROR, CRITICAL. |
+| `MAX_FILE_SIZE_MB` | integer | `10` | Maximum allowed file size for upload in megabytes. |
+---
+## How It Works End-to-End
+This section traces a complete user interaction: uploading a PDF and then asking a question about it.
+### Phase 1: Document Ingestion
+**User action:** Uploads `annual-report-2024.pdf` (2.1 MB, 45 pages) via the Gradio Documents tab.
+1. The Gradio UI reads the file and sends it as a multipart POST to `http://localhost:7860/api/ingest`.
+2. **Validation** (`ingest.py`):
+   - Filename is checked: extension `.pdf` is in `SUPPORTED_EXTENSIONS`.
+   - File size 2.1 MB is under the 10 MB limit.
+3. **Parsing** (`parsers.py`):
+   - `parse_pdf()` creates a `PdfReader` from the bytes.
+   - Iterates over all 45 pages, extracting text from each.
+   - Joins page texts with double newlines.
+   - `clean_text()` normalizes whitespace: collapses 3+ consecutive newlines to 2, collapses horizontal whitespace to single spaces, trims each line.
+   - Result: ~85,000 characters of cleaned text.
+4. **Metadata extraction** (`metadata.py`):
+   - `extract_title()` returns `"Annual Report 2024 - Acme Corporation"` (first meaningful line).
+   - `extract_dates()` finds `"2024-03-15"` in the first 2000 chars, parses it to `datetime(2024, 3, 15)`.
+   - `extract_tags()` finds frequent capitalized phrases: `["acme corporation", "revenue growth", "machine learning", ...]`.
+   - `get_page_count()` returns `45`.
+   - Final `DocumentMetadata`: source="annual-report-2024.pdf", doc_type="pdf", title="Annual Report 2024 - Acme Corporation", created_date=2024-03-15, tags=[...], page_count=45.
+5. **Chunking** (`chunker.py`):
+   - Splits the ~85,000 chars into sentences via `(?<=[.!?])\s+`.
+   - Accumulates sentences until the word count exceeds 512.
+   - Produces ~32 chunks, each with 50-word overlap with the next.
+   - Each chunk records start_char, end_char, and chunk_index.
+6. **Embedding** (`embedder.py`):
+   - `embed_texts()` encodes all 32 chunk texts in a single batch (batch_size=64).
+   - Returns 32 vectors, each of dimension 384, L2-normalized.
+7. **Vector storage** (`vectorstore.py`):
+   - `upsert_chunks()` creates 32 `PointStruct` objects with the vectors and payload.
+   - Since 32 < 100, they are uploaded in a single batch.
+   - Each point's payload includes text, document_id, chunk_index, source, doc_type, title, created_date, tags, page_count.
+8. **BM25 indexing** (`bm25.py`):
+   - `add_documents()` tokenizes each chunk (lowercase, remove stop words, remove single chars).
+   - Appends to the document list and rebuilds the full BM25Okapi index.
+9. **Response:** Returns `IngestResponse` with document_id, filename, num_chunks=32, and success message.
+### Phase 2: Querying
+**User action:** Types `"What was the revenue growth last year from PDFs?"` in the Ask tab with streaming enabled.
+1. The Gradio UI sends a POST to `http://localhost:7860/api/ask` with:
+   ```json
+   {"query": "What was the revenue growth last year from PDFs?", "top_k": 10, "rerank_top_k": 5, "stream": true, "filters": {"doc_type": "pdf"}}
+   ```
+   (Note: the UI sets `doc_type` filter from the dropdown if not "All".)
+2. **Query analysis** (`query_analyzer.py`):
+   - Doc type extraction: matches "PDFs" -> `filters.doc_type = "pdf"`.
+   - Date extraction: matches "last year" -> `filters.date_from = 2025-03-17`, `filters.date_to = 2026-03-17`.
+   - Clean query: removes "last year" and "PDFs" -> `"What was the revenue growth"`.
+   - Intent: matches `^(?:what|...)` -> `"factual"`.
+   - Confidence: 0.5 + 0.1 (doc_type) + 0.1 (date) = 0.7.
+3. **Hybrid retrieval** (`retriever.py`):
+   - Embeds the clean query `"What was the revenue growth"` to a 384-dim vector.
+   - **Dense search:** Queries Qdrant with the vector, limit=20 (top_k * 2), with filters for doc_type="pdf" and date range. Returns 20 results ranked by cosine similarity.
+   - **Sparse search:** Tokenizes query to `["what", "revenue", "growth"]` (stop words removed), scores all BM25 documents, returns top 20 by BM25 score. Post-filters by doc_type="pdf".
+   - **RRF fusion:** For each chunk, computes `score = 0.6 * 1/(60+dense_rank) + 0.4 * 1/(60+sparse_rank)`. Chunks appearing in both lists get boosted scores.
+   - Deduplicates by chunk_id, takes top 10.
+4. **Reranking** (`reranker.py`):
+   - Creates passage pairs: (query, chunk_text) for all 10 retrieved chunks.
+   - The FlashRank cross-encoder scores each pair jointly.
+   - Returns the top 5 by cross-encoder score, with updated scores and ranks.
+5. **Answer generation** (`generator.py`):
+   - Builds context with numbered passages:
+     ```
+     [1] (Source: annual-report-2024.pdf)
+     Revenue increased by 23% year-over-year...
+     [2] (Source: annual-report-2024.pdf)
+     The growth was primarily driven by...
+     ```
+   - Constructs the SYSTEM_PROMPT with context and query.
+   - Calls `llm.generate_stream()` which respects the rate limit, then yields text chunks.
+6. **Streaming response** (`query.py`):
+   - Each text chunk from Gemini is wrapped as `data: {"text": "..."}\n\n` (SSE format).
+   - The Gradio UI accumulates text and renders it progressively in the answer area.
+   - Final SSE event includes `{"done": true, "sources": [...], "model": "gemini-2.0-flash", "time_ms": 3420}`.
+   - Gradio formats the sources as styled cards showing filename, score, and snippet.
+---
+## Testing
+### Running Tests
+```bash
+# Run all unit tests (excluding integration tests)
+pytest tests/ -v --ignore=tests/test_integration.py -x
+# Run a specific test file
+pytest tests/test_chunker.py -v
+# Run with coverage (install pytest-cov first)
+pytest tests/ -v --ignore=tests/test_integration.py --cov=app
+```
+### Test Coverage
+| Test File | Module Under Test | What Is Tested |
+|---|---|---|
+| `test_chunker.py` | `app.core.chunker` | Empty input, single sentence, multiple chunks, overlap behavior, chunk size limits |
+| `test_parsers.py` | `app.utils.parsers` | UTF-8 text, Latin-1 fallback, HTML tag stripping, unsupported extensions, empty files, extension-based dispatch |
+| `test_query_analyzer.py` | `app.core.query_analyzer` | Intent classification (factual, comparative, summarize, explanatory), doc type extraction, date extraction, clean query preservation |
+| `test_retrieval.py` | `app.core.retriever` | RRF fusion (basic, empty lists, single list, weighted), metadata filter application |
+| `test_api.py` | `app.main` (FastAPI) | Health endpoint returns 200 with components, root redirects to `/ui`, `/docs` page loads |
+### Test Fixtures
+Defined in `tests/conftest.py`:
+- `client` -- A `FastAPI TestClient` instance for API testing.
+- `sample_text` -- A paragraph about RAG for use in unit tests.
+**Note:** Unit tests mock or avoid external dependencies (Qdrant, Gemini). The CI pipeline sets dummy API keys via environment variables. Integration tests (if present in `tests/test_integration.py`) are excluded from the default test run.
+---
+## CI/CD
+### GitHub Actions Pipeline (`.github/workflows/ci.yml`)
+The CI pipeline runs on every push to `main` and on every pull request targeting `main`.
+**Pipeline steps:**
+| Step | Description |
+|---|---|
+| Checkout | Clones the repository using `actions/checkout@v4` |
+| Set up Python | Installs Python 3.12 via `actions/setup-python@v5` |
+| Install dependencies | Runs `pip install -r requirements.txt` |
+| Lint | Runs `ruff check .` for code style and quality |
+| Unit tests | Runs `pytest tests/ -v --ignore=tests/test_integration.py -x` |
+**Environment variables set during testing:**
+```yaml
+env:
+  GEMINI_API_KEY: "test"
+  QDRANT_URL: "http://localhost:6333"
+  QDRANT_API_KEY: "test"
+```
+These are dummy values that allow the application to initialize its settings without connecting to real services. Tests that would require live connections are either mocked or skipped.
+The `-x` flag causes pytest to stop on the first failure for faster feedback.
+---
+## Performance and Limits
+### Free Tier Limits
+| Service | Limit | Impact |
+|---|---|---|
+| **Qdrant Cloud** (free tier) | 1 GB storage | Approximately 500,000-700,000 chunks at 384 dimensions. More than sufficient for thousands of documents. |
+| **Google Gemini** (free tier) | 15 requests per minute | RagCore enforces this with built-in rate limiting (4-second minimum interval between calls). Each question costs 1 API call. |
+| **HuggingFace Spaces** (free tier) | 2 vCPU, 16 GB RAM | Sufficient for running the embedding model, reranker, and BM25 index concurrently. |
+### Expected Latency
+| Operation | Typical Latency | Notes |
+|---|---|---|
+| Document ingestion (10-page PDF) | 3-8 seconds | Dominated by embedding time on CPU |
+| Document ingestion (50-page PDF) | 10-20 seconds | Linear with number of chunks |
+| Query (hybrid retrieval only) | 100-300 ms | Embedding + Qdrant + BM25 + RRF |
+| Query (full RAG with answer) | 3-8 seconds | Dominated by Gemini API call |
+| Query (streaming, time to first token) | 1-3 seconds | Reranking + Gemini startup |
+| BM25 rebuild on startup | 50-500 ms | Depends on collection size (scrolls all points from Qdrant) |
+| Embedding model cold load | 2-5 seconds | First request only; cached thereafter |
+| Reranker model cold load | 1-3 seconds | First request only; cached thereafter |
+### Capacity Guidelines
+- **Small deployment** (< 100 documents, < 5,000 chunks): Everything runs comfortably within free tiers.
+- **Medium deployment** (100-1,000 documents, 5,000-50,000 chunks): BM25 index may use 50-500 MB RAM. Qdrant free tier still has ample space.
+- **Large deployment** (> 1,000 documents): Consider upgrading Qdrant to a paid tier and running the embedder on GPU for faster ingestion.
+---
+## Troubleshooting
+### Common Errors and Fixes
+**Error: `"Unsupported file type '.docx'"` or similar**
+Only PDF, TXT, and HTML files are supported. Convert other formats to one of these before uploading. For DOCX files, export to PDF from your word processor.
+---
+**Error: `"File too large. Maximum size is 10MB"`**
+Increase the limit by setting `MAX_FILE_SIZE_MB` in your `.env` file, or split the file into smaller parts.
+---
+**Error: `"Could not extract text from file"`**
+The PDF may be image-based (scanned document) without an embedded text layer. pypdf cannot extract text from images. Use an OCR tool (e.g., Tesseract) to add a text layer first.
+---
+**Error: Qdrant connection timeout or `"Connection refused"`**
+- Verify your `QDRANT_URL` includes the port (typically `:6333`).
+- Verify your `QDRANT_API_KEY` is correct.
+- Check that your Qdrant Cloud cluster is active (free clusters may be paused after inactivity).
+---
+**Error: `"Gemini generation failed"` or `"429 Too Many Requests"`**
+You have exceeded the Gemini API rate limit. RagCore has built-in rate limiting, but if multiple users are sharing the same API key, collisions can occur. Solutions:
+- Wait a few seconds and retry.
+- Reduce `GEMINI_RPM_LIMIT` to add more buffer between calls.
+- Upgrade to a paid Gemini plan for higher limits.
+---
+**Error: `"Embedder initialization deferred"`**
+This warning during startup means the embedding model could not be loaded immediately. This usually resolves on the first request. If it persists:
+- Check internet connectivity (the model needs to be downloaded on first use).
+- Ensure sufficient disk space (~200 MB for cached models).
+- Check if the `EMBEDDING_MODEL` name is correct.
+---
+**BM25 index shows 0 documents after restart**
+This is expected on first startup with a fresh Qdrant collection. The BM25 index rebuilds from Qdrant on startup. If Qdrant has data but BM25 shows 0, check the Qdrant connection settings.
+---
+**Gradio UI not loading or showing "Connecting..."**
+- Ensure the server is running on port 7860 (or whichever port you configured).
+- The Gradio UI communicates with the API via `http://localhost:7860`. If running in Docker, this internal URL is correct. If running behind a reverse proxy, the UI may need adjustment.
+---
+**Slow first request after startup**
+The first request triggers lazy loading of the reranker model. This is a one-time cost of 1-3 seconds. Subsequent requests are fast.
+---
+**Docker build fails at model download step**
+The Dockerfile pre-downloads ML models during build. This requires internet access during `docker build`. If building behind a corporate proxy, configure Docker's proxy settings. If the download fails, the build will fail. Retry usually resolves transient network issues.

app/__init__.py ADDED Viewed

File without changes

app/api/__init__.py ADDED Viewed

File without changes

app/api/deps.py ADDED Viewed

	@@ -0,0 +1,53 @@

+import logging
+from functools import lru_cache
+from app.core.bm25 import BM25Index, get_bm25
+from app.core.embedder import EmbedderService, get_embedder
+from app.core.generator import AnswerGenerator
+from app.core.llm import GeminiService, get_llm
+from app.core.query_analyzer import QueryAnalyzer
+from app.core.reranker import RerankerService, get_reranker
+from app.core.retriever import HybridRetriever
+from app.core.vectorstore import VectorStoreService, get_vectorstore
+logger = logging.getLogger(__name__)
+def dep_embedder() -> EmbedderService:
+    return get_embedder()
+def dep_vectorstore() -> VectorStoreService:
+    return get_vectorstore()
+def dep_bm25() -> BM25Index:
+    return get_bm25()
+def dep_reranker() -> RerankerService:
+    return get_reranker()
+def dep_llm() -> GeminiService:
+    return get_llm()
+@lru_cache
+def dep_query_analyzer() -> QueryAnalyzer:
+    return QueryAnalyzer()
+def dep_retriever() -> HybridRetriever:
+    return HybridRetriever(
+        vectorstore=get_vectorstore(),
+        bm25=get_bm25(),
+        embedder=get_embedder(),
+    )
+def dep_generator() -> AnswerGenerator:
+    return AnswerGenerator(
+        llm=get_llm(),
+        reranker=get_reranker(),
+    )

app/api/routes/__init__.py ADDED Viewed

File without changes

app/api/routes/health.py ADDED Viewed

	@@ -0,0 +1,42 @@

+from fastapi import APIRouter
+router = APIRouter(tags=["health"])
+@router.get("/health")
+async def health_check():
+    status = {"status": "ok", "components": {}}
+    try:
+        from app.core.embedder import _embedder
+        status["components"]["embedder"] = "loaded" if _embedder else "not loaded"
+    except Exception:
+        status["components"]["embedder"] = "error"
+    try:
+        from app.core.bm25 import _bm25
+        if _bm25:
+            status["components"]["bm25"] = f"{_bm25.doc_count} documents"
+        else:
+            status["components"]["bm25"] = "not initialized"
+    except Exception:
+        status["components"]["bm25"] = "error"
+    try:
+        from app.core.vectorstore import _vectorstore
+        if _vectorstore:
+            count = _vectorstore.count()
+            status["components"]["vectorstore"] = f"connected ({count} points)"
+        else:
+            status["components"]["vectorstore"] = "not connected"
+    except Exception as e:
+        status["components"]["vectorstore"] = f"error: {e}"
+        status["status"] = "degraded"
+    try:
+        from app.core.llm import _llm
+        status["components"]["llm"] = f"ready ({_llm.model_name})" if _llm else "not initialized"
+    except Exception:
+        status["components"]["llm"] = "error"
+    return status

app/api/routes/ingest.py ADDED Viewed

	@@ -0,0 +1,155 @@

+import logging
+from fastapi import APIRouter, Depends, HTTPException, UploadFile
+from app.api.deps import dep_bm25, dep_embedder, dep_vectorstore
+from app.config import get_settings
+from app.core.bm25 import BM25Index
+from app.core.chunker import chunk_text
+from app.core.embedder import EmbedderService
+from app.core.metadata import extract_metadata
+from app.core.vectorstore import VectorStoreService
+from app.models.document import Chunk, Document, DocumentMetadata
+from app.models.schemas import IngestResponse
+from app.utils.helpers import generate_id
+from app.utils.parsers import SUPPORTED_EXTENSIONS, get_page_count, parse_document
+logger = logging.getLogger(__name__)
+router = APIRouter(prefix="/api", tags=["ingest"])
+@router.post("/ingest", response_model=IngestResponse)
+async def ingest_document(
+    file: UploadFile,
+    vectorstore: VectorStoreService = Depends(dep_vectorstore),
+    embedder: EmbedderService = Depends(dep_embedder),
+    bm25: BM25Index = Depends(dep_bm25),
+):
+    settings = get_settings()
+    # Validate file extension
+    if not file.filename:
+        raise HTTPException(status_code=400, detail="Filename is required")
+    ext = "." + file.filename.rsplit(".", 1)[-1].lower() if "." in file.filename else ""
+    if ext not in SUPPORTED_EXTENSIONS:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Unsupported file type '{ext}'. Supported: {', '.join(SUPPORTED_EXTENSIONS)}",
+        )
+    # Read file
+    file_bytes = await file.read()
+    # Validate file size
+    max_size = settings.max_file_size_mb * 1024 * 1024
+    if len(file_bytes) > max_size:
+        raise HTTPException(
+            status_code=413,
+            detail=f"File too large. Maximum size is {settings.max_file_size_mb}MB",
+        )
+    # Check for duplicate document (same filename already indexed)
+    existing_docs = vectorstore.get_document_ids()
+    for doc in existing_docs:
+        if doc.get("source") == file.filename:
+            raise HTTPException(
+                status_code=409,
+                detail=f"Document '{file.filename}' is already indexed (ID: {doc['document_id'][:12]}...). "
+                       f"Delete it first if you want to re-upload.",
+            )
+    # Parse document
+    try:
+        raw_text = parse_document(file_bytes, file.filename)
+    except Exception as e:
+        logger.error(f"Failed to parse '{file.filename}': {e}")
+        raise HTTPException(status_code=422, detail=f"Failed to parse file: {e}")
+    if not raw_text or not raw_text.strip():
+        raise HTTPException(status_code=422, detail="Could not extract text from file")
+    # Extract metadata
+    page_count = get_page_count(file_bytes, file.filename)
+    metadata = extract_metadata(raw_text, file.filename, page_count=page_count)
+    # Create document
+    document_id = generate_id()
+    # Chunk text
+    chunk_dicts = chunk_text(
+        raw_text,
+        chunk_size=settings.chunk_size,
+        chunk_overlap=settings.chunk_overlap,
+    )
+    if not chunk_dicts:
+        raise HTTPException(status_code=422, detail="Document produced no text chunks")
+    chunks = [
+        Chunk(
+            chunk_id=generate_id(),
+            document_id=document_id,
+            text=c["text"],
+            metadata=metadata,
+            chunk_index=c["chunk_index"],
+            start_char=c["start_char"],
+            end_char=c["end_char"],
+        )
+        for c in chunk_dicts
+    ]
+    # Embed chunks
+    try:
+        texts = [c.text for c in chunks]
+        embeddings = embedder.embed_texts(texts)
+    except Exception as e:
+        logger.error(f"Embedding failed for '{file.filename}': {e}")
+        raise HTTPException(status_code=500, detail=f"Embedding failed: {e}")
+    # Store in Qdrant
+    try:
+        vectorstore.upsert_chunks(chunks, embeddings)
+    except Exception as e:
+        logger.error(f"Vector store upsert failed: {e}")
+        raise HTTPException(status_code=500, detail=f"Failed to store document: {e}")
+    # Add to BM25 index
+    bm25.add_documents(chunks)
+    logger.info(f"Ingested '{file.filename}': {len(chunks)} chunks")
+    return IngestResponse(
+        document_id=document_id,
+        filename=file.filename,
+        num_chunks=len(chunks),
+        message=f"Successfully ingested '{file.filename}' with {len(chunks)} chunks",
+    )
+@router.get("/documents")
+async def list_documents(
+    vectorstore: VectorStoreService = Depends(dep_vectorstore),
+):
+    try:
+        docs = vectorstore.get_document_ids()
+        return {"documents": docs, "total": len(docs)}
+    except Exception as e:
+        logger.error(f"Failed to list documents: {e}")
+        raise HTTPException(status_code=500, detail=f"Failed to list documents: {e}")
+@router.delete("/documents/{document_id}")
+async def delete_document(
+    document_id: str,
+    vectorstore: VectorStoreService = Depends(dep_vectorstore),
+    bm25: BM25Index = Depends(dep_bm25),
+):
+    try:
+        vectorstore.delete_document(document_id)
+        bm25.rebuild_from_vectorstore(vectorstore)
+        return {"message": f"Document '{document_id}' deleted successfully"}
+    except Exception as e:
+        logger.error(f"Failed to delete document '{document_id}': {e}")
+        raise HTTPException(status_code=500, detail=f"Failed to delete document: {e}")

app/api/routes/query.py ADDED Viewed

	@@ -0,0 +1,128 @@

+import json
+import logging
+import time
+from fastapi import APIRouter, Depends, HTTPException
+from fastapi.responses import StreamingResponse
+from app.api.deps import dep_generator, dep_query_analyzer, dep_retriever
+from app.core.generator import AnswerGenerator
+from app.core.query_analyzer import QueryAnalyzer
+from app.core.retriever import HybridRetriever
+from app.models.schemas import (
+    GeneratedAnswer,
+    QueryRequest,
+    SearchRequest,
+    SearchResponse,
+)
+logger = logging.getLogger(__name__)
+router = APIRouter(prefix="/api", tags=["query"])
+def _resolve_filters(request_filters, analyzed_filters):
+    """Use explicit request filters if provided, otherwise use analyzed filters only if they contain values."""
+    if request_filters and request_filters.has_filters():
+        return request_filters
+    if analyzed_filters and analyzed_filters.has_filters():
+        return analyzed_filters
+    return None
+@router.post("/search", response_model=SearchResponse)
+async def search(
+    request: SearchRequest,
+    retriever: HybridRetriever = Depends(dep_retriever),
+    analyzer: QueryAnalyzer = Depends(dep_query_analyzer),
+):
+    try:
+        start = time.perf_counter()
+        analyzed = analyzer.analyze(request.query)
+        filters = _resolve_filters(request.filters, analyzed.extracted_filters)
+        results = retriever.retrieve(
+            query=analyzed.clean_query,
+            top_k=request.top_k,
+            filters=filters,
+        )
+        elapsed = (time.perf_counter() - start) * 1000
+        return SearchResponse(
+            query=request.query,
+            results=results,
+            total_results=len(results),
+            search_time_ms=elapsed,
+        )
+    except Exception as e:
+        logger.error(f"Search failed: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Search failed: {e}")
+@router.post("/ask")
+async def ask(
+    request: QueryRequest,
+    retriever: HybridRetriever = Depends(dep_retriever),
+    generator: AnswerGenerator = Depends(dep_generator),
+    analyzer: QueryAnalyzer = Depends(dep_query_analyzer),
+):
+    try:
+        analyzed = analyzer.analyze(request.query)
+        filters = _resolve_filters(request.filters, analyzed.extracted_filters)
+        chunks = retriever.retrieve(
+            query=analyzed.clean_query,
+            top_k=request.top_k,
+            filters=filters,
+        )
+        if request.stream:
+            return StreamingResponse(
+                _stream_response(request.query, chunks, generator, request.rerank_top_k, analyzed.intent),
+                media_type="text/event-stream",
+            )
+        answer = generator.generate_answer(
+            query=request.query,
+            chunks=chunks,
+            rerank_top_k=request.rerank_top_k,
+            intent=analyzed.intent,
+        )
+        return answer
+    except Exception as e:
+        logger.error(f"Ask failed: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Query failed: {e}")
+async def _stream_response(
+    query: str,
+    chunks,
+    generator: AnswerGenerator,
+    rerank_top_k: int,
+    intent: str,
+):
+    try:
+        async for item in generator.generate_answer_stream(
+            query=query,
+            chunks=chunks,
+            rerank_top_k=rerank_top_k,
+            intent=intent,
+        ):
+            if isinstance(item, str):
+                yield f"data: {json.dumps({'text': item})}\n\n"
+            elif isinstance(item, GeneratedAnswer):
+                sources = [
+                    {
+                        "chunk_id": s.chunk_id,
+                        "text": s.text[:200],
+                        "source": s.metadata.source,
+                        "score": s.score,
+                    }
+                    for s in item.sources
+                ]
+                yield f"data: {json.dumps({'done': True, 'sources': sources, 'model': item.model, 'time_ms': item.generation_time_ms})}\n\n"
+    except Exception as e:
+        logger.error(f"Streaming failed: {e}", exc_info=True)
+        yield f"data: {json.dumps({'error': str(e), 'done': True})}\n\n"

app/config.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import logging
+from functools import lru_cache
+from pydantic_settings import BaseSettings, SettingsConfigDict
+class Settings(BaseSettings):
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        extra="ignore",
+    )
+    # API Keys
+    gemini_api_key: str = ""
+    qdrant_url: str = ""
+    qdrant_api_key: str = ""
+    # Embedding
+    embedding_model: str = "all-MiniLM-L6-v2"
+    embedding_dim: int = 384
+    # Qdrant
+    qdrant_collection: str = "ragcore_docs"
+    # Chunking
+    chunk_size: int = 512
+    chunk_overlap: int = 50
+    # Retrieval
+    top_k: int = 10
+    rerank_top_k: int = 5
+    dense_weight: float = 0.6
+    sparse_weight: float = 0.4
+    # LLM
+    gemini_model: str = "gemini-2.5-flash"
+    gemini_rpm_limit: int = 15
+    gemini_temperature: float = 0.3
+    gemini_max_tokens: int = 2048
+    # App
+    log_level: str = "INFO"
+    max_file_size_mb: int = 10
+@lru_cache
+def get_settings() -> Settings:
+    return Settings()
+def setup_logging() -> None:
+    settings = get_settings()
+    logging.basicConfig(
+        level=getattr(logging, settings.log_level.upper(), logging.INFO),
+        format="%(asctime)s | %(levelname)-8s | %(name)s | %(message)s",
+        datefmt="%Y-%m-%d %H:%M:%S",
+    )

app/core/__init__.py ADDED Viewed

File without changes

app/core/bm25.py ADDED Viewed

	@@ -0,0 +1,124 @@

+import logging
+import re
+import time
+from rank_bm25 import BM25Okapi
+from app.models.document import Chunk
+logger = logging.getLogger(__name__)
+STOP_WORDS = {
+    "a", "an", "the", "is", "are", "was", "were", "be", "been", "being",
+    "have", "has", "had", "do", "does", "did", "will", "would", "could",
+    "should", "may", "might", "can", "shall", "to", "of", "in", "for",
+    "on", "with", "at", "by", "from", "as", "into", "through", "during",
+    "before", "after", "and", "but", "or", "not", "no", "if", "then",
+    "than", "that", "this", "it", "its", "he", "she", "they", "we", "you",
+}
+def tokenize(text: str) -> list[str]:
+    text = text.lower()
+    words = re.findall(r"\b\w+\b", text)
+    return [w for w in words if w not in STOP_WORDS and len(w) > 1]
+class BM25Index:
+    def __init__(self):
+        self.documents: list[dict] = []
+        self.index: BM25Okapi | None = None
+    def build_index(self, chunks: list[Chunk]) -> None:
+        self.documents = [
+            {
+                "chunk_id": chunk.chunk_id,
+                "document_id": chunk.document_id,
+                "text": chunk.text,
+                "tokens": tokenize(chunk.text),
+                "metadata": chunk.metadata.model_dump() if chunk.metadata else {},
+            }
+            for chunk in chunks
+        ]
+        if self.documents:
+            corpus = [doc["tokens"] for doc in self.documents]
+            self.index = BM25Okapi(corpus)
+        logger.info(f"Built BM25 index with {len(self.documents)} documents")
+    def add_documents(self, chunks: list[Chunk]) -> None:
+        new_docs = [
+            {
+                "chunk_id": chunk.chunk_id,
+                "document_id": chunk.document_id,
+                "text": chunk.text,
+                "tokens": tokenize(chunk.text),
+                "metadata": chunk.metadata.model_dump() if chunk.metadata else {},
+            }
+            for chunk in chunks
+        ]
+        self.documents.extend(new_docs)
+        if self.documents:
+            corpus = [doc["tokens"] for doc in self.documents]
+            self.index = BM25Okapi(corpus)
+        logger.info(f"BM25 index updated: {len(self.documents)} total documents")
+    def search(self, query: str, top_k: int = 10) -> list[dict]:
+        if not self.index or not self.documents:
+            return []
+        tokens = tokenize(query)
+        if not tokens:
+            return []
+        scores = self.index.get_scores(tokens)
+        scored_docs = [
+            (score, doc) for score, doc in zip(scores, self.documents) if score > 0
+        ]
+        scored_docs.sort(key=lambda x: x[0], reverse=True)
+        return [
+            {
+                "chunk_id": doc["chunk_id"],
+                "document_id": doc["document_id"],
+                "text": doc["text"],
+                "score": float(score),
+                "metadata": doc["metadata"],
+            }
+            for score, doc in scored_docs[:top_k]
+        ]
+    def rebuild_from_vectorstore(self, vectorstore) -> None:
+        start = time.perf_counter()
+        all_points = vectorstore.scroll_all()
+        self.documents = [
+            {
+                "chunk_id": p["chunk_id"],
+                "document_id": p["document_id"],
+                "text": p["text"],
+                "tokens": tokenize(p["text"]),
+                "metadata": p["metadata"],
+            }
+            for p in all_points
+            if p.get("text")
+        ]
+        if self.documents:
+            corpus = [doc["tokens"] for doc in self.documents]
+            self.index = BM25Okapi(corpus)
+        elapsed = (time.perf_counter() - start) * 1000
+        logger.info(
+            f"Rebuilt BM25 index from vectorstore: {len(self.documents)} docs in {elapsed:.0f}ms"
+        )
+    @property
+    def doc_count(self) -> int:
+        return len(self.documents)
+_bm25: BM25Index | None = None
+def get_bm25() -> BM25Index:
+    global _bm25
+    if _bm25 is None:
+        _bm25 = BM25Index()
+    return _bm25

app/core/chunker.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import logging
+import re
+logger = logging.getLogger(__name__)
+SENTENCE_PATTERN = re.compile(r"(?<=[.!?])\s+")
+def chunk_text(
+    text: str,
+    chunk_size: int = 512,
+    chunk_overlap: int = 50,
+) -> list[dict]:
+    if not text or not text.strip():
+        return []
+    sentences = SENTENCE_PATTERN.split(text)
+    sentences = [s.strip() for s in sentences if s.strip()]
+    if not sentences:
+        return []
+    chunks = []
+    current_words: list[str] = []
+    current_start = 0
+    char_pos = 0
+    for sentence in sentences:
+        words = sentence.split()
+        if current_words and len(current_words) + len(words) > chunk_size:
+            chunk_text_str = " ".join(current_words)
+            chunk_end = current_start + len(chunk_text_str)
+            chunks.append({
+                "text": chunk_text_str,
+                "start_char": current_start,
+                "end_char": chunk_end,
+                "chunk_index": len(chunks),
+            })
+            # Overlap: keep last chunk_overlap words
+            overlap_words = current_words[-chunk_overlap:] if chunk_overlap > 0 else []
+            overlap_text = " ".join(overlap_words)
+            current_start = chunk_end - len(overlap_text)
+            current_words = overlap_words
+        current_words.extend(words)
+    # Last chunk
+    if current_words:
+        chunk_text_str = " ".join(current_words)
+        chunks.append({
+            "text": chunk_text_str,
+            "start_char": current_start,
+            "end_char": current_start + len(chunk_text_str),
+            "chunk_index": len(chunks),
+        })
+    logger.info(f"Chunked text into {len(chunks)} chunks (size={chunk_size}, overlap={chunk_overlap})")
+    return chunks

app/core/embedder.py ADDED Viewed

	@@ -0,0 +1,43 @@

+import logging
+import time
+from sentence_transformers import SentenceTransformer
+from app.config import get_settings
+logger = logging.getLogger(__name__)
+class EmbedderService:
+    EMBEDDING_DIM = 384
+    def __init__(self, model_name: str):
+        start = time.perf_counter()
+        self.model = SentenceTransformer(model_name, device="cpu")
+        elapsed = (time.perf_counter() - start) * 1000
+        logger.info(f"Loaded embedding model '{model_name}' in {elapsed:.0f}ms")
+    def embed_texts(self, texts: list[str]) -> list[list[float]]:
+        if not texts:
+            return []
+        embeddings = self.model.encode(
+            texts,
+            batch_size=64,
+            show_progress_bar=False,
+            normalize_embeddings=True,
+        )
+        return embeddings.tolist()
+    def embed_query(self, query: str) -> list[float]:
+        return self.embed_texts([query])[0]
+_embedder: EmbedderService | None = None
+def get_embedder() -> EmbedderService:
+    global _embedder
+    if _embedder is None:
+        settings = get_settings()
+        _embedder = EmbedderService(settings.embedding_model)
+    return _embedder

app/core/generator.py ADDED Viewed

	@@ -0,0 +1,128 @@

+import logging
+import time
+from collections.abc import AsyncGenerator
+from app.core.llm import GeminiService
+from app.core.reranker import RerankerService
+from app.models.schemas import GeneratedAnswer, RetrievedChunk
+logger = logging.getLogger(__name__)
+SYSTEM_PROMPT = """You are a helpful assistant answering questions based on the provided context.
+CONTEXT:
+{context}
+RULES:
+- Answer based ONLY on the provided context.
+- Cite sources using [1], [2], etc. inline after the relevant information.
+- If the context doesn't contain enough information, say "I don't have enough information in the provided documents to answer this question."
+- Be concise but thorough.
+- Use markdown formatting for readability.
+QUESTION: {query}
+ANSWER:"""
+SUMMARY_PROMPT = """You are a helpful assistant. Summarize the following context.
+CONTEXT:
+{context}
+RULES:
+- Provide a structured summary using markdown.
+- Cite sources using [1], [2], etc.
+- Cover the key points from all provided sources.
+QUESTION: {query}
+SUMMARY:"""
+class AnswerGenerator:
+    def __init__(self, llm: GeminiService, reranker: RerankerService):
+        self.llm = llm
+        self.reranker = reranker
+    def _build_context(self, chunks: list[RetrievedChunk]) -> str:
+        parts = []
+        for i, chunk in enumerate(chunks, 1):
+            source = chunk.metadata.source or "unknown"
+            header = f"[{i}] (Source: {source})"
+            parts.append(f"{header}\n{chunk.text}")
+        return "\n\n".join(parts)
+    def _build_prompt(self, query: str, chunks: list[RetrievedChunk], intent: str = "factual") -> str:
+        context = self._build_context(chunks)
+        template = SUMMARY_PROMPT if intent == "summarize" else SYSTEM_PROMPT
+        return template.format(context=context, query=query)
+    def generate_answer(
+        self,
+        query: str,
+        chunks: list[RetrievedChunk],
+        rerank_top_k: int = 5,
+        intent: str = "factual",
+    ) -> GeneratedAnswer:
+        start = time.perf_counter()
+        # Rerank
+        reranked = self.reranker.rerank(query, chunks, top_k=rerank_top_k)
+        if not reranked:
+            return GeneratedAnswer(
+                query=query,
+                answer="No relevant documents found to answer your question.",
+                sources=[],
+                generation_time_ms=0,
+                model=self.llm.model_name,
+            )
+        prompt = self._build_prompt(query, reranked, intent)
+        answer = self.llm.generate(prompt)
+        elapsed = (time.perf_counter() - start) * 1000
+        logger.info(f"Generated answer in {elapsed:.0f}ms")
+        return GeneratedAnswer(
+            query=query,
+            answer=answer,
+            sources=reranked,
+            generation_time_ms=elapsed,
+            model=self.llm.model_name,
+        )
+    async def generate_answer_stream(
+        self,
+        query: str,
+        chunks: list[RetrievedChunk],
+        rerank_top_k: int = 5,
+        intent: str = "factual",
+    ) -> AsyncGenerator[str | GeneratedAnswer, None]:
+        # Rerank
+        reranked = self.reranker.rerank(query, chunks, top_k=rerank_top_k)
+        if not reranked:
+            yield GeneratedAnswer(
+                query=query,
+                answer="No relevant documents found to answer your question.",
+                sources=[],
+                generation_time_ms=0,
+                model=self.llm.model_name,
+            )
+            return
+        prompt = self._build_prompt(query, reranked, intent)
+        start = time.perf_counter()
+        async for text_chunk in self.llm.generate_stream(prompt):
+            yield text_chunk
+        elapsed = (time.perf_counter() - start) * 1000
+        # Final message with sources
+        yield GeneratedAnswer(
+            query=query,
+            answer="",  # Full answer was streamed
+            sources=reranked,
+            generation_time_ms=elapsed,
+            model=self.llm.model_name,
+        )

app/core/llm.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import asyncio
+import logging
+import time
+from collections.abc import AsyncGenerator
+import google.generativeai as genai
+from app.config import get_settings
+logger = logging.getLogger(__name__)
+class GeminiService:
+    def __init__(self, api_key: str, model_name: str, rpm_limit: int = 15):
+        genai.configure(api_key=api_key)
+        self.model = genai.GenerativeModel(model_name)
+        self.model_name = model_name
+        self._min_interval = 60.0 / rpm_limit
+        self._last_call_time = 0.0
+        logger.info(f"Initialized Gemini '{model_name}' (RPM limit: {rpm_limit})")
+    def _wait_for_rate_limit(self) -> None:
+        now = time.time()
+        elapsed = now - self._last_call_time
+        if elapsed < self._min_interval:
+            wait = self._min_interval - elapsed
+            logger.debug(f"Rate limiting: waiting {wait:.1f}s")
+            time.sleep(wait)
+        self._last_call_time = time.time()
+    async def _async_wait_for_rate_limit(self) -> None:
+        now = time.time()
+        elapsed = now - self._last_call_time
+        if elapsed < self._min_interval:
+            wait = self._min_interval - elapsed
+            logger.debug(f"Rate limiting: waiting {wait:.1f}s")
+            await asyncio.sleep(wait)
+        self._last_call_time = time.time()
+    def generate(self, prompt: str, temperature: float = 0.3, max_tokens: int = 2048) -> str:
+        self._wait_for_rate_limit()
+        try:
+            response = self.model.generate_content(
+                prompt,
+                generation_config=genai.types.GenerationConfig(
+                    temperature=temperature,
+                    max_output_tokens=max_tokens,
+                ),
+            )
+            return response.text
+        except Exception as e:
+            logger.error(f"Gemini generation failed: {e}")
+            raise
+    async def generate_stream(
+        self, prompt: str, temperature: float = 0.3, max_tokens: int = 2048
+    ) -> AsyncGenerator[str, None]:
+        await self._async_wait_for_rate_limit()
+        try:
+            response = self.model.generate_content(
+                prompt,
+                generation_config=genai.types.GenerationConfig(
+                    temperature=temperature,
+                    max_output_tokens=max_tokens,
+                ),
+                stream=True,
+            )
+            for chunk in response:
+                if chunk.text:
+                    yield chunk.text
+        except Exception as e:
+            logger.error(f"Gemini streaming failed: {e}")
+            raise
+_llm: GeminiService | None = None
+def get_llm() -> GeminiService:
+    global _llm
+    if _llm is None:
+        settings = get_settings()
+        _llm = GeminiService(
+            api_key=settings.gemini_api_key,
+            model_name=settings.gemini_model,
+            rpm_limit=settings.gemini_rpm_limit,
+        )
+    return _llm

app/core/metadata.py ADDED Viewed

	@@ -0,0 +1,62 @@

+import logging
+import re
+from collections import Counter
+from datetime import datetime
+from pathlib import Path
+from app.models.document import DocumentMetadata
+logger = logging.getLogger(__name__)
+DATE_PATTERNS = [
+    re.compile(r"\b(\d{4}-\d{2}-\d{2})\b"),
+    re.compile(r"\b(\d{2}/\d{2}/\d{4})\b"),
+    re.compile(
+        r"\b((?:January|February|March|April|May|June|July|August|September|October|November|December)"
+        r"\s+\d{1,2},?\s+\d{4})\b"
+    ),
+]
+DATE_FORMATS = ["%Y-%m-%d", "%m/%d/%Y", "%B %d, %Y", "%B %d %Y"]
+def extract_title(text: str) -> str | None:
+    for line in text.splitlines():
+        line = line.strip()
+        if line and len(line) > 3:
+            return line[:200]
+    return None
+def extract_dates(text: str) -> datetime | None:
+    for pattern in DATE_PATTERNS:
+        match = pattern.search(text[:2000])  # Only scan beginning
+        if match:
+            date_str = match.group(1)
+            for fmt in DATE_FORMATS:
+                try:
+                    return datetime.strptime(date_str, fmt)
+                except ValueError:
+                    continue
+    return None
+def extract_tags(text: str, max_tags: int = 10) -> list[str]:
+    words = re.findall(r"\b[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*\b", text)
+    counts = Counter(words)
+    tags = [word.lower() for word, count in counts.most_common(max_tags * 2) if count >= 2]
+    return tags[:max_tags]
+def extract_metadata(raw_text: str, filename: str, page_count: int | None = None) -> DocumentMetadata:
+    ext = Path(filename).suffix.lower().lstrip(".")
+    doc_type = ext if ext else "unknown"
+    return DocumentMetadata(
+        source=filename,
+        doc_type=doc_type,
+        title=extract_title(raw_text),
+        created_date=extract_dates(raw_text),
+        tags=extract_tags(raw_text),
+        page_count=page_count,
+    )

app/core/query_analyzer.py ADDED Viewed

	@@ -0,0 +1,127 @@

+import logging
+import re
+from datetime import datetime, timedelta
+from dateutil import parser as date_parser
+from app.models.schemas import AnalyzedQuery, SearchFilters
+logger = logging.getLogger(__name__)
+# Doc type patterns
+DOCTYPE_PATTERNS = {
+    "pdf": re.compile(r"\bpdfs?\b", re.IGNORECASE),
+    "html": re.compile(r"\bhtml\b", re.IGNORECASE),
+    "txt": re.compile(r"\btext\s+files?\b|\btxt\b", re.IGNORECASE),
+}
+# Relative date patterns
+RELATIVE_DATE_PATTERNS = [
+    (re.compile(r"\blast\s+week\b", re.IGNORECASE), lambda: (datetime.now() - timedelta(weeks=1), datetime.now())),
+    (re.compile(r"\blast\s+month\b", re.IGNORECASE), lambda: (datetime.now() - timedelta(days=30), datetime.now())),
+    (re.compile(r"\blast\s+year\b", re.IGNORECASE), lambda: (datetime.now() - timedelta(days=365), datetime.now())),
+    (re.compile(r"\bthis\s+week\b", re.IGNORECASE), lambda: (datetime.now() - timedelta(days=datetime.now().weekday()), datetime.now())),
+    (re.compile(r"\bthis\s+month\b", re.IGNORECASE), lambda: (datetime.now().replace(day=1), datetime.now())),
+    (re.compile(r"\bthis\s+year\b", re.IGNORECASE), lambda: (datetime.now().replace(month=1, day=1), datetime.now())),
+    (re.compile(r"\btoday\b", re.IGNORECASE), lambda: (datetime.now().replace(hour=0, minute=0, second=0), datetime.now())),
+    (re.compile(r"\byesterday\b", re.IGNORECASE), lambda: (datetime.now() - timedelta(days=1), datetime.now())),
+]
+# Absolute date patterns
+AFTER_DATE = re.compile(r"\bafter\s+(\S+)\b", re.IGNORECASE)
+BEFORE_DATE = re.compile(r"\bbefore\s+(\S+)\b", re.IGNORECASE)
+FROM_SOURCE = re.compile(r"\bfrom\s+(\S+\.\w{2,4})\b", re.IGNORECASE)
+# Intent patterns
+INTENT_PATTERNS = [
+    ("summarize", re.compile(r"\bsummar(?:ize|y)\b|\boverview\b", re.IGNORECASE)),
+    ("comparative", re.compile(r"\bcompar[ei]\b|\bdifference\b|\bvs\.?\b|\bversus\b", re.IGNORECASE)),
+    ("list", re.compile(r"\blist\b|\benumerate\b|\bwhat are all\b", re.IGNORECASE)),
+    ("explanatory", re.compile(r"^(?:why|how|explain)\b", re.IGNORECASE)),
+    ("factual", re.compile(r"^(?:what|who|when|where|how many|how much)\b", re.IGNORECASE)),
+]
+class QueryAnalyzer:
+    def analyze(self, query: str) -> AnalyzedQuery:
+        filters = SearchFilters()
+        clean = query
+        confidence = 0.5
+        phrases_to_remove = []
+        # Extract doc type
+        for doc_type, pattern in DOCTYPE_PATTERNS.items():
+            match = pattern.search(clean)
+            if match:
+                filters.doc_type = doc_type
+                phrases_to_remove.append(match.group())
+                confidence += 0.1
+        # Extract relative dates
+        for pattern, date_fn in RELATIVE_DATE_PATTERNS:
+            match = pattern.search(clean)
+            if match:
+                date_from, date_to = date_fn()
+                filters.date_from = date_from
+                filters.date_to = date_to
+                phrases_to_remove.append(match.group())
+                confidence += 0.1
+                break
+        # Extract absolute dates
+        if not filters.date_from:
+            match = AFTER_DATE.search(clean)
+            if match:
+                try:
+                    filters.date_from = date_parser.parse(match.group(1))
+                    phrases_to_remove.append(match.group())
+                    confidence += 0.1
+                except (ValueError, OverflowError):
+                    pass
+        if not filters.date_to:
+            match = BEFORE_DATE.search(clean)
+            if match:
+                try:
+                    filters.date_to = date_parser.parse(match.group(1))
+                    phrases_to_remove.append(match.group())
+                    confidence += 0.1
+                except (ValueError, OverflowError):
+                    pass
+        # Extract source
+        match = FROM_SOURCE.search(clean)
+        if match:
+            filters.source = match.group(1)
+            phrases_to_remove.append(match.group())
+            confidence += 0.1
+        # Clean query by removing extracted filter phrases
+        for phrase in phrases_to_remove:
+            clean = clean.replace(phrase, "")
+        clean = re.sub(r"\s+", " ", clean).strip()
+        # Remove dangling prepositions and leading ones
+        clean = re.sub(r"\b(?:about|from|in|on)\s*$", "", clean).strip()
+        clean = re.sub(r"^\b(?:about|from|in|on)\s+", "", clean).strip()
+        if not clean:
+            clean = query
+        # Classify intent
+        intent = "factual"
+        for intent_name, pattern in INTENT_PATTERNS:
+            if pattern.search(query):
+                intent = intent_name
+                break
+        confidence = min(confidence, 1.0)
+        analyzed = AnalyzedQuery(
+            original_query=query,
+            clean_query=clean,
+            intent=intent,
+            extracted_filters=filters,
+            confidence=confidence,
+        )
+        logger.info(f"Query analyzed: intent={intent}, filters={filters.model_dump(exclude_none=True)}")
+        return analyzed

app/core/reranker.py ADDED Viewed

	@@ -0,0 +1,52 @@

+import logging
+import time
+from app.models.schemas import RetrievedChunk
+logger = logging.getLogger(__name__)
+class RerankerService:
+    def __init__(self):
+        start = time.perf_counter()
+        from flashrank import Ranker
+        self.ranker = Ranker(model_name="ms-marco-MiniLM-L-12-v2", cache_dir="./flashrank_cache")
+        elapsed = (time.perf_counter() - start) * 1000
+        logger.info(f"Loaded FlashRank reranker in {elapsed:.0f}ms")
+    def rerank(
+        self, query: str, chunks: list[RetrievedChunk], top_k: int = 5
+    ) -> list[RetrievedChunk]:
+        if not chunks:
+            return []
+        from flashrank import RerankRequest
+        passages = [{"id": chunk.chunk_id, "text": chunk.text} for chunk in chunks]
+        request = RerankRequest(query=query, passages=passages)
+        results = self.ranker.rerank(request)
+        # Map reranked scores back to chunks
+        chunk_map = {chunk.chunk_id: chunk for chunk in chunks}
+        reranked = []
+        for i, result in enumerate(results[:top_k]):
+            chunk_id = result["id"]
+            if chunk_id in chunk_map:
+                chunk = chunk_map[chunk_id].model_copy()
+                chunk.score = float(result["score"])
+                chunk.rank = i
+                reranked.append(chunk)
+        logger.info(f"Reranked {len(chunks)} → top {len(reranked)} chunks")
+        return reranked
+_reranker: RerankerService | None = None
+def get_reranker() -> RerankerService:
+    global _reranker
+    if _reranker is None:
+        _reranker = RerankerService()
+    return _reranker

app/core/retriever.py ADDED Viewed

	@@ -0,0 +1,126 @@

+import logging
+import time
+from collections import defaultdict
+from app.core.bm25 import BM25Index
+from app.core.embedder import EmbedderService
+from app.core.vectorstore import VectorStoreService
+from app.models.document import DocumentMetadata
+from app.models.schemas import RetrievedChunk, SearchFilters
+logger = logging.getLogger(__name__)
+class HybridRetriever:
+    def __init__(
+        self,
+        vectorstore: VectorStoreService,
+        bm25: BM25Index,
+        embedder: EmbedderService,
+    ):
+        self.vectorstore = vectorstore
+        self.bm25 = bm25
+        self.embedder = embedder
+    def retrieve(
+        self,
+        query: str,
+        top_k: int = 10,
+        filters: SearchFilters | None = None,
+        dense_weight: float = 0.6,
+        sparse_weight: float = 0.4,
+    ) -> list[RetrievedChunk]:
+        start = time.perf_counter()
+        query_vector = self.embedder.embed_query(query)
+        # Dense search via Qdrant (over-fetch 2x)
+        dense_results = self.vectorstore.search(
+            query_vector=query_vector,
+            limit=top_k * 2,
+            filters=filters,
+        )
+        # Sparse search via BM25
+        sparse_results = self.bm25.search(query, top_k=top_k * 2)
+        # Post-filter BM25 results if filters are provided
+        if filters and filters.has_filters():
+            sparse_results = self._apply_filters(sparse_results, filters)
+        # RRF fusion
+        fused = self.rrf_fuse(
+            [dense_results, sparse_results],
+            weights=[dense_weight, sparse_weight],
+        )
+        # Deduplicate by chunk_id and take top_k
+        seen = set()
+        unique = []
+        for item in fused:
+            if item["chunk_id"] not in seen:
+                seen.add(item["chunk_id"])
+                unique.append(item)
+            if len(unique) >= top_k:
+                break
+        # Convert to RetrievedChunk models
+        results = [
+            RetrievedChunk(
+                chunk_id=item["chunk_id"],
+                document_id=item.get("document_id", ""),
+                text=item["text"],
+                score=item["fused_score"],
+                metadata=DocumentMetadata(**item.get("metadata", {})),
+                rank=i,
+            )
+            for i, item in enumerate(unique)
+        ]
+        elapsed = (time.perf_counter() - start) * 1000
+        logger.info(
+            f"Hybrid retrieval: {len(dense_results)} dense + {len(sparse_results)} sparse "
+            f"→ {len(results)} results in {elapsed:.0f}ms"
+        )
+        return results
+    @staticmethod
+    def rrf_fuse(
+        result_lists: list[list[dict]],
+        k: int = 60,
+        weights: list[float] | None = None,
+    ) -> list[dict]:
+        if weights is None:
+            weights = [1.0] * len(result_lists)
+        scores: dict[str, float] = defaultdict(float)
+        docs: dict[str, dict] = {}
+        for result_list, weight in zip(result_lists, weights):
+            for rank, item in enumerate(result_list):
+                chunk_id = item["chunk_id"]
+                scores[chunk_id] += weight * (1.0 / (k + rank))
+                if chunk_id not in docs:
+                    docs[chunk_id] = item
+        ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
+        return [
+            {**docs[chunk_id], "fused_score": score}
+            for chunk_id, score in ranked
+        ]
+    @staticmethod
+    def _apply_filters(results: list[dict], filters: SearchFilters) -> list[dict]:
+        filtered = []
+        for r in results:
+            meta = r.get("metadata", {})
+            if filters.source and meta.get("source") != filters.source:
+                continue
+            if filters.doc_type and meta.get("doc_type") != filters.doc_type:
+                continue
+            if filters.tags:
+                doc_tags = meta.get("tags", [])
+                if not any(t in doc_tags for t in filters.tags):
+                    continue
+            filtered.append(r)
+        return filtered

app/core/vectorstore.py ADDED Viewed

	@@ -0,0 +1,219 @@

+import logging
+from qdrant_client import QdrantClient
+from qdrant_client.http.models import (
+    Distance,
+    FieldCondition,
+    Filter,
+    MatchAny,
+    MatchValue,
+    PayloadSchemaType,
+    PointStruct,
+    Range,
+    VectorParams,
+)
+from app.config import get_settings
+from app.models.document import Chunk
+from app.models.schemas import SearchFilters
+logger = logging.getLogger(__name__)
+class VectorStoreService:
+    def __init__(self, url: str, api_key: str, collection_name: str):
+        self.client = QdrantClient(url=url, api_key=api_key)
+        self.collection_name = collection_name
+        logger.info(f"Connected to Qdrant at {url}")
+    def ensure_collection(self, vector_size: int = 384) -> None:
+        collections = [c.name for c in self.client.get_collections().collections]
+        if self.collection_name not in collections:
+            self.client.create_collection(
+                collection_name=self.collection_name,
+                vectors_config=VectorParams(size=vector_size, distance=Distance.COSINE),
+            )
+            logger.info(f"Created collection '{self.collection_name}' (dim={vector_size})")
+        else:
+            logger.info(f"Collection '{self.collection_name}' already exists")
+        # Ensure payload indexes exist for filterable fields
+        self._ensure_payload_indexes()
+    def _ensure_payload_indexes(self) -> None:
+        """Create payload indexes for fields used in filtering."""
+        index_fields = {
+            "document_id": PayloadSchemaType.KEYWORD,
+            "source": PayloadSchemaType.KEYWORD,
+            "doc_type": PayloadSchemaType.KEYWORD,
+            "tags": PayloadSchemaType.KEYWORD,
+            "created_date": PayloadSchemaType.KEYWORD,
+        }
+        try:
+            collection_info = self.client.get_collection(self.collection_name)
+            existing_indexes = set(collection_info.payload_schema.keys()) if collection_info.payload_schema else set()
+        except Exception:
+            existing_indexes = set()
+        for field_name, field_type in index_fields.items():
+            if field_name not in existing_indexes:
+                try:
+                    self.client.create_payload_index(
+                        collection_name=self.collection_name,
+                        field_name=field_name,
+                        field_schema=field_type,
+                    )
+                    logger.info(f"Created payload index: {field_name} ({field_type})")
+                except Exception as e:
+                    logger.warning(f"Could not create index for '{field_name}': {e}")
+    def upsert_chunks(self, chunks: list[Chunk], embeddings: list[list[float]]) -> None:
+        batch_size = 100
+        for i in range(0, len(chunks), batch_size):
+            batch_chunks = chunks[i : i + batch_size]
+            batch_embeddings = embeddings[i : i + batch_size]
+            points = [
+                PointStruct(
+                    id=chunk.chunk_id,
+                    vector=embedding,
+                    payload={
+                        "text": chunk.text,
+                        "document_id": chunk.document_id,
+                        "chunk_index": chunk.chunk_index,
+                        "source": chunk.metadata.source,
+                        "doc_type": chunk.metadata.doc_type,
+                        "title": chunk.metadata.title,
+                        "created_date": chunk.metadata.created_date.isoformat()
+                        if chunk.metadata.created_date
+                        else None,
+                        "tags": chunk.metadata.tags,
+                        "page_count": chunk.metadata.page_count,
+                    },
+                )
+                for chunk, embedding in zip(batch_chunks, batch_embeddings)
+            ]
+            self.client.upsert(collection_name=self.collection_name, points=points)
+        logger.info(f"Upserted {len(chunks)} chunks to '{self.collection_name}'")
+    def search(
+        self,
+        query_vector: list[float],
+        limit: int = 10,
+        filters: SearchFilters | None = None,
+    ) -> list[dict]:
+        qdrant_filter = self._build_filter(filters) if filters and filters.has_filters() else None
+        results = self.client.query_points(
+            collection_name=self.collection_name,
+            query=query_vector,
+            limit=limit,
+            query_filter=qdrant_filter,
+        ).points
+        return [
+            {
+                "chunk_id": str(r.id),
+                "text": r.payload.get("text", ""),
+                "score": r.score,
+                "document_id": r.payload.get("document_id", ""),
+                "metadata": {
+                    "source": r.payload.get("source", ""),
+                    "doc_type": r.payload.get("doc_type", ""),
+                    "title": r.payload.get("title"),
+                    "created_date": r.payload.get("created_date"),
+                    "tags": r.payload.get("tags", []),
+                    "page_count": r.payload.get("page_count"),
+                },
+            }
+            for r in results
+        ]
+    def delete_document(self, document_id: str) -> int:
+        self.client.delete(
+            collection_name=self.collection_name,
+            points_selector=Filter(
+                must=[FieldCondition(key="document_id", match=MatchValue(value=document_id))]
+            ),
+        )
+        logger.info(f"Deleted document '{document_id}' from '{self.collection_name}'")
+        return 0
+    def scroll_all(self, batch_size: int = 100) -> list[dict]:
+        all_points = []
+        offset = None
+        while True:
+            results, next_offset = self.client.scroll(
+                collection_name=self.collection_name,
+                limit=batch_size,
+                offset=offset,
+                with_payload=True,
+                with_vectors=False,
+            )
+            for r in results:
+                all_points.append({
+                    "chunk_id": str(r.id),
+                    "text": r.payload.get("text", ""),
+                    "document_id": r.payload.get("document_id", ""),
+                    "metadata": {
+                        "source": r.payload.get("source", ""),
+                        "doc_type": r.payload.get("doc_type", ""),
+                        "title": r.payload.get("title"),
+                        "tags": r.payload.get("tags", []),
+                    },
+                })
+            if next_offset is None:
+                break
+            offset = next_offset
+        return all_points
+    def get_document_ids(self) -> list[dict]:
+        all_points = self.scroll_all()
+        docs: dict[str, dict] = {}
+        for p in all_points:
+            doc_id = p["document_id"]
+            if doc_id not in docs:
+                docs[doc_id] = {
+                    "document_id": doc_id,
+                    "source": p["metadata"]["source"],
+                    "title": p["metadata"].get("title"),
+                    "doc_type": p["metadata"]["doc_type"],
+                    "num_chunks": 0,
+                }
+            docs[doc_id]["num_chunks"] += 1
+        return list(docs.values())
+    def count(self) -> int:
+        info = self.client.get_collection(self.collection_name)
+        return info.points_count
+    @staticmethod
+    def _build_filter(filters: SearchFilters) -> Filter | None:
+        conditions = []
+        if filters.source:
+            conditions.append(FieldCondition(key="source", match=MatchValue(value=filters.source)))
+        if filters.doc_type:
+            conditions.append(FieldCondition(key="doc_type", match=MatchValue(value=filters.doc_type)))
+        if filters.tags:
+            conditions.append(FieldCondition(key="tags", match=MatchAny(any=filters.tags)))
+        if filters.date_from or filters.date_to:
+            range_params = {}
+            if filters.date_from:
+                range_params["gte"] = filters.date_from.isoformat()
+            if filters.date_to:
+                range_params["lte"] = filters.date_to.isoformat()
+            conditions.append(FieldCondition(key="created_date", range=Range(**range_params)))
+        return Filter(must=conditions) if conditions else None
+_vectorstore: VectorStoreService | None = None
+def get_vectorstore() -> VectorStoreService:
+    global _vectorstore
+    if _vectorstore is None:
+        settings = get_settings()
+        _vectorstore = VectorStoreService(
+            url=settings.qdrant_url,
+            api_key=settings.qdrant_api_key,
+            collection_name=settings.qdrant_collection,
+        )
+        _vectorstore.ensure_collection(vector_size=settings.embedding_dim)
+    return _vectorstore

app/main.py ADDED Viewed

	@@ -0,0 +1,75 @@

+import logging
+from contextlib import asynccontextmanager
+import gradio as gr
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import RedirectResponse
+from app.api.routes import health, ingest, query
+from app.config import get_settings, setup_logging
+logger = logging.getLogger(__name__)
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    setup_logging()
+    logger.info("RagCore starting up...")
+    settings = get_settings()
+    # Initialize services that need warm-up
+    try:
+        from app.core.embedder import get_embedder
+        get_embedder()
+        logger.info("Embedder loaded")
+    except Exception as e:
+        logger.warning(f"Embedder initialization deferred: {e}")
+    try:
+        from app.core.vectorstore import get_vectorstore
+        from app.core.bm25 import get_bm25
+        vs = get_vectorstore()
+        bm25 = get_bm25()
+        bm25.rebuild_from_vectorstore(vs)
+        logger.info(f"BM25 index ready: {bm25.doc_count} documents")
+    except Exception as e:
+        logger.warning(f"Vectorstore/BM25 initialization deferred: {e}")
+    logger.info("RagCore ready!")
+    yield
+    logger.info("RagCore shutting down...")
+app = FastAPI(
+    title="RagCore",
+    description="RAG system with hybrid search and metadata filtering",
+    version="0.1.0",
+    lifespan=lifespan,
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+app.include_router(health.router)
+app.include_router(ingest.router)
+app.include_router(query.router)
+# Mount Gradio UI
+from app.ui.gradio_app import create_gradio_app
+gradio_app = create_gradio_app()
+app = gr.mount_gradio_app(app, gradio_app, path="/ui")
+@app.get("/", include_in_schema=False)
+async def root():
+    return RedirectResponse(url="/ui")

app/models/__init__.py ADDED Viewed

File without changes

app/models/document.py ADDED Viewed

	@@ -0,0 +1,32 @@

+from datetime import datetime
+from pydantic import BaseModel, Field
+from app.utils.helpers import generate_id
+class DocumentMetadata(BaseModel):
+    source: str = ""
+    doc_type: str = ""
+    title: str | None = None
+    created_date: datetime | None = None
+    tags: list[str] = Field(default_factory=list)
+    page_count: int | None = None
+class Chunk(BaseModel):
+    chunk_id: str = Field(default_factory=generate_id)
+    document_id: str = ""
+    text: str = ""
+    metadata: DocumentMetadata = Field(default_factory=DocumentMetadata)
+    chunk_index: int = 0
+    start_char: int = 0
+    end_char: int = 0
+class Document(BaseModel):
+    document_id: str = Field(default_factory=generate_id)
+    filename: str = ""
+    metadata: DocumentMetadata = Field(default_factory=DocumentMetadata)
+    chunks: list[Chunk] = Field(default_factory=list)
+    raw_text: str = ""

app/models/schemas.py ADDED Viewed

	@@ -0,0 +1,70 @@

+from datetime import datetime
+from pydantic import BaseModel, Field
+from app.models.document import DocumentMetadata
+class IngestResponse(BaseModel):
+    document_id: str
+    filename: str
+    num_chunks: int
+    message: str
+class SearchFilters(BaseModel):
+    source: str | None = None
+    doc_type: str | None = None
+    date_from: datetime | None = None
+    date_to: datetime | None = None
+    tags: list[str] | None = None
+    def has_filters(self) -> bool:
+        """Return True only if at least one filter field is set."""
+        return any([self.source, self.doc_type, self.date_from, self.date_to, self.tags])
+class RetrievedChunk(BaseModel):
+    chunk_id: str
+    document_id: str
+    text: str
+    score: float
+    metadata: DocumentMetadata
+    rank: int = 0
+class SearchRequest(BaseModel):
+    query: str
+    top_k: int = 10
+    filters: SearchFilters | None = None
+class SearchResponse(BaseModel):
+    query: str
+    results: list[RetrievedChunk]
+    total_results: int
+    search_time_ms: float
+class QueryRequest(BaseModel):
+    query: str
+    top_k: int = 10
+    rerank_top_k: int = 5
+    filters: SearchFilters | None = None
+    stream: bool = False
+class GeneratedAnswer(BaseModel):
+    query: str
+    answer: str
+    sources: list[RetrievedChunk] = Field(default_factory=list)
+    generation_time_ms: float = 0.0
+    model: str = ""
+class AnalyzedQuery(BaseModel):
+    original_query: str
+    clean_query: str
+    intent: str = "factual"
+    extracted_filters: SearchFilters = Field(default_factory=SearchFilters)
+    confidence: float = 0.5

app/ui/__init__.py ADDED Viewed

File without changes

app/ui/gradio_app.py ADDED Viewed

	@@ -0,0 +1,427 @@

+import json
+import logging
+import gradio as gr
+import httpx
+logger = logging.getLogger(__name__)
+API_BASE = "http://localhost:7860"
+CUSTOM_CSS = """
+.main-header {
+    text-align: center;
+    padding: 1.5rem 0 0.5rem 0;
+}
+.main-header h1 {
+    font-size: 2.4rem;
+    font-weight: 700;
+    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+    -webkit-background-clip: text;
+    -webkit-text-fill-color: transparent;
+    margin-bottom: 0.2rem;
+}
+.main-header p {
+    color: #6b7280;
+    font-size: 1rem;
+    margin: 0;
+}
+.stat-card {
+    background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
+    border-radius: 12px;
+    padding: 1rem;
+    text-align: center;
+}
+.answer-box {
+    border-left: 4px solid #667eea;
+    padding-left: 1rem;
+    margin-top: 0.5rem;
+}
+.source-card {
+    background: #f9fafb;
+    border: 1px solid #e5e7eb;
+    border-radius: 8px;
+    padding: 0.75rem;
+    margin: 0.5rem 0;
+}
+.upload-zone {
+    border: 2px dashed #667eea !important;
+    border-radius: 12px !important;
+    background: #f8f9ff !important;
+}
+.search-bar textarea {
+    font-size: 1.1rem !important;
+    border-radius: 12px !important;
+    border: 2px solid #e5e7eb !important;
+    padding: 12px 16px !important;
+}
+.search-bar textarea:focus {
+    border-color: #667eea !important;
+    box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.15) !important;
+}
+.primary-btn {
+    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%) !important;
+    border: none !important;
+    border-radius: 10px !important;
+    font-weight: 600 !important;
+    font-size: 1rem !important;
+    padding: 10px 24px !important;
+    transition: transform 0.15s, box-shadow 0.15s !important;
+}
+.primary-btn:hover {
+    transform: translateY(-1px) !important;
+    box-shadow: 0 4px 12px rgba(102, 126, 234, 0.4) !important;
+}
+.danger-btn {
+    background: linear-gradient(135deg, #ef4444 0%, #dc2626 100%) !important;
+    border: none !important;
+    border-radius: 10px !important;
+}
+.filter-row {
+    background: #f9fafb;
+    border-radius: 10px;
+    padding: 8px 12px;
+}
+.doc-table {
+    border-radius: 10px !important;
+    overflow: hidden !important;
+}
+footer { display: none !important; }
+.tab-nav button {
+    font-size: 1rem !important;
+    font-weight: 600 !important;
+    padding: 10px 20px !important;
+}
+.tab-nav button.selected {
+    border-bottom: 3px solid #667eea !important;
+    color: #667eea !important;
+}
+"""
+def upload_document(file):
+    if file is None:
+        return "Please select a file to upload."
+    try:
+        with open(file.name, "rb") as f:
+            files = {"file": (file.name.split("/")[-1].split("\\")[-1], f)}
+            response = httpx.post(f"{API_BASE}/api/ingest", files=files, timeout=120)
+        if response.status_code == 200:
+            data = response.json()
+            return (
+                f"### Document Uploaded\n\n"
+                f"| Detail | Value |\n"
+                f"|--------|-------|\n"
+                f"| **File** | {data['filename']} |\n"
+                f"| **Chunks** | {data['num_chunks']} |\n"
+                f"| **ID** | `{data['document_id'][:12]}...` |\n"
+            )
+        else:
+            detail = response.json().get("detail", response.text)
+            return f"**Upload failed:** {detail}"
+    except Exception as e:
+        return f"**Upload failed:** {e}"
+def list_documents():
+    try:
+        response = httpx.get(f"{API_BASE}/api/documents", timeout=30)
+        if response.status_code == 200:
+            data = response.json()
+            docs = data.get("documents", [])
+            if not docs:
+                return [["—", "—", "—", "—"]]
+            return [
+                [
+                    d.get("source", ""),
+                    d.get("doc_type", "").upper(),
+                    str(d.get("num_chunks", 0)),
+                    d.get("document_id", "")[:12] + "...",
+                ]
+                for d in docs
+            ]
+        return [["Error loading", "", "", ""]]
+    except Exception as e:
+        return [[f"Error: {e}", "", "", ""]]
+def get_doc_count():
+    try:
+        response = httpx.get(f"{API_BASE}/api/documents", timeout=10)
+        if response.status_code == 200:
+            data = response.json()
+            total = data.get("total", 0)
+            docs = data.get("documents", [])
+            total_chunks = sum(d.get("num_chunks", 0) for d in docs)
+            return f"**{total}** documents | **{total_chunks}** chunks indexed"
+        return "Unable to fetch stats"
+    except Exception:
+        return "Connecting..."
+def delete_document(doc_id):
+    if not doc_id or not doc_id.strip():
+        return "Enter a document ID to delete."
+    try:
+        response = httpx.delete(f"{API_BASE}/api/documents/{doc_id.strip()}", timeout=30)
+        if response.status_code == 200:
+            return f"Document `{doc_id.strip()[:12]}...` deleted successfully."
+        return f"**Error:** {response.text}"
+    except Exception as e:
+        return f"**Delete failed:** {e}"
+def ask_question(query, doc_type_filter, stream_mode):
+    if not query or not query.strip():
+        yield "Please enter a question."
+        return
+    payload = {
+        "query": query.strip(),
+        "top_k": 10,
+        "rerank_top_k": 5,
+        "stream": stream_mode,
+    }
+    if doc_type_filter and doc_type_filter != "All":
+        payload["filters"] = {"doc_type": doc_type_filter.lower()}
+    try:
+        if stream_mode:
+            with httpx.stream(
+                "POST",
+                f"{API_BASE}/api/ask",
+                json=payload,
+                timeout=120,
+            ) as response:
+                answer = ""
+                sources_text = ""
+                for line in response.iter_lines():
+                    if line.startswith("data: "):
+                        data = json.loads(line[6:])
+                        if "text" in data:
+                            answer += data["text"]
+                            yield (
+                                f"<div class='answer-box'>\n\n{answer}\n\n</div>\n\n"
+                                f"<sub>Generating...</sub>"
+                            )
+                        if data.get("done"):
+                            sources = data.get("sources", [])
+                            sources_text = _format_sources(sources)
+                            time_ms = data.get("time_ms", 0)
+                            model = data.get("model", "")
+                            footer = f"\n\n<sub>{model} | {time_ms:.0f}ms</sub>"
+                            yield (
+                                f"<div class='answer-box'>\n\n{answer}\n\n</div>"
+                                f"{sources_text}{footer}"
+                            )
+        else:
+            response = httpx.post(
+                f"{API_BASE}/api/ask",
+                json=payload,
+                timeout=120,
+            )
+            if response.status_code == 200:
+                data = response.json()
+                answer = data.get("answer", "No answer generated.")
+                sources = data.get("sources", [])
+                sources_text = _format_sources_full(sources)
+                time_ms = data.get("generation_time_ms", 0)
+                model = data.get("model", "")
+                footer = f"\n\n<sub>{model} | {time_ms:.0f}ms</sub>"
+                yield (
+                    f"<div class='answer-box'>\n\n{answer}\n\n</div>"
+                    f"{sources_text}{footer}"
+                )
+            else:
+                yield f"**Error:** {response.text}"
+    except Exception as e:
+        yield f"**Error:** {e}"
+def _format_sources(sources):
+    if not sources:
+        return ""
+    text = "\n\n---\n#### Sources\n\n"
+    for i, s in enumerate(sources, 1):
+        source_name = s.get("source", "unknown")
+        score = s.get("score", 0)
+        snippet = s.get("text", "")[:120].replace("\n", " ")
+        text += (
+            f"<div class='source-card'>\n\n"
+            f"**[{i}]** `{source_name}` — relevance: {score:.3f}\n\n"
+            f"> {snippet}...\n\n"
+            f"</div>\n\n"
+        )
+    return text
+def _format_sources_full(sources):
+    if not sources:
+        return ""
+    text = "\n\n---\n#### Sources\n\n"
+    for i, s in enumerate(sources, 1):
+        meta = s.get("metadata", {})
+        source_name = meta.get("source", "unknown")
+        score = s.get("score", 0)
+        snippet = s.get("text", "")[:120].replace("\n", " ")
+        text += (
+            f"<div class='source-card'>\n\n"
+            f"**[{i}]** `{source_name}` — relevance: {score:.3f}\n\n"
+            f"> {snippet}...\n\n"
+            f"</div>\n\n"
+        )
+    return text
+def create_gradio_app() -> gr.Blocks:
+    with gr.Blocks(title="RagCore — Smart Document Q&A") as demo:
+        # Inject CSS via style tag since Gradio 6.x doesn't accept css in Blocks()
+        gr.HTML(f"<style>{CUSTOM_CSS}</style>")
+        # Header
+        gr.HTML(
+            """
+            <div class="main-header">
+                <h1>RagCore</h1>
+                <p>Smart Document Q&A — Hybrid Search + Gemini Flash</p>
+            </div>
+            """
+        )
+        # Stats bar
+        stats_display = gr.Markdown(value="Connecting...", elem_classes=["stat-card"])
+        demo.load(fn=get_doc_count, outputs=stats_display)
+        with gr.Tab("Ask", elem_id="ask-tab"):
+            gr.Markdown("#### Ask your documents anything")
+            with gr.Group():
+                query_input = gr.Textbox(
+                    placeholder="e.g. What are the key findings? / Summarize the report / Compare approaches...",
+                    lines=2,
+                    show_label=False,
+                    elem_classes=["search-bar"],
+                    container=False,
+                )
+            with gr.Row(elem_classes=["filter-row"]):
+                doc_type_filter = gr.Dropdown(
+                    choices=["All", "PDF", "TXT", "HTML"],
+                    value="All",
+                    label="Document Type",
+                    scale=1,
+                    min_width=120,
+                )
+                stream_toggle = gr.Checkbox(
+                    label="Stream response",
+                    value=True,
+                    scale=1,
+                )
+                ask_btn = gr.Button(
+                    "Ask",
+                    variant="primary",
+                    scale=1,
+                    min_width=120,
+                    elem_classes=["primary-btn"],
+                )
+            answer_output = gr.Markdown(
+                value="*Upload a document and ask a question to get started.*",
+            )
+            ask_btn.click(
+                fn=ask_question,
+                inputs=[query_input, doc_type_filter, stream_toggle],
+                outputs=answer_output,
+            )
+            query_input.submit(
+                fn=ask_question,
+                inputs=[query_input, doc_type_filter, stream_toggle],
+                outputs=answer_output,
+            )
+            gr.Markdown("#### Try these examples")
+            gr.Examples(
+                examples=[
+                    ["What are the key points in the uploaded documents?"],
+                    ["Summarize all documents"],
+                    ["Compare the main topics across all documents"],
+                    ["List the most important findings"],
+                ],
+                inputs=query_input,
+            )
+        with gr.Tab("Documents", elem_id="docs-tab"):
+            gr.Markdown("#### Upload & Manage Documents")
+            with gr.Row():
+                with gr.Column(scale=3):
+                    file_upload = gr.File(
+                        label="Drop your file here",
+                        file_types=[".pdf", ".txt", ".html", ".htm"],
+                        elem_classes=["upload-zone"],
+                    )
+                with gr.Column(scale=1, min_width=160):
+                    upload_btn = gr.Button(
+                        "Upload & Index",
+                        variant="primary",
+                        elem_classes=["primary-btn"],
+                        size="lg",
+                    )
+                    gr.Markdown(
+                        "<sub>Supported: PDF, TXT, HTML</sub>"
+                    )
+            upload_status = gr.Markdown()
+            upload_btn.click(
+                fn=upload_document,
+                inputs=file_upload,
+                outputs=upload_status,
+            )
+            gr.Markdown("---")
+            gr.Markdown("#### Indexed Documents")
+            doc_table = gr.Dataframe(
+                headers=["Filename", "Type", "Chunks", "Document ID"],
+                label="",
+                interactive=False,
+                wrap=True,
+                elem_classes=["doc-table"],
+            )
+            refresh_btn = gr.Button("Refresh", size="sm")
+            refresh_btn.click(fn=list_documents, outputs=doc_table)
+            gr.Markdown("---")
+            gr.Markdown("#### Delete a Document")
+            with gr.Row():
+                delete_id_input = gr.Textbox(
+                    placeholder="Paste full document ID here...",
+                    show_label=False,
+                    scale=3,
+                )
+                delete_btn = gr.Button(
+                    "Delete",
+                    variant="stop",
+                    scale=1,
+                    elem_classes=["danger-btn"],
+                )
+            delete_status = gr.Markdown()
+            delete_btn.click(
+                fn=delete_document,
+                inputs=delete_id_input,
+                outputs=delete_status,
+            )
+        # Footer
+        gr.HTML(
+            """
+            <div style="text-align:center; padding: 1rem 0 0.5rem 0; color: #9ca3af; font-size: 0.8rem;">
+                RagCore v0.1.0
+            </div>
+            """
+        )
+    return demo

app/utils/__init__.py ADDED Viewed

File without changes

app/utils/helpers.py ADDED Viewed

	@@ -0,0 +1,51 @@

+import re
+import time
+import uuid
+import logging
+from contextlib import contextmanager
+from functools import wraps
+logger = logging.getLogger(__name__)
+def generate_id() -> str:
+    return str(uuid.uuid4())
+def count_words(text: str) -> int:
+    return len(text.split())
+def clean_text(text: str) -> str:
+    text = re.sub(r"\n{3,}", "\n\n", text)
+    text = re.sub(r"[ \t]+", " ", text)
+    lines = [line.strip() for line in text.splitlines()]
+    return "\n".join(lines).strip()
+@contextmanager
+def timer(label: str = "operation"):
+    start = time.perf_counter()
+    yield lambda: (time.perf_counter() - start) * 1000
+    elapsed = (time.perf_counter() - start) * 1000
+    logger.info(f"{label} completed in {elapsed:.1f}ms")
+def retry_with_backoff(retries: int = 3, base_delay: float = 1.0):
+    def decorator(func):
+        @wraps(func)
+        def wrapper(*args, **kwargs):
+            for attempt in range(retries):
+                try:
+                    return func(*args, **kwargs)
+                except Exception as e:
+                    if attempt == retries - 1:
+                        raise
+                    delay = base_delay * (2 ** attempt)
+                    logger.warning(
+                        f"{func.__name__} failed (attempt {attempt + 1}/{retries}): {e}. "
+                        f"Retrying in {delay}s..."
+                    )
+                    time.sleep(delay)
+        return wrapper
+    return decorator

app/utils/parsers.py ADDED Viewed

	@@ -0,0 +1,76 @@

+import logging
+from pathlib import Path
+from app.utils.helpers import clean_text
+logger = logging.getLogger(__name__)
+SUPPORTED_EXTENSIONS = {".pdf", ".txt", ".html", ".htm"}
+def parse_pdf(file_bytes: bytes, filename: str) -> str:
+    try:
+        from pypdf import PdfReader
+        from io import BytesIO
+        reader = PdfReader(BytesIO(file_bytes))
+        pages = []
+        for page in reader.pages:
+            text = page.extract_text()
+            if text:
+                pages.append(text)
+        raw = "\n\n".join(pages)
+        logger.info(f"Parsed PDF '{filename}': {len(reader.pages)} pages, {len(raw)} chars")
+        return clean_text(raw)
+    except Exception as e:
+        logger.error(f"Failed to parse PDF '{filename}': {e}")
+        return ""
+def parse_text(file_bytes: bytes, filename: str) -> str:
+    try:
+        text = file_bytes.decode("utf-8")
+    except UnicodeDecodeError:
+        text = file_bytes.decode("latin-1")
+    logger.info(f"Parsed text '{filename}': {len(text)} chars")
+    return clean_text(text)
+def parse_html(file_bytes: bytes, filename: str) -> str:
+    try:
+        from bs4 import BeautifulSoup
+        soup = BeautifulSoup(file_bytes, "html.parser")
+        for tag in soup(["script", "style", "nav", "footer", "header"]):
+            tag.decompose()
+        text = soup.get_text(separator="\n")
+        logger.info(f"Parsed HTML '{filename}': {len(text)} chars")
+        return clean_text(text)
+    except Exception as e:
+        logger.error(f"Failed to parse HTML '{filename}': {e}")
+        return ""
+def parse_document(file_bytes: bytes, filename: str) -> str:
+    ext = Path(filename).suffix.lower()
+    if ext == ".pdf":
+        return parse_pdf(file_bytes, filename)
+    elif ext in (".html", ".htm"):
+        return parse_html(file_bytes, filename)
+    elif ext == ".txt":
+        return parse_text(file_bytes, filename)
+    else:
+        logger.warning(f"Unsupported file type '{ext}' for '{filename}'")
+        return ""
+def get_page_count(file_bytes: bytes, filename: str) -> int | None:
+    ext = Path(filename).suffix.lower()
+    if ext == ".pdf":
+        try:
+            from pypdf import PdfReader
+            from io import BytesIO
+            return len(PdfReader(BytesIO(file_bytes)).pages)
+        except Exception:
+            return None
+    return None

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,10 @@

+version: "3.8"
+services:
+  ragcore:
+    build: .
+    ports:
+      - "8000:7860"
+    env_file:
+      - .env
+    environment:
+      - PYTHONUNBUFFERED=1

requirements.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+fastapi>=0.110,<1.0
+uvicorn[standard]>=0.29
+python-dotenv>=1.0
+pydantic>=2.6
+pydantic-settings>=2.2
+sentence-transformers>=2.6
+qdrant-client>=1.8
+rank-bm25>=0.2.2
+FlashRank>=0.2
+google-generativeai>=0.5
+gradio>=4.20
+pypdf>=4.1
+beautifulsoup4>=4.12
+httpx>=0.27
+python-multipart>=0.0.9
+python-dateutil>=2.9
+ruff>=0.3
+pytest>=8.0

tests/__init__.py ADDED Viewed

File without changes

tests/conftest.py ADDED Viewed

	@@ -0,0 +1,21 @@

+import pytest
+from fastapi.testclient import TestClient
+from app.main import app
+@pytest.fixture
+def client():
+    return TestClient(app)
+@pytest.fixture
+def sample_text():
+    return (
+        "Retrieval-Augmented Generation (RAG) is a technique that combines "
+        "information retrieval with text generation. It was introduced by "
+        "Facebook AI Research in 2020. RAG systems first retrieve relevant "
+        "documents from a knowledge base, then use a language model to generate "
+        "answers based on those documents. This approach reduces hallucinations "
+        "and provides more factual responses compared to pure generation."
+    )

tests/test_api.py ADDED Viewed

	@@ -0,0 +1,23 @@

+from fastapi.testclient import TestClient
+from app.main import app
+client = TestClient(app)
+def test_health():
+    response = client.get("/health")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["status"] == "ok"
+    assert "components" in data
+def test_root_redirects():
+    response = client.get("/", follow_redirects=False)
+    assert response.status_code in (301, 302, 307, 308)
+def test_docs_page():
+    response = client.get("/docs")
+    assert response.status_code == 200

tests/test_chunker.py ADDED Viewed

	@@ -0,0 +1,42 @@

+from app.core.chunker import chunk_text
+def test_empty_text():
+    assert chunk_text("") == []
+    assert chunk_text("   ") == []
+def test_single_sentence():
+    chunks = chunk_text("This is a single sentence.", chunk_size=100)
+    assert len(chunks) == 1
+    assert chunks[0]["text"] == "This is a single sentence."
+    assert chunks[0]["chunk_index"] == 0
+def test_multiple_chunks():
+    text = "First sentence here. Second sentence here. Third sentence here. Fourth sentence here. Fifth sentence here."
+    chunks = chunk_text(text, chunk_size=5, chunk_overlap=2)
+    assert len(chunks) > 1
+    for i, chunk in enumerate(chunks):
+        assert chunk["chunk_index"] == i
+        assert chunk["text"]
+        assert chunk["start_char"] >= 0
+        assert chunk["end_char"] > chunk["start_char"]
+def test_overlap_present():
+    text = "Alpha bravo charlie delta. Echo foxtrot golf hotel. India juliet kilo lima."
+    chunks = chunk_text(text, chunk_size=4, chunk_overlap=2)
+    if len(chunks) > 1:
+        first_words = chunks[0]["text"].split()
+        second_words = chunks[1]["text"].split()
+        overlap = set(first_words[-2:]) & set(second_words[:2])
+        assert len(overlap) > 0
+def test_chunk_size_respected():
+    text = " ".join(["word"] * 100) + "."
+    chunks = chunk_text(text, chunk_size=20, chunk_overlap=5)
+    for chunk in chunks[:-1]:  # Last chunk can be smaller
+        word_count = len(chunk["text"].split())
+        assert word_count <= 25  # Allow some slack for sentence boundaries

tests/test_parsers.py ADDED Viewed

	@@ -0,0 +1,35 @@

+from app.utils.parsers import parse_document, parse_text, parse_html
+def test_parse_text_utf8():
+    content = "Hello, world! This is a test."
+    result = parse_text(content.encode("utf-8"), "test.txt")
+    assert "Hello, world" in result
+def test_parse_text_latin1():
+    content = "Héllo wörld"
+    result = parse_text(content.encode("latin-1"), "test.txt")
+    assert "rld" in result
+def test_parse_html():
+    html = b"<html><body><p>Hello world</p><script>var x=1;</script></body></html>"
+    result = parse_html(html, "test.html")
+    assert "Hello world" in result
+    assert "var x" not in result
+def test_parse_document_unsupported():
+    result = parse_document(b"data", "test.xyz")
+    assert result == ""
+def test_parse_empty_text():
+    result = parse_text(b"", "empty.txt")
+    assert result == ""
+def test_parse_document_dispatches_by_extension():
+    result = parse_document(b"Hello text file", "readme.txt")
+    assert "Hello text file" in result

tests/test_query_analyzer.py ADDED Viewed

	@@ -0,0 +1,52 @@

+from app.core.query_analyzer import QueryAnalyzer
+def test_intent_factual():
+    qa = QueryAnalyzer()
+    result = qa.analyze("what is RAG?")
+    assert result.intent == "factual"
+def test_intent_comparative():
+    qa = QueryAnalyzer()
+    result = qa.analyze("compare BM25 and dense search")
+    assert result.intent == "comparative"
+def test_intent_summarize():
+    qa = QueryAnalyzer()
+    result = qa.analyze("summarize the report")
+    assert result.intent == "summarize"
+def test_intent_explanatory():
+    qa = QueryAnalyzer()
+    result = qa.analyze("why is RAG useful?")
+    assert result.intent == "explanatory"
+def test_doctype_extraction():
+    qa = QueryAnalyzer()
+    result = qa.analyze("search PDFs about machine learning")
+    assert result.extracted_filters.doc_type == "pdf"
+def test_no_filters():
+    qa = QueryAnalyzer()
+    result = qa.analyze("what is machine learning?")
+    assert result.extracted_filters.doc_type is None
+    assert result.extracted_filters.source is None
+    assert result.clean_query == result.original_query
+def test_date_extraction_last_month():
+    qa = QueryAnalyzer()
+    result = qa.analyze("documents from last month")
+    assert result.extracted_filters.date_from is not None
+    assert result.extracted_filters.date_to is not None
+def test_clean_query_preserves_meaning():
+    qa = QueryAnalyzer()
+    result = qa.analyze("what is machine learning?")
+    assert "machine learning" in result.clean_query

tests/test_retrieval.py ADDED Viewed

	@@ -0,0 +1,56 @@

+from app.core.retriever import HybridRetriever
+def test_rrf_fusion_basic():
+    dense = [
+        {"chunk_id": "a", "text": "doc a", "score": 0.9, "metadata": {}},
+        {"chunk_id": "b", "text": "doc b", "score": 0.8, "metadata": {}},
+    ]
+    sparse = [
+        {"chunk_id": "b", "text": "doc b", "score": 5.0, "metadata": {}},
+        {"chunk_id": "c", "text": "doc c", "score": 4.0, "metadata": {}},
+    ]
+    fused = HybridRetriever.rrf_fuse([dense, sparse])
+    ids = [item["chunk_id"] for item in fused]
+    # "b" appears in both lists so should rank highest
+    assert ids[0] == "b"
+    assert len(fused) == 3
+def test_rrf_fusion_empty():
+    fused = HybridRetriever.rrf_fuse([[], []])
+    assert fused == []
+def test_rrf_fusion_single_list():
+    results = [
+        {"chunk_id": "x", "text": "x", "score": 1.0, "metadata": {}},
+    ]
+    fused = HybridRetriever.rrf_fuse([results])
+    assert len(fused) == 1
+    assert fused[0]["chunk_id"] == "x"
+def test_rrf_fusion_with_weights():
+    dense = [
+        {"chunk_id": "a", "text": "a", "score": 0.9, "metadata": {}},
+    ]
+    sparse = [
+        {"chunk_id": "b", "text": "b", "score": 5.0, "metadata": {}},
+    ]
+    fused = HybridRetriever.rrf_fuse([dense, sparse], weights=[1.0, 0.0])
+    # With weight 0 on sparse, only dense matters
+    assert fused[0]["chunk_id"] == "a"
+def test_apply_filters():
+    results = [
+        {"chunk_id": "1", "text": "t", "score": 1, "metadata": {"doc_type": "pdf", "source": "a.pdf", "tags": []}},
+        {"chunk_id": "2", "text": "t", "score": 1, "metadata": {"doc_type": "html", "source": "b.html", "tags": []}},
+    ]
+    from app.models.schemas import SearchFilters
+    filters = SearchFilters(doc_type="pdf")
+    filtered = HybridRetriever._apply_filters(results, filters)
+    assert len(filtered) == 1
+    assert filtered[0]["chunk_id"] == "1"