Spaces:

BrejBala
/

rag-agent-workbench-api

Sleeping

App Files Files Community

BrejBala commited on Jan 17

Commit

e63c592

1 Parent(s): c02c9d3

Deploy backend Docker app

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.dockerignore +12 -0
Dockerfile +23 -0
LICENSE +21 -0
backend/.env.example +38 -0
backend/README.md +441 -0
backend/app/__pycache__/main.cpython-313.pyc +0 -0
backend/app/core/__pycache__/cache.cpython-313.pyc +0 -0
backend/app/core/__pycache__/config.cpython-313.pyc +0 -0
backend/app/core/__pycache__/errors.cpython-313.pyc +0 -0
backend/app/core/__pycache__/logging.cpython-313.pyc +0 -0
backend/app/core/__pycache__/metrics.cpython-313.pyc +0 -0
backend/app/core/__pycache__/rate_limit.cpython-313.pyc +0 -0
backend/app/core/__pycache__/runtime.cpython-313.pyc +0 -0
backend/app/core/__pycache__/security.cpython-313.pyc +0 -0
backend/app/core/__pycache__/tracing.cpython-313.pyc +0 -0
backend/app/core/cache.py +162 -0
backend/app/core/config.py +118 -0
backend/app/core/errors.py +83 -0
backend/app/core/logging.py +19 -0
backend/app/core/metrics.py +129 -0
backend/app/core/rate_limit.py +58 -0
backend/app/core/runtime.py +31 -0
backend/app/core/security.py +116 -0
backend/app/core/tracing.py +60 -0
backend/app/main.py +61 -0
backend/app/routers/__pycache__/chat.cpython-313.pyc +0 -0
backend/app/routers/__pycache__/documents.cpython-313.pyc +0 -0
backend/app/routers/__pycache__/health.cpython-313.pyc +0 -0
backend/app/routers/__pycache__/ingest.cpython-313.pyc +0 -0
backend/app/routers/__pycache__/metrics.cpython-313.pyc +0 -0
backend/app/routers/__pycache__/search.cpython-313.pyc +0 -0
backend/app/routers/chat.py +280 -0
backend/app/routers/documents.py +100 -0
backend/app/routers/health.py +19 -0
backend/app/routers/ingest.py +194 -0
backend/app/routers/metrics.py +18 -0
backend/app/routers/search.py +90 -0
backend/app/schemas/__pycache__/chat.cpython-313.pyc +0 -0
backend/app/schemas/__pycache__/documents.cpython-313.pyc +0 -0
backend/app/schemas/__pycache__/ingest.cpython-313.pyc +0 -0
backend/app/schemas/__pycache__/search.cpython-313.pyc +0 -0
backend/app/schemas/chat.py +128 -0
backend/app/schemas/documents.py +34 -0
backend/app/schemas/ingest.py +56 -0
backend/app/schemas/search.py +33 -0
backend/app/services/__pycache__/chunking.cpython-313.pyc +0 -0
backend/app/services/__pycache__/dedupe.cpython-313.pyc +0 -0
backend/app/services/__pycache__/normalize.cpython-313.pyc +0 -0
backend/app/services/__pycache__/pinecone_store.cpython-313.pyc +0 -0
backend/app/services/chat/__pycache__/graph.cpython-313.pyc +0 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,12 @@

+**/__pycache__/
+**/*.pyc
+**/.pytest_cache/
+**/.mypy_cache/
+**/.ruff_cache/
+**/.venv/
+**/venv/
+**/.env
+**/.env.*
+.git/
+.gitignore
+.DS_Store

Dockerfile ADDED Viewed

	@@ -0,0 +1,23 @@

+FROM python:3.11-slim
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+WORKDIR /app
+# Install dependencies
+COPY backend/requirements.txt /app/requirements.txt
+RUN pip install --no-cache-dir -r /app/requirements.txt
+# Copy application code
+COPY backend /app
+# Hugging Face Spaces typically exposes port 7860 and sets PORT dynamically.
+EXPOSE 7860
+ENV PINECONE_NAMESPACE=dev
+ENV LOG_LEVEL=INFO
+# Use the PORT environment variable if provided (e.g. on Hugging Face Spaces),
+# otherwise default to 7860. Shell form allows env substitution.
+CMD uvicorn app.main:app --host 0.0.0.0 --port ${PORT:-7860}

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2026 Brejesh Balakrishnan
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

backend/.env.example ADDED Viewed

	@@ -0,0 +1,38 @@

+PINECONE_API_KEY=your-pinecone-api-key
+PINECONE_INDEX_NAME=your-index-name
+PINECONE_HOST=https://your-index-host.pinecone.io
+PINECONE_NAMESPACE=dev
+PINECONE_TEXT_FIELD=chunk_text
+# Groq (LLM for /chat)
+GROQ_API_KEY=your-groq-api-key
+GROQ_BASE_URL=https://api.groq.com/openai/v1
+GROQ_MODEL=llama-3.1-8b-instant
+# Tavily (optional web search fallback for /chat)
+TAVILY_API_KEY=your-tavily-api-key
+# Optional: LangSmith / LangChain tracing
+LANGCHAIN_TRACING_V2=false
+LANGCHAIN_API_KEY=your-langsmith-api-key
+LANGCHAIN_PROJECT=rag-agent-workbench
+# Optional: basic API protection
+# When set, /ingest/*, /documents/*, /search, and /chat* require header X-API-Key
+API_KEY=your-backend-api-key
+# Optional: CORS
+# Comma-separated list of allowed origins, e.g.
+# ALLOWED_ORIGINS=http://localhost:8501,https://your-streamlit-app.streamlit.app
+# When unset, defaults to "*".
+ALLOWED_ORIGINS=
+# Optional: rate limiting and caching toggles
+# Set to "false" to disable.
+RATE_LIMIT_ENABLED=true
+CACHE_ENABLED=true
+# Optional: override the default Wikimedia User-Agent string used for Wikipedia requests.
+# If not set, a descriptive default is used to comply with Wikimedia API policy.
+WIKIMEDIA_USER_AGENT="rag-agent-workbench/0.1 (+https://github.com/..; contact: ..)"
+LOG_LEVEL=INFO

backend/README.md ADDED Viewed

	@@ -0,0 +1,441 @@

+# RAG Agent Workbench – Backend
+Lightweight FastAPI backend for ingesting documents into Pinecone (with integrated embeddings), searching over them, and serving a production-style RAG chat endpoint.
+## Prerequisites
+- Python 3.11+
+- A Pinecone account and an index configured with **integrated embeddings**
+- A Groq account and API key for chat
+- (Optional) Tavily API key for web search fallback
+- (Optional) LangSmith account + API key for tracing
+- Environment variables set (see `.env.example`)
+## Setup
+```bash
+cd backend
+python -m venv .venv
+source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+pip install -r requirements.txt
+cp .env.example .env  # then edit with your Pinecone, Groq, and optional Tavily/LangSmith credentials
+```
+Required `.env` values:
+- `PINECONE_API_KEY` – your Pinecone API key
+- `PINECONE_INDEX_NAME` – the index name (used for configuration checks)
+- `PINECONE_HOST` – the index host URL (use host targeting for production)
+- `PINECONE_NAMESPACE` – default namespace (e.g. `dev`)
+- `PINECONE_TEXT_FIELD` – text field name used by the integrated embedding index (e.g. `chunk_text` or `content`)
+- `LOG_LEVEL` – e.g. `INFO`, `DEBUG`
+Required for `/chat`:
+- `GROQ_API_KEY` – your Groq API key
+- `GROQ_BASE_URL` – Groq OpenAI-compatible endpoint (default `https://api.groq.com/openai/v1`)
+- `GROQ_MODEL` – Groq chat model name (default `llama-3.1-8b-instant`)
+Optional for web search fallback:
+- `TAVILY_API_KEY` – Tavily API key (enables web search in `/chat` when retrieval is weak)
+Optional for LangSmith tracing:
+- `LANGCHAIN_TRACING_V2` – set to `true` to enable tracing
+- `LANGCHAIN_API_KEY` – your LangSmith API key
+- `LANGCHAIN_PROJECT` – project name for traces (e.g. `rag-agent-workbench`)
+Optional for basic API protection:
+- `API_KEY` – when set, `/ingest/*`, `/documents/*`, `/search`, and `/chat*` require `X-API-Key` header.
+Optional for CORS:
+- `ALLOWED_ORIGINS` – comma-separated list of allowed origins.
+  - If unset, defaults to `"*"` (useful for local dev and quick demos).
+Optional for rate limiting and caching:
+- `RATE_LIMIT_ENABLED` – defaults to `true`. Set to `false` to disable SlowAPI limits.
+- `CACHE_ENABLED` – defaults to `true`. Set to `false` to disable in-memory TTL caches.
+Your Pinecone index **must** be configured for integrated embeddings (e.g. via `create_index_for_model` or `configure_index(embed=...)`), with a field mapping that includes the configured `PINECONE_TEXT_FIELD`.
+## Run locally
+```bash
+cd backend
+uvicorn app.main:app --reload --port 8000
+```
+The API will be available at `http://localhost:8000`.
+## Sample endpoints
+### Health
+```bash
+curl http://localhost:8000/health
+```
+### Ingest from arXiv
+```bash
+curl -X POST "http://localhost:8000/ingest/arxiv" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "retrieval augmented generation",
+    "max_docs": 5,
+    "namespace": "dev",
+    "category": "papers"
+  }'
+```
+### Ingest from OpenAlex
+```bash
+curl -X POST "http://localhost:8000/ingest/openalex" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "retrieval augmented generation",
+    "max_docs": 5,
+    "namespace": "dev",
+    "mailto": "you@example.com"
+  }'
+```
+### Ingest from Wikipedia
+```bash
+curl -X POST "http://localhost:8000/ingest/wiki" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "titles": ["Retrieval-augmented generation", "Vector database"],
+    "namespace": "dev"
+  }'
+```
+### Manual text upload
+```bash
+curl -X POST "http://localhost:8000/documents/upload-text" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "title": "My manual note",
+    "source": "manual",
+    "text": "This is some example text describing RAG pipelines...",
+    "namespace": "dev",
+    "metadata": {
+      "url": "https://example.com/my-note"
+    }
+  }'
+```
+### Search
+```bash
+curl -X POST "http://localhost:8000/search" \
+  -H "Content-Type: application/json" \
+  -H "X-API-Key: $API_KEY" \  # only if API_KEY is enabled
+  -d '{
+    "query": "what is RAG",
+    "top_k": 5,
+    "namespace": "dev",
+    "filters": {"source": "arxiv"}
+  }'
+```
+### Document stats
+```bash
+curl "http://localhost:8000/documents/stats?namespace=dev"
+```
+### Chat (non-streaming)
+```bash
+curl -X POST "http://localhost:8000/chat" \
+  -H "Content-Type: application/json" \
+  -H "X-API-Key: $API_KEY" \  # only if API_KEY is enabled
+  -d '{
+    "query": "What is retrieval-augmented generation?",
+    "namespace": "dev",
+    "top_k": 5,
+    "use_web_fallback": true,
+    "min_score": 0.25,
+    "max_web_results": 5,
+    "chat_history": [
+      {"role": "user", "content": "You are helping me understand RAG."}
+    ]
+  }'
+```
+Example JSON response:
+```json
+{
+  "answer": "...",
+  "sources": [
+    {
+      "source": "wiki",
+      "title": "Retrieval-augmented generation",
+      "url": "https://en.wikipedia.org/wiki/...",
+      "score": 0.91,
+      "chunk_text": "..."
+    }
+  ],
+  "timings": {
+    "retrieve_ms": 35.2,
+    "web_ms": 0.0,
+    "generate_ms": 420.7,
+    "total_ms": 470.1
+  },
+  "trace": {
+    "langsmith_project": "rag-agent-workbench",
+    "trace_enabled": true
+  }
+}
+```
+### Chat (SSE streaming)
+```bash
+curl -N -X POST "http://localhost:8000/chat/stream" \
+  -H "Content-Type: application/json" \
+  -H "X-API-Key: $API_KEY" \  # only if API_KEY is enabled
+  -d '{
+    "query": "Summarise retrieval-augmented generation.",
+    "namespace": "dev",
+    "top_k": 5,
+    "use_web_fallback": true
+  }'
+```
+- The response will be `text/event-stream`.
+- Individual SSE events stream tokens (space-delimited).
+- The final event (`event: end`) includes the full JSON payload as in `/chat`.
+### Metrics
+```bash
+curl "http://localhost:8000/metrics"
+```
+Returns JSON with:
+- `requests_by_path` and `errors_by_path`
+- `timings` (average and p50/p95 for `retrieve_ms`, `web_ms`, `generate_ms`, `total_ms`)
+- `cache` stats
+- Last 20 timing samples for chat.
+## Seeding data
+A helper script is provided to seed the index with multiple arXiv and OpenAlex queries:
+```bash
+python ../scripts/seed_ingest.py --base-url http://localhost:8000 --namespace dev --mailto you@example.com
+```
+## Docling integration (external script)
+Docling is used via a separate script so the backend container stays small. To convert a local PDF and upload it as text:
+```bash
+cd scripts
+pip install docling
+python docling_convert_and_upload.py \
+  --pdf-path /path/to/file.pdf \
+  --backend-url http://localhost:8000 \
+  --namespace dev \
+  --title "My PDF via Docling" \
+  --source docling
+```
+## Deploy Backend on Hugging Face Spaces (Docker)
+1. **Create a new Space**
+   - Go to Hugging Face → *New Space*.
+   - Choose:
+     - **SDK**: Docker
+     - **Space name**: e.g. `your-name/rag-agent-workbench-backend`.
+   - Point the Space to this repository and configure it to use the `backend/` subdirectory (or copy `backend/Dockerfile` to the root if you prefer).
+2. **Environment variables / secrets**
+   In the Space settings, configure the following (as “Secrets” where appropriate):
+   Required:
+   - `PINECONE_API_KEY`
+   - `PINECONE_HOST`
+   - `PINECONE_INDEX_NAME`
+   - `PINECONE_NAMESPACE`
+   - `PINECONE_TEXT_FIELD=content` (or your actual text field)
+   - `GROQ_API_KEY`
+   - `GROQ_BASE_URL` (optional, defaults to `https://api.groq.com/openai/v1`)
+   - `GROQ_MODEL` (optional, defaults to `llama-3.1-8b-instant`)
+   Optional:
+   - `TAVILY_API_KEY` (web search fallback for `/chat`)
+   - `LANGCHAIN_TRACING_V2`
+   - `LANGCHAIN_API_KEY`
+   - `LANGCHAIN_PROJECT`
+   - `API_KEY` (to protect `/ingest/*`, `/documents/*`, `/search`, `/chat*`)
+   - `ALLOWED_ORIGINS` (e.g. your Streamlit frontend origin)
+   - `RATE_LIMIT_ENABLED` and `CACHE_ENABLED` (rarely need to change from defaults)
+3. **Ports and startup**
+   - The Docker image exposes port **7860** by default.
+   - Hugging Face Spaces sets the `PORT` environment variable; the `CMD` honours it:
+     - `uvicorn app.main:app --host 0.0.0.0 --port ${PORT:-7860}`
+   - On successful startup, logs include:
+     - `Starting on port=<port> hf_spaces_mode=<bool>`
+4. **Verify**
+   - Open your Space URL:
+     - `https://<your-space>.hf.space/docs` – interactive API docs.
+     - `https://<your-space>.hf.space/health` – health check.
+   - If `API_KEY` is set, test protected endpoints using `X-API-Key`.
+## Deploy Frontend on Streamlit Community Cloud
+1. **Prepare the repo**
+   - The minimal Streamlit frontend lives under `frontend/app.py`.
+   - Root `requirements.txt` includes:
+     - `streamlit`
+     - `httpx`
+2. **Create Streamlit app**
+   - Go to Streamlit Community Cloud and create a new app.
+   - Point it at this repository.
+   - Set the main file to `frontend/app.py`.
+3. **Configure Streamlit secrets**
+   - In the Streamlit app settings, configure *Secrets* (YAML):
+     ```yaml
+     BACKEND_BASE_URL: "https://<your-backend-space>.hf.space"
+     API_KEY: "your-backend-api-key"  # only if backend API_KEY is set
+     ```
+   - **Do not** commit secrets into the repo.
+4. **Verify connectivity**
+   - Open the Streamlit app.
+   - In the sidebar “Connectivity” panel:
+     - Confirm the backend URL is correct.
+     - Click “Ping /health” to verify backend connectivity.
+   - Use the chat panel to send a question:
+     - The app will call `/chat` on the backend and display answer, timings, and sources.
+## Local Test Checklist – Work Package C
+1. **Configure environment**
+   - Set `PINECONE_*` variables for an integrated embeddings index.
+   - Set `GROQ_API_KEY` (and optionally override `GROQ_BASE_URL`, `GROQ_MODEL`).
+   - Optionally set `TAVILY_API_KEY` for web fallback.
+   - Optionally enable LangSmith:
+     - `LANGCHAIN_TRACING_V2=true`
+     - `LANGCHAIN_API_KEY=...`
+     - `LANGCHAIN_PROJECT=rag-agent-workbench`
+   - Optionally set:
+     - `API_KEY` for basic protection.
+     - `ALLOWED_ORIGINS` if you are calling from a browser origin.
+     - `RATE_LIMIT_ENABLED` / `CACHE_ENABLED` for tuning.
+2. **Start the backend**
+   ```bash
+   cd backend
+   uvicorn app.main:app --reload --port 8000
+   ```
+3. **Ingest data**
+   - Quick Wikipedia smoke test (also see `scripts/smoke_chat.py`):
+     ```bash
+     python ../scripts/smoke_chat.py --backend-url http://localhost:8000 --namespace dev
+     ```
+4. **Test `/search`**
+   ```bash
+   curl -X POST "http://localhost:8000/search" \
+     -H "Content-Type: application/json" \
+     -H "X-API-Key: $API_KEY" \  # only if API_KEY is enabled
+     -d '{"query": "what is RAG", "namespace": "dev", "top_k": 5}'
+   ```
+5. **Test `/chat`**
+   - Use the curl example above or run:
+     ```bash
+     curl -X POST "http://localhost:8000/chat" \
+       -H "Content-Type: application/json" \
+       -H "X-API-Key: $API_KEY" \  # only if API_KEY is enabled
+       -d '{"query": "What is retrieval-augmented generation?", "namespace": "dev"}'
+     ```
+6. **Test `/chat` with web fallback**
+   - Requires `TAVILY_API_KEY`:
+     ```bash
+     python ../scripts/smoke_chat_web.py --backend-url http://localhost:8000 --namespace dev
+     ```
+7. **Inspect `/metrics`**
+   ```bash
+   curl "http://localhost:8000/metrics"
+   ```
+   - Confirm:
+     - Request counts are increasing.
+     - Timing stats (`average_ms`, `p50_ms`, `p95_ms`) are populated after several `/chat` calls.
+     - Cache hit/miss counters change when repeating identical `/search` or `/chat` requests.
+8. **Run the benchmark script**
+   - From the repo root:
+     ```bash
+     python scripts/bench_local.py \
+       --backend-url http://localhost:8000 \
+       --namespace dev \
+       --concurrency 10 \
+       --requests 50 \
+       --api-key "$API_KEY"
+     ```
+   - Review reported:
+     - Average latency.
+     - p50 / p95 latency.
+     - Error rate.
+9. **Optional: Test Streamlit frontend locally**
+   - Install root requirements:
+     ```bash
+     pip install -r requirements.txt
+     ```
+   - Run:
+     ```bash
+     streamlit run frontend/app.py
+     ```
+   - Configure `BACKEND_BASE_URL` and `API_KEY` via environment or `.streamlit/secrets.toml`, and verify chat works end-to-end.

backend/app/__pycache__/main.cpython-313.pyc ADDED Viewed

Binary file (2.45 kB). View file

backend/app/core/__pycache__/cache.cpython-313.pyc ADDED Viewed

Binary file (5.12 kB). View file

backend/app/core/__pycache__/config.cpython-313.pyc ADDED Viewed

Binary file (4.09 kB). View file

backend/app/core/__pycache__/errors.cpython-313.pyc ADDED Viewed

Binary file (4.89 kB). View file

backend/app/core/__pycache__/logging.cpython-313.pyc ADDED Viewed

Binary file (1.11 kB). View file

backend/app/core/__pycache__/metrics.cpython-313.pyc ADDED Viewed

Binary file (5.86 kB). View file

backend/app/core/__pycache__/rate_limit.cpython-313.pyc ADDED Viewed

Binary file (2.53 kB). View file

backend/app/core/__pycache__/runtime.cpython-313.pyc ADDED Viewed

Binary file (1.35 kB). View file

backend/app/core/__pycache__/security.cpython-313.pyc ADDED Viewed

Binary file (5.04 kB). View file

backend/app/core/__pycache__/tracing.cpython-313.pyc ADDED Viewed

Binary file (2.75 kB). View file

backend/app/core/cache.py ADDED Viewed

	@@ -0,0 +1,162 @@

+import json
+from threading import Lock
+from typing import Any, Dict, Hashable, Optional, Tuple
+from cachetools import TTLCache
+from app.core.config import get_settings
+from app.core.logging import get_logger
+logger = get_logger(__name__)
+_settings = get_settings()
+_CACHE_ENABLED: bool = getattr(_settings, "CACHE_ENABLED", True)
+# TTLs are intentionally short and in-code defaults; no env required.
+_SEARCH_TTL_SECONDS = 60
+_CHAT_TTL_SECONDS = 60
+_search_cache: TTLCache = TTLCache(maxsize=1024, ttl=_SEARCH_TTL_SECONDS)
+_chat_cache: TTLCache = TTLCache(maxsize=512, ttl=_CHAT_TTL_SECONDS)
+_lock = Lock()
+_search_hits: int = 0
+_search_misses: int = 0
+_chat_hits: int = 0
+_chat_misses: int = 0
+def cache_enabled() -> bool:
+    return _CACHE_ENABLED
+def _make_search_key(
+    namespace: str,
+    query: str,
+    top_k: int,
+    filters: Optional[Dict[str, Any]],
+) -> Hashable:
+    filters_json = (
+        json.dumps(filters, sort_keys=True, separators=(",", ":"))
+        if filters is not None
+        else ""
+    )
+    return (namespace, query, int(top_k), filters_json)
+def _make_chat_key(
+    namespace: str,
+    query: str,
+    top_k: int,
+    min_score: float,
+    use_web_fallback: bool,
+) -> Hashable:
+    return (namespace, query, int(top_k), float(min_score), bool(use_web_fallback))
+def get_search_cached(
+    namespace: str,
+    query: str,
+    top_k: int,
+    filters: Optional[Dict[str, Any]],
+) -> Optional[Any]:
+    """Return cached search results or None."""
+    global _search_hits, _search_misses
+    if not _CACHE_ENABLED:
+        return None
+    key = _make_search_key(namespace, query, top_k, filters)
+    with _lock:
+        if key in _search_cache:
+            _search_hits += 1
+            logger.info(
+                "Search cache hit namespace='%s' query='%s' top_k=%d",
+                namespace,
+                query,
+                top_k,
+            )
+            return _search_cache[key]
+        _search_misses += 1
+    logger.info(
+        "Search cache miss namespace='%s' query='%s' top_k=%d",
+        namespace,
+        query,
+        top_k,
+    )
+    return None
+def set_search_cached(
+    namespace: str,
+    query: str,
+    top_k: int,
+    filters: Optional[Dict[str, Any]],
+    value: Any,
+) -> None:
+    if not _CACHE_ENABLED:
+        return
+    key = _make_search_key(namespace, query, top_k, filters)
+    with _lock:
+        _search_cache[key] = value
+def get_chat_cached(
+    namespace: str,
+    query: str,
+    top_k: int,
+    min_score: float,
+    use_web_fallback: bool,
+) -> Optional[Any]:
+    """Return cached chat response or None.
+    Only used when chat_history is empty.
+    """
+    global _chat_hits, _chat_misses
+    if not _CACHE_ENABLED:
+        return None
+    key = _make_chat_key(namespace, query, top_k, min_score, use_web_fallback)
+    with _lock:
+        if key in _chat_cache:
+            _chat_hits += 1
+            logger.info(
+                "Chat cache hit namespace='%s' query='%s' top_k=%d",
+                namespace,
+                query,
+                top_k,
+            )
+            return _chat_cache[key]
+        _chat_misses += 1
+    logger.info(
+        "Chat cache miss namespace='%s' query='%s' top_k=%d",
+        namespace,
+        query,
+        top_k,
+    )
+    return None
+def set_chat_cached(
+    namespace: str,
+    query: str,
+    top_k: int,
+    min_score: float,
+    use_web_fallback: bool,
+    value: Any,
+) -> None:
+    if not _CACHE_ENABLED:
+        return
+    key = _make_chat_key(namespace, query, top_k, min_score, use_web_fallback)
+    with _lock:
+        _chat_cache[key] = value
+def get_cache_stats() -> Dict[str, int]:
+    """Return a snapshot of cache hit/miss counters."""
+    return {
+        "search_hits": _search_hits,
+        "search_misses": _search_misses,
+        "chat_hits": _chat_hits,
+        "chat_misses": _chat_misses,
+    }

backend/app/core/config.py ADDED Viewed

	@@ -0,0 +1,118 @@

+import os
+from functools import lru_cache
+from typing import Optional
+from dotenv import load_dotenv
+from pydantic import Field
+from pydantic_settings import BaseSettings, SettingsConfigDict
+# Load environment variables from a local .env file if present
+load_dotenv()
+class Settings(BaseSettings):
+    """Application settings loaded from environment variables."""
+    # App
+    APP_NAME: str = Field(default="rag-agent-workbench")
+    APP_VERSION: str = Field(default="0.1.0")
+    # Pinecone
+    PINECONE_API_KEY: str = Field(..., description="Pinecone API key")
+    PINECONE_INDEX_NAME: str = Field(
+        ..., description="Name of the Pinecone index (used for configuration checks)"
+    )
+    PINECONE_HOST: str = Field(
+        ..., description="Pinecone index host URL for data-plane operations"
+    )
+    PINECONE_NAMESPACE: str = Field(
+        default="dev", description="Default Pinecone namespace"
+    )
+    PINECONE_TEXT_FIELD: str = Field(
+        default="chunk_text",
+        description=(
+            "Text field name used by the Pinecone integrated embedding index. "
+            "For example, set to 'content' if your index field_map uses that name."
+        ),
+    )
+    # Logging
+    LOG_LEVEL: str = Field(default="INFO", description="Application log level")
+    # HTTP client defaults
+    HTTP_TIMEOUT_SECONDS: float = Field(
+        default=10.0, description="Default timeout for outbound HTTP requests"
+    )
+    HTTP_MAX_RETRIES: int = Field(
+        default=3, description="Max retries for outbound HTTP requests"
+    )
+    # Groq / LLM
+    GROQ_API_KEY: Optional[str] = Field(
+        default=None,
+        description="Groq API key (required for /chat endpoints)",
+    )
+    GROQ_BASE_URL: str = Field(
+        default="https://api.groq.com/openai/v1",
+        description="Groq OpenAI-compatible base URL",
+    )
+    GROQ_MODEL: str = Field(
+        default="llama-3.1-8b-instant",
+        description="Default Groq chat model used for /chat",
+    )
+    # Web search / Tavily
+    TAVILY_API_KEY: Optional[str] = Field(
+        default=None,
+        description="Tavily API key for web search fallback (optional)",
+    )
+    # RAG defaults
+    RAG_DEFAULT_TOP_K: int = Field(
+        default=5,
+        ge=1,
+        le=100,
+        description="Default number of documents to retrieve for RAG",
+    )
+    RAG_MIN_SCORE: float = Field(
+        default=0.25,
+        ge=0.0,
+        le=1.0,
+        description="Default minimum relevance score to trust retrieval without web fallback",
+    )
+    RAG_MAX_WEB_RESULTS: int = Field(
+        default=5,
+        ge=1,
+        le=20,
+        description="Maximum number of web search results to fetch when using Tavily",
+    )
+    # Operational toggles
+    RATE_LIMIT_ENABLED: bool = Field(
+        default=True,
+        description="Enable SlowAPI rate limiting middleware when true.",
+    )
+    CACHE_ENABLED: bool = Field(
+        default=True,
+        description="Enable in-memory TTL caching for /search and /chat when true.",
+    )
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        extra="ignore",
+    )
+@lru_cache(maxsize=1)
+def get_settings() -> Settings:
+    """Return a cached Settings instance."""
+    return Settings()  # type: ignore[call-arg]
+def get_env_bool(name: str, default: bool = False) -> bool:
+    """Utility to parse boolean flags from environment variables."""
+    raw: Optional[str] = os.getenv(name)
+    if raw is None:
+        return default
+    return raw.strip().lower() in {"1", "true", "yes", "on"}

backend/app/core/errors.py ADDED Viewed

	@@ -0,0 +1,83 @@

+import logging
+from typing import Any
+from fastapi import FastAPI, HTTPException, Request
+from fastapi.exceptions import RequestValidationError
+from fastapi.responses import JSONResponse
+from starlette import status
+logger = logging.getLogger(__name__)
+class PineconeIndexConfigError(RuntimeError):
+    """Raised when the Pinecone index is not configured for integrated embeddings."""
+class UpstreamServiceError(RuntimeError):
+    """Raised when an upstream dependency (LLM, web search, etc.) fails."""
+    def __init__(self, service: str, message: str) -> None:
+        self.service = service
+        super().__init__(message)
+def setup_exception_handlers(app: FastAPI) -> None:
+    """Register global exception handlers on the FastAPI app."""
+    @app.exception_handler(PineconeIndexConfigError)
+    async def pinecone_index_config_error_handler(
+        request: Request, exc: PineconeIndexConfigError
+    ) -> JSONResponse:
+        logger.error("Pinecone index configuration error: %s", exc)
+        return JSONResponse(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            content={"detail": str(exc)},
+        )
+    @app.exception_handler(UpstreamServiceError)
+    async def upstream_service_error_handler(
+        request: Request, exc: UpstreamServiceError
+    ) -> JSONResponse:
+        logger.error("Upstream service error from %s: %s", exc.service, exc)
+        return JSONResponse(
+            status_code=status.HTTP_502_BAD_GATEWAY,
+            content={"detail": str(exc)},
+        )
+    @app.exception_handler(RequestValidationError)
+    async def validation_exception_handler(
+        request: Request, exc: RequestValidationError
+    ) -> JSONResponse:
+        logger.warning("Request validation error: %s", exc)
+        return JSONResponse(
+            status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+            content={"detail": exc.errors()},
+        )
+    @app.exception_handler(HTTPException)
+    async def http_exception_handler(
+        request: Request, exc: HTTPException
+    ) -> JSONResponse:
+        # Let FastAPI-style HTTPException pass through with its status and detail.
+        logger.warning(
+            "HTTPException raised: status=%s detail=%s",
+            exc.status_code,
+            exc.detail,
+        )
+        return JSONResponse(
+            status_code=exc.status_code,
+            content={"detail": exc.detail},
+        )
+    @app.exception_handler(Exception)
+    async def generic_exception_handler(
+        request: Request, exc: Exception
+    ) -> JSONResponse:
+        logger.exception("Unhandled error", exc_info=exc)
+        return JSONResponse(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            content={"detail": "Internal server error"},
+        )
+__all__ = ["PineconeIndexConfigError", "setup_exception_handlers"]

backend/app/core/logging.py ADDED Viewed

	@@ -0,0 +1,19 @@

+import logging
+from typing import Optional
+def configure_logging(log_level: str) -> None:
+    """Configure root logging for the application.
+    This keeps configuration minimal while ensuring consistent formatting.
+    """
+    level = getattr(logging, log_level.upper(), logging.INFO)
+    logging.basicConfig(
+        level=level,
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    )
+def get_logger(name: Optional[str] = None) -> logging.Logger:
+    """Return a module-level logger."""
+    return logging.getLogger(name or "app")

backend/app/core/metrics.py ADDED Viewed

	@@ -0,0 +1,129 @@

+from collections import defaultdict, deque
+from threading import Lock
+from time import perf_counter
+from typing import Deque, Dict, List, Mapping, MutableMapping
+from fastapi import FastAPI, Request
+from fastapi.responses import JSONResponse
+from app.core.cache import get_cache_stats
+from app.core.logging import get_logger
+logger = get_logger(__name__)
+# Request and error counters by path.
+_request_counts: MutableMapping[str, int] = defaultdict(int)
+_error_counts: MutableMapping[str, int] = defaultdict(int)
+# Timing samples for chat requests: last N samples.
+_TIMING_FIELDS = ["retrieve_ms", "web_ms", "generate_ms", "total_ms"]
+_TIMING_BUFFER_SIZE = 20
+_timing_samples: Deque[Dict[str, float]] = deque(maxlen=_TIMING_BUFFER_SIZE)
+# Aggregated sums and counts for averages.
+_timing_sums: Dict[str, float] = {f: 0.0 for f in _TIMING_FIELDS}
+_timing_count: int = 0
+_lock = Lock()
+async def metrics_middleware(request: Request, call_next):
+    """Middleware capturing request counts and error counts by path."""
+    path = request.url.path or "/"
+    start = perf_counter()
+    try:
+        response = await call_next(request)
+    except Exception:  # noqa: BLE001
+        elapsed = (perf_counter() - start) * 1000.0
+        with _lock:
+            _request_counts[path] += 1
+            _error_counts[path] += 1
+        logger.exception("Unhandled error for path=%s elapsed_ms=%.2f", path, elapsed)
+        raise
+    elapsed = (perf_counter() - start) * 1000.0
+    status = response.status_code
+    with _lock:
+        _request_counts[path] += 1
+        if status >= 400:
+            _error_counts[path] += 1
+    logger.debug(
+        "Request path=%s status=%s elapsed_ms=%.2f", path, status, elapsed
+    )
+    return response
+def record_chat_timings(timings: Mapping[str, float]) -> None:
+    """Record timing metrics from a chat request.
+    Expects a mapping with keys retrieve_ms, web_ms, generate_ms, total_ms.
+    """
+    global _timing_count
+    sample = {field: float(timings.get(field, 0.0)) for field in _TIMING_FIELDS}
+    with _lock:
+        _timing_samples.append(sample)
+        for field, value in sample.items():
+            _timing_sums[field] += value
+        _timing_count += 1
+def _percentile(values: List[float], p: float) -> float:
+    if not values:
+        return 0.0
+    values_sorted = sorted(values)
+    k = max(0, min(len(values_sorted) - 1, int(round((p / 100.0) * (len(values_sorted) - 1)))))
+    return values_sorted[k]
+def get_metrics_snapshot() -> Dict[str, object]:
+    """Return a stable snapshot of metrics suitable for /metrics responses."""
+    with _lock:
+        requests_by_path = dict(_request_counts)
+        errors_by_path = dict(_error_counts)
+        samples = list(_timing_samples)
+        sums = dict(_timing_sums)
+        count = int(_timing_count)
+    averages: Dict[str, float] = {}
+    if count > 0:
+        for field in _TIMING_FIELDS:
+            averages[field] = sums.get(field, 0.0) / count
+    else:
+        for field in _TIMING_FIELDS:
+            averages[field] = 0.0
+    # Compute percentiles over the last N samples.
+    p50: Dict[str, float] = {}
+    p95: Dict[str, float] = {}
+    if samples:
+        for field in _TIMING_FIELDS:
+            values = [s.get(field, 0.0) for s in samples]
+            p50[field] = _percentile(values, 50.0)
+            p95[field] = _percentile(values, 95.0)
+    else:
+        for field in _TIMING_FIELDS:
+            p50[field] = 0.0
+            p95[field] = 0.0
+    cache_stats = get_cache_stats()
+    return {
+        "requests_by_path": requests_by_path,
+        "errors_by_path": errors_by_path,
+        "timings": {
+            "average_ms": averages,
+            "p50_ms": p50,
+            "p95_ms": p95,
+        },
+        "cache": cache_stats,
+        "sample_count": count,
+        "samples": samples,
+    }
+def setup_metrics(app: FastAPI) -> None:
+    """Attach metrics middleware to the app."""
+    logger.info("Metrics middleware enabled.")
+    app.middleware("http")(metrics_middleware)

backend/app/core/rate_limit.py ADDED Viewed

	@@ -0,0 +1,58 @@

+from typing import Any
+from fastapi import FastAPI, Request
+from fastapi.responses import JSONResponse
+from slowapi import Limiter
+from slowapi.errors import RateLimitExceeded
+from slowapi.middleware import SlowAPIMiddleware
+from slowapi.util import get_remote_address
+from app.core.config import get_settings
+from app.core.logging import get_logger
+logger = get_logger(__name__)
+# Global limiter instance used for decorators.
+limiter = Limiter(key_func=get_remote_address)
+def setup_rate_limiter(app: FastAPI) -> None:
+    """Configure SlowAPI rate limiting middleware and handlers.
+    Limits are enabled/disabled via Settings.RATE_LIMIT_ENABLED.
+    """
+    settings = get_settings()
+    if not getattr(settings, "RATE_LIMIT_ENABLED", True):
+        logger.info("Rate limiting is disabled via settings.")
+        return
+    logger.info("Rate limiting enabled with SlowAPI.")
+    app.state.limiter = limiter  # type: ignore[attr-defined]
+    @app.exception_handler(RateLimitExceeded)
+    async def rate_limit_exceeded_handler(  # type: ignore[no-redef]
+        request: Request,
+        exc: RateLimitExceeded,
+    ) -> JSONResponse:
+        retry_after: str | None = None
+        try:
+            retry_after = exc.headers.get("Retry-After")  # type: ignore[assignment]
+        except Exception:  # noqa: BLE001
+            retry_after = None
+        logger.warning(
+            "Rate limit exceeded path=%s client=%s limit=%s",
+            request.url.path,
+            get_remote_address(request),
+            exc.detail,
+        )
+        content: dict[str, Any] = {
+            "detail": "Rate limit exceeded. Please slow down your requests.",
+        }
+        if retry_after is not None:
+            content["retry_after"] = retry_after
+        return JSONResponse(status_code=429, content=content)
+    # Attach SlowAPI middleware
+    app.add_middleware(SlowAPIMiddleware)

backend/app/core/runtime.py ADDED Viewed

	@@ -0,0 +1,31 @@

+import os
+from app.core.logging import get_logger
+logger = get_logger(__name__)
+def get_port(default: int = 7860) -> int:
+    """Return the port the application should bind to.
+    - Uses the PORT environment variable when set (e.g. on Hugging Face Spaces).
+    - Falls back to the provided default (7860 for Spaces compatibility).
+    - Logs a message indicating the chosen port and whether we appear to be
+      running inside a Hugging Face Spaces environment.
+    """
+    raw = os.getenv("PORT")
+    try:
+        port = int(raw) if raw else default
+    except (TypeError, ValueError):
+        port = default
+    # Heuristic to detect HF Spaces: SPACE_ID or SPACE_REPO_ID are usually set.
+    hf_spaces_mode = bool(os.getenv("SPACE_ID") or os.getenv("SPACE_REPO_ID"))
+    logger.info(
+        "Starting on port=%d hf_spaces_mode=%s",
+        port,
+        hf_spaces_mode,
+    )
+    return port

backend/app/core/security.py ADDED Viewed

	@@ -0,0 +1,116 @@

+import os
+from typing import Iterable, List, Optional
+from fastapi import FastAPI, Request
+from fastapi.responses import JSONResponse
+from fastapi.middleware.cors import CORSMiddleware
+from starlette.middleware.base import BaseHTTPMiddleware
+from app.core.logging import get_logger
+logger = get_logger(__name__)
+def _get_allowed_origins() -> List[str]:
+    raw = os.getenv("ALLOWED_ORIGINS")
+    if not raw:
+        # Default: permissive for local development and simple frontends.
+        origins = ["*"]
+    else:
+        origins = [item.strip() for item in raw.split(",") if item.strip()]
+        if not origins:
+            origins = ["*"]
+    return origins
+class APIKeyMiddleware(BaseHTTPMiddleware):
+    """Optional API key protection for selected endpoints.
+    When the API_KEY environment variable is set, this middleware enforces the
+    presence of an `X-API-Key` header with a matching value for:
+      - /ingest/*
+      - /documents/*
+      - /chat*
+      - /search
+    The following paths remain public regardless of API_KEY:
+      - /health
+      - /docs
+      - /openapi.json
+      - /redoc
+      - /metrics
+    When API_KEY is not set, the middleware is not installed and the API is open.
+    """
+    def __init__(self, app: FastAPI, api_key: str) -> None:  # type: ignore[override]
+        super().__init__(app)
+        self.api_key = api_key
+        self._protected_prefixes: List[str] = [
+            "/ingest",
+            "/documents",
+            "/chat",
+            "/search",
+        ]
+        self._public_prefixes: List[str] = [
+            "/health",
+            "/docs",
+            "/openapi.json",
+            "/redoc",
+            "/metrics",
+        ]
+    async def dispatch(self, request: Request, call_next):  # type: ignore[override]
+        path = request.url.path or "/"
+        # Public endpoints stay open.
+        if any(path.startswith(prefix) for prefix in self._public_prefixes):
+            return await call_next(request)
+        # Only enforce for protected prefixes.
+        if not any(path.startswith(prefix) for prefix in self._protected_prefixes):
+            return await call_next(request)
+        header_key: Optional[str] = request.headers.get("X-API-Key")
+        if not header_key or header_key != self.api_key:
+            logger.warning("Rejected request with missing/invalid API key path=%s", path)
+            return JSONResponse(
+                status_code=401,
+                content={
+                    "detail": (
+                        "Missing or invalid API key. Provide X-API-Key header with "
+                        "a valid key to access this endpoint."
+                    )
+                },
+            )
+        return await call_next(request)
+def configure_security(app: FastAPI) -> None:
+    """Configure CORS and optional API key protection on the FastAPI app."""
+    # CORS
+    origins = _get_allowed_origins()
+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=origins,
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
+    logger.info("CORS configured allow_origins=%s", origins)
+    # Optional API key middleware
+    api_key = os.getenv("API_KEY")
+    if not api_key:
+        logger.warning(
+            "API key disabled; protected endpoints are open. "
+            "Set API_KEY environment variable to enable X-API-Key protection."
+        )
+        return
+    logger.info("API key protection enabled for ingest, documents, search, and chat.")
+    app.add_middleware(APIKeyMiddleware, api_key=api_key)

backend/app/core/tracing.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import os
+from functools import lru_cache
+from typing import Any, Dict, List, Optional
+from app.core.config import get_env_bool
+from app.core.logging import get_logger
+logger = get_logger(__name__)
+def is_tracing_enabled() -> bool:
+    """Return True if LangSmith / LangChain tracing is enabled via environment."""
+    tracing_flag = get_env_bool("LANGCHAIN_TRACING_V2", False)
+    api_key = os.getenv("LANGCHAIN_API_KEY")
+    return tracing_flag and bool(api_key)
+def get_langsmith_project() -> Optional[str]:
+    """Return the LangSmith project name, if configured."""
+    return os.getenv("LANGCHAIN_PROJECT")
+@lru_cache(maxsize=1)
+def get_tracing_callbacks() -> List[Any]:
+    """Return LangChain callback handlers for tracing, if available.
+    When LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY is set, this will
+    attempt to create a LangChainTracer instance. If tracing is not enabled
+    or the tracer is unavailable, an empty list is returned.
+    """
+    if not is_tracing_enabled():
+        logger.info(
+            "LangSmith tracing disabled (set LANGCHAIN_TRACING_V2=true and "
+            "LANGCHAIN_API_KEY to enable)."
+        )
+        return []
+    try:
+        from langchain_core.tracers import LangChainTracer  # type: ignore[import]
+    except Exception as exc:  # noqa: BLE001
+        logger.warning(
+            "LangSmith tracing requested but LangChainTracer is unavailable: %s", exc
+        )
+        return []
+    project = get_langsmith_project()
+    tracer = LangChainTracer(project_name=project)
+    logger.info(
+        "LangSmith tracing enabled for project='%s'",
+        project or "(default)",
+    )
+    return [tracer]
+def get_tracing_response_metadata() -> Dict[str, Any]:
+    """Return trace metadata suitable for API responses."""
+    return {
+        "langsmith_project": get_langsmith_project(),
+        "trace_enabled": is_tracing_enabled(),
+    }

backend/app/main.py ADDED Viewed

	@@ -0,0 +1,61 @@

+from fastapi import FastAPI
+from fastapi.responses import ORJSONResponse
+from app.core.config import get_settings
+from app.core.errors import PineconeIndexConfigError, setup_exception_handlers
+from app.core.logging import configure_logging, get_logger
+from app.core.metrics import setup_metrics
+from app.core.rate_limit import setup_rate_limiter
+from app.core.runtime import get_port
+from app.core.security import configure_security
+from app.routers.documents import router as documents_router
+from app.routers.health import router as health_router
+from app.routers.ingest import router as ingest_router
+from app.routers.search import router as search_router
+from app.routers.chat import router as chat_router
+from app.routers.metrics import router as metrics_router
+from app.services.pinecone_store import init_pinecone
+settings = get_settings()
+configure_logging(settings.LOG_LEVEL)
+logger = get_logger(__name__)
+# Log runtime port / environment context at import time for easier diagnostics.
+get_port()
+app = FastAPI(
+    title="RAG Agent Workbench API",
+    version=settings.APP_VERSION,
+    default_response_class=ORJSONResponse,
+    docs_url="/docs",
+    redoc_url="/redoc",
+    openapi_url="/openapi.json",
+)
+# Core app configuration
+configure_security(app)
+setup_rate_limiter(app)
+setup_metrics(app)
+# Register routers with tags and ensure they are included in the schema
+app.include_router(health_router, tags=["health"])
+app.include_router(ingest_router, tags=["ingest"])
+app.include_router(search_router, tags=["search"])
+app.include_router(documents_router, tags=["documents"])
+app.include_router(chat_router, tags=["chat"])
+app.include_router(metrics_router, tags=["metrics"])
+# Register exception handlers
+setup_exception_handlers(app)
+@app.on_event("startup")
+async def startup_event() -> None:
+    """Application startup hook."""
+    try:
+        init_pinecone(settings)
+        logger.info("Pinecone initialisation completed")
+    except PineconeIndexConfigError:
+        # Let the exception handler and FastAPI/uvicorn deal with the error.
+        # Re-raise to fail fast on misconfiguration.
+        raise

backend/app/routers/__pycache__/chat.cpython-313.pyc ADDED Viewed

Binary file (11.4 kB). View file

backend/app/routers/__pycache__/documents.cpython-313.pyc ADDED Viewed

Binary file (4.17 kB). View file

backend/app/routers/__pycache__/health.cpython-313.pyc ADDED Viewed

Binary file (792 Bytes). View file

backend/app/routers/__pycache__/ingest.cpython-313.pyc ADDED Viewed

Binary file (8.27 kB). View file

backend/app/routers/__pycache__/metrics.cpython-313.pyc ADDED Viewed

Binary file (789 Bytes). View file

backend/app/routers/__pycache__/search.cpython-313.pyc ADDED Viewed

Binary file (3.32 kB). View file

backend/app/routers/chat.py ADDED Viewed

	@@ -0,0 +1,280 @@

+import json
+from time import perf_counter
+from typing import AsyncGenerator, Dict, List, Optional
+from fastapi import APIRouter, Request
+from fastapi.concurrency import run_in_threadpool
+from fastapi.responses import StreamingResponse
+from app.core.cache import cache_enabled, get_chat_cached, set_chat_cached
+from app.core.config import get_settings
+from app.core.logging import get_logger
+from app.core.metrics import record_chat_timings
+from app.core.rate_limit import limiter
+from app.core.tracing import (
+    get_tracing_callbacks,
+    get_tracing_response_metadata,
+)
+from app.schemas.chat import (
+    ChatRequest,
+    ChatResponse,
+    ChatTimings,
+    ChatTraceMetadata,
+    SourceHit,
+)
+from app.services.chat.graph import get_chat_graph
+logger = get_logger(__name__)
+router = APIRouter(tags=["chat"])
+def _build_chat_response(state: Dict) -> ChatResponse:
+    """Convert graph state into a ChatResponse model."""
+    timings_raw = state.get("timings") or {}
+    timings = ChatTimings(
+        retrieve_ms=float(timings_raw.get("retrieve_ms") or 0.0),
+        web_ms=float(timings_raw.get("web_ms") or 0.0),
+        generate_ms=float(timings_raw.get("generate_ms") or 0.0),
+        total_ms=float(timings_raw.get("total_ms") or 0.0),
+    )
+    sources_raw: List[Dict] = (state.get("retrieved") or []) + (
+        state.get("web_results") or []
+    )
+    sources: List[SourceHit] = [
+        SourceHit(
+            source=str(src.get("source") or "unknown"),
+            title=str(src.get("title") or ""),
+            url=str(src.get("url") or ""),
+            score=float(src.get("score") or 0.0),
+            chunk_text=str(src.get("chunk_text") or ""),
+        )
+        for src in sources_raw
+    ]
+    trace_meta = ChatTraceMetadata(**get_tracing_response_metadata())
+    return ChatResponse(
+        answer=str(state.get("answer") or ""),
+        sources=sources,
+        timings=timings,
+        trace=trace_meta,
+    )
+@router.post(
+    "/chat",
+    response_model=ChatResponse,
+    summary="Production-style RAG chat endpoint",
+    description=(
+        "Runs an agentic RAG flow using Pinecone retrieval, optional Tavily web "
+        "fallback, and a Groq-backed LLM to generate an answer. "
+        "Returns the answer, source snippets, timings, and LangSmith trace metadata."
+    ),
+)
+@limiter.limit("30/minute")
+async def chat(request: Request, payload: ChatRequest) -> ChatResponse:  # noqa: ARG001
+    settings = get_settings()
+    namespace = payload.namespace or settings.PINECONE_NAMESPACE
+    logger.info(
+        "Received /chat request namespace='%s' top_k=%d use_web_fallback=%s",
+        namespace,
+        payload.top_k,
+        payload.use_web_fallback,
+    )
+    use_cache = cache_enabled() and not payload.chat_history
+    cached_response: Optional[ChatResponse] = None
+    if use_cache:
+        cached = get_chat_cached(
+            namespace=namespace,
+            query=payload.query,
+            top_k=payload.top_k,
+            min_score=payload.min_score,
+            use_web_fallback=payload.use_web_fallback,
+        )
+        if cached is not None:
+            logger.info(
+                "Serving /chat response from cache namespace='%s' query='%s'",
+                namespace,
+                payload.query,
+            )
+            cached_response = cached
+    if cached_response is not None:
+        # Still record timings and metrics based on the cached response.
+        record_chat_timings(
+            {
+                "retrieve_ms": cached_response.timings.retrieve_ms,
+                "web_ms": cached_response.timings.web_ms,
+                "generate_ms": cached_response.timings.generate_ms,
+                "total_ms": cached_response.timings.total_ms,
+            }
+        )
+        return cached_response
+    graph = get_chat_graph()
+    callbacks = get_tracing_callbacks()
+    config: Dict = {}
+    if callbacks:
+        config["callbacks"] = callbacks
+    initial_state = {
+        "query": payload.query,
+        "namespace": namespace,
+        "top_k": payload.top_k,
+        "use_web_fallback": payload.use_web_fallback,
+        "min_score": payload.min_score,
+        "max_web_results": payload.max_web_results,
+        "chat_history": [
+            {"role": msg.role, "content": msg.content}
+            for msg in (payload.chat_history or [])
+        ],
+    }
+    start_total = perf_counter()
+    def _invoke_graph() -> Dict:
+        return graph.invoke(initial_state, config=config)
+    # Exceptions (including UpstreamServiceError) are handled by global handlers.
+    state = await run_in_threadpool(_invoke_graph)
+    total_ms = (perf_counter() - start_total) * 1000.0
+    timings = state.get("timings") or {}
+    timings["total_ms"] = total_ms
+    state["timings"] = timings
+    web_used = bool(state.get("web_fallback_used"))
+    top_score = float(state.get("top_score") or 0.0)
+    logger.info(
+        "Chat request completed namespace='%s' web_fallback_used=%s "
+        "retrieve_ms=%.2f web_ms=%.2f generate_ms=%.2f total_ms=%.2f top_score=%.4f",
+        namespace,
+        web_used,
+        float(timings.get("retrieve_ms") or 0.0),
+        float(timings.get("web_ms") or 0.0),
+        float(timings.get("generate_ms") or 0.0),
+        float(timings.get("total_ms") or 0.0),
+        top_score,
+    )
+    response_model = _build_chat_response(state)
+    # Record metrics based on this response.
+    record_chat_timings(
+        {
+            "retrieve_ms": response_model.timings.retrieve_ms,
+            "web_ms": response_model.timings.web_ms,
+            "generate_ms": response_model.timings.generate_ms,
+            "total_ms": response_model.timings.total_ms,
+        }
+    )
+    # Cache only when chat_history is empty.
+    if use_cache:
+        set_chat_cached(
+            namespace=namespace,
+            query=payload.query,
+            top_k=payload.top_k,
+            min_score=payload.min_score,
+            use_web_fallback=payload.use_web_fallback,
+            value=response_model,
+        )
+    return response_model
+@router.post(
+    "/chat/stream",
+    summary="Streaming RAG chat endpoint (SSE)",
+    description=(
+        "Same behaviour as /chat but streams the answer over Server-Sent Events "
+        "(SSE). The final event includes the full JSON payload with answer, sources, "
+        "timings, and trace metadata."
+    ),
+)
+@limiter.limit("30/minute")
+async def chat_stream(request: Request, payload: ChatRequest) -> StreamingResponse:  # noqa: ARG001
+    settings = get_settings()
+    namespace = payload.namespace or settings.PINECONE_NAMESPACE
+    logger.info(
+        "Received /chat/stream request namespace='%s' top_k=%d use_web_fallback=%s",
+        namespace,
+        payload.top_k,
+        payload.use_web_fallback,
+    )
+    graph = get_chat_graph()
+    callbacks = get_tracing_callbacks()
+    config: Dict = {}
+    if callbacks:
+        config["callbacks"] = callbacks
+    initial_state = {
+        "query": payload.query,
+        "namespace": namespace,
+        "top_k": payload.top_k,
+        "use_web_fallback": payload.use_web_fallback,
+        "min_score": payload.min_score,
+        "max_web_results": payload.max_web_results,
+        "chat_history": [
+            {"role": msg.role, "content": msg.content}
+            for msg in (payload.chat_history or [])
+        ],
+    }
+    start_total = perf_counter()
+    def _invoke_graph() -> Dict:
+        return graph.invoke(initial_state, config=config)
+    # Exceptions (including UpstreamServiceError) are handled by global handlers.
+    state = await run_in_threadpool(_invoke_graph)
+    total_ms = (perf_counter() - start_total) * 1000.0
+    timings = state.get("timings") or {}
+    timings["total_ms"] = total_ms
+    state["timings"] = timings
+    web_used = bool(state.get("web_fallback_used"))
+    top_score = float(state.get("top_score") or 0.0)
+    logger.info(
+        "Streaming chat completed namespace='%s' web_fallback_used=%s "
+        "retrieve_ms=%.2f web_ms=%.2f generate_ms=%.2f total_ms=%.2f top_score=%.4f",
+        namespace,
+        web_used,
+        float(timings.get("retrieve_ms") or 0.0),
+        float(timings.get("web_ms") or 0.0),
+        float(timings.get("generate_ms") or 0.0),
+        float(timings.get("total_ms") or 0.0),
+        top_score,
+    )
+    response_model = _build_chat_response(state)
+    answer_text = response_model.answer
+    # Record metrics based on this response as well.
+    record_chat_timings(
+        {
+            "retrieve_ms": response_model.timings.retrieve_ms,
+            "web_ms": response_model.timings.web_ms,
+            "generate_ms": response_model.timings.generate_ms,
+            "total_ms": response_model.timings.total_ms,
+        }
+    )
+    async def event_generator() -> AsyncGenerator[str, None]:
+        # Stream the answer token-by-token (space-delimited) as simple SSE events.
+        for token in answer_text.split():
+            yield f"data: {token}\n\n"
+        # Send a final event containing the full JSON payload for clients that
+        # want metadata and sources.
+        final_payload = response_model.model_dump()
+        yield f"event: end\ndata: {json.dumps(final_payload)}\n\n"
+    return StreamingResponse(event_generator(), media_type="text/event-stream")

backend/app/routers/documents.py ADDED Viewed

	@@ -0,0 +1,100 @@

+from typing import Any, Dict, List
+from fastapi import APIRouter, Query
+from fastapi.concurrency import run_in_threadpool
+from langchain_core.documents import Document
+from app.core.config import get_settings
+from app.core.logging import get_logger
+from app.schemas.documents import (
+    DocumentsStatsResponse,
+    NamespaceStat,
+    UploadTextRequest,
+    UploadTextResponse,
+)
+from app.services import chunking as chunking_service
+from app.services import dedupe as dedupe_service
+from app.services.normalize import make_doc_id, normalize_text, is_valid_document
+from app.services.pinecone_store import describe_index_stats, upsert_records
+logger = get_logger(__name__)
+router = APIRouter(prefix="/documents", tags=["documents"])
+@router.post(
+    "/upload-text",
+    response_model=UploadTextResponse,
+    summary="Upload raw text or Docling output",
+    description=(
+        "Accepts manual text uploads or Docling-converted content, normalizes and "
+        "chunks the text, and upserts it into Pinecone."
+    ),
+)
+async def upload_text(payload: UploadTextRequest) -> UploadTextResponse:
+    settings = get_settings()
+    namespace = payload.namespace or settings.PINECONE_NAMESPACE
+    normalized = normalize_text(payload.text)
+    if not is_valid_document(normalized):
+        logger.info(
+            "Skipping manual upload for title='%s' due to insufficient length (len=%d)",
+            payload.title,
+            len(normalized),
+        )
+        return UploadTextResponse(
+            namespace=namespace,
+            source=payload.source,
+            ingested_documents=0,
+            ingested_chunks=0,
+        )
+    metadata: Dict[str, Any] = payload.metadata.copy() if payload.metadata else {}
+    url = metadata.get("url", "")
+    published = metadata.get("published", "")
+    doc_id = make_doc_id(source=payload.source, title=payload.title, url=url)
+    metadata.update(
+        {
+            "title": payload.title,
+            "source": payload.source,
+            "url": url,
+            "published": published,
+            "doc_id": doc_id,
+        }
+    )
+    document = Document(page_content=normalized, metadata=metadata)
+    records = chunking_service.documents_to_records([document])
+    records = dedupe_service.dedupe_records(records)
+    total_upserted = await run_in_threadpool(upsert_records, namespace, records)
+    return UploadTextResponse(
+        namespace=namespace,
+        source=payload.source,
+        ingested_documents=1,
+        ingested_chunks=total_upserted,
+    )
+@router.get(
+    "/stats",
+    response_model=DocumentsStatsResponse,
+    summary="Get document statistics",
+    description="Returns vector counts per namespace from the configured Pinecone index.",
+)
+async def documents_stats(
+    namespace: str | None = Query(
+        default=None,
+        description="Optional namespace filter; if omitted, stats for all namespaces are returned",
+    ),
+) -> DocumentsStatsResponse:
+    raw_stats = await run_in_threadpool(describe_index_stats, namespace)
+    stats: Dict[str, NamespaceStat] = {
+        ns_name: NamespaceStat(vector_count=ns_info.get("vector_count", 0))
+        for ns_name, ns_info in raw_stats.items()
+    }
+    return DocumentsStatsResponse(namespaces=stats)

backend/app/routers/health.py ADDED Viewed

	@@ -0,0 +1,19 @@

+from fastapi import APIRouter
+from app.core.config import get_settings
+router = APIRouter(tags=["health"])
+@router.get(
+    "/health",
+    summary="Health check",
+    description="Returns service status, name, and version.",
+)
+async def health() -> dict:
+    settings = get_settings()
+    return {
+        "status": "ok",
+        "service": settings.APP_NAME,
+        "version": settings.APP_VERSION,
+    }

backend/app/routers/ingest.py ADDED Viewed

	@@ -0,0 +1,194 @@

+from typing import Any, Dict, List
+from collections import Counter
+import httpx
+from fastapi import APIRouter, HTTPException, Request
+from fastapi.concurrency import run_in_threadpool
+from langchain_core.documents import Document
+from app.core.config import get_settings
+from app.core.logging import get_logger
+from app.core.rate_limit import limiter
+from app.schemas.ingest import (
+    ArxivIngestRequest,
+    IngestResponse,
+    OpenAlexIngestRequest,
+    WikiIngestRequest,
+)
+from app.services import dedupe as dedupe_service
+from app.services import chunking as chunking_service
+from app.services.ingestors.arxiv import fetch_arxiv_documents
+from app.services.ingestors.openalex import fetch_openalex_documents
+from app.services.ingestors.wiki import fetch_wiki_documents
+from app.services.pinecone_store import upsert_records
+logger = get_logger(__name__)
+router = APIRouter(prefix="/ingest", tags=["ingest"])
+async def _process_and_upsert(
+    documents: List[Document],
+    namespace: str,
+    source: str,
+    details: dict | None = None,
+) -> IngestResponse:
+    """Shared helper to chunk, dedupe and upsert documents."""
+    if not documents:
+        return IngestResponse(
+            namespace=namespace,
+            source=source,
+            ingested_documents=0,
+            ingested_chunks=0,
+            skipped_documents=0,
+            details=details or {"reason": "no_documents_after_filtering"},
+        )
+    records = chunking_service.documents_to_records(documents)
+    records = dedupe_service.dedupe_records(records)
+    total_upserted = await run_in_threadpool(upsert_records, namespace, records)
+    return IngestResponse(
+        namespace=namespace,
+        source=source,
+        ingested_documents=len(documents),
+        ingested_chunks=total_upserted,
+        skipped_documents=0,
+        details=details,
+    )
+@router.post(
+    "/arxiv",
+    response_model=IngestResponse,
+    summary="Ingest documents from arXiv",
+    description="Fetches recent arXiv entries for a query and upserts them into Pinecone.",
+)
+@limiter.limit("10/minute")
+async def ingest_arxiv(request: Request, payload: ArxivIngestRequest) -> IngestResponse:  # noqa: ARG001
+    settings = get_settings()
+    namespace = payload.namespace or settings.PINECONE_NAMESPACE
+    max_docs = min(payload.max_docs, 20)
+    logger.info(
+        "Starting arXiv ingestion query='%s' max_docs=%d namespace='%s'",
+        payload.query,
+        max_docs,
+        namespace,
+    )
+    try:
+        documents = await fetch_arxiv_documents(
+            query=payload.query,
+            max_results=max_docs,
+            category=payload.category,
+        )
+    except (httpx.HTTPStatusError, httpx.RequestError) as exc:
+        status = None
+        reason = ""
+        url = None
+        if isinstance(exc, httpx.HTTPStatusError):
+            if exc.response is not None:
+                status = exc.response.status_code
+                reason = exc.response.reason_phrase
+                url = str(exc.response.url)
+        if hasattr(exc, "request") and getattr(exc, "request") is not None and url is None:
+            try:
+                url = str(exc.request.url)
+            except Exception:  # noqa: BLE001
+                url = None
+        logger.error(
+            "Upstream arXiv error (url=%s): %s",
+            url or "unknown",
+            exc,
+        )
+        status_display = status if status is not None else "unknown"
+        detail = f"Upstream arXiv error: {status_display} {reason}".strip()
+        raise HTTPException(
+            status_code=502,
+            detail=detail,
+        ) from exc
+    return await _process_and_upsert(documents, namespace=namespace, source="arxiv")
+@router.post(
+    "/openalex",
+    response_model=IngestResponse,
+    summary="Ingest documents from OpenAlex",
+    description="Fetches works from OpenAlex for a query and upserts them into Pinecone.",
+)
+@limiter.limit("10/minute")
+async def ingest_openalex(request: Request, payload: OpenAlexIngestRequest) -> IngestResponse:  # noqa: ARG001
+    settings = get_settings()
+    namespace = payload.namespace or settings.PINECONE_NAMESPACE
+    max_docs = min(payload.max_docs, 20)
+    logger.info(
+        "Starting OpenAlex ingestion query='%s' max_docs=%d namespace='%s'",
+        payload.query,
+        max_docs,
+        namespace,
+    )
+    try:
+        documents = await fetch_openalex_documents(
+            query=payload.query,
+            max_results=max_docs,
+            mailto=payload.mailto,
+        )
+    except (httpx.HTTPStatusError, httpx.RequestError) as exc:
+        logger.error("Upstream OpenAlex error: %s", exc)
+        raise HTTPException(
+            status_code=502,
+            detail="Upstream OpenAlex error: unable to retrieve content. "
+            "Try again later.",
+        ) from exc
+    return await _process_and_upsert(documents, namespace=namespace, source="openalex")
+@router.post(
+    "/wiki",
+    response_model=IngestResponse,
+    summary="Ingest documents from Wikipedia",
+    description=(
+        "Fetches articles from Wikipedia using the REST API with Action API fallback "
+        "and upserts them into Pinecone."
+    ),
+)
+@limiter.limit("10/minute")
+async def ingest_wiki(request: Request, payload: WikiIngestRequest) -> IngestResponse:  # noqa: ARG001
+    settings = get_settings()
+    namespace = payload.namespace or settings.PINECONE_NAMESPACE
+    titles = payload.titles[:20]
+    logger.info(
+        "Starting Wikipedia ingestion titles=%d namespace='%s'",
+        len(titles),
+        namespace,
+    )
+    try:
+        documents = await fetch_wiki_documents(titles=titles)
+    except (httpx.HTTPStatusError, httpx.RequestError) as exc:
+        logger.error("Upstream Wikimedia error: %s", exc)
+        raise HTTPException(
+            status_code=502,
+            detail=(
+                "Upstream Wikimedia error: unable to retrieve content. "
+                "Try again later or use Action API fallback."
+            ),
+        ) from exc
+    # Summarise which backend was used (REST vs Action API) for debugging.
+    backend_counts: Dict[str, int] = Counter(
+        doc.metadata.get("wikimedia_backend", "unknown") for doc in documents
+    )
+    details: Dict[str, Any] = {"wikimedia_backend_counts": dict(backend_counts)}
+    return await _process_and_upsert(
+        documents, namespace=namespace, source="wiki", details=details
+    )

backend/app/routers/metrics.py ADDED Viewed

	@@ -0,0 +1,18 @@

+from fastapi import APIRouter
+from app.core.metrics import get_metrics_snapshot
+router = APIRouter(tags=["metrics"])
+@router.get(
+    "/metrics",
+    summary="In-memory metrics snapshot",
+    description=(
+        "Returns request and error counts by path, timing statistics for chat "
+        "requests (average and p50/p95), cache hit/miss counters, and the last "
+        "20 timing samples."
+    ),
+)
+async def metrics() -> dict:
+    return get_metrics_snapshot()

backend/app/routers/search.py ADDED Viewed

	@@ -0,0 +1,90 @@

+from typing import Any, Dict, List
+from fastapi import APIRouter, Request
+from fastapi.concurrency import run_in_threadpool
+from app.core.cache import get_search_cached, set_search_cached
+from app.core.config import get_settings
+from app.core.logging import get_logger
+from app.core.rate_limit import limiter
+from app.schemas.search import SearchHit, SearchRequest, SearchResponse
+from app.services.pinecone_store import search as pinecone_search
+logger = get_logger(__name__)
+router = APIRouter(tags=["search"])
+@router.post(
+    "/search",
+    response_model=SearchResponse,
+    summary="Semantic search over ingested documents",
+    description=(
+        "Performs integrated embedding search over documents stored in Pinecone and "
+        "returns the top matching chunks."
+    ),
+)
+@limiter.limit("60/minute")
+async def search(request: Request, payload: SearchRequest) -> SearchResponse:  # noqa: ARG001
+    settings = get_settings()
+    namespace = payload.namespace or settings.PINECONE_NAMESPACE
+    text_field = settings.PINECONE_TEXT_FIELD
+    logger.info(
+        "Received search request namespace='%s' top_k=%d",
+        namespace,
+        payload.top_k,
+    )
+    cached = get_search_cached(
+        namespace=namespace,
+        query=payload.query,
+        top_k=payload.top_k,
+        filters=payload.filters,
+    )
+    if cached is not None:
+        hits_raw = cached
+    else:
+        hits_raw: List[Dict[str, Any]] = await run_in_threadpool(
+            pinecone_search,
+            namespace,
+            payload.query,
+            payload.top_k,
+            payload.filters,
+            None,
+        )
+        set_search_cached(
+            namespace=namespace,
+            query=payload.query,
+            top_k=payload.top_k,
+            filters=payload.filters,
+            value=hits_raw,
+        )
+    hits: List[SearchHit] = []
+    for hit in hits_raw:
+        hit_id = hit.get("_id") or hit.get("id") or ""
+        score = float(hit.get("_score") or hit.get("score") or 0.0)
+        raw_fields: Dict[str, Any] = hit.get("fields") or {}
+        # Map the configured Pinecone text field back to a stable 'chunk_text' key
+        returned_text = raw_fields.get(text_field, "")
+        fields: Dict[str, Any] = dict(raw_fields)
+        if text_field in fields and text_field != "chunk_text":
+            fields.pop(text_field, None)
+        fields["chunk_text"] = returned_text
+        hits.append(
+            SearchHit(
+                id=hit_id,
+                score=score,
+                fields=fields,
+            )
+        )
+    return SearchResponse(
+        namespace=namespace,
+        query=payload.query,
+        top_k=payload.top_k,
+        hits=hits,
+    )

backend/app/schemas/__pycache__/chat.cpython-313.pyc ADDED Viewed

Binary file (4.89 kB). View file

backend/app/schemas/__pycache__/documents.cpython-313.pyc ADDED Viewed

Binary file (1.97 kB). View file

backend/app/schemas/__pycache__/ingest.cpython-313.pyc ADDED Viewed

Binary file (2.57 kB). View file

backend/app/schemas/__pycache__/search.cpython-313.pyc ADDED Viewed

Binary file (1.7 kB). View file

backend/app/schemas/chat.py ADDED Viewed

	@@ -0,0 +1,128 @@

+from typing import List, Literal, Optional
+from pydantic import BaseModel, Field
+class ChatMessage(BaseModel):
+    role: Literal["user", "assistant"] = Field(
+        ...,
+        description="Role of the message author (user or assistant).",
+    )
+    content: str = Field(..., description="Message text content.")
+class ChatRequest(BaseModel):
+    query: str = Field(..., description="User query to be answered.")
+    namespace: Optional[str] = Field(
+        default=None,
+        description=(
+            "Target Pinecone namespace. Defaults to the configured "
+            "PINECONE_NAMESPACE when omitted."
+        ),
+    )
+    top_k: int = Field(
+        default=5,
+        ge=1,
+        le=100,
+        description="Maximum number of retrieved document chunks.",
+    )
+    use_web_fallback: bool = Field(
+        default=True,
+        description=(
+            "Whether to fall back to web search when retrieval is weak. "
+            "Requires a configured Tavily API key."
+        ),
+    )
+    min_score: float = Field(
+        default=0.25,
+        ge=0.0,
+        le=1.0,
+        description=(
+            "If the top retrieval score is below this threshold and "
+            "use_web_fallback is true, a web search will be attempted."
+        ),
+    )
+    max_web_results: int = Field(
+        default=5,
+        ge=1,
+        le=20,
+        description="Maximum number of web search results to fetch when enabled.",
+    )
+    chat_history: Optional[List[ChatMessage]] = Field(
+        default=None,
+        description=(
+            "Optional prior conversation history. "
+            "Messages with role='user' or 'assistant' are supported."
+        ),
+    )
+class SourceHit(BaseModel):
+    source: str = Field(
+        ...,
+        description="Origin of the snippet (e.g. wiki, openalex, arxiv, web).",
+    )
+    title: str = Field(
+        ...,
+        description="Title of the underlying document or web page.",
+    )
+    url: str = Field(
+        "",
+        description="URL associated with the source, when available.",
+    )
+    score: float = Field(
+        0.0,
+        description=(
+            "Relevance score from the vector store or a synthetic score for web search."
+        ),
+    )
+    chunk_text: str = Field(
+        ...,
+        description="Text content of the retrieved chunk or web snippet.",
+    )
+class ChatTimings(BaseModel):
+    retrieve_ms: float = Field(
+        0.0,
+        description="Time spent retrieving from Pinecone, in milliseconds.",
+    )
+    web_ms: float = Field(
+        0.0,
+        description="Time spent calling web search tools, in milliseconds.",
+    )
+    generate_ms: float = Field(
+        0.0,
+        description="Time spent generating the answer with the LLM, in milliseconds.",
+    )
+    total_ms: float = Field(
+        0.0,
+        description="End-to-end time from request receipt to response, in milliseconds.",
+    )
+class ChatTraceMetadata(BaseModel):
+    langsmith_project: Optional[str] = Field(
+        default=None,
+        description="LangSmith project name associated with this trace, if any.",
+    )
+    trace_enabled: bool = Field(
+        default=False,
+        description="Whether LangSmith / LangChain tracing was enabled for this call.",
+    )
+class ChatResponse(BaseModel):
+    answer: str = Field(..., description="Generated answer text.")
+    sources: List[SourceHit] = Field(
+        default_factory=list,
+        description="List of document or web snippets used as context.",
+    )
+    timings: ChatTimings = Field(
+        default_factory=ChatTimings,
+        description="Timing information for key phases of the pipeline.",
+    )
+    trace: ChatTraceMetadata = Field(
+        ...,
+        description="Tracing configuration metadata for observability.",
+    )

backend/app/schemas/documents.py ADDED Viewed

	@@ -0,0 +1,34 @@

+from typing import Any, Dict, Optional
+from pydantic import BaseModel, Field
+class UploadTextRequest(BaseModel):
+    title: str = Field(..., description="Document title")
+    source: str = Field(
+        default="manual",
+        description="Source label for the document (e.g. manual, docling)",
+    )
+    text: str = Field(..., description="Full text content of the document")
+    namespace: Optional[str] = Field(
+        default=None, description="Target Pinecone namespace (defaults to env)"
+    )
+    metadata: Optional[Dict[str, Any]] = Field(
+        default=None,
+        description="Additional metadata fields to store alongside the document",
+    )
+class UploadTextResponse(BaseModel):
+    namespace: str
+    source: str
+    ingested_documents: int
+    ingested_chunks: int
+class NamespaceStat(BaseModel):
+    vector_count: int
+class DocumentsStatsResponse(BaseModel):
+    namespaces: Dict[str, NamespaceStat]

backend/app/schemas/ingest.py ADDED Viewed

	@@ -0,0 +1,56 @@

+from typing import Any, Dict, List, Optional
+from pydantic import BaseModel, Field
+class ArxivIngestRequest(BaseModel):
+    query: str = Field(..., description="Search query for arXiv")
+    max_docs: int = Field(
+        default=10,
+        ge=1,
+        le=20,
+        description="Maximum number of documents to fetch (capped at 20)",
+    )
+    namespace: Optional[str] = Field(
+        default=None, description="Target Pinecone namespace (defaults to env)"
+    )
+    category: Optional[str] = Field(
+        default=None,
+        description="Optional category label for ingested papers",
+    )
+class OpenAlexIngestRequest(BaseModel):
+    query: str = Field(..., description="Search query for OpenAlex works")
+    max_docs: int = Field(
+        default=10,
+        ge=1,
+        le=20,
+        description="Maximum number of documents to fetch (capped at 20)",
+    )
+    namespace: Optional[str] = Field(
+        default=None, description="Target Pinecone namespace (defaults to env)"
+    )
+    mailto: str = Field(
+        ...,
+        description="Contact email passed to OpenAlex via the mailto query parameter",
+    )
+class WikiIngestRequest(BaseModel):
+    titles: List[str] = Field(
+        ...,
+        description="List of Wikipedia page titles (first 20 will be used)",
+    )
+    namespace: Optional[str] = Field(
+        default=None, description="Target Pinecone namespace (defaults to env)"
+    )
+class IngestResponse(BaseModel):
+    namespace: str
+    source: str
+    ingested_documents: int
+    ingested_chunks: int
+    skipped_documents: int
+    details: Optional[Dict[str, Any]] = None

backend/app/schemas/search.py ADDED Viewed

	@@ -0,0 +1,33 @@

+from typing import Any, Dict, List, Optional
+from pydantic import BaseModel, Field
+class SearchRequest(BaseModel):
+    query: str = Field(..., description="User query text")
+    top_k: int = Field(
+        default=5,
+        ge=1,
+        le=100,
+        description="Number of results to return",
+    )
+    namespace: Optional[str] = Field(
+        default=None, description="Target Pinecone namespace (defaults to env)"
+    )
+    filters: Optional[Dict[str, Any]] = Field(
+        default=None,
+        description="Optional metadata filters passed directly to Pinecone search",
+    )
+class SearchHit(BaseModel):
+    id: str
+    score: float
+    fields: Dict[str, Any]
+class SearchResponse(BaseModel):
+    namespace: str
+    query: str
+    top_k: int
+    hits: List[SearchHit]

backend/app/services/__pycache__/chunking.cpython-313.pyc ADDED Viewed

Binary file (3.1 kB). View file

backend/app/services/__pycache__/dedupe.cpython-313.pyc ADDED Viewed

Binary file (1.16 kB). View file

backend/app/services/__pycache__/normalize.cpython-313.pyc ADDED Viewed

Binary file (1.48 kB). View file

backend/app/services/__pycache__/pinecone_store.cpython-313.pyc ADDED Viewed

Binary file (7.45 kB). View file

backend/app/services/chat/__pycache__/graph.cpython-313.pyc ADDED Viewed

Binary file (12.7 kB). View file