Spaces:
Paused
Paused
| # Backend API Testing Guide | |
| End-to-end validation of the Agentic-RAG backend using [HTTPie](https://httpie.io/cli) (`http` command). | |
| For ad-hoc **Azure OpenAI** and **TruLens** troubleshooting scripts run inside the Docker backend, see [test-scripts-troubleshooting.md](./test-scripts-troubleshooting.md). | |
| --- | |
| ## Prerequisites | |
| ### Install HTTPie | |
| ```bash | |
| # macOS | |
| brew install httpie | |
| # Linux / WSL | |
| pip install httpie | |
| # Windows (PowerShell) | |
| winget install httpie.httpie | |
| ``` | |
| Verify: `http --version` → should print `3.x.x`. | |
| ### Base URL | |
| All examples use the backend running at `http://localhost:8000`. | |
| Set a shell variable for convenience: | |
| ```bash | |
| BASE=http://localhost:8000/api/v1 | |
| ``` | |
| ### Test accounts (pre-seeded) | |
| | Email | Password | Role | | |
| | ----------------------------- | --------------- | ---------- | | |
| | `researcher@example.com` | `researcher123` | researcher | | |
| | `test_researcher@example.com` | `wrong123` | researcher | | |
| > **Researcher** role has access to 55 documents / 5 000 chunks in the vector store. | |
| --- | |
| ## 1. Health Check | |
| ```bash | |
| http GET $BASE/health | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 200 OK | |
| Content-Type: application/json | |
| { | |
| "status": "ok" | |
| } | |
| ``` | |
| --- | |
| ## 2. Authentication | |
| ### 2.1 Register — success (201) | |
| ```bash | |
| http POST $BASE/auth/register \ | |
| email="newuser@example.com" \ | |
| password="secret123" \ | |
| display_name="New User" \ | |
| role="researcher" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 201 Created | |
| Content-Type: application/json | |
| { | |
| "id": "<uuid>", | |
| "email": "newuser@example.com", | |
| "display_name": "New User", | |
| "is_active": true, | |
| "created_at": "2026-03-20T15:00:00Z" | |
| } | |
| ``` | |
| ### 2.2 Register — duplicate email (400) | |
| ```bash | |
| http POST $BASE/auth/register \ | |
| email="researcher@example.com" \ | |
| password="anything" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 400 Bad Request | |
| { | |
| "detail": "Email already registered" | |
| } | |
| ``` | |
| ### 2.3 Register — invalid email format (422) | |
| ```bash | |
| http POST $BASE/auth/register \ | |
| email="not-an-email" \ | |
| password="secret123" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 422 Unprocessable Entity | |
| { | |
| "detail": [ | |
| { | |
| "type": "value_error", | |
| "loc": ["body", "email"], | |
| "msg": "value is not a valid email address" | |
| } | |
| ] | |
| } | |
| ``` | |
| --- | |
| ### 2.4 Login — correct credentials (200) | |
| > Login uses `application/x-www-form-urlencoded` (OAuth2 password flow), so pass `-f`. | |
| ```bash | |
| http -f POST $BASE/auth/login \ | |
| username="researcher@example.com" \ | |
| password="researcher123" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 200 OK | |
| { | |
| "access_token": "eyJhbGci...", | |
| "token_type": "bearer" | |
| } | |
| ``` | |
| **Capture token for subsequent requests** | |
| ```bash | |
| TOKEN=$(http -f POST $BASE/auth/login \ | |
| username="researcher@example.com" \ | |
| password="researcher123" \ | |
| | python -c "import sys,json; print(json.load(sys.stdin)['access_token'])") | |
| echo $TOKEN | |
| ``` | |
| ### 2.5 Login — wrong password (401) | |
| ```bash | |
| http -f POST $BASE/auth/login \ | |
| username="researcher@example.com" \ | |
| password="wrongpassword" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 401 Unauthorized | |
| { | |
| "detail": "Incorrect email or password" | |
| } | |
| ``` | |
| ### 2.6 Login — unknown user (401) | |
| ```bash | |
| http -f POST $BASE/auth/login \ | |
| username="nobody@example.com" \ | |
| password="anything" | |
| ``` | |
| **Expected** — same 401 response (no user enumeration). | |
| --- | |
| ### 2.7 Get Current User — valid token (200) | |
| ```bash | |
| http GET $BASE/auth/me \ | |
| "Authorization: Bearer $TOKEN" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 200 OK | |
| { | |
| "id": "<uuid>", | |
| "email": "researcher@example.com", | |
| "display_name": "Test Researcher", | |
| "is_active": true, | |
| "created_at": "..." | |
| } | |
| ``` | |
| ### 2.8 Get Current User — no token (401) | |
| ```bash | |
| http GET $BASE/auth/me | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 401 Unauthorized | |
| { | |
| "detail": "Not authenticated" | |
| } | |
| ``` | |
| ### 2.9 Get Current User — malformed token (401) | |
| ```bash | |
| http GET $BASE/auth/me \ | |
| "Authorization: Bearer not.a.valid.jwt" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 401 Unauthorized | |
| { | |
| "detail": "Could not validate credentials" | |
| } | |
| ``` | |
| --- | |
| ## 3. Query | |
| All query endpoints require a valid JWT. Run the token capture from §2.4 first. | |
| ### 3.1 Submit Query — success (200) | |
| ```bash | |
| http POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| query="What are the key differences between BERT and GPT architectures?" | |
| ``` | |
| **Expected** (~60–120 s — the agent embeds, searches pgvector, reranks with CrossEncoder, then calls the LLM) | |
| ```http | |
| HTTP/1.1 200 OK | |
| { | |
| "id": "<uuid>", | |
| "query": "What are the key differences between BERT and GPT architectures?", | |
| "answer": "BERT is an encoder-only Transformer ... GPT is decoder-only ...", | |
| "citations": [ | |
| { | |
| "index": 1, | |
| "title": "Attention Is All You Need", | |
| "source_url": "https://arxiv.org/abs/1706.03762", | |
| "full_citation": "Vaswani et al. arXiv:1706.03762, 2017." | |
| } | |
| ], | |
| "chart_data": null, | |
| "model_provider": "openai", | |
| "agent_steps": 5, | |
| "created_at": "..." | |
| } | |
| ``` | |
| **Validation checklist** | |
| - [ ] `answer` is non-empty and references the query topic | |
| - [ ] `citations` list has ≥ 1 entry with valid `source_url` starting with `https://arxiv.org/` | |
| - [ ] `agent_steps` is 1–5 | |
| - [ ] `model_provider` matches `MODEL_PROVIDER` in `.env` | |
| --- | |
| ### 3.2 Submit Query — numerical data triggers chart_data | |
| Ask a question whose answer contains benchmark numbers so the agent populates `chart_data`: | |
| ```bash | |
| http POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| query="Compare the accuracy scores of LoRA vs full fine-tuning on common NLP benchmarks" | |
| ``` | |
| **Expected** — `chart_data` is a Plotly spec (not `null`): | |
| ```json | |
| { | |
| "chart_data": { | |
| "data": [ | |
| { | |
| "type": "bar", | |
| "x": ["MNLI", "SST-2", "MRPC"], | |
| "y": [91.7, 96.2, 90.1] | |
| } | |
| ], | |
| "layout": { | |
| "title": "LoRA vs Full Fine-Tuning Benchmark Scores" | |
| } | |
| } | |
| } | |
| ``` | |
| --- | |
| ### 3.3 Submit Query — empty query string (422) | |
| ```bash | |
| http POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| query=" " | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 422 Unprocessable Entity | |
| { | |
| "detail": "Query must not be empty" | |
| } | |
| ``` | |
| ### 3.4 Submit Query — missing `query` field (422) | |
| ```bash | |
| http POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| Content-Type:application/json \ | |
| <<< '{}' | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 422 Unprocessable Entity | |
| { | |
| "detail": [ | |
| { | |
| "type": "missing", | |
| "loc": ["body", "query"], | |
| "msg": "Field required" | |
| } | |
| ] | |
| } | |
| ``` | |
| ### 3.5 Submit Query — no authentication (401) | |
| ```bash | |
| http POST $BASE/query \ | |
| query="What is attention mechanism?" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 401 Unauthorized | |
| { | |
| "detail": "Not authenticated" | |
| } | |
| ``` | |
| --- | |
| ## 4. Query History | |
| ### 4.1 List history — returns entries in reverse-chronological order (200) | |
| ```bash | |
| http GET $BASE/query/history \ | |
| "Authorization: Bearer $TOKEN" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 200 OK | |
| [ | |
| { | |
| "id": "<uuid>", | |
| "query_text": "What are the key differences between BERT and GPT architectures?", | |
| "response_text": "BERT is an encoder-only ...", | |
| "model_provider": "openai", | |
| "agent_steps": 5, | |
| "created_at": "..." | |
| } | |
| ] | |
| ``` | |
| ### 4.2 Pagination — limit and offset | |
| ```bash | |
| # First page: 2 items | |
| http GET "$BASE/query/history?limit=2&offset=0" \ | |
| "Authorization: Bearer $TOKEN" | |
| # Second page: next 2 | |
| http GET "$BASE/query/history?limit=2&offset=2" \ | |
| "Authorization: Bearer $TOKEN" | |
| ``` | |
| **Validation checklist** | |
| - [ ] `limit` controls list length (≤ N items returned) | |
| - [ ] `offset` skips earlier entries | |
| - [ ] Items are ordered newest-first | |
| ### 4.3 New user sees empty history | |
| ```bash | |
| # Register a fresh user | |
| http POST $BASE/auth/register \ | |
| email="freshuser@example.com" \ | |
| password="pass1234" | |
| # Login | |
| FRESH_TOKEN=$(http -f POST $BASE/auth/login \ | |
| username="freshuser@example.com" \ | |
| password="pass1234" \ | |
| | python -c "import sys,json; print(json.load(sys.stdin)['access_token'])") | |
| # History should be empty | |
| http GET $BASE/query/history \ | |
| "Authorization: Bearer $FRESH_TOKEN" | |
| ``` | |
| **Expected** — `[]` (empty array, HTTP 200). | |
| --- | |
| ## 5. On-Demand Visualization | |
| The visualization endpoint is driven by `include_visualization: true` on the POST body. The frontend polls every 2 s for up to **180 s** (3 minutes); these tests verify the full flow manually. | |
| ### 5.1 Submit query with visualization enabled | |
| ```bash | |
| http POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| query="Compare the accuracy scores of LoRA vs full fine-tuning on common NLP benchmarks" \ | |
| include_visualization:=true | |
| ``` | |
| **Expected** — same response shape as §3.1; returns immediately (text response is NOT delayed by viz): | |
| ```http | |
| HTTP/1.1 200 OK | |
| { | |
| "id": "<uuid>", | |
| "query": "...", | |
| "answer": "...", | |
| "citations": [...], | |
| "chart_data": null, | |
| "model_provider": "openai", | |
| "agent_steps": 5, | |
| "created_at": "..." | |
| } | |
| ``` | |
| **Capture the query ID:** | |
| ```bash | |
| QUERY_ID=$(http POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| query="Compare LoRA vs full fine-tuning accuracy" \ | |
| include_visualization:=true \ | |
| | python -c "import sys,json; print(json.load(sys.stdin)['id'])") | |
| echo $QUERY_ID | |
| ``` | |
| --- | |
| ### 5.2 Poll visualization — pending | |
| Immediately after submitting (before the viz agent finishes): | |
| ```bash | |
| http GET $BASE/query/$QUERY_ID/visualization \ | |
| "Authorization: Bearer $TOKEN" | |
| ``` | |
| **Expected:** | |
| ```http | |
| HTTP/1.1 200 OK | |
| { | |
| "status": "pending", | |
| "chart_data": null, | |
| "error": null | |
| } | |
| ``` | |
| --- | |
| ### 5.3 Poll visualization — complete | |
| After waiting ~10–60 seconds for the VizCodeAgent to finish (complex charts like sunbursts take longer): | |
| ```bash | |
| http GET $BASE/query/$QUERY_ID/visualization \ | |
| "Authorization: Bearer $TOKEN" | |
| ``` | |
| **Expected:** | |
| ```http | |
| HTTP/1.1 200 OK | |
| { | |
| "status": "complete", | |
| "chart_data": { | |
| "data": [ | |
| { | |
| "type": "bar", | |
| "x": ["MNLI", "SST-2", "MRPC"], | |
| "y": [91.7, 96.2, 90.1], | |
| "name": "LoRA" | |
| }, | |
| { | |
| "type": "bar", | |
| "x": ["MNLI", "SST-2", "MRPC"], | |
| "y": [92.1, 96.8, 90.9], | |
| "name": "Full Fine-Tuning" | |
| } | |
| ], | |
| "layout": { | |
| "title": "LoRA vs Full Fine-Tuning Accuracy", | |
| "barmode": "group" | |
| } | |
| }, | |
| "error": null | |
| } | |
| ``` | |
| **Validation checklist:** | |
| - [ ] `status` is `"complete"` | |
| - [ ] `chart_data` is a valid Plotly spec with `data` (array) and `layout` (object) keys | |
| - [ ] `data[*].type` is a recognised Plotly chart type (`bar`, `scatter`, `line`, etc.) | |
| --- | |
| ### 5.4 Poll visualization — query submitted without viz flag returns 404 | |
| ```bash | |
| # Submit without include_visualization | |
| PLAIN_ID=$(http POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| query="What is BERT?" \ | |
| | python -c "import sys,json; print(json.load(sys.stdin)['id'])") | |
| http GET $BASE/query/$PLAIN_ID/visualization \ | |
| "Authorization: Bearer $TOKEN" | |
| ``` | |
| **Expected:** | |
| ```http | |
| HTTP/1.1 404 Not Found | |
| { | |
| "detail": "Visualization not found — not requested, not yet started, or expired" | |
| } | |
| ``` | |
| --- | |
| ### 5.5 Poll visualization — no authentication (401) | |
| ```bash | |
| http GET $BASE/query/$QUERY_ID/visualization | |
| ``` | |
| **Expected:** | |
| ```http | |
| HTTP/1.1 401 Unauthorized | |
| { | |
| "detail": "Not authenticated" | |
| } | |
| ``` | |
| --- | |
| ### 5.6 Submit query with viz flag — factual query with no data (NO_CHART) | |
| Some queries produce an answer without numerical data. The viz agent should return `NO_CHART`: | |
| ```bash | |
| http POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| query="What is the intuition behind the attention mechanism?" \ | |
| include_visualization:=true | |
| ``` | |
| After the viz agent completes (~5–15 s), poll: | |
| ```bash | |
| http GET $BASE/query/$QUERY_ID/visualization \ | |
| "Authorization: Bearer $TOKEN" | |
| ``` | |
| **Expected** — status complete but no chart: | |
| ```json | |
| { | |
| "status": "complete", | |
| "chart_data": null, | |
| "error": null | |
| } | |
| ``` | |
| --- | |
| ## 6. Settings API | |
| The settings endpoints let authenticated users read and update application configuration at runtime. Changes are written to the `.env` file inside the backend container and take effect immediately via `get_settings.cache_clear()` — no container restart required. | |
| > Sensitive keys (API keys) are **partially masked** in all GET responses (`sk-ab****`). Any value sent to PUT that ends with `****` is treated as a no-op sentinel — the existing key is preserved. | |
| ### 6.1 Get current settings (200) | |
| ```bash | |
| http GET $BASE/settings \ | |
| "Authorization: Bearer $TOKEN" | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 200 OK | |
| { | |
| "model_provider": "openai", | |
| "openai_api_key": "sk-p****", | |
| "openai_model": "gpt-4o", | |
| "azure_openai_api_key": "", | |
| "azure_openai_endpoint": "", | |
| "azure_openai_deployment": "gpt-4.1-mini-2025-04-14", | |
| "azure_openai_api_version": "2025-04-14", | |
| "google_api_key": "", | |
| "gemini_model": "gemini/gemini-flash-lite-latest", | |
| "trulens_provider": "openai", | |
| "trulens_strategy": "async", | |
| "trulens_sample_rate": 1, | |
| "trulens_feedback_timeout": 180.0, | |
| "viz_model_provider": "openai", | |
| "viz_model_name": "gpt-4o-mini", | |
| "viz_azure_deployment": "", | |
| "viz_azure_api_version": "" | |
| } | |
| ``` | |
| **Validation checklist** | |
| - [ ] API keys that are set show as `"<prefix>****"` (never the full value) | |
| - [ ] API keys that are not set show as `""` (empty string) | |
| - [ ] `model_provider` matches `MODEL_PROVIDER` in `.env` | |
| --- | |
| ### 6.2 Update provider and model (200) | |
| ```bash | |
| http PUT $BASE/settings \ | |
| "Authorization: Bearer $TOKEN" \ | |
| model_provider="gemini" \ | |
| google_api_key="AIzaSy..." \ | |
| gemini_model="gemini/gemini-2.0-flash" | |
| ``` | |
| **Expected** — returns the updated settings with the key masked: | |
| ```http | |
| HTTP/1.1 200 OK | |
| { | |
| "model_provider": "gemini", | |
| "google_api_key": "AIza****", | |
| ... | |
| } | |
| ``` | |
| **Validation checklist** | |
| - [ ] `model_provider` reflects the new value | |
| - [ ] `google_api_key` is now masked (not empty) | |
| - [ ] Immediately submit a query to confirm the new provider is active (no restart needed) | |
| --- | |
| ### 6.3 Masked-key sentinel — existing key is preserved | |
| Send back the masked placeholder unchanged; the backend should not overwrite the key: | |
| ```bash | |
| # 1. Capture the current masked value | |
| MASKED=$(http GET $BASE/settings "Authorization: Bearer $TOKEN" \ | |
| | python -c "import sys,json; print(json.load(sys.stdin)['openai_api_key'])") | |
| # 2. PUT with the masked value — key should be unchanged | |
| http PUT $BASE/settings \ | |
| "Authorization: Bearer $TOKEN" \ | |
| openai_api_key="$MASKED" \ | |
| openai_model="gpt-4o-mini" | |
| ``` | |
| **Expected** — 200 OK, `openai_api_key` still masked (not cleared), `openai_model` updated. | |
| --- | |
| ### 6.4 Get settings — no authentication (401) | |
| ```bash | |
| http GET $BASE/settings | |
| ``` | |
| **Expected** | |
| ```http | |
| HTTP/1.1 401 Unauthorized | |
| { | |
| "detail": "Not authenticated" | |
| } | |
| ``` | |
| --- | |
| ## 7. RBAC Verification | |
| The vector search is filtered at the SQL level by the user's roles. A `guest` user with no role-document mappings should get "No relevant documents found" from the retriever. | |
| ### 6.1 Register a guest user (default role) | |
| ```bash | |
| http POST $BASE/auth/register \ | |
| email="guest@example.com" \ | |
| password="guest123" | |
| # role omitted → defaults to "guest" | |
| ``` | |
| ### 6.2 Login as guest | |
| ```bash | |
| GUEST_TOKEN=$(http -f POST $BASE/auth/login \ | |
| username="guest@example.com" \ | |
| password="guest123" \ | |
| | python -c "import sys,json; print(json.load(sys.stdin)['access_token'])") | |
| ``` | |
| ### 6.3 Query as guest — expects no documents | |
| ```bash | |
| http POST $BASE/query \ | |
| "Authorization: Bearer $GUEST_TOKEN" \ | |
| query="What is BERT?" | |
| ``` | |
| **Expected** — agent returns an answer noting no documents were found (HTTP 200, but answer states retriever returned empty context). The `citations` list will be empty (`[]`). | |
| --- | |
| ## 8. Error Summary Table | |
| | # | Method | Endpoint | Scenario | Expected HTTP | | |
| | --- | ------ | --------------------------------- | -------------------------------- | ------------- | | |
| | 1 | GET | `/health` | normal | 200 | | |
| | 2 | POST | `/auth/register` | new user | 201 | | |
| | 3 | POST | `/auth/register` | duplicate email | 400 | | |
| | 4 | POST | `/auth/register` | invalid email | 422 | | |
| | 5 | POST | `/auth/login` | correct credentials | 200 | | |
| | 6 | POST | `/auth/login` | wrong password | 401 | | |
| | 7 | GET | `/auth/me` | valid token | 200 | | |
| | 8 | GET | `/auth/me` | no token | 401 | | |
| | 9 | GET | `/auth/me` | malformed token | 401 | | |
| | 10 | POST | `/query` | valid query | 200 | | |
| | 11 | POST | `/query` | empty string | 422 | | |
| | 12 | POST | `/query` | missing field | 422 | | |
| | 13 | POST | `/query` | no auth | 401 | | |
| | 14 | GET | `/query/history` | with auth | 200 | | |
| | 15 | GET | `/query/history` | no auth | 401 | | |
| | 16 | POST | `/query` | `include_visualization: true` | 200 (immediate text response) | | |
| | 17 | GET | `/query/{id}/visualization` | pending (viz in progress) | 200 `{status:"pending"}` | | |
| | 18 | GET | `/query/{id}/visualization` | complete | 200 `{status:"complete", chart_data:{...}}` | | |
| | 19 | GET | `/query/{id}/visualization` | viz not requested (no flag) | 404 | | |
| | 20 | GET | `/query/{id}/visualization` | no auth | 401 | | |
| | 21 | GET | `/settings` | authenticated | 200 | | |
| | 22 | GET | `/settings` | no auth | 401 | | |
| | 23 | PUT | `/settings` | valid payload | 200 | | |
| | 24 | PUT | `/settings` | masked-key sentinel | 200 (key unchanged) | | |
| | 25 | PUT | `/settings` | no auth | 401 | | |
| --- | |
| ## 9. Verbose Mode & Inspecting Headers | |
| Add `-v` to see full request/response headers, useful for debugging CORS or auth issues: | |
| ```bash | |
| http -v GET $BASE/auth/me \ | |
| "Authorization: Bearer $TOKEN" | |
| ``` | |
| Check only response headers (no body): | |
| ```bash | |
| http -h POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| query="test" | |
| ``` | |
| --- | |
| ## 10. Interactive API Docs | |
| FastAPI serves two interactive UIs automatically: | |
| | UI | URL | | |
| | ---------------- | ---------------------------------- | | |
| | Swagger UI | http://localhost:8000/docs | | |
| | ReDoc | http://localhost:8000/redoc | | |
| | Raw OpenAPI JSON | http://localhost:8000/openapi.json | | |
| The saved snapshot is at `docs/openapi.json`. | |
| --- | |
| ## 11. TruLens Evaluation Verification | |
| After running a query, TruLens scores are persisted asynchronously to the `evaluation_results` table. Check them directly: | |
| ```bash | |
| docker exec agentic-rag-db-1 psql -U postgres -d agentic_rag -c " | |
| SELECT | |
| ql.query_text, | |
| er.relevance_score, | |
| er.groundedness_score, | |
| er.answer_relevance_score, | |
| er.created_at | |
| FROM evaluation_results er | |
| JOIN query_logs ql ON ql.id = er.query_log_id | |
| ORDER BY er.created_at DESC | |
| LIMIT 5;" | |
| ``` | |
| **Target scores** (from SLA): | |
| | Metric | Target | | |
| | ----------------- | ------ | | |
| | Context Relevance | > 0.85 | | |
| | Groundedness | > 0.90 | | |
| | Answer Relevance | > 0.85 | | |
| > Scores are written in a **background thread** after the HTTP response. The worker waits up to **`TRULENS_FEEDBACK_TIMEOUT`** seconds (default **180**) for TruLens to finish all three judge calls via `retrieve_feedback_results` before persisting to `evaluation_results`. Slow LLM proxies may need a higher timeout in `.env`. | |
| --- | |
| ## 12. Performance Targets | |
| The pipeline SLA from the spec: | |
| | Stage | Target | | |
| | ----------------------------------- | -------- | | |
| | Full pipeline (end-to-end) | < 6 s | | |
| | LLM generation | < 3 s | | |
| | CrossEncoder reranking (20 pairs) | < 500 ms | | |
| | pgvector HNSW search (100k vectors) | < 50 ms | | |
| > **Note:** The CrossEncoder model (`cross-encoder/ms-marco-MiniLM-L-6-v2`, ~85 MB) is loaded once at first request and cached as a process singleton. First-query latency on cold start (~30 s on CPU) is expected; subsequent queries meet the SLA. | |
| Measure end-to-end latency with HTTPie's `--print=h` and the response `Date` header, or use `time`: | |
| ```bash | |
| time http POST $BASE/query \ | |
| "Authorization: Bearer $TOKEN" \ | |
| query="What is retrieval-augmented generation?" | |
| ``` | |