Spaces:
Paused
Backend API Testing Guide
End-to-end validation of the Agentic-RAG backend using HTTPie (http command).
For ad-hoc Azure OpenAI and TruLens troubleshooting scripts run inside the Docker backend, see test-scripts-troubleshooting.md.
Prerequisites
Install HTTPie
# macOS
brew install httpie
# Linux / WSL
pip install httpie
# Windows (PowerShell)
winget install httpie.httpie
Verify: http --version β should print 3.x.x.
Base URL
All examples use the backend running at http://localhost:8000.
Set a shell variable for convenience:
BASE=http://localhost:8000/api/v1
Test accounts (pre-seeded)
| Password | Role | |
|---|---|---|
researcher@example.com |
researcher123 |
researcher |
test_researcher@example.com |
wrong123 |
researcher |
Researcher role has access to 55 documents / 5 000 chunks in the vector store.
1. Health Check
http GET $BASE/health
Expected
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "ok"
}
2. Authentication
2.1 Register β success (201)
http POST $BASE/auth/register \
email="newuser@example.com" \
password="secret123" \
display_name="New User" \
role="researcher"
Expected
HTTP/1.1 201 Created
Content-Type: application/json
{
"id": "<uuid>",
"email": "newuser@example.com",
"display_name": "New User",
"is_active": true,
"created_at": "2026-03-20T15:00:00Z"
}
2.2 Register β duplicate email (400)
http POST $BASE/auth/register \
email="researcher@example.com" \
password="anything"
Expected
HTTP/1.1 400 Bad Request
{
"detail": "Email already registered"
}
2.3 Register β invalid email format (422)
http POST $BASE/auth/register \
email="not-an-email" \
password="secret123"
Expected
HTTP/1.1 422 Unprocessable Entity
{
"detail": [
{
"type": "value_error",
"loc": ["body", "email"],
"msg": "value is not a valid email address"
}
]
}
2.4 Login β correct credentials (200)
Login uses
application/x-www-form-urlencoded(OAuth2 password flow), so pass-f.
http -f POST $BASE/auth/login \
username="researcher@example.com" \
password="researcher123"
Expected
HTTP/1.1 200 OK
{
"access_token": "eyJhbGci...",
"token_type": "bearer"
}
Capture token for subsequent requests
TOKEN=$(http -f POST $BASE/auth/login \
username="researcher@example.com" \
password="researcher123" \
| python -c "import sys,json; print(json.load(sys.stdin)['access_token'])")
echo $TOKEN
2.5 Login β wrong password (401)
http -f POST $BASE/auth/login \
username="researcher@example.com" \
password="wrongpassword"
Expected
HTTP/1.1 401 Unauthorized
{
"detail": "Incorrect email or password"
}
2.6 Login β unknown user (401)
http -f POST $BASE/auth/login \
username="nobody@example.com" \
password="anything"
Expected β same 401 response (no user enumeration).
2.7 Get Current User β valid token (200)
http GET $BASE/auth/me \
"Authorization: Bearer $TOKEN"
Expected
HTTP/1.1 200 OK
{
"id": "<uuid>",
"email": "researcher@example.com",
"display_name": "Test Researcher",
"is_active": true,
"created_at": "..."
}
2.8 Get Current User β no token (401)
http GET $BASE/auth/me
Expected
HTTP/1.1 401 Unauthorized
{
"detail": "Not authenticated"
}
2.9 Get Current User β malformed token (401)
http GET $BASE/auth/me \
"Authorization: Bearer not.a.valid.jwt"
Expected
HTTP/1.1 401 Unauthorized
{
"detail": "Could not validate credentials"
}
3. Query
All query endpoints require a valid JWT. Run the token capture from Β§2.4 first.
3.1 Submit Query β success (200)
http POST $BASE/query \
"Authorization: Bearer $TOKEN" \
query="What are the key differences between BERT and GPT architectures?"
Expected (~60β120 s β the agent embeds, searches pgvector, reranks with CrossEncoder, then calls the LLM)
HTTP/1.1 200 OK
{
"id": "<uuid>",
"query": "What are the key differences between BERT and GPT architectures?",
"answer": "BERT is an encoder-only Transformer ... GPT is decoder-only ...",
"citations": [
{
"index": 1,
"title": "Attention Is All You Need",
"source_url": "https://arxiv.org/abs/1706.03762",
"full_citation": "Vaswani et al. arXiv:1706.03762, 2017."
}
],
"chart_data": null,
"model_provider": "openai",
"agent_steps": 5,
"created_at": "..."
}
Validation checklist
-
answeris non-empty and references the query topic -
citationslist has β₯ 1 entry with validsource_urlstarting withhttps://arxiv.org/ -
agent_stepsis 1β5 -
model_providermatchesMODEL_PROVIDERin.env
3.2 Submit Query β numerical data triggers chart_data
Ask a question whose answer contains benchmark numbers so the agent populates chart_data:
http POST $BASE/query \
"Authorization: Bearer $TOKEN" \
query="Compare the accuracy scores of LoRA vs full fine-tuning on common NLP benchmarks"
Expected β chart_data is a Plotly spec (not null):
{
"chart_data": {
"data": [
{
"type": "bar",
"x": ["MNLI", "SST-2", "MRPC"],
"y": [91.7, 96.2, 90.1]
}
],
"layout": {
"title": "LoRA vs Full Fine-Tuning Benchmark Scores"
}
}
}
3.3 Submit Query β empty query string (422)
http POST $BASE/query \
"Authorization: Bearer $TOKEN" \
query=" "
Expected
HTTP/1.1 422 Unprocessable Entity
{
"detail": "Query must not be empty"
}
3.4 Submit Query β missing query field (422)
http POST $BASE/query \
"Authorization: Bearer $TOKEN" \
Content-Type:application/json \
<<< '{}'
Expected
HTTP/1.1 422 Unprocessable Entity
{
"detail": [
{
"type": "missing",
"loc": ["body", "query"],
"msg": "Field required"
}
]
}
3.5 Submit Query β no authentication (401)
http POST $BASE/query \
query="What is attention mechanism?"
Expected
HTTP/1.1 401 Unauthorized
{
"detail": "Not authenticated"
}
4. Query History
4.1 List history β returns entries in reverse-chronological order (200)
http GET $BASE/query/history \
"Authorization: Bearer $TOKEN"
Expected
HTTP/1.1 200 OK
[
{
"id": "<uuid>",
"query_text": "What are the key differences between BERT and GPT architectures?",
"response_text": "BERT is an encoder-only ...",
"model_provider": "openai",
"agent_steps": 5,
"created_at": "..."
}
]
4.2 Pagination β limit and offset
# First page: 2 items
http GET "$BASE/query/history?limit=2&offset=0" \
"Authorization: Bearer $TOKEN"
# Second page: next 2
http GET "$BASE/query/history?limit=2&offset=2" \
"Authorization: Bearer $TOKEN"
Validation checklist
-
limitcontrols list length (β€ N items returned) -
offsetskips earlier entries - Items are ordered newest-first
4.3 New user sees empty history
# Register a fresh user
http POST $BASE/auth/register \
email="freshuser@example.com" \
password="pass1234"
# Login
FRESH_TOKEN=$(http -f POST $BASE/auth/login \
username="freshuser@example.com" \
password="pass1234" \
| python -c "import sys,json; print(json.load(sys.stdin)['access_token'])")
# History should be empty
http GET $BASE/query/history \
"Authorization: Bearer $FRESH_TOKEN"
Expected β [] (empty array, HTTP 200).
5. On-Demand Visualization
The visualization endpoint is driven by include_visualization: true on the POST body. The frontend polls every 2 s for up to 180 s (3 minutes); these tests verify the full flow manually.
5.1 Submit query with visualization enabled
http POST $BASE/query \
"Authorization: Bearer $TOKEN" \
query="Compare the accuracy scores of LoRA vs full fine-tuning on common NLP benchmarks" \
include_visualization:=true
Expected β same response shape as Β§3.1; returns immediately (text response is NOT delayed by viz):
HTTP/1.1 200 OK
{
"id": "<uuid>",
"query": "...",
"answer": "...",
"citations": [...],
"chart_data": null,
"model_provider": "openai",
"agent_steps": 5,
"created_at": "..."
}
Capture the query ID:
QUERY_ID=$(http POST $BASE/query \
"Authorization: Bearer $TOKEN" \
query="Compare LoRA vs full fine-tuning accuracy" \
include_visualization:=true \
| python -c "import sys,json; print(json.load(sys.stdin)['id'])")
echo $QUERY_ID
5.2 Poll visualization β pending
Immediately after submitting (before the viz agent finishes):
http GET $BASE/query/$QUERY_ID/visualization \
"Authorization: Bearer $TOKEN"
Expected:
HTTP/1.1 200 OK
{
"status": "pending",
"chart_data": null,
"error": null
}
5.3 Poll visualization β complete
After waiting ~10β60 seconds for the VizCodeAgent to finish (complex charts like sunbursts take longer):
http GET $BASE/query/$QUERY_ID/visualization \
"Authorization: Bearer $TOKEN"
Expected:
HTTP/1.1 200 OK
{
"status": "complete",
"chart_data": {
"data": [
{
"type": "bar",
"x": ["MNLI", "SST-2", "MRPC"],
"y": [91.7, 96.2, 90.1],
"name": "LoRA"
},
{
"type": "bar",
"x": ["MNLI", "SST-2", "MRPC"],
"y": [92.1, 96.8, 90.9],
"name": "Full Fine-Tuning"
}
],
"layout": {
"title": "LoRA vs Full Fine-Tuning Accuracy",
"barmode": "group"
}
},
"error": null
}
Validation checklist:
-
statusis"complete" -
chart_datais a valid Plotly spec withdata(array) andlayout(object) keys -
data[*].typeis a recognised Plotly chart type (bar,scatter,line, etc.)
5.4 Poll visualization β query submitted without viz flag returns 404
# Submit without include_visualization
PLAIN_ID=$(http POST $BASE/query \
"Authorization: Bearer $TOKEN" \
query="What is BERT?" \
| python -c "import sys,json; print(json.load(sys.stdin)['id'])")
http GET $BASE/query/$PLAIN_ID/visualization \
"Authorization: Bearer $TOKEN"
Expected:
HTTP/1.1 404 Not Found
{
"detail": "Visualization not found β not requested, not yet started, or expired"
}
5.5 Poll visualization β no authentication (401)
http GET $BASE/query/$QUERY_ID/visualization
Expected:
HTTP/1.1 401 Unauthorized
{
"detail": "Not authenticated"
}
5.6 Submit query with viz flag β factual query with no data (NO_CHART)
Some queries produce an answer without numerical data. The viz agent should return NO_CHART:
http POST $BASE/query \
"Authorization: Bearer $TOKEN" \
query="What is the intuition behind the attention mechanism?" \
include_visualization:=true
After the viz agent completes (~5β15 s), poll:
http GET $BASE/query/$QUERY_ID/visualization \
"Authorization: Bearer $TOKEN"
Expected β status complete but no chart:
{
"status": "complete",
"chart_data": null,
"error": null
}
6. Settings API
The settings endpoints let authenticated users read and update application configuration at runtime. Changes are written to the .env file inside the backend container and take effect immediately via get_settings.cache_clear() β no container restart required.
Sensitive keys (API keys) are partially masked in all GET responses (
sk-ab****). Any value sent to PUT that ends with****is treated as a no-op sentinel β the existing key is preserved.
6.1 Get current settings (200)
http GET $BASE/settings \
"Authorization: Bearer $TOKEN"
Expected
HTTP/1.1 200 OK
{
"model_provider": "openai",
"openai_api_key": "sk-p****",
"openai_model": "gpt-4o",
"azure_openai_api_key": "",
"azure_openai_endpoint": "",
"azure_openai_deployment": "gpt-4.1-mini-2025-04-14",
"azure_openai_api_version": "2025-04-14",
"google_api_key": "",
"gemini_model": "gemini/gemini-flash-lite-latest",
"trulens_provider": "openai",
"trulens_strategy": "async",
"trulens_sample_rate": 1,
"trulens_feedback_timeout": 180.0,
"viz_model_provider": "openai",
"viz_model_name": "gpt-4o-mini",
"viz_azure_deployment": "",
"viz_azure_api_version": ""
}
Validation checklist
- API keys that are set show as
"<prefix>****"(never the full value) - API keys that are not set show as
""(empty string) -
model_providermatchesMODEL_PROVIDERin.env
6.2 Update provider and model (200)
http PUT $BASE/settings \
"Authorization: Bearer $TOKEN" \
model_provider="gemini" \
google_api_key="AIzaSy..." \
gemini_model="gemini/gemini-2.0-flash"
Expected β returns the updated settings with the key masked:
HTTP/1.1 200 OK
{
"model_provider": "gemini",
"google_api_key": "AIza****",
...
}
Validation checklist
-
model_providerreflects the new value -
google_api_keyis now masked (not empty) - Immediately submit a query to confirm the new provider is active (no restart needed)
6.3 Masked-key sentinel β existing key is preserved
Send back the masked placeholder unchanged; the backend should not overwrite the key:
# 1. Capture the current masked value
MASKED=$(http GET $BASE/settings "Authorization: Bearer $TOKEN" \
| python -c "import sys,json; print(json.load(sys.stdin)['openai_api_key'])")
# 2. PUT with the masked value β key should be unchanged
http PUT $BASE/settings \
"Authorization: Bearer $TOKEN" \
openai_api_key="$MASKED" \
openai_model="gpt-4o-mini"
Expected β 200 OK, openai_api_key still masked (not cleared), openai_model updated.
6.4 Get settings β no authentication (401)
http GET $BASE/settings
Expected
HTTP/1.1 401 Unauthorized
{
"detail": "Not authenticated"
}
7. RBAC Verification
The vector search is filtered at the SQL level by the user's roles. A guest user with no role-document mappings should get "No relevant documents found" from the retriever.
6.1 Register a guest user (default role)
http POST $BASE/auth/register \
email="guest@example.com" \
password="guest123"
# role omitted β defaults to "guest"
6.2 Login as guest
GUEST_TOKEN=$(http -f POST $BASE/auth/login \
username="guest@example.com" \
password="guest123" \
| python -c "import sys,json; print(json.load(sys.stdin)['access_token'])")
6.3 Query as guest β expects no documents
http POST $BASE/query \
"Authorization: Bearer $GUEST_TOKEN" \
query="What is BERT?"
Expected β agent returns an answer noting no documents were found (HTTP 200, but answer states retriever returned empty context). The citations list will be empty ([]).
8. Error Summary Table
| # | Method | Endpoint | Scenario | Expected HTTP |
|---|---|---|---|---|
| 1 | GET | /health |
normal | 200 |
| 2 | POST | /auth/register |
new user | 201 |
| 3 | POST | /auth/register |
duplicate email | 400 |
| 4 | POST | /auth/register |
invalid email | 422 |
| 5 | POST | /auth/login |
correct credentials | 200 |
| 6 | POST | /auth/login |
wrong password | 401 |
| 7 | GET | /auth/me |
valid token | 200 |
| 8 | GET | /auth/me |
no token | 401 |
| 9 | GET | /auth/me |
malformed token | 401 |
| 10 | POST | /query |
valid query | 200 |
| 11 | POST | /query |
empty string | 422 |
| 12 | POST | /query |
missing field | 422 |
| 13 | POST | /query |
no auth | 401 |
| 14 | GET | /query/history |
with auth | 200 |
| 15 | GET | /query/history |
no auth | 401 |
| 16 | POST | /query |
include_visualization: true |
200 (immediate text response) |
| 17 | GET | /query/{id}/visualization |
pending (viz in progress) | 200 {status:"pending"} |
| 18 | GET | /query/{id}/visualization |
complete | 200 {status:"complete", chart_data:{...}} |
| 19 | GET | /query/{id}/visualization |
viz not requested (no flag) | 404 |
| 20 | GET | /query/{id}/visualization |
no auth | 401 |
| 21 | GET | /settings |
authenticated | 200 |
| 22 | GET | /settings |
no auth | 401 |
| 23 | PUT | /settings |
valid payload | 200 |
| 24 | PUT | /settings |
masked-key sentinel | 200 (key unchanged) |
| 25 | PUT | /settings |
no auth | 401 |
9. Verbose Mode & Inspecting Headers
Add -v to see full request/response headers, useful for debugging CORS or auth issues:
http -v GET $BASE/auth/me \
"Authorization: Bearer $TOKEN"
Check only response headers (no body):
http -h POST $BASE/query \
"Authorization: Bearer $TOKEN" \
query="test"
10. Interactive API Docs
FastAPI serves two interactive UIs automatically:
| UI | URL |
|---|---|
| Swagger UI | http://localhost:8000/docs |
| ReDoc | http://localhost:8000/redoc |
| Raw OpenAPI JSON | http://localhost:8000/openapi.json |
The saved snapshot is at docs/openapi.json.
11. TruLens Evaluation Verification
After running a query, TruLens scores are persisted asynchronously to the evaluation_results table. Check them directly:
docker exec agentic-rag-db-1 psql -U postgres -d agentic_rag -c "
SELECT
ql.query_text,
er.relevance_score,
er.groundedness_score,
er.answer_relevance_score,
er.created_at
FROM evaluation_results er
JOIN query_logs ql ON ql.id = er.query_log_id
ORDER BY er.created_at DESC
LIMIT 5;"
Target scores (from SLA):
| Metric | Target |
|---|---|
| Context Relevance | > 0.85 |
| Groundedness | > 0.90 |
| Answer Relevance | > 0.85 |
Scores are written in a background thread after the HTTP response. The worker waits up to
TRULENS_FEEDBACK_TIMEOUTseconds (default 180) for TruLens to finish all three judge calls viaretrieve_feedback_resultsbefore persisting toevaluation_results. Slow LLM proxies may need a higher timeout in.env.
12. Performance Targets
The pipeline SLA from the spec:
| Stage | Target |
|---|---|
| Full pipeline (end-to-end) | < 6 s |
| LLM generation | < 3 s |
| CrossEncoder reranking (20 pairs) | < 500 ms |
| pgvector HNSW search (100k vectors) | < 50 ms |
Note: The CrossEncoder model (
cross-encoder/ms-marco-MiniLM-L-6-v2,85 MB) is loaded once at first request and cached as a process singleton. First-query latency on cold start (30 s on CPU) is expected; subsequent queries meet the SLA.
Measure end-to-end latency with HTTPie's --print=h and the response Date header, or use time:
time http POST $BASE/query \
"Authorization: Bearer $TOKEN" \
query="What is retrieval-augmented generation?"