Spaces:
Sleeping
RAG Agent Workbench β Context and Design
Project Purpose
RAG Agent Workbench is a lightweight experimentation backend for retrieval-augmented generation (RAG). It focuses on:
- Fast ingestion of documents into a Pinecone index with integrated embeddings.
- Simple, production-style APIs for search and chat-style question answering.
- Keeping the backend slim: no local embedding or LLM models, relying instead on managed services.
Current Architecture
Client(s)
- Any HTTP client (curl, scripts in
scripts/, future UI) talks to the FastAPI backend.
- Any HTTP client (curl, scripts in
Backend (FastAPI,
backend/app)routers/health.pyβ service status.ingest.pyβ /ingest/wiki, /ingest/openalex, /ingest/arxiv.documents.pyβ manual uploads and stats.search.pyβ semantic search over Pinecone.chat.pyβ agentic RAG chat using LangGraph + LangChain.
services/ingestors/β fetch content from arXiv, OpenAlex, Wikipedia.chunking.pyβ chunk documents into Pinecone-ready records.dedupe.pyβ in-memory duplicate record removal.normalize.pyβ text normalisation and doc id generation.pinecone_store.pyβ Pinecone init, search, upsert, stats.llm/groq_llm.pyβ Groq-backed chat model wrapper.tools/tavily_tool.pyβ Tavily web search integration.prompts/rag_prompt.pyβ RAG system + user prompts.chat/graph.pyβ LangGraph state graph for /chat.
core/config.pyβ env-driven configuration.errors.pyβ app-specific exceptions + handlers.logging.pyβ basic logging setup.tracing.pyβ LangSmith / LangChain tracing helper.
schemas/β Pydantic models for all endpoints.
Vector Store
- Pinecone index with integrated embeddings.
- Text field configurable via
PINECONE_TEXT_FIELD.
LLM and Tools
- Groq OpenAI-compatible chat model via
langchain-openai. - Tavily web search via
langchain-communitytool (optional). - LangGraph orchestrates retrieval β routing β web search β generation.
- Groq OpenAI-compatible chat model via
Implemented Endpoints
| HTTP Method | Path | Description |
|---|---|---|
| GET | /health |
Health check with service name and version. |
| POST | /ingest/arxiv |
Ingest recent arXiv entries matching a query. |
| POST | /ingest/openalex |
Ingest OpenAlex works matching a query. |
| POST | /ingest/wiki |
Ingest Wikipedia pages by title. |
| POST | /documents/upload-text |
Upload raw/manual text or Docling-converted content. |
| GET | /documents/stats |
Get vector counts per namespace from Pinecone. |
| POST | /search |
Semantic search over Pinecone using integrated embeddings. |
| POST | /chat |
Production-style RAG chat using LangGraph + Groq + Pinecone. |
| POST | /chat/stream |
SSE streaming variant of /chat. |
Key Design Decisions
Integrated embeddings only
- No local embedding models; Pinecone is configured with integrated embeddings.
- Backend stays light and easy to deploy in constrained environments.
OpenAI-compatible LLM interface
- Groq is accessed via the OpenAI-compatible API (
langchain-openai). - Avoids additional provider-specific SDKs and keeps integration simple.
- Groq is accessed via the OpenAI-compatible API (
Agentic RAG flow using LangGraph
- Chat pipeline is modelled as a state graph:
normalize_inputβ set defaults, normalise chat history.retrieve_contextβ Pinecone retrieval.decide_nextβ route to web search or generation.web_searchβ Tavily search (optional).generate_answerβ Groq LLM with RAG prompts.format_responseβ reserved for post-processing.
- This makes the flow explicit and easy to extend.
- Chat pipeline is modelled as a state graph:
Web search as a conditional fallback
- Tavily web search is used only when:
- Retrieval returns no hits, or
- Top score is below a threshold (
min_score), and use_web_fallback=trueandTAVILY_API_KEYis configured.
- When Tavily is not configured, the system degrades gracefully to retrieval-only.
- Tavily web search is used only when:
LangSmith tracing via environment flags
- Tracing is enabled purely via environment:
LANGCHAIN_TRACING_V2=trueLANGCHAIN_API_KEYset- Optional:
LANGCHAIN_PROJECT
core/tracing.pyexposes helper functions that:- Check if tracing is enabled.
- Construct callback handlers (
LangChainTracer) for LangGraph/LangChain. - Expose trace metadata in API responses.
- Tracing is enabled purely via environment:
Error handling boundary
- External dependencies (Pinecone, Groq, Tavily) are wrapped so that:
- Configuration errors return 500s with clear messages.
- Upstream service failures raise
UpstreamServiceErrorand surface as HTTP 502.
- This keeps failure modes explicit for clients.
- External dependencies (Pinecone, Groq, Tavily) are wrapped so that:
Work Package History
Work Package A
- Scope
- Initial backend setup with FastAPI, Pinecone integration, and ingestion/search endpoints.
- Highlights
/ingest/wiki,/ingest/openalex,/ingest/arxivfor sourcing content./documents/upload-textfor manual/Docling-based uploads./searchand/documents/statsendpoints to query and inspect the index.
- How to test
- Use
scripts/seed_ingest.pyandscripts/smoke_arxiv.pyto seed and smoke-test ingestion.
- Use
Work Package B (this change)
Scope
- Add a production-style
/chatRAG endpoint using LangGraph and LangChain. - Integrate Groq as the LLM and Tavily as an optional web search fallback.
- Introduce LangSmith tracing hooks and update documentation and smoke tests.
- Add a production-style
Functional changes
New router:
backend/app/routers/chat.pyPOST /chat- Runs a LangGraph state graph:
- Normalises inputs and defaults.
- Retrieves context from Pinecone.
- Decides whether to invoke web search.
- Runs Tavily web search when enabled and needed.
- Calls Groq LLM with a RAG prompt to generate the answer.
- Returns answer, sources, timings, and trace metadata.
- Runs a LangGraph state graph:
POST /chat/stream- Same pipeline as
/chatbut returns Server-Sent Events (SSE). - Streams tokens from the final answer plus a terminating event with the full JSON payload.
- Same pipeline as
New schemas:
backend/app/schemas/chat.pyChatRequestwith:query,namespace,top_k,use_web_fallback,min_score,max_web_results, andchat_history.
SourceHitrepresenting document/web snippets.ChatTimingsandChatTraceMetadatafor timings and LangSmith info.ChatResponsecombining answer, sources, timings, and trace metadata.
New services:
backend/app/services/llm/groq_llm.pyget_llm()returns a Groq-backedChatOpenAIwith:base_url=GROQ_BASE_URL(defaulthttps://api.groq.com/openai/v1).model=GROQ_MODEL(defaultllama-3.1-8b-instant).- Timeouts and retries from HTTP settings.
- Raises a configuration error if
GROQ_API_KEYis missing.
backend/app/services/tools/tavily_tool.pyis_tavily_configured()checksTAVILY_API_KEY.get_tavily_tool(max_results)wrapsTavilySearchResultsfromlangchain-community.- Logs a warning and returns
Nonewhen Tavily is not configured, disabling web fallback gracefully.
backend/app/services/prompts/rag_prompt.py- Defines RAG system and user prompts.
build_rag_messages(chat_history, question, sources)builds LangChain messages that:- Use only supplied context.
- Label context snippets as
[1],[2], etc., and instruct the model to cite them inline.
backend/app/services/chat/graph.py- Implements the LangGraph
ChatStateand state graph with nodes:normalize_inputretrieve_contextdecide_nextweb_searchgenerate_answerformat_response
- Uses Pinecone search for retrieval and Tavily for optional web search.
- Calls the Groq LLM via
get_llm()with LangChain Runnable config (callbacks) so LangSmith traces are collected when enabled. - Records
retrieve_ms,web_ms, andgenerate_msintimings.
- Implements the LangGraph
New core utility:
backend/app/core/tracing.pyis_tracing_enabled()checksLANGCHAIN_TRACING_V2andLANGCHAIN_API_KEY.get_tracing_callbacks()returns aLangChainTracercallback list when enabled.get_tracing_response_metadata()returns{langsmith_project, trace_enabled}.
Configuration changes:
backend/app/core/config.pyadds:GROQ_API_KEY,GROQ_BASE_URL,GROQ_MODEL.TAVILY_API_KEY.RAG_DEFAULT_TOP_K,RAG_MIN_SCORE,RAG_MAX_WEB_RESULTS.
backend/.env.exampleupdated with the new env vars, including LangSmith options.
Error handling:
backend/app/core/errors.pyintroducesUpstreamServiceError.- Centralised handler converts
UpstreamServiceErrorinto HTTP 502 responses.
Documentation and scripts:
backend/README.mdupdated with/chatand/chat/streamusage, env vars, and a local test checklist.- New scripts:
scripts/smoke_chat.pyβ uses/ingest/wikiand/chatfor a local smoke test.scripts/smoke_chat_web.pyβ tests/chatwithuse_web_fallback=trueand a query that should trigger web search.
How to test
- Start the backend:
cd backend uvicorn app.main:app --reload --port 8000 - Ingest some Wikipedia pages:
python ../scripts/smoke_chat.py --backend-url http://localhost:8000 --namespace dev - Test web fallback (requires
TAVILY_API_KEY):python ../scripts/smoke_chat_web.py --backend-url http://localhost:8000 --namespace dev - Verify LangSmith traces:
- Set
LANGCHAIN_TRACING_V2=true,LANGCHAIN_API_KEY, and optionallyLANGCHAIN_PROJECT. - Run
/chatagain and confirm traces appear in LangSmith.
- Set
- Start the backend:
Known Issues / Limits
No local models
- The backend intentionally does not host local embedding or LLM models.
- All intelligence is delegated to Pinecone (integrated embeddings), Groq, and Tavily.
Retrieval quality depends on ingestion
- The usefulness of
/chatdepends heavily on the quality and coverage of the ingested documents. - For some queries, even the best matching chunks may not be sufficient to answer without web fallback.
- The usefulness of
Best-effort web search
- Tavily integration is optional and depends on the external Tavily API.
- When Tavily is unavailable or misconfigured, the backend falls back to retrieval-only answers.
Simple SSE streaming
/chat/streamstreams tokens derived from the final answer string rather than streaming directly from the LLM.- This keeps implementation simple while still providing a streaming interface.
Work Package C
Scope
- Make the backend deploy-ready on Hugging Face Spaces using Docker.
- Add a minimal Streamlit frontend suitable for Streamlit Community Cloud (no Docker).
- Add production polish: basic API protection, rate limiting, caching, metrics, and a small benchmarking script.
- Keep configuration sane by default, with environment variables as overrides rather than hard requirements.
Backend changes (HF Spaces deploy + runtime)
- Docker / port behaviour
backend/Dockerfilenow:- Exposes port 7860 (the default for many Hugging Face Spaces deployments).
- Uses a shell-form
CMDsoPORTcan be honoured when set:uvicorn app.main:app --host 0.0.0.0 --port ${PORT:-7860}
- New helper:
backend/app/core/runtime.pyget_port():- Reads
PORTfrom the environment. - Defaults to
7860when unset or invalid. - Logs:
Starting on port=<port> hf_spaces_mode=<bool>using a simple heuristic (SPACE_ID/SPACE_REPO_IDenv vars).
- Reads
- Called from
app.mainat import time so the log line is visible in container logs during startup.
API key protection and CORS
API key protection
- New module:
backend/app/core/auth.py- Defines
require_api_keyFastAPI dependency usingAPIKeyHeader(X-API-Key). validate_api_key_configuration()runs at startup and enforces:- In production-like environments (
ENV=productionor on Hugging Face Spaces viaSPACE_ID/HF_HOME):API_KEYmust be set or the backend fails fast with a clear error.
- In local development:
- If
API_KEYis missing, the backend runs open but logs a prominent warning.
- If
- In production-like environments (
require_api_keybehaviour:- If
API_KEYis not configured (dev mode), the dependency is a no-op. - If
API_KEYis configured:- Missing or mismatched
X-API-Keyresults in HTTP 403.
- Missing or mismatched
- If
- Defines
- Wiring:
- All routers except
/healthare registered withdependencies=[Depends(require_api_key)]. - Docs and OpenAPI endpoints are explicitly secured:
GET /openapi.jsonβ returnsapp.openapi(), protected byrequire_api_key.GET /docsβ Swagger UI viaget_swagger_ui_html, protected byrequire_api_key.GET /redocβ ReDoc UI viaget_redoc_html, protected byrequire_api_key.
- Effect:
- In HF Spaces / production:
/docs,/redoc,/openapi.json,/chat,/search,/documents/*,/ingest/*,/metricsall requireX-API-Key./healthremains public for simple uptime checks.
- In local dev with no
API_KEY:- All endpoints (including docs) are accessible without a key for convenience.
- In HF Spaces / production:
- All routers except
- New module:
CORS configuration
backend/app/core/security.pynow focuses solely on CORS:- Reads
ALLOWED_ORIGINSenv var as a comma-separated list. - If unset or empty:
- Defaults to
["*"](permissive, useful for local dev and quick demos).
- Defaults to
- Applies FastAPI
CORSMiddlewarewith:allow_origins=originsallow_methods=["*"]allow_headers=["*"]
- Reads
- API key enforcement is handled entirely via
core/auth.pyand router/dependency wiring.
Rate limiting (SlowAPI)
New module:
backend/app/core/rate_limit.py- Uses
slowapi.Limiterwithget_remote_addressas the key function. setup_rate_limiter(app):- Reads
RATE_LIMIT_ENABLEDfromSettings(defaults toTrue). - If disabled:
- Logs
"Rate limiting is disabled via settings." - Does not attach middleware (decorators become no-ops at runtime).
- Logs
- If enabled:
- Attaches SlowAPI middleware:
app.middleware("http")(limiter.middleware). - Registers a custom
RateLimitExceededhandler returning JSON:- HTTP
429 - Body:
{"detail": "Rate limit exceeded. Please slow down your requests.", "retry_after": ...}when available.
- HTTP
- Logs violations with client IP and path.
- Attaches SlowAPI middleware:
- Reads
- Uses
Endpoint-specific limits (per IP):
/chatand/chat/stream:- Decorated with
@limiter.limit("30/minute").
- Decorated with
/ingestendpoints:/ingest/arxiv,/ingest/openalex,/ingest/wiki:@limiter.limit("10/minute").
/search:@limiter.limit("60/minute").
Operational toggle:
- New config flag in
Settings:RATE_LIMIT_ENABLED: bool = True
.env.example:RATE_LIMIT_ENABLED=true(set tofalseto disable entirely).
- New config flag in
Caching (cachetools, in-memory)
New module:
backend/app/core/cache.pyUses
cachetools.TTLCachewith short in-memory TTLs (no external store):- Search cache:
TTL = 60s,maxsize = 1024.- Keys:
(namespace, query, top_k, filters_json)wherefilters_jsonis a JSON-serialised, sorted representation of thefiltersdict.
- Chat cache:
TTL = 60s,maxsize = 512.- Keys:
(namespace, query, top_k, min_score, use_web_fallback). - Only used when no chat history is provided.
- Search cache:
API:
cache_enabled() -> bool(readsCACHE_ENABLEDfrom settings, defaultTrue).get_search_cached(...)/set_search_cached(...).get_chat_cached(...)/set_chat_cached(...).get_cache_stats()returns hit/miss counters:search_hits,search_misses,chat_hits,chat_misses.
Hit/miss logging:
- Each cache lookup logs a hit or miss with namespace and query for observability.
Integration into endpoints:
/search(backend/app/routers/search.py):- On each request:
- Check
get_search_cached(...). - If hit: use cached
hits_rawlist. - If miss: call Pinecone search and then
set_search_cached(...).
- Check
- Response construction (mapping text field to
chunk_text) remains unchanged.
- On each request:
/chat(backend/app/routers/chat.py):- Caching is only considered when
chat_historyis empty and caching is enabled. - Flow:
- Test
cache_enabled()andnot payload.chat_history. - Attempt
get_chat_cached(...). - On hit:
- Log and return the cached
ChatResponse. - Still call
record_chat_timings(...)so/metricsreflects cached responses.
- Log and return the cached
- On miss:
- Run the LangGraph pipeline as before.
- Record timings via
record_chat_timings(...). - Store the
ChatResponsein the chat cache viaset_chat_cached(...).
- Test
- Caching is only considered when
Operational toggle:
- New config flag in
Settings:CACHE_ENABLED: bool = True
.env.example:CACHE_ENABLED=true(set tofalseto fully disable caching).
- New config flag in
Metrics and observability
New module:
backend/app/core/metrics.pyIn-memory metrics only, with a small footprint and no external dependencies beyond stdlib.
Tracks:
- Request counts by path:
_request_counts[path]incremented for every request, viametrics_middleware.
- Error counts by path:
_error_counts[path]incremented for any response withstatus_code >= 400or for unhandled exceptions.
- Chat timing metrics:
- Focused on
/chatand/chat/stream. - Expected fields:
retrieve_ms,web_ms,generate_ms,total_ms.
- Stored in:
_timing_samples:deque(maxlen=20)for the last 20 samples._timing_sumsand_timing_countfor averages.
- Focused on
- Request counts by path:
Middleware:
metrics_middleware(request, call_next):- Records per-path request and error counts.
- Logs debug-level timing for each request.
API functions:
record_chat_timings(timings: Mapping[str, float]):- Updates sums, counts, and the ring buffer.
- Called from both
/chatand/chat/streamafter timings are known.
get_metrics_snapshot():- Builds a snapshot dictionary containing:
requests_by_patherrors_by_pathtimings:average_msfor each timing field.p50_msandp95_msbased on the last 20 samples.
cache:search_hits,search_misses,chat_hits,chat_missesfromcore.cache.
sample_countandsamples(the last 20 timing entries).
- Builds a snapshot dictionary containing:
/metricsendpoint- New router:
backend/app/routers/metrics.pyGET /metricsreturnsget_metrics_snapshot()as JSON.
- Registered in
app.mainwith tag["metrics"]. - Left public (not behind API key) to simplify monitoring and demos.
- New router:
App wiring (
backend/app/main.py)- After creating the FastAPI app:
configure_security(app)β CORS + optional API key.setup_rate_limiter(app)β SlowAPI middleware when enabled.setup_metrics(app)β metrics middleware.
- Routers:
health,ingest,search,documents,chat,metricsall included.
- Exception handlers:
- Still configured via
setup_exception_handlers(app).
- Still configured via
- After creating the FastAPI app:
Benchmarking script
- New script:
scripts/bench_local.py- Purpose:
- Provide a simple, cross-platform (including Windows) asyncio load tester for the backend.
- Focused on
/chat, with optional/searchbenchmarking.
- Implementation:
- Uses
httpx.AsyncClientandasyncio. - Command-line arguments:
--backend-url(default:http://localhost:8000)--namespace(default:dev)--concurrency(default:10)--requests(default:50)--include-search(optional flag to also benchmark/search)--api-key(optionalX-API-Keyvalue)
- For each benchmark:
- Issues the specified number of requests with the provided concurrency.
- Records per-request latency (ms) and whether an error occurred.
- Outputs:
- Total requests, successes, errors, and error rate.
- Average latency.
- p50 and p95 latencies.
- Uses
- Entrypoint:
python scripts/bench_local.py --backend-url http://localhost:8000 --namespace dev --concurrency 10 --requests 50
- Purpose:
Streamlit frontend (Streamlit Community Cloud)
New directory:
frontend/- Main app:
frontend/app.py- Dependencies:
streamlithttpx
- Backend configuration:
- Reads
BACKEND_BASE_URLfromst.secrets["BACKEND_BASE_URL"]or theBACKEND_BASE_URLenvironment variable. - Reads
API_KEYfromst.secrets["API_KEY"]or theAPI_KEYenvironment variable.
- Reads
- Sidebar ("Backend" + settings):
- Shows backend URL and API key status.
- "Ping /health" button that calls the backend and shows the JSON response.
top_kslider,min_scoreslider,use_web_fallbackcheckbox.- "Show sources" toggle and "Clear chat" button.
- "Recent uploads" section with quick actions:
- For each recent upload, displays title, namespace, timestamp.
- A "Search this document" button pre-fills the chat input with a prompt such as
Summarize: <title>.
- Chatbot UI:
- Uses
st.chat_messageandst.chat_inputwith conversation stored inst.session_state.messages. - When the user sends a message:
- Appends it to history and displays it.
- Calls
/chat/streamwithX-API-Key(if available) and streams tokens into the UI. - If
/chat/streamis unavailable (e.g. 404), falls back to/chat.
- Assistant messages:
- Display the answer text.
- Optionally show sources in an expandable "Sources" section with titles, URLs, scores, and truncated snippets.
- If
API_KEYis not configured in secrets or environment:- The app warns and disables sending messages to the protected backend.
- Uses
- UI document upload:
- A top-level βπ Upload Documentβ button opens a
@st.dialogmodal. - Inside the dialog:
st.file_uploaderfor.pdf,.md,.txt,.docx,.pptx,.xlsx,.html,.htm.- Inputs for title (defaulting to filename), namespace, source label, tags, and notes.
- A checkbox to allow uploading even when extracted text is very short.
- On submit:
- The frontend converts the file to text/markdown (using Docling when installed, or raw text for
.md/.txt). - Calls backend
POST /documents/upload-textwithX-API-Key. - On success, records the upload in
st.session_state.recent_uploadsand triggers a rerun to close the dialog.
- The frontend converts the file to text/markdown (using Docling when installed, or raw text for
- A top-level βπ Upload Documentβ button opens a
- Dependencies:
- Main app:
Root-level
requirements.txt- Added to support Streamlit Community Cloud, where the root requirements file is used:
streamlithttpx
- Backend Docker image continues to use
backend/requirements.txt, keeping the backend container small and independent.
- Added to support Streamlit Community Cloud, where the root requirements file is used:
Operational Runbook
Rotating keys and secrets
Backend (Hugging Face Spaces or other container hosts)
- Update environment variables / secrets:
PINECONE_API_KEY,PINECONE_HOST,PINECONE_INDEX_NAME,PINECONE_NAMESPACE,PINECONE_TEXT_FIELDGROQ_API_KEY,GROQ_BASE_URL,GROQ_MODELTAVILY_API_KEYLANGCHAIN_API_KEY,LANGCHAIN_TRACING_V2,LANGCHAIN_PROJECTAPI_KEYfor HTTP clients
- Redeploy or restart the Space to apply changes.
- Verify:
GET /healthreturnsstatus: ok./chatand/searchwork as expected./metricsshows traffic and cache counters updating.
- Update environment variables / secrets:
Frontend (Streamlit Community Cloud)
- Use Streamlit Secrets manager (no secrets in repo):
BACKEND_BASE_URLβ full URL of the backend (e.g. HF Spaces URL).API_KEYβ must match backendAPI_KEYif API protection is enabled.
- After rotating backend keys:
- If
API_KEYchanged, update it in Streamlit secrets. - No code changes required.
- If
- Use Streamlit Secrets manager (no secrets in repo):
Disabling rate limiting and caching
Rate limiting
- Set
RATE_LIMIT_ENABLED=falsein the backend environment (or.envfor local). - Restart the backend.
- SlowAPI middleware will not be attached;
@limiter.limit(...)decorators become effectively no-op for enforcement. /metricswill still track request counts and errors.
- Set
Caching
- Set
CACHE_ENABLED=falsein the backend environment. - Restart the backend.
- Search and chat endpoints will bypass in-memory TTL caches entirely.
get_cache_stats()will still report counters, which will stop increasing.
- Set
Diagnosing common deployment issues
Symptom: 404 / connection errors on Hugging Face Spaces
- Check:
- The Space is configured as Docker and points to the
backend/subdirectory (or uses the providedbackend/Dockerfile). - Logs show the startup message:
"Starting on port=... hf_spaces_mode=...".
- HF Spaces sets
PORTautomatically; the DockerCMDwill honour it.
- The Space is configured as Docker and points to the
- Verify:
- Open
/docsand/healthin the browser using the Space URL. - If 404/500 persists:
- Ensure
PINECONE_*andGROQ_API_KEYare set. - Check logs for
PineconeIndexConfigErroror missing LLM configuration.
- Ensure
- Open
- Check:
Symptom: 401 Unauthorized from frontend
- Ensure:
- Backend
API_KEYis set and matches theAPI_KEYin Streamlit secrets. - Requests include
X-API-Keyheader (Streamlit app does this automatically whenAPI_KEYis present).
- Backend
- Confirm
/healthis still reachable without a key (by design).
- Ensure:
Symptom: 429 Too Many Requests
- Indicates SlowAPI rate limiting is active.
- Options:
- Reduce load (e.g. from
bench_local.py). - Temporarily set
RATE_LIMIT_ENABLED=falsefor heavy local testing.
- Reduce load (e.g. from
- Inspect
/metrics:- Check request counts and error counts for affected paths.
Symptom: Stale results after ingestion
- By default, caches are short-lived (60 seconds) but may briefly serve stale results:
- When ingesting new documents,
/searchor/chatresponses may not immediately reflect new content.
- When ingesting new documents,
- Workarounds:
- Wait a minute for TTL expiry.
- For strict freshness, disable caching with
CACHE_ENABLED=false.
- By default, caches are short-lived (60 seconds) but may briefly serve stale results:
Symptom: Streamlit frontend cannot reach backend
- Verify:
BACKEND_BASE_URLin Streamlit secrets is correct and publicly reachable.- CORS config on the backend:
- For debugging, keep
ALLOWED_ORIGINSunset (defaults to"*"). - For locked-down deployment, ensure the Streamlit app origin is included.
- For debugging, keep
- Use the Connectivity panel:
- Click "Ping /health" and inspect the response or error message.
- Verify: