audit_assistant / docs /interfaces.md
akryldigital's picture
add docs
815b494 verified
# Interfaces
> Reference document listing every external API the system depends on, and every important module-to-module contract inside the codebase. Use this when changing an integration or refactoring an internal module to understand what might break.
## External interfaces (the system calls out)
### OpenAI Chat Completions API
| Aspect | Value |
|---|---|
| Endpoint | `https://api.openai.com/v1/chat/completions` |
| Auth | `Authorization: Bearer ${OPENAI_API_KEY}` |
| Wrapper | `src/llm/adapters.py` (via `OpenAIClient` + `LLMRegistry`) |
| Models in use | `gpt-4o-mini` (cost-efficient β€” query rewriting, answer generation), `gpt-4.1` (strong β€” query analysis) |
| Configured at | `src/config/settings.yaml::reader.OPENAI` and `reader.OPENAI_STRONG` |
| Called from | `BaseMultiAgentChatbot._analyze_query_context`, `_rewrite_query_for_rag`; `MultiAgentRAGChatbot._generate_conversational_response*` |
| Failure mode | Caught at every call site; returns a sensible fallback (original query, generic error message) |
| Latency budget | 1-5 s per call typical; tolerated up to 30 s before the user sees a "thinking" timeout (Streamlit default) |
| Rate limits | OpenAI's standard rate limits per account/key tier; not actively monitored by the app |
### Qdrant Cloud API
| Aspect | Value |
|---|---|
| Endpoint | `${QDRANT_URL}` (set as HF Space secret) |
| Auth | `api-key: ${QDRANT_API_KEY}` header |
| Protocol | gRPC preferred (configurable via `prefer_grpc: true` in settings); HTTPS fallback |
| Wrapper | `src/vectorstore.py::VectorStoreManager` (uses `langchain_qdrant.Qdrant`) |
| Collection | `BAAI-bge-m3-full` (configured in `settings.yaml::qdrant.collection_name`) |
| Operations called | `similarity_search_with_score` (vector search), `count` (pre-validation), `scroll` (metadata cache rebuild), `create_payload_index` (one-off setup) |
| Called from | `src/retrieval/context.py`, `src/retrieval/filter.py` (`MetadataCache._fetch_from_qdrant`), `src/agents/agent_filtering.py` (`_prevalidate_filters`) |
| Failure mode | Connection failure at startup β†’ chatbot init fails, app shows error banner. Query failure β†’ caught, returns empty results. |
| Latency budget | `similarity_search`: ~200-500 ms typical. `count`: <100 ms typical. `scroll`: ~80 s on full collection (only on cold start without disk cache). |
### Hugging Face Hub β€” model file downloads
| Aspect | Value |
|---|---|
| Endpoint | `https://huggingface.co/<model>/resolve/main/*` |
| Auth | Public read, no auth needed for our models |
| Wrapper | `transformers` library (transitive via `sentence-transformers` and `langchain_huggingface`) |
| Models | `BAAI/bge-m3` (embeddings), `BAAI/bge-reranker-v2-m3` (reranker) |
| When called | Only at **Docker build time** (`download_models.py`); pre-populated cache in image avoids runtime downloads |
| Failure mode | Build fails; deploys blocked until HF Hub is reachable |
### Hugging Face Hub β€” dataset push (logging)
| Aspect | Value |
|---|---|
| Endpoint | `https://huggingface.co/api/datasets/GIZ/spaces_logs/*` |
| Auth | `Bearer ${SPACES_LOG}` (write token, set as HF Space secret) |
| Wrapper | `src/logging.py` (via `huggingface_hub.HfApi`) |
| What's pushed | Conversational JSON logs (audit trail) |
| Called from | `BaseMultiAgentChatbot.chat()` after each turn |
| Failure mode | Caught silently; logs an error but doesn't fail the user request |
### Ollama (optional, local development only)
| Aspect | Value |
|---|---|
| Endpoint | `${OLLAMA_BASE_URL}` (e.g. `http://localhost:11434/`) |
| Auth | None |
| Wrapper | `src/llm/adapters.py` (via `langchain_ollama.OllamaLLM`) |
| Status | **Not used in production**. Available for local dev where running OpenAI calls would be expensive or impossible offline. |
## Internal interfaces (module to module within the codebase)
### `app.py` β†’ `BaseMultiAgentChatbot.chat()`
The **only** call from the Streamlit layer into the agent layer.
| Aspect | Value |
|---|---|
| Signature | `chat(user_input: str, conversation_id: str = "default") -> Dict[str, Any]` |
| Input | `user_input` may include a `FILTER CONTEXT:` preamble with sidebar selections; `conversation_id` is the per-Streamlit-session UUID |
| Output | Dict with keys: `response` (str), `rag_result` (PipelineResult), `agent_logs` (list), `relaxation_notes` (list), `gap_follow_up` (str or None) |
| Stability contract | This signature should be considered stable. Streamlit, tests, and any future front-ends depend on it. |
### `MultiAgentRAGChatbot._perform_retrieval()` β†’ `PipelineManager.run()`
How the agent triggers retrieval.
| Aspect | Value |
|---|---|
| Caller | `MultiAgentRAGChatbot._perform_retrieval` (subclass implementation of an abstract method) |
| Callee | `src/pipeline.py::PipelineManager.run(query, sources, auto_infer_filters, filters, skip_answer, ...)` |
| Key call-site arguments | `auto_infer_filters=False` (we did filter inference upstream); `skip_answer=True` (we do answer generation in the agent, not the pipeline) |
| Returns | `PipelineResult` with `.sources` (List[Document]), `.answer` (always empty when `skip_answer=True`), `.metadata` |
### `PipelineManager.run()` β†’ `ContextRetriever.retrieve_context()`
How the pipeline triggers actual vector search.
| Aspect | Value |
|---|---|
| Caller | `src/pipeline.py::PipelineManager.run` |
| Callee | `src/retrieval/context.py::ContextRetriever.retrieve_context(query, reports, sources, subtype, year, district, filenames, entity_type, use_reranking, top_k, ...)` |
| Returns | `List[Document]` with metadata fields `original_score`, `reranked_score` (if reranker applied), `reranking_applied`, plus the underlying Qdrant payload (year, district, source, filename, page, etc.) |
### Mixin contracts (intra-`src/agents/`)
| Mixin | Public attributes it sets on `self` | Methods it exposes |
|---|---|---|
| `_MetadataMixin` (`metadata.py`) | `self.year_whitelist`, `self.source_whitelist`, `self.district_whitelist`, `self.db_metadata_context`, `self.district_doc_counts`, `self.current_year`, `self.latest_data_year`, `self.earliest_data_year`, `self.UGANDA_REGIONS` | `_load_dynamic_data()`, `_load_db_metadata(vectorstore)`, `_normalize_district_name(s)` |
| `_FiltersMixin` (`agent_filtering.py`) | None | `_best_score(sources)` (static), `_prevalidate_filters(filters, anchored_keys)`, `_post_relaxation_relevance_check(sources, anchored_keys, original_filters)`, `_normalize_source_name(raw)`, `_llm_overrides_ui(...)` (static), `_validate_filter_values(filters)` |
| `_ConversationHistoryMixin` (`conversation_history.py`) | None | `_load_conversation(file_path)`, `_save_conversation(file_path, conversation)` |
All mixin methods are called via `self.X()` in `BaseMultiAgentChatbot`. The orchestrator stays at the same call-site convention regardless of physical file location (Python's MRO handles the dispatch).
### Filter-construction APIs in `src/retrieval/filter.py`
Two coexisting filter constructors, with different consumers (see [ADR 002](architecture/adrs/002-dense-only-retrieval-hybrid-disabled.md) and DEFERRED #1 for the rationale):
| Function | Signature | Used by |
|---|---|---|
| `build_qdrant_filter_from_dict(filters: dict)` | Dict-based, returns `Optional[Filter]` | `_FiltersMixin._prevalidate_filters` (for cheap `count()` queries before retrieval) |
| `create_filter(reports=, sources=, subtype=, year=, district=, filenames=, entity_type=)` | Kwarg-based, returns `Filter`; handles filename mutual-exclusivity | `src/retrieval/hybrid.py`, `src/retrieval/context.py` (for actual vector-search queries) |
### LangGraph state contract (`MultiAgentState`)
Defined in `src/agents/state.py`. Every agent node reads from and writes to a shared `MultiAgentState` (a TypedDict). Keys that travel between nodes:
| Key | Written by | Read by |
|---|---|---|
| `current_query` | `chat()` | All agents |
| `messages` | `chat()` (after a turn completes) | `_main_agent` (for context), `_rewrite_query_for_rag` |
| `query_context` | `_main_agent._analyze_query_context` | `_route_after_main`, `_rag_agent`, `_response_agent` |
| `rag_filters` | `_rag_agent._build_filters` | `_response_agent` |
| `anchored_filter_keys` | `_rag_agent._build_filters` | `_response_agent` (relaxation logic) |
| `rag_query` | `_response_agent._rewrite_query_for_rag` (after prevalidation succeeds) | `_response_agent._perform_retrieval` |
| `retrieved_documents` | `_response_agent` (after retrieval) | `_response_agent._generate_conversational_response` |
| `final_response` | `_response_agent`, or pre-validation early-exit | `chat()` (returned to caller) |
| `agent_logs` | All agents (append-only) | `chat()` (returned to caller for debugging) |
| `relaxation_notes` | `_response_agent` (during/after relaxation) | `chat()` (returned to caller) |
| `gap_follow_up` | `_response_agent` (when pre-validation finds an all-anchored gap) | `chat()` |
| `conversation_context["last_filters"]` | `_main_agent` (end of turn) | `_main_agent._analyze_query_context` (next turn's filter carryover) |
## Versioning and stability
- **External APIs** β€” OpenAI and Qdrant are vendor-controlled; we depend on their API stability. Both have versioning policies; we're using stable public versions.
- **Internal APIs** β€” no formal versioning; changes to mixin signatures or agent state keys are coordinated within the same PR. The code is small enough that this works without overhead.
---
*Related:* [`docs/system-requirements.md`](system-requirements.md) details what external services must be available for the system to function. [`docs/architecture/05-deployment-view.md`](architecture/05-deployment-view.md) shows the deployment topology of these interfaces.