audit_assistant / docs /interfaces.md
akryldigital's picture
add docs
815b494 verified

Interfaces

Reference document listing every external API the system depends on, and every important module-to-module contract inside the codebase. Use this when changing an integration or refactoring an internal module to understand what might break.

External interfaces (the system calls out)

OpenAI Chat Completions API

Aspect Value
Endpoint https://api.openai.com/v1/chat/completions
Auth Authorization: Bearer ${OPENAI_API_KEY}
Wrapper src/llm/adapters.py (via OpenAIClient + LLMRegistry)
Models in use gpt-4o-mini (cost-efficient β€” query rewriting, answer generation), gpt-4.1 (strong β€” query analysis)
Configured at src/config/settings.yaml::reader.OPENAI and reader.OPENAI_STRONG
Called from BaseMultiAgentChatbot._analyze_query_context, _rewrite_query_for_rag; MultiAgentRAGChatbot._generate_conversational_response*
Failure mode Caught at every call site; returns a sensible fallback (original query, generic error message)
Latency budget 1-5 s per call typical; tolerated up to 30 s before the user sees a "thinking" timeout (Streamlit default)
Rate limits OpenAI's standard rate limits per account/key tier; not actively monitored by the app

Qdrant Cloud API

Aspect Value
Endpoint ${QDRANT_URL} (set as HF Space secret)
Auth api-key: ${QDRANT_API_KEY} header
Protocol gRPC preferred (configurable via prefer_grpc: true in settings); HTTPS fallback
Wrapper src/vectorstore.py::VectorStoreManager (uses langchain_qdrant.Qdrant)
Collection BAAI-bge-m3-full (configured in settings.yaml::qdrant.collection_name)
Operations called similarity_search_with_score (vector search), count (pre-validation), scroll (metadata cache rebuild), create_payload_index (one-off setup)
Called from src/retrieval/context.py, src/retrieval/filter.py (MetadataCache._fetch_from_qdrant), src/agents/agent_filtering.py (_prevalidate_filters)
Failure mode Connection failure at startup β†’ chatbot init fails, app shows error banner. Query failure β†’ caught, returns empty results.
Latency budget similarity_search: ~200-500 ms typical. count: <100 ms typical. scroll: ~80 s on full collection (only on cold start without disk cache).

Hugging Face Hub β€” model file downloads

Aspect Value
Endpoint https://huggingface.co/<model>/resolve/main/*
Auth Public read, no auth needed for our models
Wrapper transformers library (transitive via sentence-transformers and langchain_huggingface)
Models BAAI/bge-m3 (embeddings), BAAI/bge-reranker-v2-m3 (reranker)
When called Only at Docker build time (download_models.py); pre-populated cache in image avoids runtime downloads
Failure mode Build fails; deploys blocked until HF Hub is reachable

Hugging Face Hub β€” dataset push (logging)

Aspect Value
Endpoint https://huggingface.co/api/datasets/GIZ/spaces_logs/*
Auth Bearer ${SPACES_LOG} (write token, set as HF Space secret)
Wrapper src/logging.py (via huggingface_hub.HfApi)
What's pushed Conversational JSON logs (audit trail)
Called from BaseMultiAgentChatbot.chat() after each turn
Failure mode Caught silently; logs an error but doesn't fail the user request

Ollama (optional, local development only)

Aspect Value
Endpoint ${OLLAMA_BASE_URL} (e.g. http://localhost:11434/)
Auth None
Wrapper src/llm/adapters.py (via langchain_ollama.OllamaLLM)
Status Not used in production. Available for local dev where running OpenAI calls would be expensive or impossible offline.

Internal interfaces (module to module within the codebase)

app.py β†’ BaseMultiAgentChatbot.chat()

The only call from the Streamlit layer into the agent layer.

Aspect Value
Signature chat(user_input: str, conversation_id: str = "default") -> Dict[str, Any]
Input user_input may include a FILTER CONTEXT: preamble with sidebar selections; conversation_id is the per-Streamlit-session UUID
Output Dict with keys: response (str), rag_result (PipelineResult), agent_logs (list), relaxation_notes (list), gap_follow_up (str or None)
Stability contract This signature should be considered stable. Streamlit, tests, and any future front-ends depend on it.

MultiAgentRAGChatbot._perform_retrieval() β†’ PipelineManager.run()

How the agent triggers retrieval.

Aspect Value
Caller MultiAgentRAGChatbot._perform_retrieval (subclass implementation of an abstract method)
Callee src/pipeline.py::PipelineManager.run(query, sources, auto_infer_filters, filters, skip_answer, ...)
Key call-site arguments auto_infer_filters=False (we did filter inference upstream); skip_answer=True (we do answer generation in the agent, not the pipeline)
Returns PipelineResult with .sources (List[Document]), .answer (always empty when skip_answer=True), .metadata

PipelineManager.run() β†’ ContextRetriever.retrieve_context()

How the pipeline triggers actual vector search.

Aspect Value
Caller src/pipeline.py::PipelineManager.run
Callee src/retrieval/context.py::ContextRetriever.retrieve_context(query, reports, sources, subtype, year, district, filenames, entity_type, use_reranking, top_k, ...)
Returns List[Document] with metadata fields original_score, reranked_score (if reranker applied), reranking_applied, plus the underlying Qdrant payload (year, district, source, filename, page, etc.)

Mixin contracts (intra-src/agents/)

Mixin Public attributes it sets on self Methods it exposes
_MetadataMixin (metadata.py) self.year_whitelist, self.source_whitelist, self.district_whitelist, self.db_metadata_context, self.district_doc_counts, self.current_year, self.latest_data_year, self.earliest_data_year, self.UGANDA_REGIONS _load_dynamic_data(), _load_db_metadata(vectorstore), _normalize_district_name(s)
_FiltersMixin (agent_filtering.py) None _best_score(sources) (static), _prevalidate_filters(filters, anchored_keys), _post_relaxation_relevance_check(sources, anchored_keys, original_filters), _normalize_source_name(raw), _llm_overrides_ui(...) (static), _validate_filter_values(filters)
_ConversationHistoryMixin (conversation_history.py) None _load_conversation(file_path), _save_conversation(file_path, conversation)

All mixin methods are called via self.X() in BaseMultiAgentChatbot. The orchestrator stays at the same call-site convention regardless of physical file location (Python's MRO handles the dispatch).

Filter-construction APIs in src/retrieval/filter.py

Two coexisting filter constructors, with different consumers (see ADR 002 and DEFERRED #1 for the rationale):

Function Signature Used by
build_qdrant_filter_from_dict(filters: dict) Dict-based, returns Optional[Filter] _FiltersMixin._prevalidate_filters (for cheap count() queries before retrieval)
create_filter(reports=, sources=, subtype=, year=, district=, filenames=, entity_type=) Kwarg-based, returns Filter; handles filename mutual-exclusivity src/retrieval/hybrid.py, src/retrieval/context.py (for actual vector-search queries)

LangGraph state contract (MultiAgentState)

Defined in src/agents/state.py. Every agent node reads from and writes to a shared MultiAgentState (a TypedDict). Keys that travel between nodes:

Key Written by Read by
current_query chat() All agents
messages chat() (after a turn completes) _main_agent (for context), _rewrite_query_for_rag
query_context _main_agent._analyze_query_context _route_after_main, _rag_agent, _response_agent
rag_filters _rag_agent._build_filters _response_agent
anchored_filter_keys _rag_agent._build_filters _response_agent (relaxation logic)
rag_query _response_agent._rewrite_query_for_rag (after prevalidation succeeds) _response_agent._perform_retrieval
retrieved_documents _response_agent (after retrieval) _response_agent._generate_conversational_response
final_response _response_agent, or pre-validation early-exit chat() (returned to caller)
agent_logs All agents (append-only) chat() (returned to caller for debugging)
relaxation_notes _response_agent (during/after relaxation) chat() (returned to caller)
gap_follow_up _response_agent (when pre-validation finds an all-anchored gap) chat()
conversation_context["last_filters"] _main_agent (end of turn) _main_agent._analyze_query_context (next turn's filter carryover)

Versioning and stability

  • External APIs β€” OpenAI and Qdrant are vendor-controlled; we depend on their API stability. Both have versioning policies; we're using stable public versions.
  • Internal APIs β€” no formal versioning; changes to mixin signatures or agent state keys are coordinated within the same PR. The code is small enough that this works without overhead.

Related: docs/system-requirements.md details what external services must be available for the system to function. docs/architecture/05-deployment-view.md shows the deployment topology of these interfaces.