Spaces:

akryldigital
/

audit_assistant

Running

App Files Files Community

audit_assistant / docs /interfaces.md

akryldigital

add docs

815b494 verified 10 days ago

preview code

raw

history blame contribute delete

9.74 kB

	# Interfaces

	> Reference document listing every external API the system depends on, and every important module-to-module contract inside the codebase. Use this when changing an integration or refactoring an internal module to understand what might break.

	## External interfaces (the system calls out)

	### OpenAI Chat Completions API

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `https://api.openai.com/v1/chat/completions` \|
	\| Auth \| `Authorization: Bearer ${OPENAI_API_KEY}` \|
	\| Wrapper \| `src/llm/adapters.py` (via `OpenAIClient` + `LLMRegistry`) \|
	\| Models in use \| `gpt-4o-mini` (cost-efficient — query rewriting, answer generation), `gpt-4.1` (strong — query analysis) \|
	\| Configured at \| `src/config/settings.yaml::reader.OPENAI` and `reader.OPENAI_STRONG` \|
	\| Called from \| `BaseMultiAgentChatbot._analyze_query_context`, `_rewrite_query_for_rag`; `MultiAgentRAGChatbot._generate_conversational_response*` \|
	\| Failure mode \| Caught at every call site; returns a sensible fallback (original query, generic error message) \|
	\| Latency budget \| 1-5 s per call typical; tolerated up to 30 s before the user sees a "thinking" timeout (Streamlit default) \|
	\| Rate limits \| OpenAI's standard rate limits per account/key tier; not actively monitored by the app \|

	### Qdrant Cloud API

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `${QDRANT_URL}` (set as HF Space secret) \|
	\| Auth \| `api-key: ${QDRANT_API_KEY}` header \|
	\| Protocol \| gRPC preferred (configurable via `prefer_grpc: true` in settings); HTTPS fallback \|
	\| Wrapper \| `src/vectorstore.py::VectorStoreManager` (uses `langchain_qdrant.Qdrant`) \|
	\| Collection \| `BAAI-bge-m3-full` (configured in `settings.yaml::qdrant.collection_name`) \|
	\| Operations called \| `similarity_search_with_score` (vector search), `count` (pre-validation), `scroll` (metadata cache rebuild), `create_payload_index` (one-off setup) \|
	\| Called from \| `src/retrieval/context.py`, `src/retrieval/filter.py` (`MetadataCache._fetch_from_qdrant`), `src/agents/agent_filtering.py` (`_prevalidate_filters`) \|
	\| Failure mode \| Connection failure at startup → chatbot init fails, app shows error banner. Query failure → caught, returns empty results. \|
	\| Latency budget \| `similarity_search`: ~200-500 ms typical. `count`: <100 ms typical. `scroll`: ~80 s on full collection (only on cold start without disk cache). \|

	### Hugging Face Hub — model file downloads

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `https://huggingface.co/<model>/resolve/main/*` \|
	\| Auth \| Public read, no auth needed for our models \|
	\| Wrapper \| `transformers` library (transitive via `sentence-transformers` and `langchain_huggingface`) \|
	\| Models \| `BAAI/bge-m3` (embeddings), `BAAI/bge-reranker-v2-m3` (reranker) \|
	\| When called \| Only at Docker build time (`download_models.py`); pre-populated cache in image avoids runtime downloads \|
	\| Failure mode \| Build fails; deploys blocked until HF Hub is reachable \|

	### Hugging Face Hub — dataset push (logging)

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `https://huggingface.co/api/datasets/GIZ/spaces_logs/*` \|
	\| Auth \| `Bearer ${SPACES_LOG}` (write token, set as HF Space secret) \|
	\| Wrapper \| `src/logging.py` (via `huggingface_hub.HfApi`) \|
	\| What's pushed \| Conversational JSON logs (audit trail) \|
	\| Called from \| `BaseMultiAgentChatbot.chat()` after each turn \|
	\| Failure mode \| Caught silently; logs an error but doesn't fail the user request \|

	### Ollama (optional, local development only)

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `${OLLAMA_BASE_URL}` (e.g. `http://localhost:11434/`) \|
	\| Auth \| None \|
	\| Wrapper \| `src/llm/adapters.py` (via `langchain_ollama.OllamaLLM`) \|
	\| Status \| Not used in production. Available for local dev where running OpenAI calls would be expensive or impossible offline. \|

	## Internal interfaces (module to module within the codebase)

	### `app.py` → `BaseMultiAgentChatbot.chat()`

	The only call from the Streamlit layer into the agent layer.

	\| Aspect \| Value \|
	\|---\|---\|
	\| Signature \| `chat(user_input: str, conversation_id: str = "default") -> Dict[str, Any]` \|
	\| Input \| `user_input` may include a `FILTER CONTEXT:` preamble with sidebar selections; `conversation_id` is the per-Streamlit-session UUID \|
	\| Output \| Dict with keys: `response` (str), `rag_result` (PipelineResult), `agent_logs` (list), `relaxation_notes` (list), `gap_follow_up` (str or None) \|
	\| Stability contract \| This signature should be considered stable. Streamlit, tests, and any future front-ends depend on it. \|

	### `MultiAgentRAGChatbot._perform_retrieval()` → `PipelineManager.run()`

	How the agent triggers retrieval.

	\| Aspect \| Value \|
	\|---\|---\|
	\| Caller \| `MultiAgentRAGChatbot._perform_retrieval` (subclass implementation of an abstract method) \|
	\| Callee \| `src/pipeline.py::PipelineManager.run(query, sources, auto_infer_filters, filters, skip_answer, ...)` \|
	\| Key call-site arguments \| `auto_infer_filters=False` (we did filter inference upstream); `skip_answer=True` (we do answer generation in the agent, not the pipeline) \|
	\| Returns \| `PipelineResult` with `.sources` (List[Document]), `.answer` (always empty when `skip_answer=True`), `.metadata` \|

	### `PipelineManager.run()` → `ContextRetriever.retrieve_context()`

	How the pipeline triggers actual vector search.

	\| Aspect \| Value \|
	\|---\|---\|
	\| Caller \| `src/pipeline.py::PipelineManager.run` \|
	\| Callee \| `src/retrieval/context.py::ContextRetriever.retrieve_context(query, reports, sources, subtype, year, district, filenames, entity_type, use_reranking, top_k, ...)` \|
	\| Returns \| `List[Document]` with metadata fields `original_score`, `reranked_score` (if reranker applied), `reranking_applied`, plus the underlying Qdrant payload (year, district, source, filename, page, etc.) \|

	### Mixin contracts (intra-`src/agents/`)

	\| Mixin \| Public attributes it sets on `self` \| Methods it exposes \|
	\|---\|---\|---\|
	\| `_MetadataMixin` (`metadata.py`) \| `self.year_whitelist`, `self.source_whitelist`, `self.district_whitelist`, `self.db_metadata_context`, `self.district_doc_counts`, `self.current_year`, `self.latest_data_year`, `self.earliest_data_year`, `self.UGANDA_REGIONS` \| `_load_dynamic_data()`, `_load_db_metadata(vectorstore)`, `_normalize_district_name(s)` \|
	\| `_FiltersMixin` (`agent_filtering.py`) \| None \| `_best_score(sources)` (static), `_prevalidate_filters(filters, anchored_keys)`, `_post_relaxation_relevance_check(sources, anchored_keys, original_filters)`, `_normalize_source_name(raw)`, `_llm_overrides_ui(...)` (static), `_validate_filter_values(filters)` \|
	\| `_ConversationHistoryMixin` (`conversation_history.py`) \| None \| `_load_conversation(file_path)`, `_save_conversation(file_path, conversation)` \|

	All mixin methods are called via `self.X()` in `BaseMultiAgentChatbot`. The orchestrator stays at the same call-site convention regardless of physical file location (Python's MRO handles the dispatch).

	### Filter-construction APIs in `src/retrieval/filter.py`

	Two coexisting filter constructors, with different consumers (see [ADR 002](architecture/adrs/002-dense-only-retrieval-hybrid-disabled.md) and DEFERRED #1 for the rationale):

	\| Function \| Signature \| Used by \|
	\|---\|---\|---\|
	\| `build_qdrant_filter_from_dict(filters: dict)` \| Dict-based, returns `Optional[Filter]` \| `_FiltersMixin._prevalidate_filters` (for cheap `count()` queries before retrieval) \|
	\| `create_filter(reports=, sources=, subtype=, year=, district=, filenames=, entity_type=)` \| Kwarg-based, returns `Filter`; handles filename mutual-exclusivity \| `src/retrieval/hybrid.py`, `src/retrieval/context.py` (for actual vector-search queries) \|

	### LangGraph state contract (`MultiAgentState`)

	Defined in `src/agents/state.py`. Every agent node reads from and writes to a shared `MultiAgentState` (a TypedDict). Keys that travel between nodes:

	\| Key \| Written by \| Read by \|
	\|---\|---\|---\|
	\| `current_query` \| `chat()` \| All agents \|
	\| `messages` \| `chat()` (after a turn completes) \| `_main_agent` (for context), `_rewrite_query_for_rag` \|
	\| `query_context` \| `_main_agent._analyze_query_context` \| `_route_after_main`, `_rag_agent`, `_response_agent` \|
	\| `rag_filters` \| `_rag_agent._build_filters` \| `_response_agent` \|
	\| `anchored_filter_keys` \| `_rag_agent._build_filters` \| `_response_agent` (relaxation logic) \|
	\| `rag_query` \| `_response_agent._rewrite_query_for_rag` (after prevalidation succeeds) \| `_response_agent._perform_retrieval` \|
	\| `retrieved_documents` \| `_response_agent` (after retrieval) \| `_response_agent._generate_conversational_response` \|
	\| `final_response` \| `_response_agent`, or pre-validation early-exit \| `chat()` (returned to caller) \|
	\| `agent_logs` \| All agents (append-only) \| `chat()` (returned to caller for debugging) \|
	\| `relaxation_notes` \| `_response_agent` (during/after relaxation) \| `chat()` (returned to caller) \|
	\| `gap_follow_up` \| `_response_agent` (when pre-validation finds an all-anchored gap) \| `chat()` \|
	\| `conversation_context["last_filters"]` \| `_main_agent` (end of turn) \| `_main_agent._analyze_query_context` (next turn's filter carryover) \|

	## Versioning and stability

	- External APIs — OpenAI and Qdrant are vendor-controlled; we depend on their API stability. Both have versioning policies; we're using stable public versions.
	- Internal APIs — no formal versioning; changes to mixin signatures or agent state keys are coordinated within the same PR. The code is small enough that this works without overhead.

	---

	Related: [`docs/system-requirements.md`](system-requirements.md) details what external services must be available for the system to function. [`docs/architecture/05-deployment-view.md`](architecture/05-deployment-view.md) shows the deployment topology of these interfaces.

	# Interfaces

	> Reference document listing every external API the system depends on, and every important module-to-module contract inside the codebase. Use this when changing an integration or refactoring an internal module to understand what might break.

	## External interfaces (the system calls out)

	### OpenAI Chat Completions API

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `https://api.openai.com/v1/chat/completions` \|
	\| Auth \| `Authorization: Bearer ${OPENAI_API_KEY}` \|
	\| Wrapper \| `src/llm/adapters.py` (via `OpenAIClient` + `LLMRegistry`) \|
	\| Models in use \| `gpt-4o-mini` (cost-efficient — query rewriting, answer generation), `gpt-4.1` (strong — query analysis) \|
	\| Configured at \| `src/config/settings.yaml::reader.OPENAI` and `reader.OPENAI_STRONG` \|
	\| Called from \| `BaseMultiAgentChatbot._analyze_query_context`, `_rewrite_query_for_rag`; `MultiAgentRAGChatbot._generate_conversational_response*` \|
	\| Failure mode \| Caught at every call site; returns a sensible fallback (original query, generic error message) \|
	\| Latency budget \| 1-5 s per call typical; tolerated up to 30 s before the user sees a "thinking" timeout (Streamlit default) \|
	\| Rate limits \| OpenAI's standard rate limits per account/key tier; not actively monitored by the app \|

	### Qdrant Cloud API

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `${QDRANT_URL}` (set as HF Space secret) \|
	\| Auth \| `api-key: ${QDRANT_API_KEY}` header \|
	\| Protocol \| gRPC preferred (configurable via `prefer_grpc: true` in settings); HTTPS fallback \|
	\| Wrapper \| `src/vectorstore.py::VectorStoreManager` (uses `langchain_qdrant.Qdrant`) \|
	\| Collection \| `BAAI-bge-m3-full` (configured in `settings.yaml::qdrant.collection_name`) \|
	\| Operations called \| `similarity_search_with_score` (vector search), `count` (pre-validation), `scroll` (metadata cache rebuild), `create_payload_index` (one-off setup) \|
	\| Called from \| `src/retrieval/context.py`, `src/retrieval/filter.py` (`MetadataCache._fetch_from_qdrant`), `src/agents/agent_filtering.py` (`_prevalidate_filters`) \|
	\| Failure mode \| Connection failure at startup → chatbot init fails, app shows error banner. Query failure → caught, returns empty results. \|
	\| Latency budget \| `similarity_search`: ~200-500 ms typical. `count`: <100 ms typical. `scroll`: ~80 s on full collection (only on cold start without disk cache). \|

	### Hugging Face Hub — model file downloads

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `https://huggingface.co/<model>/resolve/main/*` \|
	\| Auth \| Public read, no auth needed for our models \|
	\| Wrapper \| `transformers` library (transitive via `sentence-transformers` and `langchain_huggingface`) \|
	\| Models \| `BAAI/bge-m3` (embeddings), `BAAI/bge-reranker-v2-m3` (reranker) \|
	\| When called \| Only at Docker build time (`download_models.py`); pre-populated cache in image avoids runtime downloads \|
	\| Failure mode \| Build fails; deploys blocked until HF Hub is reachable \|

	### Hugging Face Hub — dataset push (logging)

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `https://huggingface.co/api/datasets/GIZ/spaces_logs/*` \|
	\| Auth \| `Bearer ${SPACES_LOG}` (write token, set as HF Space secret) \|
	\| Wrapper \| `src/logging.py` (via `huggingface_hub.HfApi`) \|
	\| What's pushed \| Conversational JSON logs (audit trail) \|
	\| Called from \| `BaseMultiAgentChatbot.chat()` after each turn \|
	\| Failure mode \| Caught silently; logs an error but doesn't fail the user request \|

	### Ollama (optional, local development only)

	\| Aspect \| Value \|
	\|---\|---\|
	\| Endpoint \| `${OLLAMA_BASE_URL}` (e.g. `http://localhost:11434/`) \|
	\| Auth \| None \|
	\| Wrapper \| `src/llm/adapters.py` (via `langchain_ollama.OllamaLLM`) \|
	\| Status \| Not used in production. Available for local dev where running OpenAI calls would be expensive or impossible offline. \|

	## Internal interfaces (module to module within the codebase)

	### `app.py` → `BaseMultiAgentChatbot.chat()`

	The only call from the Streamlit layer into the agent layer.

	\| Aspect \| Value \|
	\|---\|---\|
	\| Signature \| `chat(user_input: str, conversation_id: str = "default") -> Dict[str, Any]` \|
	\| Input \| `user_input` may include a `FILTER CONTEXT:` preamble with sidebar selections; `conversation_id` is the per-Streamlit-session UUID \|
	\| Output \| Dict with keys: `response` (str), `rag_result` (PipelineResult), `agent_logs` (list), `relaxation_notes` (list), `gap_follow_up` (str or None) \|
	\| Stability contract \| This signature should be considered stable. Streamlit, tests, and any future front-ends depend on it. \|

	### `MultiAgentRAGChatbot._perform_retrieval()` → `PipelineManager.run()`

	How the agent triggers retrieval.

	\| Aspect \| Value \|
	\|---\|---\|
	\| Caller \| `MultiAgentRAGChatbot._perform_retrieval` (subclass implementation of an abstract method) \|
	\| Callee \| `src/pipeline.py::PipelineManager.run(query, sources, auto_infer_filters, filters, skip_answer, ...)` \|
	\| Key call-site arguments \| `auto_infer_filters=False` (we did filter inference upstream); `skip_answer=True` (we do answer generation in the agent, not the pipeline) \|
	\| Returns \| `PipelineResult` with `.sources` (List[Document]), `.answer` (always empty when `skip_answer=True`), `.metadata` \|

	### `PipelineManager.run()` → `ContextRetriever.retrieve_context()`

	How the pipeline triggers actual vector search.

	\| Aspect \| Value \|
	\|---\|---\|
	\| Caller \| `src/pipeline.py::PipelineManager.run` \|
	\| Callee \| `src/retrieval/context.py::ContextRetriever.retrieve_context(query, reports, sources, subtype, year, district, filenames, entity_type, use_reranking, top_k, ...)` \|
	\| Returns \| `List[Document]` with metadata fields `original_score`, `reranked_score` (if reranker applied), `reranking_applied`, plus the underlying Qdrant payload (year, district, source, filename, page, etc.) \|

	### Mixin contracts (intra-`src/agents/`)

	\| Mixin \| Public attributes it sets on `self` \| Methods it exposes \|
	\|---\|---\|---\|
	\| `_MetadataMixin` (`metadata.py`) \| `self.year_whitelist`, `self.source_whitelist`, `self.district_whitelist`, `self.db_metadata_context`, `self.district_doc_counts`, `self.current_year`, `self.latest_data_year`, `self.earliest_data_year`, `self.UGANDA_REGIONS` \| `_load_dynamic_data()`, `_load_db_metadata(vectorstore)`, `_normalize_district_name(s)` \|
	\| `_FiltersMixin` (`agent_filtering.py`) \| None \| `_best_score(sources)` (static), `_prevalidate_filters(filters, anchored_keys)`, `_post_relaxation_relevance_check(sources, anchored_keys, original_filters)`, `_normalize_source_name(raw)`, `_llm_overrides_ui(...)` (static), `_validate_filter_values(filters)` \|
	\| `_ConversationHistoryMixin` (`conversation_history.py`) \| None \| `_load_conversation(file_path)`, `_save_conversation(file_path, conversation)` \|

	All mixin methods are called via `self.X()` in `BaseMultiAgentChatbot`. The orchestrator stays at the same call-site convention regardless of physical file location (Python's MRO handles the dispatch).

	### Filter-construction APIs in `src/retrieval/filter.py`

	Two coexisting filter constructors, with different consumers (see [ADR 002](architecture/adrs/002-dense-only-retrieval-hybrid-disabled.md) and DEFERRED #1 for the rationale):

	\| Function \| Signature \| Used by \|
	\|---\|---\|---\|
	\| `build_qdrant_filter_from_dict(filters: dict)` \| Dict-based, returns `Optional[Filter]` \| `_FiltersMixin._prevalidate_filters` (for cheap `count()` queries before retrieval) \|
	\| `create_filter(reports=, sources=, subtype=, year=, district=, filenames=, entity_type=)` \| Kwarg-based, returns `Filter`; handles filename mutual-exclusivity \| `src/retrieval/hybrid.py`, `src/retrieval/context.py` (for actual vector-search queries) \|

	### LangGraph state contract (`MultiAgentState`)

	Defined in `src/agents/state.py`. Every agent node reads from and writes to a shared `MultiAgentState` (a TypedDict). Keys that travel between nodes:

	\| Key \| Written by \| Read by \|
	\|---\|---\|---\|
	\| `current_query` \| `chat()` \| All agents \|
	\| `messages` \| `chat()` (after a turn completes) \| `_main_agent` (for context), `_rewrite_query_for_rag` \|
	\| `query_context` \| `_main_agent._analyze_query_context` \| `_route_after_main`, `_rag_agent`, `_response_agent` \|
	\| `rag_filters` \| `_rag_agent._build_filters` \| `_response_agent` \|
	\| `anchored_filter_keys` \| `_rag_agent._build_filters` \| `_response_agent` (relaxation logic) \|
	\| `rag_query` \| `_response_agent._rewrite_query_for_rag` (after prevalidation succeeds) \| `_response_agent._perform_retrieval` \|
	\| `retrieved_documents` \| `_response_agent` (after retrieval) \| `_response_agent._generate_conversational_response` \|
	\| `final_response` \| `_response_agent`, or pre-validation early-exit \| `chat()` (returned to caller) \|
	\| `agent_logs` \| All agents (append-only) \| `chat()` (returned to caller for debugging) \|
	\| `relaxation_notes` \| `_response_agent` (during/after relaxation) \| `chat()` (returned to caller) \|
	\| `gap_follow_up` \| `_response_agent` (when pre-validation finds an all-anchored gap) \| `chat()` \|
	\| `conversation_context["last_filters"]` \| `_main_agent` (end of turn) \| `_main_agent._analyze_query_context` (next turn's filter carryover) \|

	## Versioning and stability

	- External APIs — OpenAI and Qdrant are vendor-controlled; we depend on their API stability. Both have versioning policies; we're using stable public versions.
	- Internal APIs — no formal versioning; changes to mixin signatures or agent state keys are coordinated within the same PR. The code is small enough that this works without overhead.

	---

	Related: [`docs/system-requirements.md`](system-requirements.md) details what external services must be available for the system to function. [`docs/architecture/05-deployment-view.md`](architecture/05-deployment-view.md) shows the deployment topology of these interfaces.