Agent-Tool-State Contract Registry
Status: Canonical Source of Truth
Last Updated: 2025-12-06
Purpose: Developer reference for multi-agent coordination
This document defines the exact contracts between agents, tools, and shared state. Use this when:
- Adding new agents or tools
- Modifying agent behavior
- Debugging coordination issues
- Understanding "if I change X, what breaks?"
Table of Contents
- System Overview
- Agent Contracts
- Judge Decision Criteria
- Shared State (ResearchMemory)
- Tool Contracts
- Event Flow
- Break Conditions
- Dependency Matrix
System Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR (AdvancedOrchestrator) β
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Manager ββββΆβ Agents ββββΆβ Memory β β
β β (Magentic) β β (ChatAgent) β β(ResearchMem)β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β β β β
β β βΌ βΌ β
β β βββββββββββββββ βββββββββββββββ β
β ββββββββββΆβ Tools ββββΆβ Embeddings β β
β β(@ai_function)β β (ChromaDB) β β
β βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Agent Inventory
| Agent |
File |
Role |
Tools |
| SearchAgent |
magentic_agents.py |
Evidence gathering |
search_pubmed, search_clinical_trials, search_preprints |
| JudgeAgent |
magentic_agents.py |
Evidence evaluation |
None (LLM only) |
| HypothesisAgent |
magentic_agents.py |
Mechanism generation |
None (LLM only) |
| ReportAgent |
magentic_agents.py |
Report synthesis |
get_bibliography |
| RetrievalAgent |
retrieval_agent.py |
Web search |
search_web |
β οΈ Dead Code Warning: RetrievalAgent is implemented but NOT wired into magentic_agents.py.
The orchestrator only uses SearchAgent (PubMed, ClinicalTrials, EuropePMC), not web search.
See GitHub issue #134 for decision to delete or wire in.
Agent Contracts
SearchAgent
Factory: create_search_agent(chat_client, domain, api_key) -> ChatAgent
Input
"Search for testosterone and libido mechanisms in peer-reviewed literature"
Output
message.text = """
Found 15 sources (12 new added to context):
- [Title 1](url): Abstract excerpt...
- [Title 2](url): Abstract excerpt...
"""
message.additional_properties = {
"evidence": [Evidence.model_dump(), ...]
}
State Access
| Operation |
Key |
Type |
Description |
| READ |
memory.query |
str |
Current research question |
| READ |
memory.evidence_ids |
list[str] |
Existing evidence URLs |
| WRITE |
memory._evidence_cache |
dict[str, Evidence] |
Caches Evidence objects |
| WRITE |
memory.evidence_ids |
list[str] |
Appends new URLs |
| WRITE |
embedding_service |
VectorDB |
Stores embeddings |
Side Effects
- Calls external APIs (PubMed, ClinicalTrials, Europe PMC)
- Deduplicates via semantic similarity (0.9 threshold)
- Stores in vector database
Error Behavior
- API failure β Returns "No results found for: {query}"
- Rate limit β Raises
RateLimitError (caught by orchestrator)
JudgeAgent
Factory: create_judge_agent(chat_client, domain, api_key) -> ChatAgent
Input
"Evaluate if we have sufficient evidence to answer: {query}"
Output
message.text = """
## Assessment
β
SUFFICIENT EVIDENCE (confidence: 85%). STOP SEARCHING.
### Scores
- Mechanism: 8/10
- Clinical: 7/10
### Reasoning
Strong evidence for testosterone-AR pathway...
"""
message.additional_properties = {
"assessment": JudgeAssessment.model_dump()
}
State Access
| Operation |
Key |
Type |
Description |
| READ |
Evidence from context |
list[Evidence] |
Passed by Manager |
| WRITE |
None |
- |
Read-only evaluation |
Side Effects
Critical Output Signal
"β
SUFFICIENT EVIDENCE" β Manager delegates to ReportAgent
"β INSUFFICIENT" β Manager calls SearchAgent with suggested queries
HypothesisAgent
Factory: create_hypothesis_agent(chat_client, domain, api_key) -> ChatAgent
Input
"Generate mechanistic hypotheses for: {query}"
Output
message.text = """
## Hypothesis 1 (Confidence: 75%)
**Mechanism**: Testosterone β Androgen Receptor β BDNF β Libido
**Suggested searches**: testosterone BDNF, androgen receptor signaling
## Primary Hypothesis
Testosterone β AR β dopamine release β reward pathway
## Knowledge Gaps
- Dose-response relationship unclear
"""
message.additional_properties = {
"assessment": HypothesisAssessment.model_dump()
}
State Access
| Operation |
Key |
Type |
Description |
| READ |
memory.query |
str |
Research question |
| READ |
Evidence from context |
list[Evidence] |
Current evidence |
| WRITE |
evidence_store["hypotheses"] |
list |
Appends hypotheses |
ReportAgent
Factory: create_report_agent(chat_client, domain, api_key) -> ChatAgent
Input
"Generate final research report for: {query}"
Output
message.text = ResearchReport.to_markdown()
message.additional_properties = {
"report": ResearchReport.model_dump()
}
State Access
| Operation |
Key |
Type |
Description |
| READ |
memory.get_all_evidence() |
list[Evidence] |
All collected evidence |
| READ |
evidence_store["hypotheses"] |
list |
Generated hypotheses |
| READ |
evidence_store["last_assessment"] |
JudgeAssessment |
Final assessment |
| WRITE |
evidence_store["final_report"] |
ResearchReport |
Stores report |
Tool: get_bibliography()
@ai_function
def get_bibliography() -> str:
"""Returns formatted reference list from all evidence."""
evidence = state.memory.get_all_evidence()
return format_as_references(evidence)
Judge Decision Criteria
Scoring Dimensions
Mechanism Score (0-10)
| Score |
Meaning |
| 0-3 |
Minimal mechanism understanding |
| 4-5 |
Partial mechanism (some targets identified) |
| 6-7 |
Clear mechanism (targets + pathways) |
| 8-9 |
Comprehensive (multiple pathways, regulation) |
| 10 |
Complete understanding |
Clinical Evidence Score (0-10)
| Score |
Meaning |
| 0-3 |
Preclinical only or weak human evidence |
| 4-5 |
Some human evidence (small trials, case reports) |
| 6-7 |
Strong human evidence (RCTs) |
| 8-9 |
Robust (meta-analysis, large RCTs) |
| 10 |
Definitive clinical proof |
Sufficiency Decision
if (
confidence >= 0.7
and mechanism_score >= 6
and clinical_evidence_score >= 6
):
sufficient = True
recommendation = "synthesize"
else:
sufficient = False
recommendation = "continue"
next_search_queries = ["suggested query 1", "suggested query 2"]
JudgeAssessment Model
class JudgeAssessment(BaseModel):
details: AssessmentDetails
mechanism_score: int
mechanism_reasoning: str
clinical_evidence_score: int
clinical_reasoning: str
drug_candidates: list[str]
key_findings: list[str]
sufficient: bool
confidence: float
recommendation: Literal["continue", "synthesize"]
next_search_queries: list[str]
reasoning: str
Shared State (ResearchMemory)
Initialization
state = init_magentic_state(query, embedding_service)
Memory Structure
class ResearchMemory:
query: str
hypotheses: list[Hypothesis]
conflicts: list[Conflict]
evidence_ids: list[str]
_evidence_cache: dict[str, Evidence]
iteration_count: int
_embedding_service: EmbeddingServiceProtocol
Key Methods
| Method |
Returns |
Description |
store_evidence(evidence) |
list[str] |
Store with dedup, return new IDs |
get_all_evidence() |
list[Evidence] |
All accumulated evidence |
get_relevant_evidence(n) |
list[Evidence] |
Top N by semantic similarity |
get_context_summary() |
str |
Markdown summary for fallback |
add_hypothesis(h) |
None |
Append hypothesis |
get_confirmed_hypotheses() |
list[Hypothesis] |
Confidence > 0.8 |
State Flow
User Query
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ResearchMemory initialized (empty) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
SearchAgent βββΆ store_evidence([Evidence]) βββΆ evidence_ids grows
β
βΌ
JudgeAgent βββΆ reads evidence from context βββΆ returns assessment
β
ββββ INSUFFICIENT βββΆ SearchAgent (with next_search_queries)
β
ββββ SUFFICIENT βββΆ ReportAgent
β
βΌ
get_all_evidence() βββΆ ResearchReport
Tool Contracts
search_pubmed
File: src/agents/tools.py
@ai_function
async def search_pubmed(query: str, max_results: int = 10) -> str:
"""Search PubMed for biomedical research papers."""
| Aspect |
Value |
| External API |
NCBI E-utilities |
| Rate Limit |
3/sec (10/sec with NCBI_API_KEY) |
| Output |
Formatted string with titles/abstracts |
| Side Effect |
Stores Evidence in memory |
search_clinical_trials
@ai_function
async def search_clinical_trials(query: str, max_results: int = 10) -> str:
"""Search ClinicalTrials.gov for clinical studies."""
| Aspect |
Value |
| External API |
ClinicalTrials.gov (uses requests not httpx) |
| Rate Limit |
Standard HTTP limits |
| Output |
Trial status, conditions, interventions |
| Side Effect |
Stores Evidence in memory |
search_preprints
@ai_function
async def search_preprints(query: str, max_results: int = 10) -> str:
"""Search Europe PMC for preprints and papers."""
| Aspect |
Value |
| External API |
Europe PMC REST API |
| Output |
Papers with PMIDs, DOIs |
| Side Effect |
Stores Evidence in memory |
get_bibliography
@ai_function
def get_bibliography() -> str:
"""Get formatted reference list from all collected evidence."""
| Aspect |
Value |
| External API |
None |
| Reads |
memory.get_all_evidence() |
| Output |
Numbered reference list |
search_web
@ai_function
async def search_web(query: str, max_results: int = 10) -> str:
"""Search web using DuckDuckGo."""
| Aspect |
Value |
| External API |
DuckDuckGo |
| Output |
Web results with URLs |
| Side Effect |
Stores Evidence in memory |
Event Flow
AgentEvent Types
| Type |
When Emitted |
Data |
started |
Workflow begins |
None |
thinking |
Before first agent event |
None |
searching |
SearchAgent active |
agent_id |
search_complete |
SearchAgent done |
evidence count |
judging |
JudgeAgent active |
agent_id |
judge_complete |
JudgeAgent done |
assessment |
hypothesizing |
HypothesisAgent active |
agent_id |
synthesizing |
ReportAgent active |
agent_id |
streaming |
Real-time text |
text, agent_id |
complete |
Workflow done |
report, iterations |
error |
Error occurred |
error message |
progress |
Status update |
status message |
Typical Sequence
1. started β "Starting research..."
2. progress β "Loading embedding service..."
3. thinking β "Multi-agent reasoning..."
4. streaming (searcher) β "Found 15 sources..."
5. streaming (judge) β "β
SUFFICIENT..."
6. streaming (reporter) β "## Research Report..."
7. complete β Final report
Break Conditions
The orchestrator exits when ANY of these occur:
1. Judge Approval β
if "SUFFICIENT EVIDENCE" in judge_response:
2. Max Rounds Reached π
max_round_count = 5
if not reporter_ran:
async for event in _synthesize_fallback(iteration, "max_rounds"):
yield event
3. Timeout β±οΈ
try:
async with asyncio.timeout(settings.advanced_timeout):
async for event in workflow.run_stream(task):
yield event
except TimeoutError:
async for event in _synthesize_fallback(iteration, "timeout"):
yield event
4. Token Budget πΎ
Dependency Matrix
"If I change X, what breaks?"
| Changed Component |
Affected Components |
Impact |
| Evidence model |
All agents, Memory, Tools |
HIGH - Core data type |
| JudgeAssessment |
Judge, Orchestrator |
HIGH - Decision flow |
| ResearchMemory |
All agents |
HIGH - Shared state |
| search_pubmed |
SearchAgent |
MEDIUM - One tool |
| get_bibliography |
ReportAgent |
MEDIUM - References |
| AgentEvent |
Orchestrator, UI |
MEDIUM - Streaming |
| EmbeddingService |
Memory, Dedup |
MEDIUM - Similarity |
| Judge thresholds |
Workflow loop count |
LOW - Tuning |
| System prompts |
Agent behavior |
LOW - Prompt eng |
Agent Dependencies
SearchAgent
βββ REQUIRES: MagenticState, EmbeddingService
βββ WRITES TO: ResearchMemory (evidence)
βββ NO DEPS ON: Other agents
JudgeAgent
βββ REQUIRES: Evidence context (from Manager)
βββ WRITES TO: Nothing
βββ CONTROLS: SearchAgent (continue) or ReportAgent (synthesize)
HypothesisAgent
βββ REQUIRES: Evidence context
βββ WRITES TO: evidence_store["hypotheses"]
βββ NO DEPS ON: Other agents
ReportAgent
βββ REQUIRES: ResearchMemory, hypotheses, assessment
βββ READS FROM: All prior state
βββ WRITES TO: evidence_store["final_report"]
Critical Thresholds
| Threshold |
Value |
Location |
Impact |
| Confidence threshold |
0.7 (70%) |
JudgeAssessment |
Sufficiency decision |
| Mechanism score threshold |
6 |
Judge criteria |
Sufficiency decision |
| Clinical score threshold |
6 |
Judge criteria |
Sufficiency decision |
| Max manager rounds |
5 |
AdvancedOrchestrator |
Loop termination |
| Max stall count |
3 |
MagenticBuilder |
Stall detection |
| Dedup similarity |
0.9 |
EmbeddingService |
Evidence dedup |
| Max evidence for judge |
30 |
prompts/judge.py |
Context limit |
| Confirmed hypothesis |
0.8 |
ResearchMemory |
High-confidence filter |
| Timeout |
600s |
settings.advanced_timeout |
Workflow timeout |
Developer Checklist
When modifying agents:
When adding new agents:
When changing Judge criteria:
This document is the source of truth for multi-agent coordination.