File size: 18,249 Bytes
f81b58b c7a2e77 f81b58b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 |
# Agent-Tool-State Contract Registry
> **Status**: Canonical Source of Truth
> **Last Updated**: 2025-12-06
> **Purpose**: Developer reference for multi-agent coordination
This document defines the exact contracts between agents, tools, and shared state. Use this when:
- Adding new agents or tools
- Modifying agent behavior
- Debugging coordination issues
- Understanding "if I change X, what breaks?"
---
## Table of Contents
1. [System Overview](#system-overview)
2. [Agent Contracts](#agent-contracts)
3. [Judge Decision Criteria](#judge-decision-criteria)
4. [Shared State (ResearchMemory)](#shared-state-researchmemory)
5. [Tool Contracts](#tool-contracts)
6. [Event Flow](#event-flow)
7. [Break Conditions](#break-conditions)
8. [Dependency Matrix](#dependency-matrix)
---
## System Overview
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR (AdvancedOrchestrator) β
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Manager ββββΆβ Agents ββββΆβ Memory β β
β β (Magentic) β β (ChatAgent) β β(ResearchMem)β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β β β β
β β βΌ βΌ β
β β βββββββββββββββ βββββββββββββββ β
β ββββββββββΆβ Tools ββββΆβ Embeddings β β
β β(@ai_function)β β (ChromaDB) β β
β βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Agent Inventory
| Agent | File | Role | Tools |
|-------|------|------|-------|
| **SearchAgent** | `magentic_agents.py` | Evidence gathering | search_pubmed, search_clinical_trials, search_preprints |
| **JudgeAgent** | `magentic_agents.py` | Evidence evaluation | None (LLM only) |
| **HypothesisAgent** | `magentic_agents.py` | Mechanism generation | None (LLM only) |
| **ReportAgent** | `magentic_agents.py` | Report synthesis | get_bibliography |
| **RetrievalAgent** | `retrieval_agent.py` | Web search | search_web |
> **β οΈ Dead Code Warning:** RetrievalAgent is implemented but NOT wired into `magentic_agents.py`.
> The orchestrator only uses SearchAgent (PubMed, ClinicalTrials, EuropePMC), not web search.
> See GitHub issue #134 for decision to delete or wire in.
---
## Agent Contracts
### SearchAgent
**Factory**: `create_search_agent(chat_client, domain, api_key) -> ChatAgent`
#### Input
```python
# Manager instruction (string)
"Search for testosterone and libido mechanisms in peer-reviewed literature"
```
#### Output
```python
# ChatMessage with:
message.text = """
Found 15 sources (12 new added to context):
- [Title 1](url): Abstract excerpt...
- [Title 2](url): Abstract excerpt...
"""
message.additional_properties = {
"evidence": [Evidence.model_dump(), ...]
}
```
#### State Access
| Operation | Key | Type | Description |
|-----------|-----|------|-------------|
| **READ** | `memory.query` | str | Current research question |
| **READ** | `memory.evidence_ids` | list[str] | Existing evidence URLs |
| **WRITE** | `memory._evidence_cache` | dict[str, Evidence] | Caches Evidence objects |
| **WRITE** | `memory.evidence_ids` | list[str] | Appends new URLs |
| **WRITE** | `embedding_service` | VectorDB | Stores embeddings |
#### Side Effects
1. Calls external APIs (PubMed, ClinicalTrials, Europe PMC)
2. Deduplicates via semantic similarity (0.9 threshold)
3. Stores in vector database
#### Error Behavior
- API failure β Returns "No results found for: {query}"
- Rate limit β Raises `RateLimitError` (caught by orchestrator)
---
### JudgeAgent
**Factory**: `create_judge_agent(chat_client, domain, api_key) -> ChatAgent`
#### Input
```python
# Manager instruction with evidence context
"Evaluate if we have sufficient evidence to answer: {query}"
# + Evidence list in context
```
#### Output
```python
# ChatMessage with:
message.text = """
## Assessment
β
SUFFICIENT EVIDENCE (confidence: 85%). STOP SEARCHING.
### Scores
- Mechanism: 8/10
- Clinical: 7/10
### Reasoning
Strong evidence for testosterone-AR pathway...
"""
message.additional_properties = {
"assessment": JudgeAssessment.model_dump()
}
```
#### State Access
| Operation | Key | Type | Description |
|-----------|-----|------|-------------|
| **READ** | Evidence from context | list[Evidence] | Passed by Manager |
| **WRITE** | None | - | Read-only evaluation |
#### Side Effects
- None (pure evaluation)
#### Critical Output Signal
- `"β
SUFFICIENT EVIDENCE"` β Manager delegates to ReportAgent
- `"β INSUFFICIENT"` β Manager calls SearchAgent with suggested queries
---
### HypothesisAgent
**Factory**: `create_hypothesis_agent(chat_client, domain, api_key) -> ChatAgent`
#### Input
```python
# Manager instruction
"Generate mechanistic hypotheses for: {query}"
```
#### Output
```python
# ChatMessage with:
message.text = """
## Hypothesis 1 (Confidence: 75%)
**Mechanism**: Testosterone β Androgen Receptor β BDNF β Libido
**Suggested searches**: testosterone BDNF, androgen receptor signaling
## Primary Hypothesis
Testosterone β AR β dopamine release β reward pathway
## Knowledge Gaps
- Dose-response relationship unclear
"""
message.additional_properties = {
"assessment": HypothesisAssessment.model_dump()
}
```
#### State Access
| Operation | Key | Type | Description |
|-----------|-----|------|-------------|
| **READ** | `memory.query` | str | Research question |
| **READ** | Evidence from context | list[Evidence] | Current evidence |
| **WRITE** | `evidence_store["hypotheses"]` | list | Appends hypotheses |
---
### ReportAgent
**Factory**: `create_report_agent(chat_client, domain, api_key) -> ChatAgent`
#### Input
```python
# Manager instruction
"Generate final research report for: {query}"
```
#### Output
```python
# ChatMessage with:
message.text = ResearchReport.to_markdown() # Full markdown report
message.additional_properties = {
"report": ResearchReport.model_dump()
}
```
#### State Access
| Operation | Key | Type | Description |
|-----------|-----|------|-------------|
| **READ** | `memory.get_all_evidence()` | list[Evidence] | All collected evidence |
| **READ** | `evidence_store["hypotheses"]` | list | Generated hypotheses |
| **READ** | `evidence_store["last_assessment"]` | JudgeAssessment | Final assessment |
| **WRITE** | `evidence_store["final_report"]` | ResearchReport | Stores report |
#### Tool: get_bibliography()
```python
@ai_function
def get_bibliography() -> str:
"""Returns formatted reference list from all evidence."""
evidence = state.memory.get_all_evidence()
return format_as_references(evidence)
```
---
## Judge Decision Criteria
### Scoring Dimensions
**Mechanism Score (0-10)**
| Score | Meaning |
|-------|---------|
| 0-3 | Minimal mechanism understanding |
| 4-5 | Partial mechanism (some targets identified) |
| 6-7 | Clear mechanism (targets + pathways) |
| 8-9 | Comprehensive (multiple pathways, regulation) |
| 10 | Complete understanding |
**Clinical Evidence Score (0-10)**
| Score | Meaning |
|-------|---------|
| 0-3 | Preclinical only or weak human evidence |
| 4-5 | Some human evidence (small trials, case reports) |
| 6-7 | Strong human evidence (RCTs) |
| 8-9 | Robust (meta-analysis, large RCTs) |
| 10 | Definitive clinical proof |
### Sufficiency Decision
```python
# SUFFICIENT (recommendation="synthesize")
if (
confidence >= 0.7 # 70%
and mechanism_score >= 6
and clinical_evidence_score >= 6
):
sufficient = True
recommendation = "synthesize"
# INSUFFICIENT (recommendation="continue")
else:
sufficient = False
recommendation = "continue"
next_search_queries = ["suggested query 1", "suggested query 2"]
```
### JudgeAssessment Model
```python
class JudgeAssessment(BaseModel):
details: AssessmentDetails
mechanism_score: int # 0-10
mechanism_reasoning: str # min 10 chars
clinical_evidence_score: int # 0-10
clinical_reasoning: str # min 10 chars
drug_candidates: list[str]
key_findings: list[str]
sufficient: bool # Ready for synthesis?
confidence: float # 0.0-1.0
recommendation: Literal["continue", "synthesize"]
next_search_queries: list[str] # If continue
reasoning: str # min 20 chars
```
---
## Shared State (ResearchMemory)
### Initialization
```python
# Per-query isolation via ContextVar
state = init_magentic_state(query, embedding_service)
# Returns MagenticState wrapping ResearchMemory
```
### Memory Structure
```python
class ResearchMemory:
query: str # Research question
hypotheses: list[Hypothesis] # Generated hypotheses
conflicts: list[Conflict] # Detected conflicts
evidence_ids: list[str] # URLs (unique keys)
_evidence_cache: dict[str, Evidence] # URL -> Evidence
iteration_count: int # Current iteration
_embedding_service: EmbeddingServiceProtocol
```
### Key Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `store_evidence(evidence)` | `list[str]` | Store with dedup, return new IDs |
| `get_all_evidence()` | `list[Evidence]` | All accumulated evidence |
| `get_relevant_evidence(n)` | `list[Evidence]` | Top N by semantic similarity |
| `get_context_summary()` | `str` | Markdown summary for fallback |
| `add_hypothesis(h)` | `None` | Append hypothesis |
| `get_confirmed_hypotheses()` | `list[Hypothesis]` | Confidence > 0.8 |
### State Flow
```
User Query
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ResearchMemory initialized (empty) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
SearchAgent βββΆ store_evidence([Evidence]) βββΆ evidence_ids grows
β
βΌ
JudgeAgent βββΆ reads evidence from context βββΆ returns assessment
β
ββββ INSUFFICIENT βββΆ SearchAgent (with next_search_queries)
β
ββββ SUFFICIENT βββΆ ReportAgent
β
βΌ
get_all_evidence() βββΆ ResearchReport
```
---
## Tool Contracts
### search_pubmed
**File**: `src/agents/tools.py`
```python
@ai_function
async def search_pubmed(query: str, max_results: int = 10) -> str:
"""Search PubMed for biomedical research papers."""
```
| Aspect | Value |
|--------|-------|
| External API | NCBI E-utilities |
| Rate Limit | 3/sec (10/sec with NCBI_API_KEY) |
| Output | Formatted string with titles/abstracts |
| Side Effect | Stores Evidence in memory |
### search_clinical_trials
```python
@ai_function
async def search_clinical_trials(query: str, max_results: int = 10) -> str:
"""Search ClinicalTrials.gov for clinical studies."""
```
| Aspect | Value |
|--------|-------|
| External API | ClinicalTrials.gov (uses `requests` not httpx) |
| Rate Limit | Standard HTTP limits |
| Output | Trial status, conditions, interventions |
| Side Effect | Stores Evidence in memory |
### search_preprints
```python
@ai_function
async def search_preprints(query: str, max_results: int = 10) -> str:
"""Search Europe PMC for preprints and papers."""
```
| Aspect | Value |
|--------|-------|
| External API | Europe PMC REST API |
| Output | Papers with PMIDs, DOIs |
| Side Effect | Stores Evidence in memory |
### get_bibliography
```python
@ai_function
def get_bibliography() -> str:
"""Get formatted reference list from all collected evidence."""
```
| Aspect | Value |
|--------|-------|
| External API | None |
| Reads | `memory.get_all_evidence()` |
| Output | Numbered reference list |
### search_web
```python
@ai_function
async def search_web(query: str, max_results: int = 10) -> str:
"""Search web using DuckDuckGo."""
```
| Aspect | Value |
|--------|-------|
| External API | DuckDuckGo |
| Output | Web results with URLs |
| Side Effect | Stores Evidence in memory |
---
## Event Flow
### AgentEvent Types
| Type | When Emitted | Data |
|------|--------------|------|
| `started` | Workflow begins | None |
| `thinking` | Before first agent event | None |
| `searching` | SearchAgent active | agent_id |
| `search_complete` | SearchAgent done | evidence count |
| `judging` | JudgeAgent active | agent_id |
| `judge_complete` | JudgeAgent done | assessment |
| `hypothesizing` | HypothesisAgent active | agent_id |
| `synthesizing` | ReportAgent active | agent_id |
| `streaming` | Real-time text | text, agent_id |
| `complete` | Workflow done | report, iterations |
| `error` | Error occurred | error message |
| `progress` | Status update | status message |
### Typical Sequence
```
1. started β "Starting research..."
2. progress β "Loading embedding service..."
3. thinking β "Multi-agent reasoning..."
4. streaming (searcher) β "Found 15 sources..."
5. streaming (judge) β "β
SUFFICIENT..."
6. streaming (reporter) β "## Research Report..."
7. complete β Final report
```
---
## Break Conditions
The orchestrator exits when ANY of these occur:
### 1. Judge Approval β
```python
if "SUFFICIENT EVIDENCE" in judge_response:
# Manager delegates to ReportAgent
# ReportAgent completes β Workflow ends
```
### 2. Max Rounds Reached π
```python
# MagenticBuilder config
max_round_count = 5 # Default
# After 5 manager rounds:
if not reporter_ran:
# Force fallback synthesis
async for event in _synthesize_fallback(iteration, "max_rounds"):
yield event
```
### 3. Timeout β±οΈ
```python
try:
async with asyncio.timeout(settings.advanced_timeout): # 600s default
async for event in workflow.run_stream(task):
yield event
except TimeoutError:
async for event in _synthesize_fallback(iteration, "timeout"):
yield event
```
### 4. Token Budget πΎ
```python
# Implicit via PydanticAI/LLM client
# ~50K tokens per query (from settings)
# Individual agent calls handle retries
```
---
## Dependency Matrix
### "If I change X, what breaks?"
| Changed Component | Affected Components | Impact |
|-------------------|---------------------|--------|
| **Evidence model** | All agents, Memory, Tools | HIGH - Core data type |
| **JudgeAssessment** | Judge, Orchestrator | HIGH - Decision flow |
| **ResearchMemory** | All agents | HIGH - Shared state |
| **search_pubmed** | SearchAgent | MEDIUM - One tool |
| **get_bibliography** | ReportAgent | MEDIUM - References |
| **AgentEvent** | Orchestrator, UI | MEDIUM - Streaming |
| **EmbeddingService** | Memory, Dedup | MEDIUM - Similarity |
| **Judge thresholds** | Workflow loop count | LOW - Tuning |
| **System prompts** | Agent behavior | LOW - Prompt eng |
### Agent Dependencies
```
SearchAgent
βββ REQUIRES: MagenticState, EmbeddingService
βββ WRITES TO: ResearchMemory (evidence)
βββ NO DEPS ON: Other agents
JudgeAgent
βββ REQUIRES: Evidence context (from Manager)
βββ WRITES TO: Nothing
βββ CONTROLS: SearchAgent (continue) or ReportAgent (synthesize)
HypothesisAgent
βββ REQUIRES: Evidence context
βββ WRITES TO: evidence_store["hypotheses"]
βββ NO DEPS ON: Other agents
ReportAgent
βββ REQUIRES: ResearchMemory, hypotheses, assessment
βββ READS FROM: All prior state
βββ WRITES TO: evidence_store["final_report"]
```
---
## Critical Thresholds
| Threshold | Value | Location | Impact |
|-----------|-------|----------|--------|
| Confidence threshold | 0.7 (70%) | JudgeAssessment | Sufficiency decision |
| Mechanism score threshold | 6 | Judge criteria | Sufficiency decision |
| Clinical score threshold | 6 | Judge criteria | Sufficiency decision |
| Max manager rounds | 5 | AdvancedOrchestrator | Loop termination |
| Max stall count | 3 | MagenticBuilder | Stall detection |
| Dedup similarity | 0.9 | EmbeddingService | Evidence dedup |
| Max evidence for judge | 30 | prompts/judge.py | Context limit |
| Confirmed hypothesis | 0.8 | ResearchMemory | High-confidence filter |
| Timeout | 600s | settings.advanced_timeout | Workflow timeout |
---
## Developer Checklist
When modifying agents:
- [ ] Update this document if contracts change
- [ ] Verify state access (read/write) is correct
- [ ] Check tool side effects
- [ ] Test with `make check`
- [ ] Verify event emission
When adding new agents:
- [ ] Create factory function in `magentic_agents.py`
- [ ] Define input/output contract
- [ ] Document state access
- [ ] Add to Agent Inventory table
- [ ] Update Dependency Matrix
When changing Judge criteria:
- [ ] Update JudgeAssessment model
- [ ] Update Critical Thresholds table
- [ ] Test workflow loop behavior
- [ ] Verify fallback synthesis triggers correctly
---
*This document is the source of truth for multi-agent coordination.*
|