VibecoderMcSwaggins commited on
Commit
cb24279
·
1 Parent(s): 4b245e3

docs: add SPEC_10 for domain-agnostic refactor

Browse files

Addresses #75 (Domain Identity Crisis) and #76 (Hardcoded Prompts).

Creates ironclad specification for:
- Centralized DomainConfig in src/config/domain.py
- 15 hardcoded locations to update
- Testing strategy and migration path
- Backwards-compatible rollout

Key insight: Make agent GENERAL by default, with domain presets
for specialization (drug_repurposing, sexual_health, etc.)

docs/specs/SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md ADDED
@@ -0,0 +1,442 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC_10: Domain-Agnostic Refactor
2
+
3
+ **Status**: DRAFT
4
+ **Priority**: P1
5
+ **Effort**: Medium (2-3 hours)
6
+ **Related Issues**: #75, #76
7
+
8
+ ## Problem Statement
9
+
10
+ The codebase has "drug repurposing" hardcoded in **15 locations**:
11
+
12
+ ```
13
+ src/prompts/report.py:11 - SYSTEM_PROMPT
14
+ src/prompts/judge.py:5 - SYSTEM_PROMPT
15
+ src/prompts/judge.py:140 - Evidence scoring prompt
16
+ src/prompts/hypothesis.py:11 - SYSTEM_PROMPT
17
+ src/orchestrators/simple.py:476 - Report header
18
+ src/orchestrators/simple.py:564 - Report header
19
+ src/orchestrators/advanced.py:159 - Task prompt
20
+ src/agents/magentic_agents.py:33 - Agent description
21
+ src/agents/magentic_agents.py:108 - Agent description
22
+ src/agents/search_agent.py:31 - Tool description
23
+ src/agents/tools.py:85 - Tool docstring
24
+ src/mcp_tools.py:27 - Example query
25
+ src/mcp_tools.py:116 - Docstring
26
+ src/mcp_tools.py:164 - Function docstring
27
+ src/mcp_tools.py:167 - Docstring
28
+ ```
29
+
30
+ This violates:
31
+ - **DRY** - Same concept repeated 15 times
32
+ - **Open/Closed** - Can't add domains without modifying multiple files
33
+ - **Flexibility** - Agent is locked to one domain
34
+
35
+ ## Solution: Centralized Domain Configuration
36
+
37
+ ### 1. Create Domain Config Module
38
+
39
+ **File**: `src/config/domain.py`
40
+
41
+ ```python
42
+ """Centralized domain configuration for research agents.
43
+
44
+ This module defines research domains and their associated prompts,
45
+ allowing the agent to operate in domain-agnostic or domain-specific modes.
46
+
47
+ Usage:
48
+ from src.config.domain import get_domain_config, ResearchDomain
49
+
50
+ # Get default (general) config
51
+ config = get_domain_config()
52
+
53
+ # Get specific domain
54
+ config = get_domain_config(ResearchDomain.SEXUAL_HEALTH)
55
+
56
+ # Use in prompts
57
+ system_prompt = config.judge_system_prompt
58
+ """
59
+
60
+ from enum import Enum
61
+ from typing import ClassVar
62
+
63
+ from pydantic import BaseModel
64
+
65
+
66
+ class ResearchDomain(str, Enum):
67
+ """Available research domains."""
68
+
69
+ GENERAL = "general"
70
+ DRUG_REPURPOSING = "drug_repurposing"
71
+ SEXUAL_HEALTH = "sexual_health"
72
+
73
+
74
+ class DomainConfig(BaseModel):
75
+ """Configuration for a research domain.
76
+
77
+ Contains all domain-specific text used across the codebase,
78
+ ensuring consistency and single-source-of-truth.
79
+ """
80
+
81
+ # Identity
82
+ name: str
83
+ description: str
84
+
85
+ # Report generation
86
+ report_title: str
87
+ report_focus: str
88
+
89
+ # Judge prompts
90
+ judge_system_prompt: str
91
+ judge_scoring_prompt: str
92
+
93
+ # Hypothesis prompts
94
+ hypothesis_system_prompt: str
95
+
96
+ # Report writer prompts
97
+ report_system_prompt: str
98
+
99
+ # Search context
100
+ search_description: str
101
+ search_example_query: str
102
+
103
+ # Agent descriptions (for Magentic mode)
104
+ search_agent_description: str
105
+ hypothesis_agent_description: str
106
+
107
+
108
+ # ─────────────────────────────────────────────────────────────────
109
+ # Domain Definitions
110
+ # ─────────────────────────────────────────────────────────────────
111
+
112
+ GENERAL_CONFIG = DomainConfig(
113
+ name="General Research",
114
+ description="General-purpose biomedical research agent",
115
+
116
+ report_title="## Research Analysis",
117
+ report_focus="comprehensive research synthesis",
118
+
119
+ judge_system_prompt="""You are an expert research judge.
120
+ Your role is to evaluate evidence quality, assess relevance to the research query,
121
+ and determine if sufficient evidence exists to synthesize findings.""",
122
+
123
+ judge_scoring_prompt="""Score this evidence for research relevance.
124
+ Provide ONLY scores and extracted data.""",
125
+
126
+ hypothesis_system_prompt="""You are a biomedical research scientist.
127
+ Your role is to generate evidence-based hypotheses from the literature,
128
+ identifying key mechanisms, targets, and potential therapeutic implications.""",
129
+
130
+ report_system_prompt="""You are a scientific writer specializing in research reports.
131
+ Your role is to synthesize evidence into clear, well-structured reports with
132
+ proper citations and evidence-based conclusions.""",
133
+
134
+ search_description="Searches biomedical literature for relevant evidence",
135
+ search_example_query="metformin aging mechanisms",
136
+
137
+ search_agent_description="Searches PubMed, ClinicalTrials.gov, and Europe PMC for evidence",
138
+ hypothesis_agent_description="Generates mechanistic hypotheses from evidence",
139
+ )
140
+
141
+ DRUG_REPURPOSING_CONFIG = DomainConfig(
142
+ name="Drug Repurposing",
143
+ description="Drug repurposing research specialist",
144
+
145
+ report_title="## Drug Repurposing Analysis",
146
+ report_focus="drug repurposing opportunities",
147
+
148
+ judge_system_prompt="""You are an expert drug repurposing research judge.
149
+ Your role is to evaluate evidence for drug repurposing potential, assess
150
+ mechanism plausibility, and determine if compounds warrant further investigation.""",
151
+
152
+ judge_scoring_prompt="""Score this evidence for drug repurposing potential.
153
+ Provide ONLY scores and extracted data.""",
154
+
155
+ hypothesis_system_prompt="""You are a biomedical research scientist specializing in drug repurposing.
156
+ Your role is to generate mechanistic hypotheses for how existing drugs might
157
+ treat new indications, based on shared pathways and targets.""",
158
+
159
+ report_system_prompt="""You are a scientific writer specializing in drug repurposing research reports.
160
+ Your role is to synthesize evidence into actionable drug repurposing recommendations
161
+ with clear mechanistic rationale and clinical translation potential.""",
162
+
163
+ search_description="Searches biomedical literature for drug repurposing evidence",
164
+ search_example_query="metformin alzheimer repurposing",
165
+
166
+ search_agent_description="Searches PubMed for drug repurposing evidence",
167
+ hypothesis_agent_description="Generates mechanistic hypotheses for drug repurposing",
168
+ )
169
+
170
+ SEXUAL_HEALTH_CONFIG = DomainConfig(
171
+ name="Sexual Health Research",
172
+ description="Sexual health and wellness research specialist",
173
+
174
+ report_title="## Sexual Health Analysis",
175
+ report_focus="sexual health and wellness interventions",
176
+
177
+ judge_system_prompt="""You are an expert sexual health research judge.
178
+ Your role is to evaluate evidence for sexual health interventions, assess
179
+ efficacy and safety data, and determine clinical applicability.""",
180
+
181
+ judge_scoring_prompt="""Score this evidence for sexual health relevance.
182
+ Provide ONLY scores and extracted data.""",
183
+
184
+ hypothesis_system_prompt="""You are a biomedical research scientist specializing in sexual health.
185
+ Your role is to generate evidence-based hypotheses for sexual health interventions,
186
+ identifying mechanisms of action and potential therapeutic applications.""",
187
+
188
+ report_system_prompt="""You are a scientific writer specializing in sexual health research reports.
189
+ Your role is to synthesize evidence into clear recommendations for sexual health
190
+ interventions with proper safety considerations.""",
191
+
192
+ search_description="Searches biomedical literature for sexual health evidence",
193
+ search_example_query="testosterone therapy female libido",
194
+
195
+ search_agent_description="Searches PubMed for sexual health evidence",
196
+ hypothesis_agent_description="Generates hypotheses for sexual health interventions",
197
+ )
198
+
199
+ # ─────────────────────────────────────────────────────────────────
200
+ # Domain Registry
201
+ # ─────────────────────────────────────────────────────────────────
202
+
203
+ DOMAIN_CONFIGS: dict[ResearchDomain, DomainConfig] = {
204
+ ResearchDomain.GENERAL: GENERAL_CONFIG,
205
+ ResearchDomain.DRUG_REPURPOSING: DRUG_REPURPOSING_CONFIG,
206
+ ResearchDomain.SEXUAL_HEALTH: SEXUAL_HEALTH_CONFIG,
207
+ }
208
+
209
+ # Default domain
210
+ DEFAULT_DOMAIN = ResearchDomain.GENERAL
211
+
212
+
213
+ def get_domain_config(domain: ResearchDomain | None = None) -> DomainConfig:
214
+ """Get configuration for a research domain.
215
+
216
+ Args:
217
+ domain: The research domain. Defaults to GENERAL if None.
218
+
219
+ Returns:
220
+ DomainConfig for the specified domain.
221
+ """
222
+ if domain is None:
223
+ domain = DEFAULT_DOMAIN
224
+ return DOMAIN_CONFIGS[domain]
225
+ ```
226
+
227
+ ### 2. Update Settings to Include Domain
228
+
229
+ **File**: `src/utils/config.py` (add to Settings class)
230
+
231
+ ```python
232
+ from src.config.domain import ResearchDomain
233
+
234
+ class Settings(BaseSettings):
235
+ # ... existing fields ...
236
+
237
+ # Domain configuration
238
+ research_domain: ResearchDomain = ResearchDomain.GENERAL
239
+ ```
240
+
241
+ ### 3. Update All 15 Hardcoded Locations
242
+
243
+ #### 3.1 Prompts Module
244
+
245
+ **`src/prompts/report.py`**:
246
+ ```python
247
+ from src.config.domain import get_domain_config
248
+
249
+ def get_system_prompt(domain=None):
250
+ config = get_domain_config(domain)
251
+ return config.report_system_prompt
252
+
253
+ # Keep SYSTEM_PROMPT for backwards compatibility
254
+ SYSTEM_PROMPT = get_system_prompt()
255
+ ```
256
+
257
+ **`src/prompts/judge.py`**:
258
+ ```python
259
+ from src.config.domain import get_domain_config
260
+
261
+ def get_system_prompt(domain=None):
262
+ config = get_domain_config(domain)
263
+ return config.judge_system_prompt
264
+
265
+ def get_scoring_prompt(domain=None):
266
+ config = get_domain_config(domain)
267
+ return config.judge_scoring_prompt
268
+
269
+ SYSTEM_PROMPT = get_system_prompt()
270
+ ```
271
+
272
+ **`src/prompts/hypothesis.py`**:
273
+ ```python
274
+ from src.config.domain import get_domain_config
275
+
276
+ def get_system_prompt(domain=None):
277
+ config = get_domain_config(domain)
278
+ return config.hypothesis_system_prompt
279
+
280
+ SYSTEM_PROMPT = get_system_prompt()
281
+ ```
282
+
283
+ #### 3.2 Orchestrators
284
+
285
+ **`src/orchestrators/simple.py`**:
286
+ ```python
287
+ from src.config.domain import get_domain_config
288
+
289
+ class SimpleOrchestrator:
290
+ def __init__(self, domain=None, ...):
291
+ self.domain_config = get_domain_config(domain)
292
+
293
+ def _format_report(self, ...):
294
+ return f"""{self.domain_config.report_title}
295
+ Query: {query}
296
+ ...
297
+ """
298
+ ```
299
+
300
+ **`src/orchestrators/advanced.py`**:
301
+ ```python
302
+ from src.config.domain import get_domain_config
303
+
304
+ async def run_research(..., domain=None):
305
+ config = get_domain_config(domain)
306
+ task = f"""Research {config.report_focus} for: {query}
307
+ ...
308
+ """
309
+ ```
310
+
311
+ #### 3.3 Agents
312
+
313
+ **`src/agents/magentic_agents.py`**:
314
+ ```python
315
+ from src.config.domain import get_domain_config
316
+
317
+ def create_search_agent(domain=None):
318
+ config = get_domain_config(domain)
319
+ return Agent(
320
+ description=config.search_agent_description,
321
+ ...
322
+ )
323
+ ```
324
+
325
+ **`src/agents/search_agent.py`** and **`src/agents/tools.py`**:
326
+ Similar pattern - inject domain config.
327
+
328
+ #### 3.4 MCP Tools
329
+
330
+ **`src/mcp_tools.py`**:
331
+ ```python
332
+ from src.config.domain import get_domain_config, ResearchDomain
333
+
334
+ @mcp.tool
335
+ async def search_pubmed(query: str, domain: str = "general"):
336
+ """Search PubMed for biomedical literature.
337
+
338
+ Args:
339
+ query: Search query (e.g., "metformin alzheimer")
340
+ domain: Research domain (general, drug_repurposing, sexual_health)
341
+ """
342
+ config = get_domain_config(ResearchDomain(domain))
343
+ # Use config.search_description in responses
344
+ ```
345
+
346
+ ### 4. Update Gradio UI
347
+
348
+ **`src/app.py`** - Add domain selector:
349
+
350
+ ```python
351
+ from src.config.domain import ResearchDomain, DOMAIN_CONFIGS
352
+
353
+ domain_dropdown = gr.Dropdown(
354
+ choices=[d.value for d in ResearchDomain],
355
+ value="general",
356
+ label="Research Domain",
357
+ info="Select research focus area"
358
+ )
359
+ ```
360
+
361
+ ## Implementation Checklist
362
+
363
+ - [ ] Create `src/config/domain.py` with DomainConfig
364
+ - [ ] Add `research_domain` to Settings
365
+ - [ ] Update `src/prompts/report.py`
366
+ - [ ] Update `src/prompts/judge.py`
367
+ - [ ] Update `src/prompts/hypothesis.py`
368
+ - [ ] Update `src/orchestrators/simple.py`
369
+ - [ ] Update `src/orchestrators/advanced.py`
370
+ - [ ] Update `src/agents/magentic_agents.py`
371
+ - [ ] Update `src/agents/search_agent.py`
372
+ - [ ] Update `src/agents/tools.py`
373
+ - [ ] Update `src/mcp_tools.py`
374
+ - [ ] Add domain selector to Gradio UI
375
+ - [ ] Write unit tests for domain config
376
+ - [ ] Update CLAUDE.md, AGENTS.md, GEMINI.md
377
+
378
+ ## Testing Strategy
379
+
380
+ ### Unit Tests
381
+
382
+ ```python
383
+ # tests/unit/config/test_domain.py
384
+
385
+ def test_get_domain_config_default():
386
+ config = get_domain_config()
387
+ assert config.name == "General Research"
388
+
389
+ def test_get_domain_config_drug_repurposing():
390
+ config = get_domain_config(ResearchDomain.DRUG_REPURPOSING)
391
+ assert "drug repurposing" in config.judge_system_prompt.lower()
392
+
393
+ def test_all_domains_have_required_fields():
394
+ for domain in ResearchDomain:
395
+ config = get_domain_config(domain)
396
+ assert config.report_title
397
+ assert config.judge_system_prompt
398
+ assert config.hypothesis_system_prompt
399
+ ```
400
+
401
+ ### Integration Tests
402
+
403
+ ```python
404
+ # tests/integration/test_domain_switching.py
405
+
406
+ @pytest.mark.integration
407
+ async def test_simple_mode_respects_domain():
408
+ result = await run_simple_mode(
409
+ "metformin aging",
410
+ domain=ResearchDomain.GENERAL
411
+ )
412
+ assert "## Research Analysis" in result
413
+
414
+ result = await run_simple_mode(
415
+ "metformin aging",
416
+ domain=ResearchDomain.DRUG_REPURPOSING
417
+ )
418
+ assert "## Drug Repurposing Analysis" in result
419
+ ```
420
+
421
+ ## Migration Path
422
+
423
+ 1. **Phase 1**: Create domain config, add to Settings (no breaking changes)
424
+ 2. **Phase 2**: Update prompts module to use config (backwards compatible)
425
+ 3. **Phase 3**: Update orchestrators (backwards compatible via defaults)
426
+ 4. **Phase 4**: Update UI with domain selector
427
+ 5. **Phase 5**: Update docs and examples
428
+
429
+ ## Success Criteria
430
+
431
+ - [ ] Zero hardcoded "drug repurposing" strings in `src/`
432
+ - [ ] `grep -r "drug repurposing" src/` returns only `domain.py`
433
+ - [ ] All existing tests pass
434
+ - [ ] New domain can be added by only modifying `domain.py`
435
+ - [ ] Default behavior unchanged (general domain)
436
+
437
+ ## Rollback Plan
438
+
439
+ All changes are backwards compatible:
440
+ - Default domain = GENERAL (similar to current behavior)
441
+ - Existing APIs unchanged (domain is optional parameter)
442
+ - No database migrations required