Spaces:

openenv-community
/

replicalab

Running

App Files Files Community

replicalab / docs /map /agents.md

maxxie114

Initial HF Spaces deployment

80d8c84 2 days ago

preview code

raw

history blame contribute delete

10.9 kB

	# Agents Map — `replicalab/agents/`

	> Deterministic policy helpers for Scientist and Lab Manager agents.
	> No LLM calls in this module — the LLM backend is injected via `GenerateFn`.
	>
	> Tasks implemented: AGT 01-07, 11

	## Exports — `__init__.py`

	```python
	# From lab_manager_policy
	AlternativeSuggestion, FeasibilityCheckResult, SuggestionChange
	check_feasibility, compose_lab_manager_response, suggest_alternative

	# From scientist_policy
	RetryMetadata, ScientistCallResult, ScientistOutputParseError
	build_baseline_scientist_action, build_scientist_system_prompt
	call_scientist_with_retry, format_scientist_observation, parse_scientist_output
	```

	---

	## Scientist Policy — `scientist_policy.py`

	### Pipeline Flow

	```
	scenario → build_scientist_system_prompt() → system_prompt
	↓
	observation → format_scientist_observation() → user_message
	↓
	call_scientist_with_retry(generate_fn, system_prompt, obs)
	↓ calls generate_fn(messages)
	↓ calls parse_scientist_output(raw_text)
	↓ on failure: _build_correction_prompt(error)
	↓ retries up to max_retries times
	→ ScientistCallResult(action, metadata)
	```

	### Public Functions

	#### `build_scientist_system_prompt(scenario) -> str` — AGT 01
	Builds a domain-neutral system prompt from a `NormalizedScenarioPack`.

	Sections rendered (in order):
	1. Role statement ("You are the Scientist agent in ReplicaLab")
	2. Job description (negotiate strongest feasible plan)
	3. Domain ID
	4. Task summary
	5. Success criteria (bulleted)
	6. Constraints (with hard/soft labels, quantities, comparators)
	7. Available resources (with availability status)
	8. Allowed substitutions (original → alternative with conditions)
	9. Output contract (exactly one JSON, no extra keys)
	10. Allowed action_type values
	11. Action-specific field requirements

	#### `format_scientist_observation(obs: ScientistObservation) -> str` — AGT 02
	Converts a per-turn observation into the user message string.

	Sections (fixed order, tested):
	1. Round status: `"Round {n} of {max}"`
	2. Paper summary: title, hypothesis, method, key finding, goal
	3. Conversation history or "No conversation history yet"
	4. Current protocol or "No protocol has been proposed yet"
	5. ScientistAction schema reminder (field list, action_type values)
	6. Closing instruction: "Respond with exactly one JSON object"

	#### `parse_scientist_output(raw_text: str) -> ScientistAction` — MOD 09
	Strict parser from raw model text into validated `ScientistAction`.

	Accepts:
	- Plain JSON objects
	- `\`\`\`json` fenced blocks
	- Prose containing one JSON object

	Error codes:
	\| Code \| Meaning \|
	\|------\|---------\|
	\| `no_json` \| No JSON object found in output \|
	\| `invalid_json` \| JSON syntax error (trailing comma, etc.) \|
	\| `invalid_action` \| Valid JSON but fails ScientistAction validation \|

	#### `call_scientist_with_retry(generate_fn, system_prompt, observation, max_retries=2) -> ScientistCallResult` — AGT 03
	Retry loop with error-specific correction prompts.

	Behavior:
	1. Builds messages: `[system, user]`
	2. Calls `generate_fn(messages)` → raw text
	3. Calls `parse_scientist_output(raw_text)`
	4. On success: returns `ScientistCallResult(action, metadata)`
	5. On failure: appends `[assistant(bad_output), user(correction)]` to messages, retries
	6. After `max_retries` failures: raises last `ScientistOutputParseError`

	Correction prompts (`_build_correction_prompt`):
	- `no_json`: "Your previous response did not contain a JSON object..."
	- `invalid_json`: "Your previous response contained malformed JSON: {error}..."
	- `invalid_action`: "...failed ScientistAction validation: {detail}. Fix the validation error..."

	#### `build_baseline_scientist_action(observation) -> ScientistAction` — AGT 04
	Deterministic non-LLM action for smoke tests. No API calls.

	Decision tree:
	1. If protocol exists AND at max rounds → `accept`
	2. If protocol exists AND latest lab_manager feedback indicates blocker → `revise_protocol` (halve sample, reduce duration)
	3. If protocol exists AND no blocker → `accept`
	4. If no protocol → `propose_protocol` (domain-inferred defaults)

	Domain inference (`_infer_domain`):
	- Checks paper fields for ML hints (benchmark, dataset, gpu, bert...) → `machine_learning`
	- Checks for finance hints (backtest, sharpe, trading...) → `finance_trading`
	- Default → `mathematics`

	Blocker detection (`_feedback_indicates_blocker`):
	- Returns `False` if action_type is `accept` or `report_feasibility`
	- Otherwise checks message for blocker hints: booked, unavailable, exceeds, tight, budget, cost, etc.

	### Classes

	#### `ScientistOutputParseError(ValueError)`
	\| Attribute \| Type \| Purpose \|
	\|-----------\|------\|---------\|
	\| `code` \| `Literal["no_json", "invalid_json", "invalid_action"]` \| Machine-readable error type \|
	\| `message` \| `str` \| Human-readable detail \|
	\| `raw_text` \| `str` \| Original model output \|
	\| `parsed_payload` \| `dict \\| None` \| Decoded JSON if parsing succeeded \|

	#### `RetryMetadata(BaseModel)` — `extra="forbid"`
	\| Field \| Type \| Purpose \|
	\|-------\|------\|---------\|
	\| `attempt_count` \| `int` \| Total attempts (1 = success on first try) \|
	\| `retry_count` \| `int` \| `attempt_count - 1` \|
	\| `last_error_code` \| `str \\| None` \| Error code from last failure \|
	\| `last_error_message` \| `str \\| None` \| Error message from last failure \|

	#### `ScientistCallResult(BaseModel)` — `extra="forbid"`
	\| Field \| Type \|
	\|-------\|------\|
	\| `action` \| `ScientistAction` \|
	\| `metadata` \| `RetryMetadata` \|

	### Type Aliases

	```python
	GenerateFn = Callable[[list[dict[str, str]]], str]
	```

	### Constants

	```python
	_ML_HINTS = ("benchmark", "dataset", "accuracy", "tokenizer", "train", "gpu", ...)
	_FINANCE_HINTS = ("backtest", "drawdown", "sharpe", "trading", "slippage", ...)
	_BLOCKER_HINTS = ("booked", "unavailable", "exceeds", "tight", "budget", "cost", ...)
	```

	---

	## Lab Manager Policy — `lab_manager_policy.py`

	### Pipeline Flow

	```
	protocol + scenario → check_feasibility()
	↓
	FeasibilityCheckResult (7 dimensions)
	↓
	suggest_alternative(protocol, check, scenario)
	↓
	AlternativeSuggestion \| None
	↓
	compose_lab_manager_response(check, suggestion)
	↓
	LabManagerAction (typed, with explanation)
	```

	### Public Functions

	#### `check_feasibility(protocol, scenario) -> FeasibilityCheckResult` — AGT 05
	Runs 7 deterministic dimension checks. No LLM calls.

	Checks performed:
	\| Dimension \| Function \| What it checks \|
	\|-----------\|----------\|---------------\|
	\| `protocol` \| `_build_protocol_check` \| Wraps `validate_protocol()` from MOD 05 \|
	\| `budget` \| `_check_budget` \| `_estimate_protocol_cost()` vs `budget_remaining` \|
	\| `equipment` \| `_check_equipment` \| Items available/booked, finds substitutions \|
	\| `reagents` \| `_check_reagents` \| Items in-stock/out-of-stock, finds substitutions \|
	\| `schedule` \| `_check_schedule` \| `duration_days` vs `time_limit_days` \|
	\| `staff` \| `_check_staff` \| `_estimate_staff_load()` vs `staff_count` \|
	\| `policy` \| `_check_policy` \| Safety restrictions (e.g., offline-only execution) \|

	Cost estimation (`_estimate_protocol_cost`):
	```
	base = sample_size * 10
	+ duration_days * 50
	+ len(controls) * 25
	+ len(required_equipment) * 100
	+ len(required_reagents) * 75
	```

	Staff estimation (`_estimate_staff_load`):
	```
	base = 1
	+ (1 if sample_size > 20)
	+ (1 if len(controls) > 2)
	+ (1 if duration_days > 5)
	+ (1 if len(required_equipment) > 2)
	```

	#### `suggest_alternative(protocol, check_result, scenario) -> AlternativeSuggestion \| None` — AGT 06
	Deterministic revision engine. Returns `None` if already feasible.

	Fix order (deterministic):
	1. Equipment substitutions — replace booked items with alternatives
	2. Reagent substitutions — replace out-of-stock items with alternatives
	3. Duration clamp — reduce to `time_limit_days` if over
	4. Sample size reduction — iterative halving until budget fits (max 10 iterations)

	Post-fix recheck: runs `check_feasibility()` on revised protocol.
	Returns: revised protocol, list of changes, remaining failures, pre/post checks.

	#### `compose_lab_manager_response(check_result, suggestion=None, explanation_renderer=None) -> LabManagerAction` — AGT 07
	Converts grounded results into a typed `LabManagerAction`.

	Action type selection (`_select_lab_manager_action_type`):
	\| Condition \| Action \|
	\|-----------\|--------\|
	\| All 7 dimensions pass \| `ACCEPT` \|
	\| Suggestion exists AND improved AND only non-lab failures remain \| `SUGGEST_ALTERNATIVE` \|
	\| Lab constraints fail AND no suggestion \| `REJECT` \|
	\| Only policy/protocol fail (not lab constraints) \| `REPORT_FEASIBILITY` \|
	\| Suggestion exists but didn't improve \| `REJECT` \|

	Lab constraints = budget, equipment, reagents, schedule, staff (not protocol, not policy).

	### Classes

	#### `DimensionCheck(BaseModel)` — `extra="forbid"`
	\| Field \| Type \| Default \|
	\|-------\|------\|---------\|
	\| `ok` \| `bool` \| `True` \|
	\| `reasons` \| `list[str]` \| `[]` \|

	#### `FeasibilityCheckResult(BaseModel)` — `extra="forbid"`
	\| Field \| Type \|
	\|-------\|------\|
	\| `protocol` \| `DimensionCheck` \|
	\| `budget` \| `DimensionCheck` \|
	\| `equipment` \| `DimensionCheck` \|
	\| `reagents` \| `DimensionCheck` \|
	\| `schedule` \| `DimensionCheck` \|
	\| `staff` \| `DimensionCheck` \|
	\| `policy` \| `DimensionCheck` \|
	\| `estimated_cost` \| `float` \|
	\| `required_staff` \| `int` \|
	\| `substitution_options` \| `dict[str, list[str]]` \|
	\| `validation_result` \| `ValidationResult` \|

	Computed properties: `protocol_ok`, `budget_ok`, `equipment_ok`, `reagents_ok`, `schedule_ok`, `staff_ok`, `feasible`, `summary`

	#### `SuggestionChange(BaseModel)` — `extra="forbid"`
	\| Field \| Type \| Purpose \|
	\|-------\|------\|---------\|
	\| `field` \| `str` \| Which protocol field was changed \|
	\| `original` \| `str` \| Original value (stringified) \|
	\| `revised` \| `str` \| New value (stringified) \|
	\| `reason` \| `str` \| Why it was changed \|
	\| `tradeoff` \| `str` \| What is lost \|

	#### `AlternativeSuggestion(BaseModel)` — `extra="forbid"`
	\| Field \| Type \|
	\|-------\|------\|
	\| `revised_protocol` \| `Protocol` \|
	\| `applied_changes` \| `list[SuggestionChange]` \|
	\| `remaining_failures` \| `list[str]` \|
	\| `improved` \| `bool` \|
	\| `pre_check` \| `FeasibilityCheckResult` \|
	\| `post_check` \| `FeasibilityCheckResult` \|

	### Type Aliases

	```python
	ExplanationRenderer = Callable[
	[LabManagerActionType, FeasibilityCheckResult, Optional[AlternativeSuggestion]],
	str,
	]
	```