Spaces:
Running
Running
| # Agents Map β `replicalab/agents/` | |
| > Deterministic policy helpers for Scientist and Lab Manager agents. | |
| > No LLM calls in this module β the LLM backend is injected via `GenerateFn`. | |
| > | |
| > **Tasks implemented:** AGT 01-07, 11 | |
| ## Exports β `__init__.py` | |
| ```python | |
| # From lab_manager_policy | |
| AlternativeSuggestion, FeasibilityCheckResult, SuggestionChange | |
| check_feasibility, compose_lab_manager_response, suggest_alternative | |
| # From scientist_policy | |
| RetryMetadata, ScientistCallResult, ScientistOutputParseError | |
| build_baseline_scientist_action, build_scientist_system_prompt | |
| call_scientist_with_retry, format_scientist_observation, parse_scientist_output | |
| ``` | |
| --- | |
| ## Scientist Policy β `scientist_policy.py` | |
| ### Pipeline Flow | |
| ``` | |
| scenario β build_scientist_system_prompt() β system_prompt | |
| β | |
| observation β format_scientist_observation() β user_message | |
| β | |
| call_scientist_with_retry(generate_fn, system_prompt, obs) | |
| β calls generate_fn(messages) | |
| β calls parse_scientist_output(raw_text) | |
| β on failure: _build_correction_prompt(error) | |
| β retries up to max_retries times | |
| β ScientistCallResult(action, metadata) | |
| ``` | |
| ### Public Functions | |
| #### `build_scientist_system_prompt(scenario) -> str` β AGT 01 | |
| Builds a domain-neutral system prompt from a `NormalizedScenarioPack`. | |
| **Sections rendered (in order):** | |
| 1. Role statement ("You are the Scientist agent in ReplicaLab") | |
| 2. Job description (negotiate strongest feasible plan) | |
| 3. Domain ID | |
| 4. Task summary | |
| 5. Success criteria (bulleted) | |
| 6. Constraints (with hard/soft labels, quantities, comparators) | |
| 7. Available resources (with availability status) | |
| 8. Allowed substitutions (original β alternative with conditions) | |
| 9. Output contract (exactly one JSON, no extra keys) | |
| 10. Allowed action_type values | |
| 11. Action-specific field requirements | |
| #### `format_scientist_observation(obs: ScientistObservation) -> str` β AGT 02 | |
| Converts a per-turn observation into the user message string. | |
| **Sections (fixed order, tested):** | |
| 1. Round status: `"Round {n} of {max}"` | |
| 2. Paper summary: title, hypothesis, method, key finding, goal | |
| 3. Conversation history or "No conversation history yet" | |
| 4. Current protocol or "No protocol has been proposed yet" | |
| 5. ScientistAction schema reminder (field list, action_type values) | |
| 6. Closing instruction: "Respond with exactly one JSON object" | |
| #### `parse_scientist_output(raw_text: str) -> ScientistAction` β MOD 09 | |
| Strict parser from raw model text into validated `ScientistAction`. | |
| **Accepts:** | |
| - Plain JSON objects | |
| - `\`\`\`json` fenced blocks | |
| - Prose containing one JSON object | |
| **Error codes:** | |
| | Code | Meaning | | |
| |------|---------| | |
| | `no_json` | No JSON object found in output | | |
| | `invalid_json` | JSON syntax error (trailing comma, etc.) | | |
| | `invalid_action` | Valid JSON but fails ScientistAction validation | | |
| #### `call_scientist_with_retry(generate_fn, system_prompt, observation, max_retries=2) -> ScientistCallResult` β AGT 03 | |
| Retry loop with error-specific correction prompts. | |
| **Behavior:** | |
| 1. Builds messages: `[system, user]` | |
| 2. Calls `generate_fn(messages)` β raw text | |
| 3. Calls `parse_scientist_output(raw_text)` | |
| 4. On success: returns `ScientistCallResult(action, metadata)` | |
| 5. On failure: appends `[assistant(bad_output), user(correction)]` to messages, retries | |
| 6. After `max_retries` failures: raises last `ScientistOutputParseError` | |
| **Correction prompts (`_build_correction_prompt`):** | |
| - `no_json`: "Your previous response did not contain a JSON object..." | |
| - `invalid_json`: "Your previous response contained malformed JSON: {error}..." | |
| - `invalid_action`: "...failed ScientistAction validation: {detail}. Fix the validation error..." | |
| #### `build_baseline_scientist_action(observation) -> ScientistAction` β AGT 04 | |
| Deterministic non-LLM action for smoke tests. No API calls. | |
| **Decision tree:** | |
| 1. If protocol exists AND at max rounds β `accept` | |
| 2. If protocol exists AND latest lab_manager feedback indicates blocker β `revise_protocol` (halve sample, reduce duration) | |
| 3. If protocol exists AND no blocker β `accept` | |
| 4. If no protocol β `propose_protocol` (domain-inferred defaults) | |
| **Domain inference (`_infer_domain`):** | |
| - Checks paper fields for ML hints (benchmark, dataset, gpu, bert...) β `machine_learning` | |
| - Checks for finance hints (backtest, sharpe, trading...) β `finance_trading` | |
| - Default β `mathematics` | |
| **Blocker detection (`_feedback_indicates_blocker`):** | |
| - Returns `False` if action_type is `accept` or `report_feasibility` | |
| - Otherwise checks message for blocker hints: booked, unavailable, exceeds, tight, budget, cost, etc. | |
| ### Classes | |
| #### `ScientistOutputParseError(ValueError)` | |
| | Attribute | Type | Purpose | | |
| |-----------|------|---------| | |
| | `code` | `Literal["no_json", "invalid_json", "invalid_action"]` | Machine-readable error type | | |
| | `message` | `str` | Human-readable detail | | |
| | `raw_text` | `str` | Original model output | | |
| | `parsed_payload` | `dict \| None` | Decoded JSON if parsing succeeded | | |
| #### `RetryMetadata(BaseModel)` β `extra="forbid"` | |
| | Field | Type | Purpose | | |
| |-------|------|---------| | |
| | `attempt_count` | `int` | Total attempts (1 = success on first try) | | |
| | `retry_count` | `int` | `attempt_count - 1` | | |
| | `last_error_code` | `str \| None` | Error code from last failure | | |
| | `last_error_message` | `str \| None` | Error message from last failure | | |
| #### `ScientistCallResult(BaseModel)` β `extra="forbid"` | |
| | Field | Type | | |
| |-------|------| | |
| | `action` | `ScientistAction` | | |
| | `metadata` | `RetryMetadata` | | |
| ### Type Aliases | |
| ```python | |
| GenerateFn = Callable[[list[dict[str, str]]], str] | |
| ``` | |
| ### Constants | |
| ```python | |
| _ML_HINTS = ("benchmark", "dataset", "accuracy", "tokenizer", "train", "gpu", ...) | |
| _FINANCE_HINTS = ("backtest", "drawdown", "sharpe", "trading", "slippage", ...) | |
| _BLOCKER_HINTS = ("booked", "unavailable", "exceeds", "tight", "budget", "cost", ...) | |
| ``` | |
| --- | |
| ## Lab Manager Policy β `lab_manager_policy.py` | |
| ### Pipeline Flow | |
| ``` | |
| protocol + scenario β check_feasibility() | |
| β | |
| FeasibilityCheckResult (7 dimensions) | |
| β | |
| suggest_alternative(protocol, check, scenario) | |
| β | |
| AlternativeSuggestion | None | |
| β | |
| compose_lab_manager_response(check, suggestion) | |
| β | |
| LabManagerAction (typed, with explanation) | |
| ``` | |
| ### Public Functions | |
| #### `check_feasibility(protocol, scenario) -> FeasibilityCheckResult` β AGT 05 | |
| Runs 7 deterministic dimension checks. No LLM calls. | |
| **Checks performed:** | |
| | Dimension | Function | What it checks | | |
| |-----------|----------|---------------| | |
| | `protocol` | `_build_protocol_check` | Wraps `validate_protocol()` from MOD 05 | | |
| | `budget` | `_check_budget` | `_estimate_protocol_cost()` vs `budget_remaining` | | |
| | `equipment` | `_check_equipment` | Items available/booked, finds substitutions | | |
| | `reagents` | `_check_reagents` | Items in-stock/out-of-stock, finds substitutions | | |
| | `schedule` | `_check_schedule` | `duration_days` vs `time_limit_days` | | |
| | `staff` | `_check_staff` | `_estimate_staff_load()` vs `staff_count` | | |
| | `policy` | `_check_policy` | Safety restrictions (e.g., offline-only execution) | | |
| **Cost estimation (`_estimate_protocol_cost`):** | |
| ``` | |
| base = sample_size * 10 | |
| + duration_days * 50 | |
| + len(controls) * 25 | |
| + len(required_equipment) * 100 | |
| + len(required_reagents) * 75 | |
| ``` | |
| **Staff estimation (`_estimate_staff_load`):** | |
| ``` | |
| base = 1 | |
| + (1 if sample_size > 20) | |
| + (1 if len(controls) > 2) | |
| + (1 if duration_days > 5) | |
| + (1 if len(required_equipment) > 2) | |
| ``` | |
| #### `suggest_alternative(protocol, check_result, scenario) -> AlternativeSuggestion | None` β AGT 06 | |
| Deterministic revision engine. Returns `None` if already feasible. | |
| **Fix order (deterministic):** | |
| 1. Equipment substitutions β replace booked items with alternatives | |
| 2. Reagent substitutions β replace out-of-stock items with alternatives | |
| 3. Duration clamp β reduce to `time_limit_days` if over | |
| 4. Sample size reduction β iterative halving until budget fits (max 10 iterations) | |
| **Post-fix recheck:** runs `check_feasibility()` on revised protocol. | |
| **Returns:** revised protocol, list of changes, remaining failures, pre/post checks. | |
| #### `compose_lab_manager_response(check_result, suggestion=None, explanation_renderer=None) -> LabManagerAction` β AGT 07 | |
| Converts grounded results into a typed `LabManagerAction`. | |
| **Action type selection (`_select_lab_manager_action_type`):** | |
| | Condition | Action | | |
| |-----------|--------| | |
| | All 7 dimensions pass | `ACCEPT` | | |
| | Suggestion exists AND improved AND only non-lab failures remain | `SUGGEST_ALTERNATIVE` | | |
| | Lab constraints fail AND no suggestion | `REJECT` | | |
| | Only policy/protocol fail (not lab constraints) | `REPORT_FEASIBILITY` | | |
| | Suggestion exists but didn't improve | `REJECT` | | |
| **Lab constraints = budget, equipment, reagents, schedule, staff (not protocol, not policy).** | |
| ### Classes | |
| #### `DimensionCheck(BaseModel)` β `extra="forbid"` | |
| | Field | Type | Default | | |
| |-------|------|---------| | |
| | `ok` | `bool` | `True` | | |
| | `reasons` | `list[str]` | `[]` | | |
| #### `FeasibilityCheckResult(BaseModel)` β `extra="forbid"` | |
| | Field | Type | | |
| |-------|------| | |
| | `protocol` | `DimensionCheck` | | |
| | `budget` | `DimensionCheck` | | |
| | `equipment` | `DimensionCheck` | | |
| | `reagents` | `DimensionCheck` | | |
| | `schedule` | `DimensionCheck` | | |
| | `staff` | `DimensionCheck` | | |
| | `policy` | `DimensionCheck` | | |
| | `estimated_cost` | `float` | | |
| | `required_staff` | `int` | | |
| | `substitution_options` | `dict[str, list[str]]` | | |
| | `validation_result` | `ValidationResult` | | |
| **Computed properties:** `protocol_ok`, `budget_ok`, `equipment_ok`, `reagents_ok`, `schedule_ok`, `staff_ok`, `feasible`, `summary` | |
| #### `SuggestionChange(BaseModel)` β `extra="forbid"` | |
| | Field | Type | Purpose | | |
| |-------|------|---------| | |
| | `field` | `str` | Which protocol field was changed | | |
| | `original` | `str` | Original value (stringified) | | |
| | `revised` | `str` | New value (stringified) | | |
| | `reason` | `str` | Why it was changed | | |
| | `tradeoff` | `str` | What is lost | | |
| #### `AlternativeSuggestion(BaseModel)` β `extra="forbid"` | |
| | Field | Type | | |
| |-------|------| | |
| | `revised_protocol` | `Protocol` | | |
| | `applied_changes` | `list[SuggestionChange]` | | |
| | `remaining_failures` | `list[str]` | | |
| | `improved` | `bool` | | |
| | `pre_check` | `FeasibilityCheckResult` | | |
| | `post_check` | `FeasibilityCheckResult` | | |
| ### Type Aliases | |
| ```python | |
| ExplanationRenderer = Callable[ | |
| [LabManagerActionType, FeasibilityCheckResult, Optional[AlternativeSuggestion]], | |
| str, | |
| ] | |
| ``` | |