Spaces:

openenv-community
/

replicalab

Running

File size: 10,853 Bytes

80d8c84

# Agents Map — `replicalab/agents/`

> Deterministic policy helpers for Scientist and Lab Manager agents.
> No LLM calls in this module — the LLM backend is injected via `GenerateFn`.
>
> **Tasks implemented:** AGT 01-07, 11

## Exports — `__init__.py`

```python
# From lab_manager_policy
AlternativeSuggestion, FeasibilityCheckResult, SuggestionChange
check_feasibility, compose_lab_manager_response, suggest_alternative

# From scientist_policy
RetryMetadata, ScientistCallResult, ScientistOutputParseError
build_baseline_scientist_action, build_scientist_system_prompt
call_scientist_with_retry, format_scientist_observation, parse_scientist_output
```

---

## Scientist Policy — `scientist_policy.py`

### Pipeline Flow

```
scenario → build_scientist_system_prompt() → system_prompt
                                                    ↓
observation → format_scientist_observation() → user_message
                                                    ↓
              call_scientist_with_retry(generate_fn, system_prompt, obs)
                   ↓ calls generate_fn(messages)
                   ↓ calls parse_scientist_output(raw_text)
                   ↓ on failure: _build_correction_prompt(error)
                   ↓ retries up to max_retries times
                   → ScientistCallResult(action, metadata)
```

### Public Functions

#### `build_scientist_system_prompt(scenario) -> str` — AGT 01
Builds a domain-neutral system prompt from a `NormalizedScenarioPack`.

**Sections rendered (in order):**
1. Role statement ("You are the Scientist agent in ReplicaLab")
2. Job description (negotiate strongest feasible plan)
3. Domain ID
4. Task summary
5. Success criteria (bulleted)
6. Constraints (with hard/soft labels, quantities, comparators)
7. Available resources (with availability status)
8. Allowed substitutions (original → alternative with conditions)
9. Output contract (exactly one JSON, no extra keys)
10. Allowed action_type values
11. Action-specific field requirements

#### `format_scientist_observation(obs: ScientistObservation) -> str` — AGT 02
Converts a per-turn observation into the user message string.

**Sections (fixed order, tested):**
1. Round status: `"Round {n} of {max}"`
2. Paper summary: title, hypothesis, method, key finding, goal
3. Conversation history or "No conversation history yet"
4. Current protocol or "No protocol has been proposed yet"
5. ScientistAction schema reminder (field list, action_type values)
6. Closing instruction: "Respond with exactly one JSON object"

#### `parse_scientist_output(raw_text: str) -> ScientistAction` — MOD 09
Strict parser from raw model text into validated `ScientistAction`.

**Accepts:**
- Plain JSON objects
- `\`\`\`json` fenced blocks
- Prose containing one JSON object

**Error codes:**
| Code | Meaning |
|------|---------|
| `no_json` | No JSON object found in output |
| `invalid_json` | JSON syntax error (trailing comma, etc.) |
| `invalid_action` | Valid JSON but fails ScientistAction validation |

#### `call_scientist_with_retry(generate_fn, system_prompt, observation, max_retries=2) -> ScientistCallResult` — AGT 03
Retry loop with error-specific correction prompts.

**Behavior:**
1. Builds messages: `[system, user]`
2. Calls `generate_fn(messages)` → raw text
3. Calls `parse_scientist_output(raw_text)`
4. On success: returns `ScientistCallResult(action, metadata)`
5. On failure: appends `[assistant(bad_output), user(correction)]` to messages, retries
6. After `max_retries` failures: raises last `ScientistOutputParseError`

**Correction prompts (`_build_correction_prompt`):**
- `no_json`: "Your previous response did not contain a JSON object..."
- `invalid_json`: "Your previous response contained malformed JSON: {error}..."
- `invalid_action`: "...failed ScientistAction validation: {detail}. Fix the validation error..."

#### `build_baseline_scientist_action(observation) -> ScientistAction` — AGT 04
Deterministic non-LLM action for smoke tests. No API calls.

**Decision tree:**
1. If protocol exists AND at max rounds → `accept`
2. If protocol exists AND latest lab_manager feedback indicates blocker → `revise_protocol` (halve sample, reduce duration)
3. If protocol exists AND no blocker → `accept`
4. If no protocol → `propose_protocol` (domain-inferred defaults)

**Domain inference (`_infer_domain`):**
- Checks paper fields for ML hints (benchmark, dataset, gpu, bert...) → `machine_learning`
- Checks for finance hints (backtest, sharpe, trading...) → `finance_trading`
- Default → `mathematics`

**Blocker detection (`_feedback_indicates_blocker`):**
- Returns `False` if action_type is `accept` or `report_feasibility`
- Otherwise checks message for blocker hints: booked, unavailable, exceeds, tight, budget, cost, etc.

### Classes

#### `ScientistOutputParseError(ValueError)`
| Attribute | Type | Purpose |
|-----------|------|---------|
| `code` | `Literal["no_json", "invalid_json", "invalid_action"]` | Machine-readable error type |
| `message` | `str` | Human-readable detail |
| `raw_text` | `str` | Original model output |
| `parsed_payload` | `dict \| None` | Decoded JSON if parsing succeeded |

#### `RetryMetadata(BaseModel)` — `extra="forbid"`
| Field | Type | Purpose |
|-------|------|---------|
| `attempt_count` | `int` | Total attempts (1 = success on first try) |
| `retry_count` | `int` | `attempt_count - 1` |
| `last_error_code` | `str \| None` | Error code from last failure |
| `last_error_message` | `str \| None` | Error message from last failure |

#### `ScientistCallResult(BaseModel)` — `extra="forbid"`
| Field | Type |
|-------|------|
| `action` | `ScientistAction` |
| `metadata` | `RetryMetadata` |

### Type Aliases

```python
GenerateFn = Callable[[list[dict[str, str]]], str]
```

### Constants

```python
_ML_HINTS = ("benchmark", "dataset", "accuracy", "tokenizer", "train", "gpu", ...)
_FINANCE_HINTS = ("backtest", "drawdown", "sharpe", "trading", "slippage", ...)
_BLOCKER_HINTS = ("booked", "unavailable", "exceeds", "tight", "budget", "cost", ...)
```

---

## Lab Manager Policy — `lab_manager_policy.py`

### Pipeline Flow

```
protocol + scenario → check_feasibility()
                           ↓
                    FeasibilityCheckResult (7 dimensions)
                           ↓
              suggest_alternative(protocol, check, scenario)
                           ↓
              AlternativeSuggestion | None
                           ↓
              compose_lab_manager_response(check, suggestion)
                           ↓
                    LabManagerAction (typed, with explanation)
```

### Public Functions

#### `check_feasibility(protocol, scenario) -> FeasibilityCheckResult` — AGT 05
Runs 7 deterministic dimension checks. No LLM calls.

**Checks performed:**
| Dimension | Function | What it checks |
|-----------|----------|---------------|
| `protocol` | `_build_protocol_check` | Wraps `validate_protocol()` from MOD 05 |
| `budget` | `_check_budget` | `_estimate_protocol_cost()` vs `budget_remaining` |
| `equipment` | `_check_equipment` | Items available/booked, finds substitutions |
| `reagents` | `_check_reagents` | Items in-stock/out-of-stock, finds substitutions |
| `schedule` | `_check_schedule` | `duration_days` vs `time_limit_days` |
| `staff` | `_check_staff` | `_estimate_staff_load()` vs `staff_count` |
| `policy` | `_check_policy` | Safety restrictions (e.g., offline-only execution) |

**Cost estimation (`_estimate_protocol_cost`):**
```
base = sample_size * 10
+ duration_days * 50
+ len(controls) * 25
+ len(required_equipment) * 100
+ len(required_reagents) * 75
```

**Staff estimation (`_estimate_staff_load`):**
```
base = 1
+ (1 if sample_size > 20)
+ (1 if len(controls) > 2)
+ (1 if duration_days > 5)
+ (1 if len(required_equipment) > 2)
```

#### `suggest_alternative(protocol, check_result, scenario) -> AlternativeSuggestion | None` — AGT 06
Deterministic revision engine. Returns `None` if already feasible.

**Fix order (deterministic):**
1. Equipment substitutions — replace booked items with alternatives
2. Reagent substitutions — replace out-of-stock items with alternatives
3. Duration clamp — reduce to `time_limit_days` if over
4. Sample size reduction — iterative halving until budget fits (max 10 iterations)

**Post-fix recheck:** runs `check_feasibility()` on revised protocol.
**Returns:** revised protocol, list of changes, remaining failures, pre/post checks.

#### `compose_lab_manager_response(check_result, suggestion=None, explanation_renderer=None) -> LabManagerAction` — AGT 07
Converts grounded results into a typed `LabManagerAction`.

**Action type selection (`_select_lab_manager_action_type`):**
| Condition | Action |
|-----------|--------|
| All 7 dimensions pass | `ACCEPT` |
| Suggestion exists AND improved AND only non-lab failures remain | `SUGGEST_ALTERNATIVE` |
| Lab constraints fail AND no suggestion | `REJECT` |
| Only policy/protocol fail (not lab constraints) | `REPORT_FEASIBILITY` |
| Suggestion exists but didn't improve | `REJECT` |

**Lab constraints = budget, equipment, reagents, schedule, staff (not protocol, not policy).**

### Classes

#### `DimensionCheck(BaseModel)` — `extra="forbid"`
| Field | Type | Default |
|-------|------|---------|
| `ok` | `bool` | `True` |
| `reasons` | `list[str]` | `[]` |

#### `FeasibilityCheckResult(BaseModel)` — `extra="forbid"`
| Field | Type |
|-------|------|
| `protocol` | `DimensionCheck` |
| `budget` | `DimensionCheck` |
| `equipment` | `DimensionCheck` |
| `reagents` | `DimensionCheck` |
| `schedule` | `DimensionCheck` |
| `staff` | `DimensionCheck` |
| `policy` | `DimensionCheck` |
| `estimated_cost` | `float` |
| `required_staff` | `int` |
| `substitution_options` | `dict[str, list[str]]` |
| `validation_result` | `ValidationResult` |

**Computed properties:** `protocol_ok`, `budget_ok`, `equipment_ok`, `reagents_ok`, `schedule_ok`, `staff_ok`, `feasible`, `summary`

#### `SuggestionChange(BaseModel)` — `extra="forbid"`
| Field | Type | Purpose |
|-------|------|---------|
| `field` | `str` | Which protocol field was changed |
| `original` | `str` | Original value (stringified) |
| `revised` | `str` | New value (stringified) |
| `reason` | `str` | Why it was changed |
| `tradeoff` | `str` | What is lost |

#### `AlternativeSuggestion(BaseModel)` — `extra="forbid"`
| Field | Type |
|-------|------|
| `revised_protocol` | `Protocol` |
| `applied_changes` | `list[SuggestionChange]` |
| `remaining_failures` | `list[str]` |
| `improved` | `bool` |
| `pre_check` | `FeasibilityCheckResult` |
| `post_check` | `FeasibilityCheckResult` |

### Type Aliases

```python
ExplanationRenderer = Callable[
    [LabManagerActionType, FeasibilityCheckResult, Optional[AlternativeSuggestion]],
    str,
]
```