Spaces:
Running
Running
File size: 10,853 Bytes
80d8c84 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 | # Agents Map β `replicalab/agents/`
> Deterministic policy helpers for Scientist and Lab Manager agents.
> No LLM calls in this module β the LLM backend is injected via `GenerateFn`.
>
> **Tasks implemented:** AGT 01-07, 11
## Exports β `__init__.py`
```python
# From lab_manager_policy
AlternativeSuggestion, FeasibilityCheckResult, SuggestionChange
check_feasibility, compose_lab_manager_response, suggest_alternative
# From scientist_policy
RetryMetadata, ScientistCallResult, ScientistOutputParseError
build_baseline_scientist_action, build_scientist_system_prompt
call_scientist_with_retry, format_scientist_observation, parse_scientist_output
```
---
## Scientist Policy β `scientist_policy.py`
### Pipeline Flow
```
scenario β build_scientist_system_prompt() β system_prompt
β
observation β format_scientist_observation() β user_message
β
call_scientist_with_retry(generate_fn, system_prompt, obs)
β calls generate_fn(messages)
β calls parse_scientist_output(raw_text)
β on failure: _build_correction_prompt(error)
β retries up to max_retries times
β ScientistCallResult(action, metadata)
```
### Public Functions
#### `build_scientist_system_prompt(scenario) -> str` β AGT 01
Builds a domain-neutral system prompt from a `NormalizedScenarioPack`.
**Sections rendered (in order):**
1. Role statement ("You are the Scientist agent in ReplicaLab")
2. Job description (negotiate strongest feasible plan)
3. Domain ID
4. Task summary
5. Success criteria (bulleted)
6. Constraints (with hard/soft labels, quantities, comparators)
7. Available resources (with availability status)
8. Allowed substitutions (original β alternative with conditions)
9. Output contract (exactly one JSON, no extra keys)
10. Allowed action_type values
11. Action-specific field requirements
#### `format_scientist_observation(obs: ScientistObservation) -> str` β AGT 02
Converts a per-turn observation into the user message string.
**Sections (fixed order, tested):**
1. Round status: `"Round {n} of {max}"`
2. Paper summary: title, hypothesis, method, key finding, goal
3. Conversation history or "No conversation history yet"
4. Current protocol or "No protocol has been proposed yet"
5. ScientistAction schema reminder (field list, action_type values)
6. Closing instruction: "Respond with exactly one JSON object"
#### `parse_scientist_output(raw_text: str) -> ScientistAction` β MOD 09
Strict parser from raw model text into validated `ScientistAction`.
**Accepts:**
- Plain JSON objects
- `\`\`\`json` fenced blocks
- Prose containing one JSON object
**Error codes:**
| Code | Meaning |
|------|---------|
| `no_json` | No JSON object found in output |
| `invalid_json` | JSON syntax error (trailing comma, etc.) |
| `invalid_action` | Valid JSON but fails ScientistAction validation |
#### `call_scientist_with_retry(generate_fn, system_prompt, observation, max_retries=2) -> ScientistCallResult` β AGT 03
Retry loop with error-specific correction prompts.
**Behavior:**
1. Builds messages: `[system, user]`
2. Calls `generate_fn(messages)` β raw text
3. Calls `parse_scientist_output(raw_text)`
4. On success: returns `ScientistCallResult(action, metadata)`
5. On failure: appends `[assistant(bad_output), user(correction)]` to messages, retries
6. After `max_retries` failures: raises last `ScientistOutputParseError`
**Correction prompts (`_build_correction_prompt`):**
- `no_json`: "Your previous response did not contain a JSON object..."
- `invalid_json`: "Your previous response contained malformed JSON: {error}..."
- `invalid_action`: "...failed ScientistAction validation: {detail}. Fix the validation error..."
#### `build_baseline_scientist_action(observation) -> ScientistAction` β AGT 04
Deterministic non-LLM action for smoke tests. No API calls.
**Decision tree:**
1. If protocol exists AND at max rounds β `accept`
2. If protocol exists AND latest lab_manager feedback indicates blocker β `revise_protocol` (halve sample, reduce duration)
3. If protocol exists AND no blocker β `accept`
4. If no protocol β `propose_protocol` (domain-inferred defaults)
**Domain inference (`_infer_domain`):**
- Checks paper fields for ML hints (benchmark, dataset, gpu, bert...) β `machine_learning`
- Checks for finance hints (backtest, sharpe, trading...) β `finance_trading`
- Default β `mathematics`
**Blocker detection (`_feedback_indicates_blocker`):**
- Returns `False` if action_type is `accept` or `report_feasibility`
- Otherwise checks message for blocker hints: booked, unavailable, exceeds, tight, budget, cost, etc.
### Classes
#### `ScientistOutputParseError(ValueError)`
| Attribute | Type | Purpose |
|-----------|------|---------|
| `code` | `Literal["no_json", "invalid_json", "invalid_action"]` | Machine-readable error type |
| `message` | `str` | Human-readable detail |
| `raw_text` | `str` | Original model output |
| `parsed_payload` | `dict \| None` | Decoded JSON if parsing succeeded |
#### `RetryMetadata(BaseModel)` β `extra="forbid"`
| Field | Type | Purpose |
|-------|------|---------|
| `attempt_count` | `int` | Total attempts (1 = success on first try) |
| `retry_count` | `int` | `attempt_count - 1` |
| `last_error_code` | `str \| None` | Error code from last failure |
| `last_error_message` | `str \| None` | Error message from last failure |
#### `ScientistCallResult(BaseModel)` β `extra="forbid"`
| Field | Type |
|-------|------|
| `action` | `ScientistAction` |
| `metadata` | `RetryMetadata` |
### Type Aliases
```python
GenerateFn = Callable[[list[dict[str, str]]], str]
```
### Constants
```python
_ML_HINTS = ("benchmark", "dataset", "accuracy", "tokenizer", "train", "gpu", ...)
_FINANCE_HINTS = ("backtest", "drawdown", "sharpe", "trading", "slippage", ...)
_BLOCKER_HINTS = ("booked", "unavailable", "exceeds", "tight", "budget", "cost", ...)
```
---
## Lab Manager Policy β `lab_manager_policy.py`
### Pipeline Flow
```
protocol + scenario β check_feasibility()
β
FeasibilityCheckResult (7 dimensions)
β
suggest_alternative(protocol, check, scenario)
β
AlternativeSuggestion | None
β
compose_lab_manager_response(check, suggestion)
β
LabManagerAction (typed, with explanation)
```
### Public Functions
#### `check_feasibility(protocol, scenario) -> FeasibilityCheckResult` β AGT 05
Runs 7 deterministic dimension checks. No LLM calls.
**Checks performed:**
| Dimension | Function | What it checks |
|-----------|----------|---------------|
| `protocol` | `_build_protocol_check` | Wraps `validate_protocol()` from MOD 05 |
| `budget` | `_check_budget` | `_estimate_protocol_cost()` vs `budget_remaining` |
| `equipment` | `_check_equipment` | Items available/booked, finds substitutions |
| `reagents` | `_check_reagents` | Items in-stock/out-of-stock, finds substitutions |
| `schedule` | `_check_schedule` | `duration_days` vs `time_limit_days` |
| `staff` | `_check_staff` | `_estimate_staff_load()` vs `staff_count` |
| `policy` | `_check_policy` | Safety restrictions (e.g., offline-only execution) |
**Cost estimation (`_estimate_protocol_cost`):**
```
base = sample_size * 10
+ duration_days * 50
+ len(controls) * 25
+ len(required_equipment) * 100
+ len(required_reagents) * 75
```
**Staff estimation (`_estimate_staff_load`):**
```
base = 1
+ (1 if sample_size > 20)
+ (1 if len(controls) > 2)
+ (1 if duration_days > 5)
+ (1 if len(required_equipment) > 2)
```
#### `suggest_alternative(protocol, check_result, scenario) -> AlternativeSuggestion | None` β AGT 06
Deterministic revision engine. Returns `None` if already feasible.
**Fix order (deterministic):**
1. Equipment substitutions β replace booked items with alternatives
2. Reagent substitutions β replace out-of-stock items with alternatives
3. Duration clamp β reduce to `time_limit_days` if over
4. Sample size reduction β iterative halving until budget fits (max 10 iterations)
**Post-fix recheck:** runs `check_feasibility()` on revised protocol.
**Returns:** revised protocol, list of changes, remaining failures, pre/post checks.
#### `compose_lab_manager_response(check_result, suggestion=None, explanation_renderer=None) -> LabManagerAction` β AGT 07
Converts grounded results into a typed `LabManagerAction`.
**Action type selection (`_select_lab_manager_action_type`):**
| Condition | Action |
|-----------|--------|
| All 7 dimensions pass | `ACCEPT` |
| Suggestion exists AND improved AND only non-lab failures remain | `SUGGEST_ALTERNATIVE` |
| Lab constraints fail AND no suggestion | `REJECT` |
| Only policy/protocol fail (not lab constraints) | `REPORT_FEASIBILITY` |
| Suggestion exists but didn't improve | `REJECT` |
**Lab constraints = budget, equipment, reagents, schedule, staff (not protocol, not policy).**
### Classes
#### `DimensionCheck(BaseModel)` β `extra="forbid"`
| Field | Type | Default |
|-------|------|---------|
| `ok` | `bool` | `True` |
| `reasons` | `list[str]` | `[]` |
#### `FeasibilityCheckResult(BaseModel)` β `extra="forbid"`
| Field | Type |
|-------|------|
| `protocol` | `DimensionCheck` |
| `budget` | `DimensionCheck` |
| `equipment` | `DimensionCheck` |
| `reagents` | `DimensionCheck` |
| `schedule` | `DimensionCheck` |
| `staff` | `DimensionCheck` |
| `policy` | `DimensionCheck` |
| `estimated_cost` | `float` |
| `required_staff` | `int` |
| `substitution_options` | `dict[str, list[str]]` |
| `validation_result` | `ValidationResult` |
**Computed properties:** `protocol_ok`, `budget_ok`, `equipment_ok`, `reagents_ok`, `schedule_ok`, `staff_ok`, `feasible`, `summary`
#### `SuggestionChange(BaseModel)` β `extra="forbid"`
| Field | Type | Purpose |
|-------|------|---------|
| `field` | `str` | Which protocol field was changed |
| `original` | `str` | Original value (stringified) |
| `revised` | `str` | New value (stringified) |
| `reason` | `str` | Why it was changed |
| `tradeoff` | `str` | What is lost |
#### `AlternativeSuggestion(BaseModel)` β `extra="forbid"`
| Field | Type |
|-------|------|
| `revised_protocol` | `Protocol` |
| `applied_changes` | `list[SuggestionChange]` |
| `remaining_failures` | `list[str]` |
| `improved` | `bool` |
| `pre_check` | `FeasibilityCheckResult` |
| `post_check` | `FeasibilityCheckResult` |
### Type Aliases
```python
ExplanationRenderer = Callable[
[LabManagerActionType, FeasibilityCheckResult, Optional[AlternativeSuggestion]],
str,
]
```
|