Spaces:
Running
Running
| # Scenarios Map β `replicalab/scenarios/` | |
| > Normalized scenario generation across 3 domains with seeded determinism. | |
| > | |
| > **Tasks implemented:** SCN 01-12 | |
| ## Entry Point | |
| ### `generate_scenario(seed, template, difficulty) -> NormalizedScenarioPack` | |
| Located in `templates.py`. The main public API. | |
| **Flow:** | |
| 1. `seed_rng(seed)` β deterministic `random.Random` instance | |
| 2. `load_template(template)` β picks the template builder function | |
| 3. `builder(rng)` β raw draft dict (randomly selects one of 2 cases per domain) | |
| 4. `apply_difficulty(draft, difficulty, rng)` β scales budget, time, staff, resources | |
| 5. `_build_pack(seed, template, draft)` β constructs `NormalizedScenarioPack` | |
| ### `available_scenario_families() -> list[dict]` | |
| Returns `[{"family": name, "difficulties": ["easy", "medium", "hard"]}]` for each template. | |
| ## Core Data Classes (all in `templates.py`) | |
| ### `NormalizedScenarioPack(BaseModel)` β `extra="forbid"` | |
| The complete scenario definition. Every downstream consumer uses this. | |
| | Field | Type | Source | | |
| |-------|------|--------| | |
| | `scenario_id` | `str` | `"{template}_{seed}"` | | |
| | `template` | `TemplateName` | input param | | |
| | `domain_id` | `str` | from template case | | |
| | `difficulty` | `Difficulty` | input param | | |
| | `seed` | `int` | input param | | |
| | `task_summary` | `str` | from template case | | |
| | `success_criteria` | `list[str]` | from template case | | |
| | `constraints` | `list[ScenarioConstraint]` | from template + difficulty scaling | | |
| | `resources` | `list[ScenarioResource]` | from template + difficulty scaling | | |
| | `allowed_substitutions` | `list[AllowedSubstitution]` | from template case | | |
| | `hidden_reference_spec` | `HiddenReferenceSpec` | from template case | | |
| | `scientist_observation` | `ScientistObservation` | built from case fields | | |
| | `lab_manager_observation` | `LabManagerObservation` | built from case fields | | |
| ### `ScenarioConstraint(BaseModel)` | |
| | Field | Type | Example | | |
| |-------|------|---------| | |
| | `key` | `str` | `"gpu_hours"` | | |
| | `label` | `str` | `"Maximum GPU budget"` | | |
| | `quantity` | `float \| int \| None` | `8` | | |
| | `unit` | `str \| None` | `"gpu_hours"` | | |
| | `comparator` | `Literal["<=", ">=", "="]` | `"<="` | | |
| | `hard` | `bool` | `True` | | |
| | `details` | `str` | `"The full run must fit within eight GPU-hours."` | | |
| ### `ScenarioResource(BaseModel)` | |
| | Field | Type | Example | | |
| |-------|------|---------| | |
| | `key` | `str` | `"gpu_node"` | | |
| | `label` | `str` | `"A100 GPU node"` | | |
| | `quantity` | `float \| int \| None` | `1` | | |
| | `unit` | `str \| None` | `"node"` | | |
| | `available` | `bool` | `True` | | |
| | `category` | `str` | `"compute"` | | |
| | `details` | `str` | `"Reserved for one benchmark run at a time."` | | |
| ### `AllowedSubstitution(BaseModel)` | |
| | Field | Type | Example | | |
| |-------|------|---------| | |
| | `original` | `str` | `"A100 GPU node"` | | |
| | `alternative` | `str` | `"V100 GPU node"` | | |
| | `condition` | `str` | `"Use if A100 is booked."` | | |
| | `tradeoff` | `str` | `"V100 is slower; extend training by ~30%."` | | |
| ### `HiddenReferenceSpec(BaseModel)` | |
| Ground truth the judge uses to score fidelity. The scientist never sees this. | |
| | Field | Type | Example | | |
| |-------|------|---------| | |
| | `summary` | `str` | `"A valid plan keeps the published split..."` | | |
| | `required_elements` | `list[str]` | `["published data split", "held-out accuracy evaluation"]` | | |
| | `flexible_elements` | `list[str]` | `["batch size", "learning-rate schedule"]` | | |
| | `target_metric` | `str` | `"held_out_accuracy"` | | |
| | `target_value` | `str` | `"within one point of the reported baseline"` | | |
| ## Template Builders | |
| Each returns a raw `dict[str, Any]` with one randomly selected case. | |
| ### `build_math_reasoning_template(rng)` β `math_reasoning.py` | |
| - **Domain:** `mathematics` | |
| - **Case A:** Cauchy-Schwarz inequality β structured proof verification | |
| - **Case B:** Jensen's inequality β convexity-based proof | |
| - **Equipment:** Structured proof notebook, Automated proof checker | |
| - **Reagents:** Graduate reviewer, Reference textbook | |
| - **Substitutions:** Graduate reviewer β self-check rubric | |
| ### `build_ml_benchmark_template(rng)` β `ml_benchmark.py` | |
| - **Domain:** `machine_learning` | |
| - **Case A:** AG News TinyBERT β text classification replication | |
| - **Case B:** CIFAR-10 ResNet-18 β image classification replication | |
| - **Equipment:** A100 GPU node, Dataset mirror, Experiment tracker | |
| - **Reagents:** Pre-trained checkpoint, Evaluation harness | |
| - **Substitutions:** A100 β V100 (slower), full dataset β stratified sample | |
| ### `build_finance_trading_template(rng)` β `finance_trading.py` | |
| - **Domain:** `finance_trading` | |
| - **Case A:** SPY/QQQ mean-reversion β pairs trading backtest | |
| - **Case B:** Momentum futures β trend-following strategy | |
| - **Equipment:** Backtest engine, Historical daily bar dataset | |
| - **Reagents:** Risk reviewer, Compliance packet | |
| - **Substitutions:** Daily bars β weekly bars, risk reviewer β automated risk check | |
| - **Safety restrictions:** offline-only execution policy | |
| ## Difficulty Scaling β `apply_difficulty(draft, difficulty, rng)` | |
| | Parameter | Easy | Medium | Hard | | |
| |-----------|------|--------|------| | |
| | `budget_total` | Γ1.15 | Γ0.95 | Γ0.80 | | |
| | `time_limit_days` | unchanged | β1 day | β1 day | | |
| | `staff_count` | unchanged | unchanged | β1 person | | |
| | Resources tightened | 0 | 1 | 2 | | |
| | Conflict constraint | no | yes (1) | yes (1) | | |
| **`_tighten_one_resource`**: picks a random resource, sets `available=False`. | |
| **`_append_conflict_constraint`**: adds a soft constraint noting resource conflict. | |
| ## Utility β `replicalab/utils/seed.py` | |
| | Function | Purpose | | |
| |----------|---------| | |
| | `get_deterministic_seed(seed, namespace)` | SHA256-based child seed derivation | | |
| | `seed_rng(seed, namespace)` | Returns `random.Random(derived_seed)` | | |
| ## Type Aliases | |
| ```python | |
| Difficulty = Literal["easy", "medium", "hard"] | |
| TemplateName = Literal["math_reasoning", "ml_benchmark", "finance_trading"] | |
| TemplateBuilder = Callable[[Any], dict[str, Any]] | |
| ``` | |
| ## Constants | |
| ```python | |
| GOLDEN_SCENARIO_SPECS_PATH = Path("tests/fixtures/golden_scenarios.json") | |
| ``` | |
| ## Who Consumes This | |
| - **`validation.py`** β reads constraints, resources, substitutions, hidden_reference_spec | |
| - **`lab_manager_policy.py`** β reads lab_manager_observation, substitutions, constraints | |
| - **`scientist_policy.py`** β reads scenario pack for system prompt generation | |
| - **`server/app.py`** β calls `generate_scenario()` on reset, stores pack for lab manager | |
| - **`scoring/`** (future) β will read hidden_reference_spec for fidelity scoring | |