Spaces:
Running
Running
File size: 6,517 Bytes
80d8c84 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | # Scenarios Map β `replicalab/scenarios/`
> Normalized scenario generation across 3 domains with seeded determinism.
>
> **Tasks implemented:** SCN 01-12
## Entry Point
### `generate_scenario(seed, template, difficulty) -> NormalizedScenarioPack`
Located in `templates.py`. The main public API.
**Flow:**
1. `seed_rng(seed)` β deterministic `random.Random` instance
2. `load_template(template)` β picks the template builder function
3. `builder(rng)` β raw draft dict (randomly selects one of 2 cases per domain)
4. `apply_difficulty(draft, difficulty, rng)` β scales budget, time, staff, resources
5. `_build_pack(seed, template, draft)` β constructs `NormalizedScenarioPack`
### `available_scenario_families() -> list[dict]`
Returns `[{"family": name, "difficulties": ["easy", "medium", "hard"]}]` for each template.
## Core Data Classes (all in `templates.py`)
### `NormalizedScenarioPack(BaseModel)` β `extra="forbid"`
The complete scenario definition. Every downstream consumer uses this.
| Field | Type | Source |
|-------|------|--------|
| `scenario_id` | `str` | `"{template}_{seed}"` |
| `template` | `TemplateName` | input param |
| `domain_id` | `str` | from template case |
| `difficulty` | `Difficulty` | input param |
| `seed` | `int` | input param |
| `task_summary` | `str` | from template case |
| `success_criteria` | `list[str]` | from template case |
| `constraints` | `list[ScenarioConstraint]` | from template + difficulty scaling |
| `resources` | `list[ScenarioResource]` | from template + difficulty scaling |
| `allowed_substitutions` | `list[AllowedSubstitution]` | from template case |
| `hidden_reference_spec` | `HiddenReferenceSpec` | from template case |
| `scientist_observation` | `ScientistObservation` | built from case fields |
| `lab_manager_observation` | `LabManagerObservation` | built from case fields |
### `ScenarioConstraint(BaseModel)`
| Field | Type | Example |
|-------|------|---------|
| `key` | `str` | `"gpu_hours"` |
| `label` | `str` | `"Maximum GPU budget"` |
| `quantity` | `float \| int \| None` | `8` |
| `unit` | `str \| None` | `"gpu_hours"` |
| `comparator` | `Literal["<=", ">=", "="]` | `"<="` |
| `hard` | `bool` | `True` |
| `details` | `str` | `"The full run must fit within eight GPU-hours."` |
### `ScenarioResource(BaseModel)`
| Field | Type | Example |
|-------|------|---------|
| `key` | `str` | `"gpu_node"` |
| `label` | `str` | `"A100 GPU node"` |
| `quantity` | `float \| int \| None` | `1` |
| `unit` | `str \| None` | `"node"` |
| `available` | `bool` | `True` |
| `category` | `str` | `"compute"` |
| `details` | `str` | `"Reserved for one benchmark run at a time."` |
### `AllowedSubstitution(BaseModel)`
| Field | Type | Example |
|-------|------|---------|
| `original` | `str` | `"A100 GPU node"` |
| `alternative` | `str` | `"V100 GPU node"` |
| `condition` | `str` | `"Use if A100 is booked."` |
| `tradeoff` | `str` | `"V100 is slower; extend training by ~30%."` |
### `HiddenReferenceSpec(BaseModel)`
Ground truth the judge uses to score fidelity. The scientist never sees this.
| Field | Type | Example |
|-------|------|---------|
| `summary` | `str` | `"A valid plan keeps the published split..."` |
| `required_elements` | `list[str]` | `["published data split", "held-out accuracy evaluation"]` |
| `flexible_elements` | `list[str]` | `["batch size", "learning-rate schedule"]` |
| `target_metric` | `str` | `"held_out_accuracy"` |
| `target_value` | `str` | `"within one point of the reported baseline"` |
## Template Builders
Each returns a raw `dict[str, Any]` with one randomly selected case.
### `build_math_reasoning_template(rng)` β `math_reasoning.py`
- **Domain:** `mathematics`
- **Case A:** Cauchy-Schwarz inequality β structured proof verification
- **Case B:** Jensen's inequality β convexity-based proof
- **Equipment:** Structured proof notebook, Automated proof checker
- **Reagents:** Graduate reviewer, Reference textbook
- **Substitutions:** Graduate reviewer β self-check rubric
### `build_ml_benchmark_template(rng)` β `ml_benchmark.py`
- **Domain:** `machine_learning`
- **Case A:** AG News TinyBERT β text classification replication
- **Case B:** CIFAR-10 ResNet-18 β image classification replication
- **Equipment:** A100 GPU node, Dataset mirror, Experiment tracker
- **Reagents:** Pre-trained checkpoint, Evaluation harness
- **Substitutions:** A100 β V100 (slower), full dataset β stratified sample
### `build_finance_trading_template(rng)` β `finance_trading.py`
- **Domain:** `finance_trading`
- **Case A:** SPY/QQQ mean-reversion β pairs trading backtest
- **Case B:** Momentum futures β trend-following strategy
- **Equipment:** Backtest engine, Historical daily bar dataset
- **Reagents:** Risk reviewer, Compliance packet
- **Substitutions:** Daily bars β weekly bars, risk reviewer β automated risk check
- **Safety restrictions:** offline-only execution policy
## Difficulty Scaling β `apply_difficulty(draft, difficulty, rng)`
| Parameter | Easy | Medium | Hard |
|-----------|------|--------|------|
| `budget_total` | Γ1.15 | Γ0.95 | Γ0.80 |
| `time_limit_days` | unchanged | β1 day | β1 day |
| `staff_count` | unchanged | unchanged | β1 person |
| Resources tightened | 0 | 1 | 2 |
| Conflict constraint | no | yes (1) | yes (1) |
**`_tighten_one_resource`**: picks a random resource, sets `available=False`.
**`_append_conflict_constraint`**: adds a soft constraint noting resource conflict.
## Utility β `replicalab/utils/seed.py`
| Function | Purpose |
|----------|---------|
| `get_deterministic_seed(seed, namespace)` | SHA256-based child seed derivation |
| `seed_rng(seed, namespace)` | Returns `random.Random(derived_seed)` |
## Type Aliases
```python
Difficulty = Literal["easy", "medium", "hard"]
TemplateName = Literal["math_reasoning", "ml_benchmark", "finance_trading"]
TemplateBuilder = Callable[[Any], dict[str, Any]]
```
## Constants
```python
GOLDEN_SCENARIO_SPECS_PATH = Path("tests/fixtures/golden_scenarios.json")
```
## Who Consumes This
- **`validation.py`** β reads constraints, resources, substitutions, hidden_reference_spec
- **`lab_manager_policy.py`** β reads lab_manager_observation, substitutions, constraints
- **`scientist_policy.py`** β reads scenario pack for system prompt generation
- **`server/app.py`** β calls `generate_scenario()` on reset, stores pack for lab manager
- **`scoring/`** (future) β will read hidden_reference_spec for fidelity scoring
|