Spaces:

openenv-community
/

replicalab

Running

App Files Files Community

replicalab / docs /map /scenarios.md

maxxie114

Initial HF Spaces deployment

80d8c84 2 days ago

preview code

raw

history blame contribute delete

6.52 kB

Scenarios Map — `replicalab/scenarios/`

Normalized scenario generation across 3 domains with seeded determinism.

Tasks implemented: SCN 01-12

Entry Point

`generate_scenario(seed, template, difficulty) -> NormalizedScenarioPack`

Located in templates.py. The main public API.

Flow:

seed_rng(seed) → deterministic random.Random instance
load_template(template) → picks the template builder function
builder(rng) → raw draft dict (randomly selects one of 2 cases per domain)
apply_difficulty(draft, difficulty, rng) → scales budget, time, staff, resources
_build_pack(seed, template, draft) → constructs NormalizedScenarioPack

`available_scenario_families() -> list[dict]`

Returns [{"family": name, "difficulties": ["easy", "medium", "hard"]}] for each template.

Core Data Classes (all in `templates.py`)

`NormalizedScenarioPack(BaseModel)` — `extra="forbid"`

The complete scenario definition. Every downstream consumer uses this.

Field	Type	Source
`scenario_id`	`str`	`"{template}_{seed}"`
`template`	`TemplateName`	input param
`domain_id`	`str`	from template case
`difficulty`	`Difficulty`	input param
`seed`	`int`	input param
`task_summary`	`str`	from template case
`success_criteria`	`list[str]`	from template case
`constraints`	`list[ScenarioConstraint]`	from template + difficulty scaling
`resources`	`list[ScenarioResource]`	from template + difficulty scaling
`allowed_substitutions`	`list[AllowedSubstitution]`	from template case
`hidden_reference_spec`	`HiddenReferenceSpec`	from template case
`scientist_observation`	`ScientistObservation`	built from case fields
`lab_manager_observation`	`LabManagerObservation`	built from case fields

`ScenarioConstraint(BaseModel)`

Field	Type	Example
`key`	`str`	`"gpu_hours"`
`label`	`str`	`"Maximum GPU budget"`
`quantity`	`float \| int \| None`	`8`
`unit`	`str \| None`	`"gpu_hours"`
`comparator`	`Literal["<=", ">=", "="]`	`"<="`
`hard`	`bool`	`True`
`details`	`str`	`"The full run must fit within eight GPU-hours."`

`ScenarioResource(BaseModel)`

Field	Type	Example
`key`	`str`	`"gpu_node"`
`label`	`str`	`"A100 GPU node"`
`quantity`	`float \| int \| None`	`1`
`unit`	`str \| None`	`"node"`
`available`	`bool`	`True`
`category`	`str`	`"compute"`
`details`	`str`	`"Reserved for one benchmark run at a time."`

`AllowedSubstitution(BaseModel)`

Field	Type	Example
`original`	`str`	`"A100 GPU node"`
`alternative`	`str`	`"V100 GPU node"`
`condition`	`str`	`"Use if A100 is booked."`
`tradeoff`	`str`	`"V100 is slower; extend training by ~30%."`

`HiddenReferenceSpec(BaseModel)`

Ground truth the judge uses to score fidelity. The scientist never sees this.

Field	Type	Example
`summary`	`str`	`"A valid plan keeps the published split..."`
`required_elements`	`list[str]`	`["published data split", "held-out accuracy evaluation"]`
`flexible_elements`	`list[str]`	`["batch size", "learning-rate schedule"]`
`target_metric`	`str`	`"held_out_accuracy"`
`target_value`	`str`	`"within one point of the reported baseline"`

Template Builders

Each returns a raw dict[str, Any] with one randomly selected case.

`build_math_reasoning_template(rng)` — `math_reasoning.py`

Domain: mathematics
Case A: Cauchy-Schwarz inequality — structured proof verification
Case B: Jensen's inequality — convexity-based proof
Equipment: Structured proof notebook, Automated proof checker
Reagents: Graduate reviewer, Reference textbook
Substitutions: Graduate reviewer → self-check rubric

`build_ml_benchmark_template(rng)` — `ml_benchmark.py`

Domain: machine_learning
Case A: AG News TinyBERT — text classification replication
Case B: CIFAR-10 ResNet-18 — image classification replication
Equipment: A100 GPU node, Dataset mirror, Experiment tracker
Reagents: Pre-trained checkpoint, Evaluation harness
Substitutions: A100 → V100 (slower), full dataset → stratified sample

`build_finance_trading_template(rng)` — `finance_trading.py`

Domain: finance_trading
Case A: SPY/QQQ mean-reversion — pairs trading backtest
Case B: Momentum futures — trend-following strategy
Equipment: Backtest engine, Historical daily bar dataset
Reagents: Risk reviewer, Compliance packet
Substitutions: Daily bars → weekly bars, risk reviewer → automated risk check
Safety restrictions: offline-only execution policy

Difficulty Scaling — `apply_difficulty(draft, difficulty, rng)`

Parameter	Easy	Medium	Hard
`budget_total`	×1.15	×0.95	×0.80
`time_limit_days`	unchanged	−1 day	−1 day
`staff_count`	unchanged	unchanged	−1 person
Resources tightened	0	1	2
Conflict constraint	no	yes (1)	yes (1)

_tighten_one_resource: picks a random resource, sets available=False. _append_conflict_constraint: adds a soft constraint noting resource conflict.

Utility — `replicalab/utils/seed.py`

Function	Purpose
`get_deterministic_seed(seed, namespace)`	SHA256-based child seed derivation
`seed_rng(seed, namespace)`	Returns `random.Random(derived_seed)`

Type Aliases

Difficulty = Literal["easy", "medium", "hard"]
TemplateName = Literal["math_reasoning", "ml_benchmark", "finance_trading"]
TemplateBuilder = Callable[[Any], dict[str, Any]]

Constants

GOLDEN_SCENARIO_SPECS_PATH = Path("tests/fixtures/golden_scenarios.json")

Who Consumes This

validation.py — reads constraints, resources, substitutions, hidden_reference_spec
lab_manager_policy.py — reads lab_manager_observation, substitutions, constraints
scientist_policy.py — reads scenario pack for system prompt generation
server/app.py — calls generate_scenario() on reset, stores pack for lab manager
scoring/ (future) — will read hidden_reference_spec for fidelity scoring

Scenarios Map — replicalab/scenarios/

Entry Point

generate_scenario(seed, template, difficulty) -> NormalizedScenarioPack

available_scenario_families() -> list[dict]

Core Data Classes (all in templates.py)

NormalizedScenarioPack(BaseModel) — extra="forbid"

ScenarioConstraint(BaseModel)

ScenarioResource(BaseModel)

AllowedSubstitution(BaseModel)

HiddenReferenceSpec(BaseModel)