File size: 6,022 Bytes
80d8c84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# ReplicaLab Project Map

> Living reference of every module, class, function, and relationship.
> Updated after each implementation session.
>
> **Last updated:** 2026-03-07 (JDG 01-03 scoring implemented)

## Module Index

| File | What it covers |
|------|---------------|
| [models.md](models.md) | Data contracts β€” actions, observations, protocol, reward, episode state |
| [scenarios.md](scenarios.md) | Scenario generation β€” templates, constraints, resources, hidden specs |
| [agents.md](agents.md) | Agent policies β€” scientist prompt/parse/retry, lab manager feasibility/suggest/compose |
| [validation.md](validation.md) | Protocol validation β€” deterministic checks against scenario constraints |
| [scoring.md](scoring.md) | Judge scoring β€” rigor, feasibility, fidelity |
| [server.md](server.md) | FastAPI server β€” REST + WebSocket endpoints, stub environment |
| [frontend.md](frontend.md) | React UI β€” dashboard, episode viewer, components |
| [config.md](config.md) | Shared constants β€” rounds, budget, timeouts |
| [tests.md](tests.md) | Test coverage β€” 87 tests across 6 files |

## Dependency Graph

```
server/app.py
 β”œβ”€β”€ replicalab.config
 β”œβ”€β”€ replicalab.models
 β”œβ”€β”€ replicalab.scenarios (generate_scenario, available_scenario_families)
 └── replicalab.agents (check_feasibility, suggest_alternative, compose_lab_manager_response)

replicalab/agents/scientist_policy.py
 β”œβ”€β”€ replicalab.models (ScientistAction, ScientistObservation, Protocol, ConversationEntry)
 └── replicalab.scenarios (NormalizedScenarioPack)

replicalab/agents/lab_manager_policy.py
 β”œβ”€β”€ replicalab.models (LabManagerAction, LabManagerActionType, Protocol)
 β”œβ”€β”€ replicalab.scenarios (NormalizedScenarioPack)
 └── replicalab.utils.validation (ValidationResult, validate_protocol)

replicalab/scenarios/templates.py
 β”œβ”€β”€ replicalab.config (MAX_BUDGET, MAX_ROUNDS)
 β”œβ”€β”€ replicalab.models (ScientistObservation, LabManagerObservation)
 β”œβ”€β”€ replicalab.scenarios.{math_reasoning, ml_benchmark, finance_trading}
 └── replicalab.utils.seed (seed_rng)

replicalab/utils/validation.py
 β”œβ”€β”€ replicalab.models (Protocol)
 └── replicalab.scenarios.templates (NormalizedScenarioPack)

replicalab/scoring/
 β”œβ”€β”€ replicalab.models (Protocol, RewardBreakdown)
 β”œβ”€β”€ replicalab.scenarios (NormalizedScenarioPack, HiddenReferenceSpec)
 β”œβ”€β”€ replicalab.agents.lab_manager_policy (check_feasibility, FeasibilityCheckResult)
 └── replicalab.utils.text (element_tokens, normalize_label)
```

## File Tree (implemented only)

```
replicalab/
 β”œβ”€β”€ __init__.py              (empty)
 β”œβ”€β”€ config.py                (shared constants)
 β”œβ”€β”€ models.py                (25 classes β€” all data contracts)
 β”œβ”€β”€ agents/
 β”‚   β”œβ”€β”€ __init__.py          (re-exports from submodules)
 β”‚   β”œβ”€β”€ scientist_policy.py  (AGT 01-04: prompt, formatter, parser, retry, baseline)
 β”‚   └── lab_manager_policy.py(AGT 05-07: feasibility, suggest, compose)
 β”œβ”€β”€ scenarios/
 β”‚   β”œβ”€β”€ __init__.py          (re-exports from templates)
 β”‚   β”œβ”€β”€ templates.py         (NormalizedScenarioPack, generate_scenario, apply_difficulty)
 β”‚   β”œβ”€β”€ math_reasoning.py    (2 cases: Cauchy-Schwarz, Jensen's inequality)
 β”‚   β”œβ”€β”€ ml_benchmark.py      (2 cases: AG News TinyBERT, CIFAR-10 ResNet-18)
 β”‚   └── finance_trading.py   (2 cases: SPY/QQQ mean-reversion, momentum futures)
 β”œβ”€β”€ scoring/
 β”‚   β”œβ”€β”€ __init__.py          (exports score_rigor, score_feasibility, score_fidelity)
 β”‚   β”œβ”€β”€ rigor.py             (JDG 01: structural quality + criteria coverage)
 β”‚   β”œβ”€β”€ feasibility.py       (JDG 02: wraps FeasibilityCheckResult with partial credit)
 β”‚   └── fidelity.py          (JDG 03: substitution-aware hidden spec alignment)
 └── utils/
     β”œβ”€β”€ seed.py              (deterministic RNG from SHA256)
     β”œβ”€β”€ text.py              (shared token matching: normalize_label, element_tokens)
     └── validation.py        (MOD 05: protocol validation, 5 checks)

server/
 └── app.py                   (FastAPI + WebSocket + _StubEnv)

frontend/
 β”œβ”€β”€ package.json             (React 19, Three.js, Framer Motion, Recharts, Tailwind)
 β”œβ”€β”€ src/
 β”‚   β”œβ”€β”€ App.tsx              (router: /, /episode, /episode/:id)
 β”‚   β”œβ”€β”€ types/index.ts       (TypeScript interfaces mirroring Python models)
 β”‚   β”œβ”€β”€ lib/
 β”‚   β”‚   β”œβ”€β”€ api.ts           (REST + WebSocket client + mock data generators)
 β”‚   β”‚   β”œβ”€β”€ audio.ts         (audio utilities)
 β”‚   β”‚   └── utils.ts         (shared helpers)
 β”‚   β”œβ”€β”€ components/          (15 React components)
 β”‚   └── pages/               (DashboardPage, EpisodePage)
 └── vite.config.ts

tests/
 β”œβ”€β”€ test_config.py           (3 tests)
 β”œβ”€β”€ test_models.py           (15 tests)
 β”œβ”€β”€ test_scenarios.py        (8 tests)
 β”œβ”€β”€ test_validation.py       (13 tests)
 β”œβ”€β”€ test_scientist_policy.py (18 tests)
 β”œβ”€β”€ test_lab_manager_policy.py(13 tests)
 β”œβ”€β”€ test_reward.py           (18 tests β€” JDG 01-03 scoring)
 └── test_server.py           (5 tests β€” API endpoints)
```

## Task Completion Status

| Area | Done | Remaining | Key gaps |
|------|------|-----------|----------|
| Models (MOD) | MOD 01-05, 09, 11-12 | MOD 06 | Semantic validators for impossible plans |
| Scenarios (SCN) | SCN 01-12 | SCN 13 | Booking/scheduling data model |
| Agents (AGT) | AGT 01-07, 11 | AGT 08-10 | LLM-backed scientist, model selection |
| Judge (JDG) | JDG 01-03 | JDG 04-08 | Reward composition, bonuses, penalties |
| Environment (ENV) | β€” | ENV 01-11 | Entire real environment |
| Server (API) | API 01-04, 06 (partial) | API 05, 07-10 | Replay, auth, rate limiting |
| Frontend (FND) | FND 01-10 | β€” | Complete |
| Training (TRN) | β€” | TRN 01-18 | Entire RL pipeline |