replicalab / docs /map /README.md
maxxie114's picture
Initial HF Spaces deployment
80d8c84

ReplicaLab Project Map

Living reference of every module, class, function, and relationship. Updated after each implementation session.

Last updated: 2026-03-07 (JDG 01-03 scoring implemented)

Module Index

File What it covers
models.md Data contracts β€” actions, observations, protocol, reward, episode state
scenarios.md Scenario generation β€” templates, constraints, resources, hidden specs
agents.md Agent policies β€” scientist prompt/parse/retry, lab manager feasibility/suggest/compose
validation.md Protocol validation β€” deterministic checks against scenario constraints
scoring.md Judge scoring β€” rigor, feasibility, fidelity
server.md FastAPI server β€” REST + WebSocket endpoints, stub environment
frontend.md React UI β€” dashboard, episode viewer, components
config.md Shared constants β€” rounds, budget, timeouts
tests.md Test coverage β€” 87 tests across 6 files

Dependency Graph

server/app.py
 β”œβ”€β”€ replicalab.config
 β”œβ”€β”€ replicalab.models
 β”œβ”€β”€ replicalab.scenarios (generate_scenario, available_scenario_families)
 └── replicalab.agents (check_feasibility, suggest_alternative, compose_lab_manager_response)

replicalab/agents/scientist_policy.py
 β”œβ”€β”€ replicalab.models (ScientistAction, ScientistObservation, Protocol, ConversationEntry)
 └── replicalab.scenarios (NormalizedScenarioPack)

replicalab/agents/lab_manager_policy.py
 β”œβ”€β”€ replicalab.models (LabManagerAction, LabManagerActionType, Protocol)
 β”œβ”€β”€ replicalab.scenarios (NormalizedScenarioPack)
 └── replicalab.utils.validation (ValidationResult, validate_protocol)

replicalab/scenarios/templates.py
 β”œβ”€β”€ replicalab.config (MAX_BUDGET, MAX_ROUNDS)
 β”œβ”€β”€ replicalab.models (ScientistObservation, LabManagerObservation)
 β”œβ”€β”€ replicalab.scenarios.{math_reasoning, ml_benchmark, finance_trading}
 └── replicalab.utils.seed (seed_rng)

replicalab/utils/validation.py
 β”œβ”€β”€ replicalab.models (Protocol)
 └── replicalab.scenarios.templates (NormalizedScenarioPack)

replicalab/scoring/
 β”œβ”€β”€ replicalab.models (Protocol, RewardBreakdown)
 β”œβ”€β”€ replicalab.scenarios (NormalizedScenarioPack, HiddenReferenceSpec)
 β”œβ”€β”€ replicalab.agents.lab_manager_policy (check_feasibility, FeasibilityCheckResult)
 └── replicalab.utils.text (element_tokens, normalize_label)

File Tree (implemented only)

replicalab/
 β”œβ”€β”€ __init__.py              (empty)
 β”œβ”€β”€ config.py                (shared constants)
 β”œβ”€β”€ models.py                (25 classes β€” all data contracts)
 β”œβ”€β”€ agents/
 β”‚   β”œβ”€β”€ __init__.py          (re-exports from submodules)
 β”‚   β”œβ”€β”€ scientist_policy.py  (AGT 01-04: prompt, formatter, parser, retry, baseline)
 β”‚   └── lab_manager_policy.py(AGT 05-07: feasibility, suggest, compose)
 β”œβ”€β”€ scenarios/
 β”‚   β”œβ”€β”€ __init__.py          (re-exports from templates)
 β”‚   β”œβ”€β”€ templates.py         (NormalizedScenarioPack, generate_scenario, apply_difficulty)
 β”‚   β”œβ”€β”€ math_reasoning.py    (2 cases: Cauchy-Schwarz, Jensen's inequality)
 β”‚   β”œβ”€β”€ ml_benchmark.py      (2 cases: AG News TinyBERT, CIFAR-10 ResNet-18)
 β”‚   └── finance_trading.py   (2 cases: SPY/QQQ mean-reversion, momentum futures)
 β”œβ”€β”€ scoring/
 β”‚   β”œβ”€β”€ __init__.py          (exports score_rigor, score_feasibility, score_fidelity)
 β”‚   β”œβ”€β”€ rigor.py             (JDG 01: structural quality + criteria coverage)
 β”‚   β”œβ”€β”€ feasibility.py       (JDG 02: wraps FeasibilityCheckResult with partial credit)
 β”‚   └── fidelity.py          (JDG 03: substitution-aware hidden spec alignment)
 └── utils/
     β”œβ”€β”€ seed.py              (deterministic RNG from SHA256)
     β”œβ”€β”€ text.py              (shared token matching: normalize_label, element_tokens)
     └── validation.py        (MOD 05: protocol validation, 5 checks)

server/
 └── app.py                   (FastAPI + WebSocket + _StubEnv)

frontend/
 β”œβ”€β”€ package.json             (React 19, Three.js, Framer Motion, Recharts, Tailwind)
 β”œβ”€β”€ src/
 β”‚   β”œβ”€β”€ App.tsx              (router: /, /episode, /episode/:id)
 β”‚   β”œβ”€β”€ types/index.ts       (TypeScript interfaces mirroring Python models)
 β”‚   β”œβ”€β”€ lib/
 β”‚   β”‚   β”œβ”€β”€ api.ts           (REST + WebSocket client + mock data generators)
 β”‚   β”‚   β”œβ”€β”€ audio.ts         (audio utilities)
 β”‚   β”‚   └── utils.ts         (shared helpers)
 β”‚   β”œβ”€β”€ components/          (15 React components)
 β”‚   └── pages/               (DashboardPage, EpisodePage)
 └── vite.config.ts

tests/
 β”œβ”€β”€ test_config.py           (3 tests)
 β”œβ”€β”€ test_models.py           (15 tests)
 β”œβ”€β”€ test_scenarios.py        (8 tests)
 β”œβ”€β”€ test_validation.py       (13 tests)
 β”œβ”€β”€ test_scientist_policy.py (18 tests)
 β”œβ”€β”€ test_lab_manager_policy.py(13 tests)
 β”œβ”€β”€ test_reward.py           (18 tests β€” JDG 01-03 scoring)
 └── test_server.py           (5 tests β€” API endpoints)

Task Completion Status

Area Done Remaining Key gaps
Models (MOD) MOD 01-05, 09, 11-12 MOD 06 Semantic validators for impossible plans
Scenarios (SCN) SCN 01-12 SCN 13 Booking/scheduling data model
Agents (AGT) AGT 01-07, 11 AGT 08-10 LLM-backed scientist, model selection
Judge (JDG) JDG 01-03 JDG 04-08 Reward composition, bonuses, penalties
Environment (ENV) β€” ENV 01-11 Entire real environment
Server (API) API 01-04, 06 (partial) API 05, 07-10 Replay, auth, rate limiting
Frontend (FND) FND 01-10 β€” Complete
Training (TRN) β€” TRN 01-18 Entire RL pipeline