DoAtlas Testing Guide
This document describes the test layers, how to run them locally, and the production-grade gating strategy for the DoAtlas monorepo.
TL;DR
# Fast, hermetic β runs in CI on every push
bash scripts/test-gate.sh
# Same battery + coverage reports under coverage/
bash scripts/test-gate.sh --coverage
# Add the live-stack Playwright suite (requires the running stack +
# E2E_LIVE_ADMIN_USERNAME / E2E_LIVE_ADMIN_PASSWORD secrets)
E2E_LIVE=1 bash scripts/test-gate.sh
Layers
| Layer | Where | What it covers | Runtime |
|---|---|---|---|
| Node unit/integration | artifacts/api-server/src/**/__tests__/*.test.ts |
Express routes, drug-store queries, agent supervisor, codex CLI helpers, latency-alert, HMAC research bridge, full register/login/role-gate flow | ~25 s |
| Memory unit + integration | artifacts/api-server/src/lib/__tests__/memory*.test.mjs |
Long-term memory store contract | ~5 s |
| Python pytest | artifacts/research-engine/tests/*.py |
Engine HMAC contract, drug network store, networks roundtrip, smoke pipelines | ~10 s |
| Playwright (mocked) | artifacts/doatlas-web/tests/e2e/ |
UI flows against a mocked API | ~30 s |
| Playwright (live) | artifacts/doatlas-web/tests/e2e-live/ |
End-to-end against the already-running stack β opt-in only. Coverage is currently partial: only auth-flow.spec.ts is fully implemented; tool-using-chat, target-discovery, interrupt-and-refresh, admin-panels, and logout are scaffolded as test.fixme and tracked under follow-up #144. |
minutes |
Running individual layers
api-server (Node)
pnpm --filter @workspace/api-server run test # full suite
pnpm --filter @workspace/api-server run test:coverage # under c8
pnpm --filter @workspace/api-server run test:agent # focused
The auth-flow.test.ts, research-bridge.test.ts,
shadow-budget.integration.test.ts, and
shadow-sampling-load.test.ts (Task #190) suites spin up the real
Express app and/or open a per-test PostgreSQL schema (via the
options=-c search_path=... URL parameter). They require
DATABASE_URL to be set and skip themselves cleanly otherwise on dev
machines without a local Postgres. In CI / the test gate they must
not skip β see "CI requirements" below.
Auth model under test. The api-server is a stateless Bearer-token
service: /api/auth/login and /api/auth/register return a
{ access_token, token_type: "Bearer" } JSON body, and protected routes
read Authorization: Bearer <token>. There is no server-side session
table, so /api/auth/logout is intentionally a stateless 204 β it
exists for client symmetry, and the client is responsible for dropping
the token. The auth-flow.test.ts contract reflects this design: it
asserts token issue β /api/auth/me round-trip with the bearer header
β admin role-gate β stateless logout. If the auth model is ever changed
to cookie/session, both the routes and the tests must be updated
together (tracked under the auth follow-up).
research-engine (Python)
pnpm --filter @workspace/research-engine run test
pnpm --filter @workspace/research-engine run test:coverage
Coverage HTML lands in coverage/research-engine/.
doatlas-web (Playwright)
Mocked, hermetic suite (default):
pnpm --filter @workspace/doatlas-web exec playwright test
Live-stack suite (against the running api-server + research-engine +
Postgres in this Replit env, or any deployment behind
E2E_LIVE_BASE_URL):
export E2E_LIVE=1
export E2E_LIVE_ADMIN_USERNAME=... # pre-seeded admin
export E2E_LIVE_ADMIN_PASSWORD=...
pnpm --filter @workspace/doatlas-web exec playwright test \
--config=playwright.live.config.ts
The live-stack model is pinned to Anthropic Sonnet via the
replit-integration provider so latency / cost stay predictable.
Workflow gates in the Replit environment
Two workflows enforce the test gate inside this Repl:
agent-loop-testsβ runspnpm --filter @workspace/api-server teston every workflow restart. Flagged as a validation step.Provider Smoke Checkβ runspnpm run smoke:providersagainst the live LLM adapters. Treated as informational because outcomes depend on third-party availability and credential coverage.
The dataset bootstrap workflow (Dataset Download) is wrapped by
scripts/dataset-download.sh, which short-circuits when the DuckDB
snapshot has already been extracted under
artifacts/research-engine/data/snapshots/SNAP:*.duckdb. Likewise the
research-engine artifact's dev command is wrapped by
scripts/start-research-engine.sh, which no-ops (and sleep infinitys)
when a healthy instance is already serving on port 8011 β this stops
the workflow manager from flapping FAILED β RUNNING during double-starts.
CI requirements
scripts/test-gate.sh is the single entry point for the pre-merge
gate (invoked by the Test Gate workflow in .replit and by any
external CI runner that wraps pnpm run test:gate). It enforces two
non-obvious environmental contracts that future contributors must
preserve when refactoring the gate or porting it to a new runner:
DATABASE_URLmust be set. The gate hard-fails (exit 65) ifDATABASE_URLis unset, because several api-server suites β the auth flow, research-bridge HMAC contract, shadow-budget integration, and especially the shadow A/B sampling load test (shadow-sampling-load.test.ts, Task #190) β self-skip without a Postgres DSN. Without this gate they would silently disappear in CI, masking regressions in the shadow fire-and-forget path. Pre-merge runs must therefore provision a Postgres (any per-job ephemeral instance is fine; each test creates its ownsearch_pathschema).TEST_GATE=1is exported by the gate. Tests that self-skip on missing infrastructure should treatTEST_GATE=1(or the standardCI=1) as "no skipping allowed" and turn the skip into a hard failure. The shadow-sampling load test already does this. New Postgres-dependent suites should follow the same pattern so the gate cannot silently lose coverage.
In other words: if you are running the gate from a fresh runner, set
DATABASE_URL first. If you are adding a new test that needs
infrastructure, gate the skip on !process.env.TEST_GATE && !process.env.CI.
Coverage
Pass --coverage to scripts/test-gate.sh (or invoke the per-package
test:coverage scripts directly). Reports are written to:
coverage/api-server/index.html(c8, HTML)coverage/research-engine/index.html(pytest-cov, HTML)
The coverage targets are deliberately not enforced as gates β they exist so PR authors can spot regressions in the modules they touched.
What is intentionally NOT in the gate
- Provider smoke check β surfaces 4xx/5xx from upstream model providers as a warning, never a hard failure (transient network flake otherwise breaks every CI run).
E2E_LIVEPlaywright β needs network egress, real LLM calls, and an admin login. Run on demand or in a dedicated nightly job.
Troubleshooting
column "..." does not existfrom the auth tests β your test fixture is out of sync withlib/db/src/schema/. Add the missing column to theCREATE TABLEblocks at the top ofauth-flow.test.ts.Address already in useon port 8011 β anotherpython3 -m app.serveris still running. The wrapper script will detect a healthy instance and skip; if it's a crashed instance holding the port,pkill -f "app.server"and restart.- Playwright
webServertimeout β the mocked suite spawns its own Vite preview onE2E_PORT(default 5179). Make sure nothing else is bound to it.