DoAtlas Testing Guide

This document describes the test layers, how to run them locally, and the production-grade gating strategy for the DoAtlas monorepo.

TL;DR

# Fast, hermetic — runs in CI on every push
bash scripts/test-gate.sh

# Same battery + coverage reports under coverage/
bash scripts/test-gate.sh --coverage

# Add the live-stack Playwright suite (requires the running stack +
# E2E_LIVE_ADMIN_USERNAME / E2E_LIVE_ADMIN_PASSWORD secrets)
E2E_LIVE=1 bash scripts/test-gate.sh

Layers

Layer	Where	What it covers	Runtime
Node unit/integration	`artifacts/api-server/src/*/__tests__/.test.ts`	Express routes, drug-store queries, agent supervisor, codex CLI helpers, latency-alert, HMAC research bridge, full register/login/role-gate flow	~25 s
Memory unit + integration	`artifacts/api-server/src/lib/__tests__/memory*.test.mjs`	Long-term memory store contract	~5 s
Python pytest	`artifacts/research-engine/tests/*.py`	Engine HMAC contract, drug network store, networks roundtrip, smoke pipelines	~10 s
Playwright (mocked)	`artifacts/doatlas-web/tests/e2e/`	UI flows against a mocked API	~30 s
Playwright (live)	`artifacts/doatlas-web/tests/e2e-live/`	End-to-end against the already-running stack — opt-in only. Coverage is currently partial: only `auth-flow.spec.ts` is fully implemented; `tool-using-chat`, `target-discovery`, `interrupt-and-refresh`, `admin-panels`, and `logout` are scaffolded as `test.fixme` and tracked under follow-up #144.	minutes

Running individual layers

api-server (Node)

pnpm --filter @workspace/api-server run test            # full suite
pnpm --filter @workspace/api-server run test:coverage   # under c8
pnpm --filter @workspace/api-server run test:agent      # focused

The auth-flow.test.ts, research-bridge.test.ts, shadow-budget.integration.test.ts, and shadow-sampling-load.test.ts (Task #190) suites spin up the real Express app and/or open a per-test PostgreSQL schema (via the options=-c search_path=... URL parameter). They require DATABASE_URL to be set and skip themselves cleanly otherwise on dev machines without a local Postgres. In CI / the test gate they must not skip — see "CI requirements" below.

Auth model under test. The api-server is a stateless Bearer-token service: /api/auth/login and /api/auth/register return a { access_token, token_type: "Bearer" } JSON body, and protected routes read Authorization: Bearer <token>. There is no server-side session table, so /api/auth/logout is intentionally a stateless 204 — it exists for client symmetry, and the client is responsible for dropping the token. The auth-flow.test.ts contract reflects this design: it asserts token issue → /api/auth/me round-trip with the bearer header → admin role-gate → stateless logout. If the auth model is ever changed to cookie/session, both the routes and the tests must be updated together (tracked under the auth follow-up).

research-engine (Python)

pnpm --filter @workspace/research-engine run test
pnpm --filter @workspace/research-engine run test:coverage

Coverage HTML lands in coverage/research-engine/.

doatlas-web (Playwright)

Mocked, hermetic suite (default):

pnpm --filter @workspace/doatlas-web exec playwright test

Live-stack suite (against the running api-server + research-engine + Postgres in this Replit env, or any deployment behind E2E_LIVE_BASE_URL):

export E2E_LIVE=1
export E2E_LIVE_ADMIN_USERNAME=...   # pre-seeded admin
export E2E_LIVE_ADMIN_PASSWORD=...
pnpm --filter @workspace/doatlas-web exec playwright test \
  --config=playwright.live.config.ts

The live-stack model is pinned to Anthropic Sonnet via the replit-integration provider so latency / cost stay predictable.

Workflow gates in the Replit environment

Two workflows enforce the test gate inside this Repl:

agent-loop-tests — runs pnpm --filter @workspace/api-server test on every workflow restart. Flagged as a validation step.
Provider Smoke Check — runs pnpm run smoke:providers against the live LLM adapters. Treated as informational because outcomes depend on third-party availability and credential coverage.

The dataset bootstrap workflow (Dataset Download) is wrapped by scripts/dataset-download.sh, which short-circuits when the DuckDB snapshot has already been extracted under artifacts/research-engine/data/snapshots/SNAP:*.duckdb. Likewise the research-engine artifact's dev command is wrapped by scripts/start-research-engine.sh, which no-ops (and sleep infinitys) when a healthy instance is already serving on port 8011 — this stops the workflow manager from flapping FAILED → RUNNING during double-starts.

CI requirements

scripts/test-gate.sh is the single entry point for the pre-merge gate (invoked by the Test Gate workflow in .replit and by any external CI runner that wraps pnpm run test:gate). It enforces two non-obvious environmental contracts that future contributors must preserve when refactoring the gate or porting it to a new runner:

DATABASE_URL must be set. The gate hard-fails (exit 65) if DATABASE_URL is unset, because several api-server suites — the auth flow, research-bridge HMAC contract, shadow-budget integration, and especially the shadow A/B sampling load test (shadow-sampling-load.test.ts, Task #190) — self-skip without a Postgres DSN. Without this gate they would silently disappear in CI, masking regressions in the shadow fire-and-forget path. Pre-merge runs must therefore provision a Postgres (any per-job ephemeral instance is fine; each test creates its own search_path schema).
TEST_GATE=1 is exported by the gate. Tests that self-skip on missing infrastructure should treat TEST_GATE=1 (or the standard CI=1) as "no skipping allowed" and turn the skip into a hard failure. The shadow-sampling load test already does this. New Postgres-dependent suites should follow the same pattern so the gate cannot silently lose coverage.

In other words: if you are running the gate from a fresh runner, set DATABASE_URL first. If you are adding a new test that needs infrastructure, gate the skip on !process.env.TEST_GATE && !process.env.CI.

Coverage

Pass --coverage to scripts/test-gate.sh (or invoke the per-package test:coverage scripts directly). Reports are written to:

coverage/api-server/index.html (c8, HTML)
coverage/research-engine/index.html (pytest-cov, HTML)

The coverage targets are deliberately not enforced as gates — they exist so PR authors can spot regressions in the modules they touched.

What is intentionally NOT in the gate

Provider smoke check — surfaces 4xx/5xx from upstream model providers as a warning, never a hard failure (transient network flake otherwise breaks every CI run).
E2E_LIVE Playwright — needs network egress, real LLM calls, and an admin login. Run on demand or in a dedicated nightly job.

Troubleshooting

column "..." does not exist from the auth tests — your test fixture is out of sync with lib/db/src/schema/. Add the missing column to the CREATE TABLE blocks at the top of auth-flow.test.ts.
Address already in use on port 8011 — another python3 -m app.server is still running. The wrapper script will detect a healthy instance and skip; if it's a crashed instance holding the port, pkill -f "app.server" and restart.
Playwright webServer timeout — the mocked suite spawns its own Vite preview on E2E_PORT (default 5179). Make sure nothing else is bound to it.