Add files using upload-large-folder tool

2a55985 verified 27 days ago

7.78 kB

	# DoAtlas Testing Guide

	This document describes the test layers, how to run them locally, and the
	production-grade gating strategy for the DoAtlas monorepo.

	## TL;DR

	```bash
	# Fast, hermetic — runs in CI on every push
	bash scripts/test-gate.sh

	# Same battery + coverage reports under coverage/
	bash scripts/test-gate.sh --coverage

	# Add the live-stack Playwright suite (requires the running stack +
	# E2E_LIVE_ADMIN_USERNAME / E2E_LIVE_ADMIN_PASSWORD secrets)
	E2E_LIVE=1 bash scripts/test-gate.sh
	```

	## Layers

	\| Layer \| Where \| What it covers \| Runtime \|
	\| --- \| --- \| --- \| --- \|
	\| Node unit/integration \| `artifacts/api-server/src/*/__tests__/.test.ts` \| Express routes, drug-store queries, agent supervisor, codex CLI helpers, latency-alert, HMAC research bridge, full register/login/role-gate flow \| ~25 s \|
	\| Memory unit + integration \| `artifacts/api-server/src/lib/__tests__/memory*.test.mjs` \| Long-term memory store contract \| ~5 s \|
	\| Python pytest \| `artifacts/research-engine/tests/*.py` \| Engine HMAC contract, drug network store, networks roundtrip, smoke pipelines \| ~10 s \|
	\| Playwright (mocked) \| `artifacts/doatlas-web/tests/e2e/` \| UI flows against a mocked API \| ~30 s \|
	\| Playwright (live) \| `artifacts/doatlas-web/tests/e2e-live/` \| End-to-end against the already-running stack — opt-in only. Coverage is currently partial: only `auth-flow.spec.ts` is fully implemented; `tool-using-chat`, `target-discovery`, `interrupt-and-refresh`, `admin-panels`, and `logout` are scaffolded as `test.fixme` and tracked under follow-up #144. \| minutes \|

	## Running individual layers

	### api-server (Node)

	```bash
	pnpm --filter @workspace/api-server run test # full suite
	pnpm --filter @workspace/api-server run test:coverage # under c8
	pnpm --filter @workspace/api-server run test:agent # focused
	```

	The `auth-flow.test.ts`, `research-bridge.test.ts`,
	`shadow-budget.integration.test.ts`, and
	`shadow-sampling-load.test.ts` (Task #190) suites spin up the real
	Express app and/or open a per-test PostgreSQL schema (via the
	`options=-c search_path=...` URL parameter). They require
	`DATABASE_URL` to be set and skip themselves cleanly otherwise on dev
	machines without a local Postgres. **In CI / the test gate they must
	not skip** — see "CI requirements" below.

	Auth model under test. The api-server is a stateless Bearer-token
	service: `/api/auth/login` and `/api/auth/register` return a
	`{ access_token, token_type: "Bearer" }` JSON body, and protected routes
	read `Authorization: Bearer <token>`. There is no server-side session
	table, so `/api/auth/logout` is intentionally a stateless `204` — it
	exists for client symmetry, and the client is responsible for dropping
	the token. The `auth-flow.test.ts` contract reflects this design: it
	asserts token issue → `/api/auth/me` round-trip with the bearer header
	→ admin role-gate → stateless logout. If the auth model is ever changed
	to cookie/session, both the routes and the tests must be updated
	together (tracked under the auth follow-up).

	### research-engine (Python)

	```bash
	pnpm --filter @workspace/research-engine run test
	pnpm --filter @workspace/research-engine run test:coverage
	```

	Coverage HTML lands in `coverage/research-engine/`.

	### doatlas-web (Playwright)

	Mocked, hermetic suite (default):

	```bash
	pnpm --filter @workspace/doatlas-web exec playwright test
	```

	Live-stack suite (against the running api-server + research-engine +
	Postgres in this Replit env, or any deployment behind
	`E2E_LIVE_BASE_URL`):

	```bash
	export E2E_LIVE=1
	export E2E_LIVE_ADMIN_USERNAME=... # pre-seeded admin
	export E2E_LIVE_ADMIN_PASSWORD=...
	pnpm --filter @workspace/doatlas-web exec playwright test \
	--config=playwright.live.config.ts
	```

	The live-stack model is pinned to Anthropic Sonnet via the
	`replit-integration` provider so latency / cost stay predictable.

	## Workflow gates in the Replit environment

	Two workflows enforce the test gate inside this Repl:

	- `agent-loop-tests` — runs `pnpm --filter @workspace/api-server test`
	on every workflow restart. Flagged as a validation step.
	- `Provider Smoke Check` — runs `pnpm run smoke:providers` against
	the live LLM adapters. Treated as informational because outcomes
	depend on third-party availability and credential coverage.

	The dataset bootstrap workflow (`Dataset Download`) is wrapped by
	`scripts/dataset-download.sh`, which short-circuits when the DuckDB
	snapshot has already been extracted under
	`artifacts/research-engine/data/snapshots/SNAP:*.duckdb`. Likewise the
	research-engine artifact's dev command is wrapped by
	`scripts/start-research-engine.sh`, which no-ops (and `sleep infinity`s)
	when a healthy instance is already serving on port 8011 — this stops
	the workflow manager from flapping FAILED → RUNNING during double-starts.

	## CI requirements

	`scripts/test-gate.sh` is the single entry point for the pre-merge
	gate (invoked by the `Test Gate` workflow in `.replit` and by any
	external CI runner that wraps `pnpm run test:gate`). It enforces two
	non-obvious environmental contracts that future contributors must
	preserve when refactoring the gate or porting it to a new runner:

	- `DATABASE_URL` must be set. The gate hard-fails (exit 65) if
	`DATABASE_URL` is unset, because several api-server suites — the
	auth flow, research-bridge HMAC contract, shadow-budget
	integration, and especially the shadow A/B sampling load test
	(`shadow-sampling-load.test.ts`, Task #190) — self-skip without a
	Postgres DSN. Without this gate they would silently disappear in
	CI, masking regressions in the shadow fire-and-forget path.
	Pre-merge runs must therefore provision a Postgres (any per-job
	ephemeral instance is fine; each test creates its own
	`search_path` schema).

	- `TEST_GATE=1` is exported by the gate. Tests that self-skip on
	missing infrastructure should treat `TEST_GATE=1` (or the standard
	`CI=1`) as "no skipping allowed" and turn the skip into a hard
	failure. The shadow-sampling load test already does this. New
	Postgres-dependent suites should follow the same pattern so the
	gate cannot silently lose coverage.

	In other words: if you are running the gate from a fresh runner, set
	`DATABASE_URL` first. If you are adding a new test that needs
	infrastructure, gate the skip on `!process.env.TEST_GATE && !process.env.CI`.

	## Coverage

	Pass `--coverage` to `scripts/test-gate.sh` (or invoke the per-package
	`test:coverage` scripts directly). Reports are written to:

	- `coverage/api-server/index.html` (c8, HTML)
	- `coverage/research-engine/index.html` (pytest-cov, HTML)

	The coverage targets are deliberately not enforced as gates — they
	exist so PR authors can spot regressions in the modules they touched.

	## What is intentionally NOT in the gate

	- Provider smoke check — surfaces 4xx/5xx from upstream model
	providers as a warning, never a hard failure (transient network
	flake otherwise breaks every CI run).
	- `E2E_LIVE` Playwright — needs network egress, real LLM calls,
	and an admin login. Run on demand or in a dedicated nightly job.

	## Troubleshooting

	- `column "..." does not exist` from the auth tests — your test
	fixture is out of sync with `lib/db/src/schema/`. Add the missing
	column to the `CREATE TABLE` blocks at the top of
	`auth-flow.test.ts`.
	- `Address already in use` on port 8011 — another `python3 -m
	app.server` is still running. The wrapper script will detect a
	healthy instance and skip; if it's a crashed instance holding the
	port, `pkill -f "app.server"` and restart.
	- Playwright `webServer` timeout — the mocked suite spawns its own
	Vite preview on `E2E_PORT` (default 5179). Make sure nothing else is
	bound to it.