| # DoAtlas Testing Guide |
|
|
| This document describes the test layers, how to run them locally, and the |
| production-grade gating strategy for the DoAtlas monorepo. |
|
|
| ## TL;DR |
|
|
| ```bash |
| # Fast, hermetic β runs in CI on every push |
| bash scripts/test-gate.sh |
| |
| # Same battery + coverage reports under coverage/ |
| bash scripts/test-gate.sh --coverage |
| |
| # Add the live-stack Playwright suite (requires the running stack + |
| # E2E_LIVE_ADMIN_USERNAME / E2E_LIVE_ADMIN_PASSWORD secrets) |
| E2E_LIVE=1 bash scripts/test-gate.sh |
| ``` |
|
|
| ## Layers |
|
|
| | Layer | Where | What it covers | Runtime | |
| | --- | --- | --- | --- | |
| | Node unit/integration | `artifacts/api-server/src/**/__tests__/*.test.ts` | Express routes, drug-store queries, agent supervisor, codex CLI helpers, latency-alert, HMAC research bridge, full register/login/role-gate flow | ~25 s | |
| | Memory unit + integration | `artifacts/api-server/src/lib/__tests__/memory*.test.mjs` | Long-term memory store contract | ~5 s | |
| | Python pytest | `artifacts/research-engine/tests/*.py` | Engine HMAC contract, drug network store, networks roundtrip, smoke pipelines | ~10 s | |
| | Playwright (mocked) | `artifacts/doatlas-web/tests/e2e/` | UI flows against a mocked API | ~30 s | |
| | Playwright (live) | `artifacts/doatlas-web/tests/e2e-live/` | End-to-end against the **already-running** stack β opt-in only. **Coverage is currently partial:** only `auth-flow.spec.ts` is fully implemented; `tool-using-chat`, `target-discovery`, `interrupt-and-refresh`, `admin-panels`, and `logout` are scaffolded as `test.fixme` and tracked under follow-up #144. | minutes | |
|
|
| ## Running individual layers |
|
|
| ### api-server (Node) |
|
|
| ```bash |
| pnpm --filter @workspace/api-server run test # full suite |
| pnpm --filter @workspace/api-server run test:coverage # under c8 |
| pnpm --filter @workspace/api-server run test:agent # focused |
| ``` |
|
|
| The `auth-flow.test.ts`, `research-bridge.test.ts`, |
| `shadow-budget.integration.test.ts`, and |
| `shadow-sampling-load.test.ts` (Task #190) suites spin up the real |
| Express app and/or open a per-test PostgreSQL schema (via the |
| `options=-c search_path=...` URL parameter). They require |
| `DATABASE_URL` to be set and skip themselves cleanly otherwise on dev |
| machines without a local Postgres. **In CI / the test gate they must |
| not skip** β see "CI requirements" below. |
|
|
| **Auth model under test.** The api-server is a stateless Bearer-token |
| service: `/api/auth/login` and `/api/auth/register` return a |
| `{ access_token, token_type: "Bearer" }` JSON body, and protected routes |
| read `Authorization: Bearer <token>`. There is no server-side session |
| table, so `/api/auth/logout` is intentionally a stateless `204` β it |
| exists for client symmetry, and the client is responsible for dropping |
| the token. The `auth-flow.test.ts` contract reflects this design: it |
| asserts token issue β `/api/auth/me` round-trip with the bearer header |
| β admin role-gate β stateless logout. If the auth model is ever changed |
| to cookie/session, both the routes and the tests must be updated |
| together (tracked under the auth follow-up). |
|
|
| ### research-engine (Python) |
|
|
| ```bash |
| pnpm --filter @workspace/research-engine run test |
| pnpm --filter @workspace/research-engine run test:coverage |
| ``` |
|
|
| Coverage HTML lands in `coverage/research-engine/`. |
|
|
| ### doatlas-web (Playwright) |
|
|
| Mocked, hermetic suite (default): |
|
|
| ```bash |
| pnpm --filter @workspace/doatlas-web exec playwright test |
| ``` |
|
|
| Live-stack suite (against the running api-server + research-engine + |
| Postgres in this Replit env, or any deployment behind |
| `E2E_LIVE_BASE_URL`): |
|
|
| ```bash |
| export E2E_LIVE=1 |
| export E2E_LIVE_ADMIN_USERNAME=... # pre-seeded admin |
| export E2E_LIVE_ADMIN_PASSWORD=... |
| pnpm --filter @workspace/doatlas-web exec playwright test \ |
| --config=playwright.live.config.ts |
| ``` |
|
|
| The live-stack model is pinned to Anthropic Sonnet via the |
| `replit-integration` provider so latency / cost stay predictable. |
|
|
| ## Workflow gates in the Replit environment |
|
|
| Two workflows enforce the test gate inside this Repl: |
|
|
| - **`agent-loop-tests`** β runs `pnpm --filter @workspace/api-server test` |
| on every workflow restart. Flagged as a validation step. |
| - **`Provider Smoke Check`** β runs `pnpm run smoke:providers` against |
| the live LLM adapters. Treated as informational because outcomes |
| depend on third-party availability and credential coverage. |
|
|
| The dataset bootstrap workflow (`Dataset Download`) is wrapped by |
| `scripts/dataset-download.sh`, which short-circuits when the DuckDB |
| snapshot has already been extracted under |
| `artifacts/research-engine/data/snapshots/SNAP:*.duckdb`. Likewise the |
| research-engine artifact's dev command is wrapped by |
| `scripts/start-research-engine.sh`, which no-ops (and `sleep infinity`s) |
| when a healthy instance is already serving on port 8011 β this stops |
| the workflow manager from flapping FAILED β RUNNING during double-starts. |
|
|
| ## CI requirements |
|
|
| `scripts/test-gate.sh` is the single entry point for the pre-merge |
| gate (invoked by the `Test Gate` workflow in `.replit` and by any |
| external CI runner that wraps `pnpm run test:gate`). It enforces two |
| non-obvious environmental contracts that future contributors must |
| preserve when refactoring the gate or porting it to a new runner: |
|
|
| - **`DATABASE_URL` must be set.** The gate hard-fails (exit 65) if |
| `DATABASE_URL` is unset, because several api-server suites β the |
| auth flow, research-bridge HMAC contract, shadow-budget |
| integration, and especially the **shadow A/B sampling load test** |
| (`shadow-sampling-load.test.ts`, Task #190) β self-skip without a |
| Postgres DSN. Without this gate they would silently disappear in |
| CI, masking regressions in the shadow fire-and-forget path. |
| Pre-merge runs must therefore provision a Postgres (any per-job |
| ephemeral instance is fine; each test creates its own |
| `search_path` schema). |
|
|
| - **`TEST_GATE=1` is exported by the gate.** Tests that self-skip on |
| missing infrastructure should treat `TEST_GATE=1` (or the standard |
| `CI=1`) as "no skipping allowed" and turn the skip into a hard |
| failure. The shadow-sampling load test already does this. New |
| Postgres-dependent suites should follow the same pattern so the |
| gate cannot silently lose coverage. |
| |
| In other words: if you are running the gate from a fresh runner, set |
| `DATABASE_URL` first. If you are adding a new test that needs |
| infrastructure, gate the skip on `!process.env.TEST_GATE && !process.env.CI`. |
| |
| ## Coverage |
| |
| Pass `--coverage` to `scripts/test-gate.sh` (or invoke the per-package |
| `test:coverage` scripts directly). Reports are written to: |
| |
| - `coverage/api-server/index.html` (c8, HTML) |
| - `coverage/research-engine/index.html` (pytest-cov, HTML) |
| |
| The coverage targets are deliberately not enforced as gates β they |
| exist so PR authors can spot regressions in the modules they touched. |
| |
| ## What is intentionally NOT in the gate |
| |
| - **Provider smoke check** β surfaces 4xx/5xx from upstream model |
| providers as a warning, never a hard failure (transient network |
| flake otherwise breaks every CI run). |
| - **`E2E_LIVE` Playwright** β needs network egress, real LLM calls, |
| and an admin login. Run on demand or in a dedicated nightly job. |
| |
| ## Troubleshooting |
| |
| - *`column "..." does not exist` from the auth tests* β your test |
| fixture is out of sync with `lib/db/src/schema/`. Add the missing |
| column to the `CREATE TABLE` blocks at the top of |
| `auth-flow.test.ts`. |
| - *`Address already in use` on port 8011* β another `python3 -m |
| app.server` is still running. The wrapper script will detect a |
| healthy instance and skip; if it's a *crashed* instance holding the |
| port, `pkill -f "app.server"` and restart. |
| - *Playwright `webServer` timeout* β the mocked suite spawns its own |
| Vite preview on `E2E_PORT` (default 5179). Make sure nothing else is |
| bound to it. |
| |