doatlas-2 / docs /TESTING.md
Iostream-Li's picture
Add files using upload-large-folder tool
2a55985 verified
# DoAtlas Testing Guide
This document describes the test layers, how to run them locally, and the
production-grade gating strategy for the DoAtlas monorepo.
## TL;DR
```bash
# Fast, hermetic β€” runs in CI on every push
bash scripts/test-gate.sh
# Same battery + coverage reports under coverage/
bash scripts/test-gate.sh --coverage
# Add the live-stack Playwright suite (requires the running stack +
# E2E_LIVE_ADMIN_USERNAME / E2E_LIVE_ADMIN_PASSWORD secrets)
E2E_LIVE=1 bash scripts/test-gate.sh
```
## Layers
| Layer | Where | What it covers | Runtime |
| --- | --- | --- | --- |
| Node unit/integration | `artifacts/api-server/src/**/__tests__/*.test.ts` | Express routes, drug-store queries, agent supervisor, codex CLI helpers, latency-alert, HMAC research bridge, full register/login/role-gate flow | ~25 s |
| Memory unit + integration | `artifacts/api-server/src/lib/__tests__/memory*.test.mjs` | Long-term memory store contract | ~5 s |
| Python pytest | `artifacts/research-engine/tests/*.py` | Engine HMAC contract, drug network store, networks roundtrip, smoke pipelines | ~10 s |
| Playwright (mocked) | `artifacts/doatlas-web/tests/e2e/` | UI flows against a mocked API | ~30 s |
| Playwright (live) | `artifacts/doatlas-web/tests/e2e-live/` | End-to-end against the **already-running** stack β€” opt-in only. **Coverage is currently partial:** only `auth-flow.spec.ts` is fully implemented; `tool-using-chat`, `target-discovery`, `interrupt-and-refresh`, `admin-panels`, and `logout` are scaffolded as `test.fixme` and tracked under follow-up #144. | minutes |
## Running individual layers
### api-server (Node)
```bash
pnpm --filter @workspace/api-server run test # full suite
pnpm --filter @workspace/api-server run test:coverage # under c8
pnpm --filter @workspace/api-server run test:agent # focused
```
The `auth-flow.test.ts`, `research-bridge.test.ts`,
`shadow-budget.integration.test.ts`, and
`shadow-sampling-load.test.ts` (Task #190) suites spin up the real
Express app and/or open a per-test PostgreSQL schema (via the
`options=-c search_path=...` URL parameter). They require
`DATABASE_URL` to be set and skip themselves cleanly otherwise on dev
machines without a local Postgres. **In CI / the test gate they must
not skip** β€” see "CI requirements" below.
**Auth model under test.** The api-server is a stateless Bearer-token
service: `/api/auth/login` and `/api/auth/register` return a
`{ access_token, token_type: "Bearer" }` JSON body, and protected routes
read `Authorization: Bearer <token>`. There is no server-side session
table, so `/api/auth/logout` is intentionally a stateless `204` β€” it
exists for client symmetry, and the client is responsible for dropping
the token. The `auth-flow.test.ts` contract reflects this design: it
asserts token issue β†’ `/api/auth/me` round-trip with the bearer header
β†’ admin role-gate β†’ stateless logout. If the auth model is ever changed
to cookie/session, both the routes and the tests must be updated
together (tracked under the auth follow-up).
### research-engine (Python)
```bash
pnpm --filter @workspace/research-engine run test
pnpm --filter @workspace/research-engine run test:coverage
```
Coverage HTML lands in `coverage/research-engine/`.
### doatlas-web (Playwright)
Mocked, hermetic suite (default):
```bash
pnpm --filter @workspace/doatlas-web exec playwright test
```
Live-stack suite (against the running api-server + research-engine +
Postgres in this Replit env, or any deployment behind
`E2E_LIVE_BASE_URL`):
```bash
export E2E_LIVE=1
export E2E_LIVE_ADMIN_USERNAME=... # pre-seeded admin
export E2E_LIVE_ADMIN_PASSWORD=...
pnpm --filter @workspace/doatlas-web exec playwright test \
--config=playwright.live.config.ts
```
The live-stack model is pinned to Anthropic Sonnet via the
`replit-integration` provider so latency / cost stay predictable.
## Workflow gates in the Replit environment
Two workflows enforce the test gate inside this Repl:
- **`agent-loop-tests`** β€” runs `pnpm --filter @workspace/api-server test`
on every workflow restart. Flagged as a validation step.
- **`Provider Smoke Check`** β€” runs `pnpm run smoke:providers` against
the live LLM adapters. Treated as informational because outcomes
depend on third-party availability and credential coverage.
The dataset bootstrap workflow (`Dataset Download`) is wrapped by
`scripts/dataset-download.sh`, which short-circuits when the DuckDB
snapshot has already been extracted under
`artifacts/research-engine/data/snapshots/SNAP:*.duckdb`. Likewise the
research-engine artifact's dev command is wrapped by
`scripts/start-research-engine.sh`, which no-ops (and `sleep infinity`s)
when a healthy instance is already serving on port 8011 β€” this stops
the workflow manager from flapping FAILED β†’ RUNNING during double-starts.
## CI requirements
`scripts/test-gate.sh` is the single entry point for the pre-merge
gate (invoked by the `Test Gate` workflow in `.replit` and by any
external CI runner that wraps `pnpm run test:gate`). It enforces two
non-obvious environmental contracts that future contributors must
preserve when refactoring the gate or porting it to a new runner:
- **`DATABASE_URL` must be set.** The gate hard-fails (exit 65) if
`DATABASE_URL` is unset, because several api-server suites β€” the
auth flow, research-bridge HMAC contract, shadow-budget
integration, and especially the **shadow A/B sampling load test**
(`shadow-sampling-load.test.ts`, Task #190) β€” self-skip without a
Postgres DSN. Without this gate they would silently disappear in
CI, masking regressions in the shadow fire-and-forget path.
Pre-merge runs must therefore provision a Postgres (any per-job
ephemeral instance is fine; each test creates its own
`search_path` schema).
- **`TEST_GATE=1` is exported by the gate.** Tests that self-skip on
missing infrastructure should treat `TEST_GATE=1` (or the standard
`CI=1`) as "no skipping allowed" and turn the skip into a hard
failure. The shadow-sampling load test already does this. New
Postgres-dependent suites should follow the same pattern so the
gate cannot silently lose coverage.
In other words: if you are running the gate from a fresh runner, set
`DATABASE_URL` first. If you are adding a new test that needs
infrastructure, gate the skip on `!process.env.TEST_GATE && !process.env.CI`.
## Coverage
Pass `--coverage` to `scripts/test-gate.sh` (or invoke the per-package
`test:coverage` scripts directly). Reports are written to:
- `coverage/api-server/index.html` (c8, HTML)
- `coverage/research-engine/index.html` (pytest-cov, HTML)
The coverage targets are deliberately not enforced as gates β€” they
exist so PR authors can spot regressions in the modules they touched.
## What is intentionally NOT in the gate
- **Provider smoke check** β€” surfaces 4xx/5xx from upstream model
providers as a warning, never a hard failure (transient network
flake otherwise breaks every CI run).
- **`E2E_LIVE` Playwright** β€” needs network egress, real LLM calls,
and an admin login. Run on demand or in a dedicated nightly job.
## Troubleshooting
- *`column "..." does not exist` from the auth tests* β€” your test
fixture is out of sync with `lib/db/src/schema/`. Add the missing
column to the `CREATE TABLE` blocks at the top of
`auth-flow.test.ts`.
- *`Address already in use` on port 8011* β€” another `python3 -m
app.server` is still running. The wrapper script will detect a
healthy instance and skip; if it's a *crashed* instance holding the
port, `pkill -f "app.server"` and restart.
- *Playwright `webServer` timeout* β€” the mocked suite spawns its own
Vite preview on `E2E_PORT` (default 5179). Make sure nothing else is
bound to it.