name: Audit every new module 3x from different angles
description: >-
User-stated policy — every new module/script must be audited 3x from different
angles before declaring done; never claim "tested" after a single happy-path
smoke
type: feedback
originSessionId: 4037f43b-2133-46c6-84bd-02f7d454ec8b
Every new module, script, or non-trivial code change MUST be audited 3 times from different angles before declaring it done. A single happy-path smoke test is NOT enough — past audits have caught critical bugs (e.g. v7 max_length=4096 silently masked 94.5% of training rows; Track 1 was a soft GT-leak that a single happy-path test would have missed).
Why: "Smoke-tested" without explicit angle-diversity has historically meant "ran the easy case once" — which misses pre-existing bugs (max_length truncation), distribution-mismatch concerns (oracle vs deployable), invariant violations (row count drift, GT leak), and edge-case crashes.
How to apply: Before reporting a new module/script as done, run AT LEAST 3 audits hitting different concerns:
- Happy path / output structure — Does it produce the expected output format on a representative input? Field types, counts, alignment.
- Edge case / failure mode — Empty input, malformed JSON, missing keys, max_length boundaries, wrong-mode flags, mutex violations.
- Invariant / safety check — Row count preserved? No GT leak? IDs aligned with upstream/downstream? Idempotency? Train/test consistency?
For trainers specifically also check: token-length distribution post-injection, supervision masking actually supervises non-zero tokens, gold conditions populated, no silent truncation losing the supervised span.
Document audit results inline (each angle + pass/fail evidence) when committing. If a check fails, the fix is itself a new module that must be audited 3x.
Lesson (2026-05-02): I audited path-string construction for the
build_motif_cache.py rename in isolation (Path.with_suffix vs
str-concat) and pronounced it correct. The actual run still crashed
because np.save(path_str, arr) auto-APPENDS .npy when the path
doesn't end in .npy — the rename target then doesn't exist.
Unit-testing the parts I changed missed the integration with stdlib
behavior. Going forward, every new module's audit MUST include an
end-to-end run on a small input + read-back of the produced output,
not just isolated unit tests of the lines I changed.