phase8_rl / _claude_memory /feedback_audit_3x_rule.md

Upload _claude_memory/feedback_audit_3x_rule.md with huggingface_hub

c561f0b verified 25 days ago

2.46 kB

name: Audit every new module 3x from different angles
description: >-
  User-stated policy — every new module/script must be audited 3x from different
  angles before declaring done; never claim "tested" after a single happy-path
  smoke
type: feedback
originSessionId: 4037f43b-2133-46c6-84bd-02f7d454ec8b

Every new module, script, or non-trivial code change MUST be audited 3 times from different angles before declaring it done. A single happy-path smoke test is NOT enough — past audits have caught critical bugs (e.g. v7 max_length=4096 silently masked 94.5% of training rows; Track 1 was a soft GT-leak that a single happy-path test would have missed).

Why: "Smoke-tested" without explicit angle-diversity has historically meant "ran the easy case once" — which misses pre-existing bugs (max_length truncation), distribution-mismatch concerns (oracle vs deployable), invariant violations (row count drift, GT leak), and edge-case crashes.

How to apply: Before reporting a new module/script as done, run AT LEAST 3 audits hitting different concerns:

Happy path / output structure — Does it produce the expected output format on a representative input? Field types, counts, alignment.
Edge case / failure mode — Empty input, malformed JSON, missing keys, max_length boundaries, wrong-mode flags, mutex violations.
Invariant / safety check — Row count preserved? No GT leak? IDs aligned with upstream/downstream? Idempotency? Train/test consistency?

For trainers specifically also check: token-length distribution post-injection, supervision masking actually supervises non-zero tokens, gold conditions populated, no silent truncation losing the supervised span.

Document audit results inline (each angle + pass/fail evidence) when committing. If a check fails, the fix is itself a new module that must be audited 3x.

Lesson (2026-05-02): I audited path-string construction for the build_motif_cache.py rename in isolation (Path.with_suffix vs str-concat) and pronounced it correct. The actual run still crashed because np.save(path_str, arr) auto-APPENDS .npy when the path doesn't end in .npy — the rename target then doesn't exist. Unit-testing the parts I changed missed the integration with stdlib behavior. Going forward, every new module's audit MUST include an end-to-end run on a small input + read-back of the produced output, not just isolated unit tests of the lines I changed.