explcre
/

phase8_rl

Model card Files Files and versions

xet

Community

explcre commited on May 7

Commit

c561f0b

verified ·

1 Parent(s): f3bb917

Upload _claude_memory/feedback_audit_3x_rule.md with huggingface_hub

Browse files

Files changed (1) hide show

_claude_memory/feedback_audit_3x_rule.md +29 -0

_claude_memory/feedback_audit_3x_rule.md ADDED Viewed

	@@ -0,0 +1,29 @@

+---
+name: Audit every new module 3x from different angles
+description: User-stated policy — every new module/script must be audited 3x from different angles before declaring done; never claim "tested" after a single happy-path smoke
+type: feedback
+originSessionId: 4037f43b-2133-46c6-84bd-02f7d454ec8b
+---
+Every new module, script, or non-trivial code change MUST be audited 3 times from **different angles** before declaring it done. A single happy-path smoke test is NOT enough — past audits have caught critical bugs (e.g. v7 max_length=4096 silently masked 94.5% of training rows; Track 1 was a soft GT-leak that a single happy-path test would have missed).
+**Why:** "Smoke-tested" without explicit angle-diversity has historically meant "ran the easy case once" — which misses pre-existing bugs (max_length truncation), distribution-mismatch concerns (oracle vs deployable), invariant violations (row count drift, GT leak), and edge-case crashes.
+**How to apply:** Before reporting a new module/script as done, run AT LEAST 3 audits hitting different concerns:
+1. **Happy path / output structure** — Does it produce the expected output format on a representative input? Field types, counts, alignment.
+2. **Edge case / failure mode** — Empty input, malformed JSON, missing keys, max_length boundaries, wrong-mode flags, mutex violations.
+3. **Invariant / safety check** — Row count preserved? No GT leak? IDs aligned with upstream/downstream? Idempotency? Train/test consistency?
+For trainers specifically also check: token-length distribution post-injection, supervision masking actually supervises non-zero tokens, gold conditions populated, no silent truncation losing the supervised span.
+Document audit results inline (each angle + pass/fail evidence) when committing. If a check fails, the fix is itself a new module that must be audited 3x.
+**Lesson (2026-05-02)**: I audited path-string construction for the
+build_motif_cache.py rename in isolation (Path.with_suffix vs
+str-concat) and pronounced it correct. The actual run still crashed
+because `np.save(path_str, arr)` auto-APPENDS `.npy` when the path
+doesn't end in `.npy` — the rename target then doesn't exist.
+**Unit-testing the parts I changed missed the integration with stdlib
+behavior.** Going forward, every new module's audit MUST include an
+end-to-end run on a small input + read-back of the produced output,
+not just isolated unit tests of the lines I changed.