Upload _claude_memory/feedback_audit_3x_rule.md with huggingface_hub
Browse files
_claude_memory/feedback_audit_3x_rule.md
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: Audit every new module 3x from different angles
|
| 3 |
+
description: User-stated policy — every new module/script must be audited 3x from different angles before declaring done; never claim "tested" after a single happy-path smoke
|
| 4 |
+
type: feedback
|
| 5 |
+
originSessionId: 4037f43b-2133-46c6-84bd-02f7d454ec8b
|
| 6 |
+
---
|
| 7 |
+
Every new module, script, or non-trivial code change MUST be audited 3 times from **different angles** before declaring it done. A single happy-path smoke test is NOT enough — past audits have caught critical bugs (e.g. v7 max_length=4096 silently masked 94.5% of training rows; Track 1 was a soft GT-leak that a single happy-path test would have missed).
|
| 8 |
+
|
| 9 |
+
**Why:** "Smoke-tested" without explicit angle-diversity has historically meant "ran the easy case once" — which misses pre-existing bugs (max_length truncation), distribution-mismatch concerns (oracle vs deployable), invariant violations (row count drift, GT leak), and edge-case crashes.
|
| 10 |
+
|
| 11 |
+
**How to apply:** Before reporting a new module/script as done, run AT LEAST 3 audits hitting different concerns:
|
| 12 |
+
|
| 13 |
+
1. **Happy path / output structure** — Does it produce the expected output format on a representative input? Field types, counts, alignment.
|
| 14 |
+
2. **Edge case / failure mode** — Empty input, malformed JSON, missing keys, max_length boundaries, wrong-mode flags, mutex violations.
|
| 15 |
+
3. **Invariant / safety check** — Row count preserved? No GT leak? IDs aligned with upstream/downstream? Idempotency? Train/test consistency?
|
| 16 |
+
|
| 17 |
+
For trainers specifically also check: token-length distribution post-injection, supervision masking actually supervises non-zero tokens, gold conditions populated, no silent truncation losing the supervised span.
|
| 18 |
+
|
| 19 |
+
Document audit results inline (each angle + pass/fail evidence) when committing. If a check fails, the fix is itself a new module that must be audited 3x.
|
| 20 |
+
|
| 21 |
+
**Lesson (2026-05-02)**: I audited path-string construction for the
|
| 22 |
+
build_motif_cache.py rename in isolation (Path.with_suffix vs
|
| 23 |
+
str-concat) and pronounced it correct. The actual run still crashed
|
| 24 |
+
because `np.save(path_str, arr)` auto-APPENDS `.npy` when the path
|
| 25 |
+
doesn't end in `.npy` — the rename target then doesn't exist.
|
| 26 |
+
**Unit-testing the parts I changed missed the integration with stdlib
|
| 27 |
+
behavior.** Going forward, every new module's audit MUST include an
|
| 28 |
+
end-to-end run on a small input + read-back of the produced output,
|
| 29 |
+
not just isolated unit tests of the lines I changed.
|