Chaeyoon Claude Opus 4.8 commited on
Commit ·
d43213b
1
Parent(s): 84981a4
Declutter Claude tooling: remove eval skill, settings.local, changelog, post-edit hook
Browse files- Delete .claude/skills/run-evaluation/ and .claude/settings.local.json
- Delete CHANGELOG.md
- Remove the PostToolUse hook script (scripts/post_edit_check.py) and unwire it from
.claude/settings.json. It is Claude-only dev tooling: not imported by any product code,
not needed to build/run, and doesn't belong in the src/ package.
Kept .gitkeep: data/ and output/ are gitignored, so these are the only thing keeping those
RAP structure directories present in a fresh clone (created at runtime otherwise).
Verified: ruff clean; 23 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- .claude/settings.json +0 -13
- .claude/skills/run-evaluation/SKILL.md +0 -24
- CHANGELOG.md +0 -25
- scripts/post_edit_check.py +0 -59
.claude/settings.json
CHANGED
|
@@ -23,18 +23,5 @@
|
|
| 23 |
"Write(data/raw/**)",
|
| 24 |
"Edit(data/raw/**)"
|
| 25 |
]
|
| 26 |
-
},
|
| 27 |
-
"hooks": {
|
| 28 |
-
"PostToolUse": [
|
| 29 |
-
{
|
| 30 |
-
"matcher": "Write|Edit",
|
| 31 |
-
"hooks": [
|
| 32 |
-
{
|
| 33 |
-
"type": "command",
|
| 34 |
-
"command": "python scripts/post_edit_check.py"
|
| 35 |
-
}
|
| 36 |
-
]
|
| 37 |
-
}
|
| 38 |
-
]
|
| 39 |
}
|
| 40 |
}
|
|
|
|
| 23 |
"Write(data/raw/**)",
|
| 24 |
"Edit(data/raw/**)"
|
| 25 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
}
|
| 27 |
}
|
.claude/skills/run-evaluation/SKILL.md
DELETED
|
@@ -1,24 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
name: run-evaluation
|
| 3 |
-
description: How to run the NoteGuard evaluation (detection P/R/F1 + residual leakage) and read it
|
| 4 |
-
---
|
| 5 |
-
# Running the evaluation
|
| 6 |
-
|
| 7 |
-
The eval is the project's pass/fail signal — it proves sanitisation actually removes PII, with numbers.
|
| 8 |
-
|
| 9 |
-
1. Data: either `NOTEGUARD_DATA_DIR=<folder with the 3 CSVs>` (offline) or let it auto-download from HF.
|
| 10 |
-
2. Run `python tests/run_eval.py --compare --limit 300` (use a larger `--limit` for the headline;
|
| 11 |
-
`--method pseudonym` to measure leakage under pseudonymisation). Writes `output/results.json`.
|
| 12 |
-
3. It joins each note to its patient/admission record (the EVAL-ONLY oracle) to get ground truth, then
|
| 13 |
-
reports, per detector:
|
| 14 |
-
- **detection P / R / F1** per entity type (precision is a conservative lower bound — removing PII
|
| 15 |
-
that isn't in the tables, e.g. clinician names, counts as a false positive).
|
| 16 |
-
- **residual leakage** = known identifiers still present after sanitisation. This is the headline.
|
| 17 |
-
|
| 18 |
-
## How to read it
|
| 19 |
-
- `--compare` prints two rows: **rules** → **presidio+rules** (the shipping detector). The leakage
|
| 20 |
-
should drop sharply between them.
|
| 21 |
-
- Watch residual leakage as the headline. If it regresses after a change to `src/recognisers.py`,
|
| 22 |
-
`detect.py`, or `transform.py`, fix it before continuing.
|
| 23 |
-
|
| 24 |
-
Log anything that didn't work in `experiments/FAILED.md`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CHANGELOG.md
DELETED
|
@@ -1,25 +0,0 @@
|
|
| 1 |
-
# Changelog
|
| 2 |
-
|
| 3 |
-
All notable changes are documented here. Format follows [Keep a Changelog](https://keepachangelog.com);
|
| 4 |
-
the project uses [semantic versioning](https://semver.org).
|
| 5 |
-
|
| 6 |
-
## [1.0.0] — 2026-06-20
|
| 7 |
-
|
| 8 |
-
Gold-RAP restructure ("analysis as a product").
|
| 9 |
-
|
| 10 |
-
### Added
|
| 11 |
-
- Standard RAP directory layout: `src/` (package), `tests/` (unit tests + `run_eval.py`),
|
| 12 |
-
`docs/`, `data/` (inputs), `output/` (generated artifacts).
|
| 13 |
-
- `pyproject.toml` — the project is now pip-installable (`pip install -e .`).
|
| 14 |
-
- Continuous integration (`.github/workflows/ci.yml`) running `ruff` + `pytest` on every push/PR.
|
| 15 |
-
- `ruff` lint configuration and `logging` in the evaluation entry point.
|
| 16 |
-
- This changelog.
|
| 17 |
-
|
| 18 |
-
### Changed
|
| 19 |
-
- Renamed the package `noteguard/` → `src/`; all imports updated to `src.*`.
|
| 20 |
-
- Generated outputs (metrics, two-Trust artifacts) now write to `output/` instead of `data/out/`.
|
| 21 |
-
- `run_eval.py` moved under `tests/` as the evaluation entry point.
|
| 22 |
-
|
| 23 |
-
### Removed
|
| 24 |
-
- Decluttered `experiments/` (failure log moved to `docs/failed_experiments.md`).
|
| 25 |
-
- Removed the committed `results.json` artifact (now regenerated into `output/`, gitignored).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scripts/post_edit_check.py
DELETED
|
@@ -1,59 +0,0 @@
|
|
| 1 |
-
"""PostToolUse hook: re-run the de-identification tests after edits to the scrubbing logic.
|
| 2 |
-
|
| 3 |
-
Wired in `.claude/settings.json`. The team guide's lesson is "verify everything" — recognisers and
|
| 4 |
-
anonymisation operators are exactly where a silent regression would let PII leak, so we re-check them
|
| 5 |
-
on every edit.
|
| 6 |
-
|
| 7 |
-
Safe by design:
|
| 8 |
-
- Only acts on edits to the scrubbing modules in `src/`; otherwise exits 0 silently.
|
| 9 |
-
- Gated behind the `PII_ENABLE_HOOK=1` env var so it never disrupts an unrelated session or a
|
| 10 |
-
half-installed environment. Turn it on once the venv + tests exist.
|
| 11 |
-
- Exits 0 (never blocks) if the venv or pytest is missing.
|
| 12 |
-
"""
|
| 13 |
-
from __future__ import annotations
|
| 14 |
-
|
| 15 |
-
import json
|
| 16 |
-
import os
|
| 17 |
-
import subprocess
|
| 18 |
-
import sys
|
| 19 |
-
from pathlib import Path
|
| 20 |
-
|
| 21 |
-
REPO = Path(__file__).resolve().parent.parent
|
| 22 |
-
WATCHED = ("src/recognisers.py", "src/detect.py", "src/transform.py")
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
def main() -> int:
|
| 26 |
-
if os.environ.get("PII_ENABLE_HOOK") != "1":
|
| 27 |
-
return 0
|
| 28 |
-
try:
|
| 29 |
-
payload = json.load(sys.stdin)
|
| 30 |
-
except Exception:
|
| 31 |
-
return 0
|
| 32 |
-
|
| 33 |
-
file_path = (payload.get("tool_input") or {}).get("file_path", "")
|
| 34 |
-
rel = file_path.replace("\\", "/")
|
| 35 |
-
if not any(w in rel for w in WATCHED):
|
| 36 |
-
return 0
|
| 37 |
-
|
| 38 |
-
venv_python = REPO / ".venv" / "Scripts" / "python.exe"
|
| 39 |
-
python = str(venv_python) if venv_python.exists() else sys.executable
|
| 40 |
-
tests_dir = REPO / "tests"
|
| 41 |
-
if not tests_dir.exists():
|
| 42 |
-
return 0
|
| 43 |
-
|
| 44 |
-
result = subprocess.run(
|
| 45 |
-
[python, "-m", "pytest", str(tests_dir), "-q"],
|
| 46 |
-
cwd=str(REPO),
|
| 47 |
-
capture_output=True,
|
| 48 |
-
text=True,
|
| 49 |
-
)
|
| 50 |
-
if result.returncode != 0:
|
| 51 |
-
# Surface failures to Claude via stderr; exit 2 asks the model to address them.
|
| 52 |
-
sys.stderr.write("Leakage tests FAILED after edit — PII may be leaking:\n")
|
| 53 |
-
sys.stderr.write(result.stdout[-2000:] + "\n" + result.stderr[-1000:])
|
| 54 |
-
return 2
|
| 55 |
-
return 0
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
if __name__ == "__main__":
|
| 59 |
-
sys.exit(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|