Spaces:

chaeyoona
/

noteguard

Running

Chaeyoon Claude Opus 4.8 commited on 15 days ago

Commit

d43213b

1 Parent(s): 84981a4

Declutter Claude tooling: remove eval skill, settings.local, changelog, post-edit hook

- Delete .claude/skills/run-evaluation/ and .claude/settings.local.json
- Delete CHANGELOG.md
- Remove the PostToolUse hook script (scripts/post_edit_check.py) and unwire it from
.claude/settings.json. It is Claude-only dev tooling: not imported by any product code,
not needed to build/run, and doesn't belong in the src/ package.

Kept .gitkeep: data/ and output/ are gitignored, so these are the only thing keeping those
RAP structure directories present in a fresh clone (created at runtime otherwise).

Verified: ruff clean; 23 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Files changed (4) hide show

.claude/settings.json +0 -13
.claude/skills/run-evaluation/SKILL.md +0 -24
CHANGELOG.md +0 -25
scripts/post_edit_check.py +0 -59

.claude/settings.json CHANGED Viewed

@@ -23,18 +23,5 @@
       "Write(data/raw/**)",
       "Edit(data/raw/**)"
     ]
-  },
-  "hooks": {
-    "PostToolUse": [
-      {
-        "matcher": "Write|Edit",
-        "hooks": [
-          {
-            "type": "command",
-            "command": "python scripts/post_edit_check.py"
-          }
-        ]
-      }
-    ]
   }
 }

       "Write(data/raw/**)",
       "Edit(data/raw/**)"
     ]
   }
 }

.claude/skills/run-evaluation/SKILL.md DELETED Viewed

@@ -1,24 +0,0 @@
----
-name: run-evaluation
-description: How to run the NoteGuard evaluation (detection P/R/F1 + residual leakage) and read it
----
-# Running the evaluation
-The eval is the project's pass/fail signal — it proves sanitisation actually removes PII, with numbers.
-1. Data: either `NOTEGUARD_DATA_DIR=<folder with the 3 CSVs>` (offline) or let it auto-download from HF.
-2. Run `python tests/run_eval.py --compare --limit 300` (use a larger `--limit` for the headline;
-   `--method pseudonym` to measure leakage under pseudonymisation). Writes `output/results.json`.
-3. It joins each note to its patient/admission record (the EVAL-ONLY oracle) to get ground truth, then
-   reports, per detector:
-   - **detection P / R / F1** per entity type (precision is a conservative lower bound — removing PII
-     that isn't in the tables, e.g. clinician names, counts as a false positive).
-   - **residual leakage** = known identifiers still present after sanitisation. This is the headline.
-## How to read it
-- `--compare` prints two rows: **rules** → **presidio+rules** (the shipping detector). The leakage
-  should drop sharply between them.
-- Watch residual leakage as the headline. If it regresses after a change to `src/recognisers.py`,
-  `detect.py`, or `transform.py`, fix it before continuing.
-Log anything that didn't work in `experiments/FAILED.md`.

CHANGELOG.md DELETED Viewed

@@ -1,25 +0,0 @@
-# Changelog
-All notable changes are documented here. Format follows [Keep a Changelog](https://keepachangelog.com);
-the project uses [semantic versioning](https://semver.org).
-## [1.0.0] — 2026-06-20
-Gold-RAP restructure ("analysis as a product").
-### Added
-- Standard RAP directory layout: `src/` (package), `tests/` (unit tests + `run_eval.py`),
-  `docs/`, `data/` (inputs), `output/` (generated artifacts).
-- `pyproject.toml` — the project is now pip-installable (`pip install -e .`).
-- Continuous integration (`.github/workflows/ci.yml`) running `ruff` + `pytest` on every push/PR.
-- `ruff` lint configuration and `logging` in the evaluation entry point.
-- This changelog.
-### Changed
-- Renamed the package `noteguard/` → `src/`; all imports updated to `src.*`.
-- Generated outputs (metrics, two-Trust artifacts) now write to `output/` instead of `data/out/`.
-- `run_eval.py` moved under `tests/` as the evaluation entry point.
-### Removed
-- Decluttered `experiments/` (failure log moved to `docs/failed_experiments.md`).
-- Removed the committed `results.json` artifact (now regenerated into `output/`, gitignored).

scripts/post_edit_check.py DELETED Viewed

@@ -1,59 +0,0 @@
-"""PostToolUse hook: re-run the de-identification tests after edits to the scrubbing logic.
-Wired in `.claude/settings.json`. The team guide's lesson is "verify everything" — recognisers and
-anonymisation operators are exactly where a silent regression would let PII leak, so we re-check them
-on every edit.
-Safe by design:
-- Only acts on edits to the scrubbing modules in `src/`; otherwise exits 0 silently.
-- Gated behind the `PII_ENABLE_HOOK=1` env var so it never disrupts an unrelated session or a
-  half-installed environment. Turn it on once the venv + tests exist.
-- Exits 0 (never blocks) if the venv or pytest is missing.
-"""
-from __future__ import annotations
-import json
-import os
-import subprocess
-import sys
-from pathlib import Path
-REPO = Path(__file__).resolve().parent.parent
-WATCHED = ("src/recognisers.py", "src/detect.py", "src/transform.py")
-def main() -> int:
-    if os.environ.get("PII_ENABLE_HOOK") != "1":
-        return 0
-    try:
-        payload = json.load(sys.stdin)
-    except Exception:
-        return 0
-    file_path = (payload.get("tool_input") or {}).get("file_path", "")
-    rel = file_path.replace("\\", "/")
-    if not any(w in rel for w in WATCHED):
-        return 0
-    venv_python = REPO / ".venv" / "Scripts" / "python.exe"
-    python = str(venv_python) if venv_python.exists() else sys.executable
-    tests_dir = REPO / "tests"
-    if not tests_dir.exists():
-        return 0
-    result = subprocess.run(
-        [python, "-m", "pytest", str(tests_dir), "-q"],
-        cwd=str(REPO),
-        capture_output=True,
-        text=True,
-    )
-    if result.returncode != 0:
-        # Surface failures to Claude via stderr; exit 2 asks the model to address them.
-        sys.stderr.write("Leakage tests FAILED after edit — PII may be leaking:\n")
-        sys.stderr.write(result.stdout[-2000:] + "\n" + result.stderr[-1000:])
-        return 2
-    return 0
-if __name__ == "__main__":
-    sys.exit(main())