Baladithya Balamurugan
Wave 21: deep-read critical review — 8 source clusters re-read, findings verified
2a16b30
|
Raw
History Blame Contribute Delete
25.3 kB

Grounding Map: Dataset-Generation Pipeline — What Exists vs What Is Claimed vs What Is Envisioned

Agent: REPO-GROUNDING
Date: 2026-06-09
Scope: composer_replication/datagen/*.py, teacher_replay.py, trainer/composer_trainer.py, loss.py, hint_generator.py, docs/adrs/ADR-010 + ADR-002 + ADR-013, research/design-F1..F5, research/notes/final_report_socratic-mcts-swe-worldmodel-8f6dea.md, docs/COMPOSER_RECIPE_MAPPING.md, docs/BACKLOG_RESOLUTION_2026-06-09.md


(1) Exact Current Dataset-Generation Capability

FeatureDeletionTask schema (datagen/schema.py)

Six load-bearing fields and what produces each today:

Field Type Producer today Notes
task_id str SweBenchAdapter.to_task() — copied from instance["instance_id"] or instance["task_id"] "unknown" if missing
repo str instance["repo"] via SweBenchAdapter.to_task() e.g. "getmoto/moto"
base_commit str instance["base_commit"] no code to git checkout this commit exists today
broken_image str SweBenchAdapter.image_for(instance) — either instance["docker_image"] (SWE-rebench) or the conventional swebench/sweb.eval.x86_64.{iid}:latest This tag is a pre-built SWE-bench eval image; no code in the repo pulls or builds these images
fail_to_pass tuple[str,...] _as_tuple(instance["FAIL_TO_PASS"]) — handles JSON-encoded string OR list validated non-empty in __post_init__
pass_to_pass tuple[str,...] _as_tuple(instance["PASS_TO_PASS"]) may be empty
test_command str SweBenchAdapter.default_test_command = "python -m pytest -q" hardcoded; not read from instance
deleted_symbols tuple[str,...] never populated by SweBenchAdapter — hardcoded () in every substrate inversion the monitor can't do symbol-provenance checks without this
golden_diff str instance["patch"] held out of repr; used only by validator
granularity str hardcoded "feature" in SweBenchAdapter.to_task() CREATE-half escalation (function→file→feature) not wired to anything
difficulty_prior float instance["difficulty"] if present (SWE-rebench) else 0.5
upstream_license str instance["license_name"] copyleft filter in is_redistributable() strips GPL/AGPL/LGPL

What SweBenchAdapter actually does and does NOT do

SweBenchAdapter.to_task(instance: dict) is a pure schema inversion — it takes one SWE-bench-shaped dict and maps it to a FeatureDeletionTask. It does NOT:

  • Pull or build a Docker image
  • Apply the gold patch in reverse (git apply -R)
  • Run any tests
  • Discover test node IDs
  • Populate deleted_symbols (always empty)
  • Escalate granularity beyond the static "feature"

The broken-repo Docker image is assumed to exist pre-built (the SWE-bench project publishes these images; SWE-rebench carries its own docker_image field). The full pipeline step "revert gold patch → scrub caches → freeze image" is the documented [~] gate in ADR-010 — implemented in concept (the 4-gate validator interface exists, scrub_tree is built, LocalSubprocessSandbox and DockerSandbox are built) but there is no code in the repo that actually clones a repo, runs git apply -R <gold_patch>, builds a Docker image, and pushes it to a registry.

What FeatureDeletionEnv does during training (datagen/env.py)

  • reset(task) — boots the sandbox (by image tag), returns a text prompt listing failing tests. The prompt exposes task.repo, task.fail_to_pass, task.test_command but NEVER golden_diff or deleted_symbols.
  • step(action) — delegates to sandbox.exec(action), returning observation text; grades on submit or turn limit.
  • _grade() — runs sandbox.run_tests(test_command, fail_to_pass + pass_to_pass), computes pass-fraction over fail_to_pass, gates to 0 if pass_to_pass guard is broken OR HackMonitor.flag() fires.
  • reward_fn(prompts, completions, *, task_id, **kwargs) — TRL RewardFunc face; dispatches through reset/step; feeds fractional credit (not binary) to DifficultyCurriculum.update.

Safeguards implemented

  • scrub_tree(workdir) — physically removes __pycache__, .mypy_cache, .pytest_cache, .git, .hg, *.pyc/.pyo/.class before episode start. This is the PRIMARY control (added in Wave 2; was absent before).
  • SANDBOX_DENYLIST — blocks find, strings, unzip, jar, javap, decompilers, git. First-token-only check; bypassable via sh -c "...". Documented as defense-in-depth, NOT the wall.
  • HackMonitor.flag() — layer 1: substring scan for cache/decompiler signatures in trajectory actions (not in submit_patch). Layer 2: patch-provenance — if a deleted symbol reappears verbatim in the patch AND the trajectory shows a cache/bytecode artifact being read (normalized to defeat "__py"+"cache__" obfuscation), flags the trajectory.
  • DockerSandboxnetwork_mode='none', read_only=True, cap_drop=['ALL'], no-new-privileges, pids_limit=256, mem_limit=1g, optional gVisor runtime='runsc'.

What ingestion/claude_code.py can ingest today

ClaudeCodeIngester.ingest(path: Path) -> Iterator[TraceState]:

  • Input: Claude Code session JSONL at ~/.claude/projects/<encoded>/<sessionId>.jsonl
  • Output: one TraceState per assistant TURN (state_id, messages, student_action)
  • Skips: subagent files (agent- prefix), sidechain records (isSidechain: True), summary / attachment / queue-operation / file-history-snapshot records
  • student_action: JSON-serialized list of text + tool_use + thinking blocks (thinking KEPT in student_action, STRIPPED from teacher-facing messages if strip_thinking=True)
  • tool_error flag: structurally set on user messages where any tool_result block has is_error: true — this is the SDPO error-site detection signal
  • state_id: f"{path.stem}::{state_idx:04d}"
  • Does NOT handle: OpenHands traces, SWE-smith trajectories, any format other than Claude Code JSONL

(2) Envisioned Pipeline End-to-End (S3 Contract Prefixes, Tree Controller, Outer Loop)

From research/design-F1-systems-framing.md, research/design-F2-aws-datagen.md, and research/notes/final_report_socratic-mcts-swe-worldmodel-8f6dea.md §5/§8/§10.

  1. Seed trace ingestion (Stage a): ClaudeCodeIngester.ingest() over s3://composer-datagen-386931836011-us-west-2/raw/claude_code/**/*.jsonl → Parquet at traces/v1/run_id=<id>/part-*.parquet via AWS Glue 5.0 Spark ETL job (glue_ingest_job.py, ~80 LOC, NOT YET BUILT).
  2. Schema inversion (Stage c1): SweBenchAdapter.to_task() per SWE-bench row → FeatureDeletionTask JSONL at tasks/v1/run_id=<id>/manifest.jsonl (one task per line, array index = line number). Pure CPU; runs inside the Glue job or a Lambda. License gate (is_redistributable()) applied here.
  3. N-teacher replay (Stage b): teacher_replay.replay_trace() generalized from flat OpenRouter to BedrockBatchTeacherPool — write one shared replay/v1/run_id=<id>/input/states.jsonl, submit one CreateModelInvocationJob per teacher, write .jsonl.out per teacher to replay/v1/.../teacher=<slug>/. An EMR Serverless aggregation step joins all N outputs by state_idlist[TeacherCallResult]. (teacher_replay_bedrock.py, ~180 LOC, NOT YET BUILT).
  4. Multi-model tree expansion (the core delta — NOT BUILT): A tree_controller.py (~250–350 LOC, design-only) that, for each TraceState node, fires N models, applies each candidate action through FeatureDeletionEnv.step() to get a real next observation, branches again from the new state, grades leaves with _grade(). Expansion is gated on pre-expansion divergence between sibling next-action distributions (to avoid O(N^D) explosion). Emits six typed S3 prefixes (see step 8).
  5. Sandbox materialization + 4-gate validation (Stage c2): AWS Batch array jobs on EC2 Spot, one child per task. Each child reads AWS_BATCH_JOB_ARRAY_INDEX, looks up its task in the S3 manifest, boots DockerSandbox/LocalSubprocessSandbox, runs validator.validate_task() (4 gates), writes task_grades/v1/run_id=<id>/<task_id>.json. (datagen/aws/batch_validate.py, ~120 LOC, NOT YET BUILT).
  6. DPO pair extraction + normalization (Stage d): extract_dpo_pairs() (already built in teacher_replay.py) on the fan-in of teacher outputs → DPOPair rows → DJNormalizer data-juicer op-graph → EMR Serverless Spark for cross-partition dedup → corpus/v1/run_id=<id>/dpo/part-*.parquet and corpus/sft/part-*.parquet. (replaysim/emr_normalize_job.py, ~100 LOC, NOT YET BUILT).
  7. Orchestration: AWS Step Functions Standard Workflow: Ingest(Glue) → InvertSchema(Lambda) → [Bedrock batch ×N (Map)] → FanIn(EMR-Serverless) → ExtractDPO+SynthTasks → SandboxValidate(Batch array, .sync) → Normalize(EMR-Serverless) → WriteManifest(Lambda). (infra/datagen_stepfunctions.json + infra/datagen_stack.py, ~250 LOC IaC, NOT YET BUILT).
  8. S3 typed dataset contract (full set):
    • raw/claude_code/**/*.jsonl — input seed traces
    • traces/v1/run_id=<id>/part-*.parquet — TraceState rows (Stage a output)
    • tasks/v1/run_id=<id>/manifest.jsonl — FeatureDeletionTask rows (Stage c1 output)
    • tasks/golden/run_id=<id>/ — golden_diff ACL-isolated prefix (deny-by-default; NEVER co-located with policy-visible tasks/)
    • replay/v1/run_id=<id>/input/states.jsonl — shared Bedrock batch input
    • replay/v1/run_id=<id>/teacher=<slug>/*.jsonl.out — per-teacher Bedrock batch output
    • task_grades/v1/run_id=<id>/<task_id>.json — validator + _grade() results
    • corpus/v1/run_id=<id>/sft/part-*.parquet — clean winning trajectories (SFT-first floor)
    • corpus/v1/run_id=<id>/dpo/part-*.parquet — DPO pairs (normalized DPOPair)
    • dpo_pairs/ — divergence-derived DPO pairs from the tree (sibling winners vs losers)
    • rl_task_pool/ — FeatureDeletionTask registry + DifficultyCurriculum priors
    • divergence_pairs/ — divergence-annotated nodes (where sibling next-action distributions forked)
    • wm_tuples/ — (state, action, next_state, outcome) for ALL branches incl. failures (world-model training target)
    • holdout/ — disjoint held-out eval anchor (HeldoutSplit; NEVER fed back)
    • diloco/rendezvous/round_<NNNNNN>/rank_<RRRR>.pt — DiLoCo outer-sync (already used by existing allreduce.py)
    • manifests/run_id=<id>.json — run-level manifest (counts, cost, lineage, schema_version, parent_run_id for flywheel)
  9. SFT-first stage: Read sft_corpus/ (clean _grade() gate-1 passing trajectories), run compose_loss with alpha_sdpo=0, beta_replay=0 (reduces to _lm_response_ce — next-token CE masked to response tokens), write ckpt_sft/. (pipeline/sft_floor.py, ~60 LOC, NOT YET BUILT).
  10. Inner RL loop: ComposerReplicationTrainer (trl.GRPOTrainer subclass) on rl_task_pool/ with FeatureDeletionEnv.reward_fn; total = grpo + α·sdpo + β·trace_replay_dpo; DiLoCo outer-sync via S3; HeldOutGuard kill-switch now wired (Wave 3).
  11. Flywheel: Improved student generates next outer loop's seed traces; learned deliberation-confidence becomes the next round's divergence gate.

(3) Unbuilt Components the Vision Depends On

Every item below is design-only or a skeleton; none has real production code.

Component File Estimate Source Status
datagen/tree_controller.py — the core delta: env-step between branches, _grade() at leaves, divergence-gated expansion, six typed S3 prefix writes ~250–350 LOC design-F1, final_report §1/§5/§6 0% built — no file exists
SiblingBootstrapGenerator in hint_generator.py — select max-reward sibling → emit "a working approach looks like: …" → feed ctx_teacher splice ~60 LOC design-F5 Tier 1 / final_report §1/§6 0% built — not a class in hint_generator.py at all
pipeline/s3_layout.py — typed writers for all six S3 dataset prefixes; the OUTER→INNER contract ~80 LOC design-F1 §4 0% built — no pipeline/ directory exists
pipeline/sft_floor.py — SFT-first driver: read sft_corpus/, run TRL SFTTrainer or compose_loss alpha=beta=0, write ckpt_sft/ ~60 LOC design-F1 §2 / design-F5 d 0% built
teacher_replay_bedrock.pyBedrockBatchTeacherPool: submit one Bedrock CreateModelInvocationJob per teacher, poll, parse .jsonl.out back into list[TeacherCallResult] ~180 LOC design-F2 §b 0% built
datagen/aws/batch_validate.py — AWS Batch array-child entrypoint: read BATCH_JOB_ARRAY_INDEX → manifest line → DockerSandbox + validator + _grade() → write task_grades/ ~120 LOC design-F2 §c2 0% builtdatagen/aws/ subdirectory does not exist
datagen/aws/glue_ingest_job.py — Glue Spark entrypoint wrapping ClaudeCodeIngester.ingest in mapPartitions; write traces/ Parquet ~80 LOC design-F2 §a 0% built
replaysim/emr_normalize_job.py — EMR Serverless Spark entrypoint wrapping DJNormalizer per partition + Spark cross-partition dedup; write corpus/dpo/ + corpus/sft/ Parquet ~100 LOC design-F2 §d 0% built
datagen/aws/s3_contract.py — S3 layout constants, RunManifest dataclass, Parquet/JSONL serializers, recordId==state_id join helpers, schema_version/split column injection ~120 LOC design-F2 §contract 0% built
infra/datagen_stepfunctions.json + infra/datagen_stack.py — Step Functions state machine + IAM roles (Bedrock batch service role, Batch Spot compute env, EMR Serverless, Glue) ~250 LOC IaC design-F2 §orchestration 0% builtinfra/ directory does not exist
trainer/composer_trainer.py world-model head — parameter-isolated next-state adapter + <deliberate> token as second SDPO mode ~40 LOC delta design-F1 §4 / final_report §2 0% built — grep confirms no world_model/WorldModel/next_state_head/<deliberate> anywhere in composer_replication/
Broken-repo image builder — code to clone a repo at base_commit, apply git apply -R <golden_diff>, run scrub_tree, build and push a Docker image to ECR unspecified ADR-010 §decision / design-F2 §c2 0% built — there is NO code anywhere in the repo that manufactures a broken-repo Docker image from scratch
EKSExecutor (now skeleton-built in Wave 2) + Argo Workflows controller for outer loop Wave-2 executor skeleton built; Argo controller design-only design-F1 §AWS / final_report §8 skeleton builteks.py is a functional executor (IndexedJob dispatch) but the Argo outer-loop controller is 0%
verl AsyncServer backend for tool-heavy tree final_report §8 0% built — design note only
Offline LLM-judge hack monitor (EvilGenie-style, Bedrock) design-F5 §Tier 4 0% built

(4) Seams Where "Point at an Arbitrary OSS Repo" Breaks the Current Code

The SweBenchAdapter is designed to consume pre-packaged SWE-bench-shaped datasets, not arbitrary GitHub repos. The breaks are structural:

Break 1: broken_image assumes a pre-built SWE-bench image exists

SweBenchAdapter.image_for() returns either instance["docker_image"] (SWE-rebench) or the convention swebench/sweb.eval.x86_64.{iid}:latest. For an arbitrary OSS repo there is no such image. A fresh repo would need:

  • Clone at base_commit
  • Install the project's Python/Java/etc. toolchain
  • Apply git apply -R <golden_diff> to manufacture the broken state
  • Run scrub_tree() to strip caches
  • Build a Docker image that encapsulates this broken state
  • Push the image to a registry accessible by DockerSandbox.boot()

None of this code exists. DockerSandbox.boot(image) raises RuntimeError("DockerSandbox.boot: image {image!r} not found locally and could not be pulled (the container is --network none). Pull it on the host first.") if the image is absent.

Break 2: test_command is hardcoded

SweBenchAdapter.default_test_command = "python -m pytest -q". A fresh repo may use make test, npm test, cargo test, mvn verify, or any other test runner. There is no test-discovery logic anywhere in the repo.

Break 3: fail_to_pass and pass_to_pass require pre-existing test labels

SWE-bench instances ship with FAIL_TO_PASS and PASS_TO_PASS as pre-identified pytest node IDs. For an arbitrary repo the mapping from "the code change" to "which tests exercise the deleted symbols" must be derived — e.g., via coverage analysis or AST-reachability. FeatureDeletionTask.__post_init__ raises ValueError if fail_to_pass is empty. The 4-gate validator's Gate 2 (deletion breaks the feature) cannot be verified without pre-identified test node IDs.

Break 4: deleted_symbols is never populated

SweBenchAdapter hardcodes deleted_symbols=(). The HackMonitor._patch_provenance_hack() check (monitor.py:157-182) skips the symbol-reappearance test if deleted_symbols is empty — so the provenance layer of the hack monitor is effectively a no-op on all SweBenchAdapter-derived tasks. For a fresh repo, AST analysis to identify the deleted symbols would be required.

Break 5: No copyleft scrub for arbitrary repos

is_redistributable() reads upstream_license from instance["license_name"]. For a fresh GitHub repo there is no pre-populated license field; the repo license must be detected (e.g., via SPDX scanning) before the copyleft filter can be applied.

Break 6: No env setup for non-Python repos

LocalSubprocessSandbox.run_tests runs subprocess.run(cmd, shell=True, ...) against the working tree with a hard-coded 600s timeout. It has no virtualenv creation, no dependency installation, no multi-language support. DockerSandbox depends on a pre-baked image that already has the environment. A fresh Python repo would need pip install -e . run inside the image, and a non-Python repo would need a completely different image and test runner.


(5) What ingestion/claude_code.py Can Ingest Today

ClaudeCodeIngester.ingest(path) handles exactly one format: Claude Code session JSONL at ~/.claude/projects/<encoded>/<sessionId>.jsonl.

Supported record types handled:

  • type="user" — string content or list of text/tool_result blocks → OpenAI-style user message; tool_error structural flag set if any tool_result block has is_error: true
  • type="assistant" — list of text/thinking/tool_use blocks → one TraceState with student_action (full blocks including thinking) and messages (history, optionally with thinking stripped)

Record types silently skipped:

  • type="summary" — Claude Code conversation summary records
  • type="attachment", "queue-operation", "file-history-snapshot", "last-prompt", "system" — auxiliary records
  • isSidechain: True records — subagent traces (skipped in v0.1 per ADR-002)
  • Files starting with agent- — subagent session files by naming convention

Structural features:

  • state_id = f"{path.stem}::{state_idx:04d}" — stable within-session identifier
  • strip_thinking flag (default True) — strips [THINKING] ... lines from the teacher-facing messages history but keeps them in student_action
  • Injects synthetic system prompt at messages[0] ("You are a senior software engineer...")
  • Version check: warns on schema version outside 2.x.x

NOT handled by this ingester:

  • OpenHands trajectory format (planned for v0.2 per ADR-002)
  • SWE-smith trajectories (planned for v0.2)
  • Cline VS Code export
  • Aider chat history
  • SWE-bench leaderboard trajectory submissions
  • Any binary or non-JSONL format

Critical Cross-Checks: What the Repo Claims vs What Exists

Claim 1: "Feature Deletion generator" (Composer 2.5 blog says "point at a repo")

What the blog says (COMPOSER_RECIPE_MAPPING.md): "take a repo with passing tests, delete some code, ask the agent to reimplement to pass tests." What the repo does: Inverts existing SWE-bench-shaped instances — reverts their gold patch. There is NO code that: (a) points at an arbitrary OSS repo, (b) identifies deletable symbols, (c) synthesizes a broken state beyond SWE-bench's pre-packaged ones. The ADR correctly scopes this as "Option A — invert OSS substrates" vs "Option B — greenfield repo scraping." The blog's "point at a repo" vision is Option B, which was explicitly rejected.

Claim 2: "25× synthetic data"

What the blog says: Composer 2.5 uses 25× more synthetic tasks than Composer 2 (COMPOSER_RECIPE_MAPPING.md §2). What the repo has: A schema adapter for 5 existing OSS datasets (SWE-bench-Lite ~300, SWE-Gym ~2.4k, R2E-Gym ~8.1k, SWE-rebench ~21.3k, OpenHands/Nemotron ~59k). ADR-010 notes ~15 node-days to invert all SWE-rebench tasks. No actual inverted task corpus has been generated. The 25× claim refers to the training run; the repo has the generation machinery for the inversion shape but not the greenfield synthesis needed for genuine novel task minting.

Claim 3: "Dynamic difficulty curriculum — select for AND create harder tasks"

What Composer 2.5 says: "We both select for and create harder tasks dynamically throughout the run." What the repo has: The SELECT-FOR half: DifficultyCurriculum with p̂(1−p̂) frontier weighting, retire/quarantine thresholds, and effort tilt on turns/think-tokens (Wave 20). The CREATE half (escalating deletion span, coupling complexity, multi-feature targets during the run) is explicitly listed as MISSING in design-F5 row b2. granularity is set statically to "feature" for all SweBenchAdapter tasks; no escalation logic exists.

Claim 4: deleted_symbols enables AST-provenance monitoring

What ADR-010 says: "signature + patch-provenance monitor" that detects if deleted symbols reappear via cache reads. Reality: deleted_symbols=() on every SweBenchAdapter-derived task (line 81 in substrates.py: hardcoded empty tuple). HackMonitor._patch_provenance_hack() returns False immediately when deleted_symbols is empty (reappeared = [s for s in deleted_symbols if s and s in patch] → empty list). The provenance layer of the monitor is a dead code path for all currently-generable tasks.

Claim 5: The tree controller and world-model head are part of the system

What design docs say: "roughly nine-tenths of it" is reuse (final_report §6 reuse-vs-build table). Reality: The tree controller is 0/0 — no file, no function, no class. Confirmed by exhaustive grep: no SiblingBootstrap, world_model, WorldModel, next_state_head, tree_controller, MCTS, deliberate_token anywhere in composer_replication/. The "nine-tenths reuse" claim is accurate for the Composer recipe replication; the tree itself (the framework's own addition) is entirely design.

Claim 6: The broken-repo image is manufactured by the pipeline

What design-F2 says: Step c2 involves "pull the substrate's frozen image, git apply -R the gold patch, scrub_tree(), run the test command, confirm FAIL_TO_PASS actually fails." Reality: This describes what SHOULD happen in the Batch array child. No such code is written. SweBenchAdapter.image_for() returns a string tag; that tag must be pre-pulled on the host before DockerSandbox.boot() can use it (RuntimeError on image-not-found). The full broken-image manufacture pipeline (clone → revert → scrub → build → push) is a gap.


Summary of Unbuilt vs Built

BUILT and tested (production-ready CPU, Docker-gated where noted):

  • FeatureDeletionTask schema + FeatureDeletionEnv (reset/step/_grade/reward_fn)
  • SweBenchAdapter schema inversion (pure dict transform)
  • FakeSandbox, LocalSubprocessSandbox, DockerSandbox (hardware-gated e2e green in Wave 1/2)
  • scrub_tree() primary reward-hack control
  • HackMonitor (signature + patch-provenance, obfuscation-resistant)
  • DifficultyCurriculum (SELECT-FOR half + effort tilt)
  • validate_task() 4-gate solvability validator
  • ClaudeCodeIngester (Claude Code JSONL only)
  • behavior_rewards.pyc_length, EffortWeights, LengthEffortPenalty, UnfinishedTodoPenalty, LeftoverCoTPenalty, CommunicationReward (Wave 20)
  • kl_in_reward.py — k1-in-reward path opt-in (Wave 20)
  • HeldOutGuard + HeldoutSplit + wired into trainer (Wave 2/3)
  • EKSExecutor skeleton + SageMakerExecutor skeleton (Wave 2)

DESIGN-ONLY (no code):

  • Tree controller (datagen/tree_controller.py)
  • SiblingBootstrapGenerator in hint_generator.py
  • pipeline/s3_layout.py, pipeline/sft_floor.py
  • teacher_replay_bedrock.py (BedrockBatchTeacherPool)
  • datagen/aws/batch_validate.py, datagen/aws/glue_ingest_job.py, datagen/aws/s3_contract.py
  • replaysim/emr_normalize_job.py
  • infra/datagen_stepfunctions.json, infra/datagen_stack.py
  • World-model next-state head in trainer
  • Argo Workflows outer-loop controller
  • Broken-repo image builder (clone → git apply -R → build → push)
  • CREATE half of difficulty curriculum (mint harder tasks during run)
  • SFT-first training stage
  • Offline LLM-judge hack monitor