Spaces:

SaiManish123
/

adaptshield

Running

App Files Files Community

SaiManish123 commited on Apr 26

Commit

c1060df

verified ·

1 Parent(s): 304b5df

Initial deploy of AdaptShield two-phase cybersecurity environment

Browse files

Files changed (38) hide show

.gitignore +28 -0
.python-version +1 -0
Dockerfile +47 -0
README.md +428 -5
__init__.py +23 -0
assets/_make_headline_chart.py +79 -0
assets/architecture_overview.svg +95 -0
assets/headline_results.png +0 -0
assets/training_pipeline.svg +87 -0
baseline.py +204 -0
build_benchmark_table.py +103 -0
client.py +107 -0
eval_tasks.py +46 -0
generate_sft_data.py +197 -0
inference.py +351 -0
launch_hf_grpo_job.py +403 -0
launch_hf_sft_job.py +363 -0
models.py +243 -0
openenv.yaml +26 -0
plot_sft_checkpoint_curve.py +93 -0
plot_training.py +106 -0
pyproject.toml +31 -0
server/Dockerfile +47 -0
server/__init__.py +11 -0
server/adaptshield_environment.py +1324 -0
server/app.py +173 -0
server/attacker.py +361 -0
server/grader.py +501 -0
server/requirements.txt +6 -0
server/scenarios.py +264 -0
smoke_test.py +59 -0
soc_tools.py +239 -0
tests/test_regression.py +352 -0
tool_baseline.py +128 -0
train.py +1332 -0
train_sft.py +611 -0
train_smoke.py +250 -0
uv.lock +0 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,28 @@

+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+.venv
+venv/
+ENV/
+env.bak/
+venv.bak/
+.DS_Store

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.11.9

Dockerfile ADDED Viewed

	@@ -0,0 +1,47 @@

+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git curl && \
+    rm -rf /var/lib/apt/lists/*
+COPY . /app/env
+WORKDIR /app/env
+RUN if ! command -v uv >/dev/null 2>&1; then \
+    curl -LsSf https://astral.sh/uv/install.sh | sh && \
+    mv /root/.local/bin/uv /usr/local/bin/uv && \
+    mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+    uv sync --frozen --no-install-project --no-editable; \
+    else \
+    uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+    uv sync --frozen --no-editable; \
+    else \
+    uv sync --no-editable; \
+    fi
+FROM ${BASE_IMAGE}
+WORKDIR /app
+COPY --from=builder /app/env/.venv /app/.venv
+COPY --from=builder /app/env /app/env
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+EXPOSE 7860
+HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
+    CMD curl -f http://localhost:7860/health || exit 1
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 7860"]

README.md CHANGED Viewed

@@ -1,10 +1,433 @@
 ---
-title: Adaptshield
-emoji: 👀
-colorFrom: red
-colorTo: purple
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Janus (AdaptShield)
+emoji: 🛡️
+colorFrom: blue
+colorTo: red
 sdk: docker
 pinned: false
+license: mit
+tags:
+  - openenv
+  - security
+  - reinforcement-learning
+  - cybersecurity
+short_description: Two-phase adaptive cybersecurity benchmark for LLMs
 ---
+# Janus (AdaptShield) — Two-Phase Adaptive Cybersecurity Benchmark
+**AdaptShield** is the environment: a two-phase agentic cybersecurity
+simulator where an LLM defends a 4-node enterprise network against an
+adversary that shifts strategy mid-episode. **Janus** is the model we
+trained on it — a Qwen2.5-1.5B LoRA, supervised then refined with GRPO.
+On the hardest task Janus scores 0.90 on a held-out world family it
+never saw during training; a tool-aware heuristic baseline scores 0.18
+on the same task.
+The skill being tested is narrow on purpose. Not threat classification.
+Not generic tool calling. The benchmark targets one thing: real-time
+adaptation when the attacker's playbook changes mid-incident. Section
+[Why this matters](#why-this-matters) explains why we think that's the
+gap, and the [Results](#results) section is where the gap closes.
+## Project Links
+- **HF Space (live env):** `TODO`
+- **Colab notebook (SFT + GRPO reproducer, free T4):** `TODO`
+- **Artifacts / model repo:** [`SaiManish123/Janus`](https://huggingface.co/SaiManish123/Janus)
+- **Demo video:** `TODO`
+- **Blog / writeup:** `TODO`
+---
+## Why this matters
+Most cyber-agent demos test threat classification or generic tool
+calling. Real production breaches don't look like that. They look like
+this:
+In April 2026 attackers compromised Context.ai, used its OAuth
+integration into a Vercel employee's Google Workspace, and pivoted from
+shadow AI through identity into Vercel's internal systems, where they
+enumerated and decrypted customer environment variables. The same week,
+a Broken Object Level Authorization flaw in Lovable.dev let any
+free-tier account read source code, Supabase credentials, Stripe keys
+and AI chat histories from other tenants — including projects built by
+AI itself. Eight months earlier, the Tea dating app left a Firebase
+bucket open and 72,000 verification selfies and driver's licenses of
+women on a safety app were scraped to 4chan within hours.
+Three different failure modes — identity hijack via shadow AI, broken
+authz in vibe-coded apps, classic cloud misconfig — but the same
+underlying problem for the defender's agent. The environment is shifting
+faster than any static training distribution can keep up with, and the
+real attacker doesn't sit still while you classify them.
+AdaptShield is built around that pressure. The environment forces the
+agent to (1) act on partial evidence, (2) hand judgment across two
+roles with an information bottleneck between them, (3) trade security
+correctness against operational blast radius, and (4) re-plan when the
+adversary's playbook changes mid-episode. Each of those is a separate
+failure mode in production SOC tooling, and the benchmark scores all
+four at once.
+---
+## Results
+Numbers below come from the production run on Hugging Face L4 Jobs,
+training Qwen2.5-1.5B-Instruct with a LoRA adapter. Eval is 50
+deterministic seeds per task, evaluated on a held-out world family
+the policy never saw during training.
+![AdaptShield held-out benchmark — tool-aware baseline vs SFT vs GRPO](assets/headline_results.png)
+On the hard task (`polymorphic-zero-day`) the tool-aware heuristic
+baseline scores 0.18 and Janus holds 0.90 on the held-out family. On
+the easier tasks the lift is smaller because the rule baseline is
+already near the ceiling; the benchmark is shaped so adaptation only
+matters where it should.
+### Benchmark comparison (full table)
+| Task | No-tool baseline | Tool-aware baseline | SFT (train family) | SFT (held-out) | GRPO (train) | GRPO (held-out) |
+|------|-----------------:|-------------------:|-------------------:|---------------:|-------------:|----------------:|
+| `direct-triage`        | 0.860 | 0.990 | 0.990 | 0.990 | 0.990 | 0.990 |
+| `dual-pivot`           | 0.650 | 0.640 | 0.825 | 0.825 | 0.825 | 0.825 |
+| `polymorphic-zero-day` | 0.380 | 0.180 | 0.960 | 0.930 | **0.883** | **0.902** |
+Two things in this table are worth flagging.
+The tool-aware baseline scores 0.18 on the hard task — worse than the
+no-tool baseline at 0.38. That isn't a bug in the baseline; it's that
+bolting tools onto a heuristic without learning when to trust them
+makes the agent over-trigger on injected false positives. You see the
+same pattern in production with rule-based SOAR playbooks against
+adaptive adversaries.
+Held-out GRPO (0.902) actually edges out train-family GRPO (0.883). That
+is evidence the policy is generalizing across world templates rather
+than memorizing them. Without splitting the eval by world family this
+finding wouldn't be visible — same-seed evaluation would have credited
+the model for memorization it didn't do.
+### SFT — loss and held-out reward
+![SFT loss curve](https://huggingface.co/SaiManish123/Janus/resolve/main/sft_worldsplit_1_5b/loss_curve.png)
+![SFT held-out reward curve](https://huggingface.co/SaiManish123/Janus/resolve/main/sft_worldsplit_1_5b/reward_curve.png)
+### GRPO — refinement on the polymorphic adversary
+![GRPO reward curve, polymorphic-zero-day](https://huggingface.co/SaiManish123/Janus/resolve/main/grpo_polymorphic_zero_day_1_5b/reward_curve.png)
+### Training runs
+Three production runs on Hugging Face Jobs produced the artifacts in this
+README. Stdout logs are public and the per-step / per-episode metrics
+files are next to the adapters.
+| Run | Trainer | GPU | Steps / Episodes | Train wall-clock | Logs | Metrics |
+|-----|---------|-----|------------------|------------------|------|---------|
+| [`sft_worldsplit_1_5b`](https://huggingface.co/SaiManish123/Janus/tree/main/sft_worldsplit_1_5b) | SFT (LoRA) | L4 ×1 | 378 steps | 9m 49s | [stdout](https://huggingface.co/SaiManish123/Janus/blob/main/logs/sft_worldsplit_1_5b.log) | [trainer_state](https://huggingface.co/SaiManish123/Janus/blob/main/sft_worldsplit_1_5b/checkpoint-378/trainer_state.json) |
+| [`grpo_worldsplit_1_5b`](https://huggingface.co/SaiManish123/Janus/tree/main/grpo_worldsplit_1_5b) | GRPO, mixed curriculum | L4 ×1 | 1,628 episodes | 1h 26m | [stdout](https://huggingface.co/SaiManish123/Janus/blob/main/logs/grpo_worldsplit_1_5b.log) | [per-episode](https://huggingface.co/SaiManish123/Janus/blob/main/grpo_worldsplit_1_5b/metrics.json) |
+| [`grpo_polymorphic_zero_day_1_5b`](https://huggingface.co/SaiManish123/Janus/tree/main/grpo_polymorphic_zero_day_1_5b) | GRPO, hard-task focus | L4 ×1 | 4,357 episodes | 3h 17m | [stdout](https://huggingface.co/SaiManish123/Janus/blob/main/logs/grpo_polymorphic_zero_day_1_5b.log) | [per-episode](https://huggingface.co/SaiManish123/Janus/blob/main/grpo_polymorphic_zero_day_1_5b/metrics.json) |
+The curriculum run mixes all three tasks (weights `direct-triage: 0.3 /
+dual-pivot: 0.4 / polymorphic-zero-day: 0.3`). The polymorphic run
+trains exclusively on the hard task to push hard-task performance
+without distraction from saturated tiers. Per-episode reward in both
+runs stabilizes within the first ~500 episodes and stays there for the
+rest of the schedule.
+---
+## Architecture
+![AdaptShield architecture overview](assets/architecture_overview.svg)
+Each episode runs against a sampled mission profile, world-family
+template, and latent operational mode. The Threat Analyst investigates
+raw enterprise evidence through SOC tools and emits a structured
+handoff. The Tactical Executor sees only that handoff (not the raw
+state) and chooses the mitigation. A deterministic Python grader scores
+security correctness, business impact, dependency blast radius, and
+mission alignment. There is no LLM-as-judge anywhere in the loop.
+## Training Pipeline
+![Janus training pipeline](assets/training_pipeline.svg)
+Five steps, each reproducible from the repo:
+1. Generate SFT demonstrations by rolling AdaptShield episodes with a
+   rule-based Phase 1 expert and a tool-aware Phase 2 expert.
+2. Train a LoRA adapter on Qwen2.5-1.5B (or 0.5B for the Colab
+   reproducer) with supervised fine-tuning on those demos.
+3. Evaluate on both train-family and held-out-family worlds. The split
+   is by world template, not by seed, so memorizing a template doesn't
+   transfer across the split.
+4. Refine the SFT adapter with GRPO on a curriculum weighted toward
+   `polymorphic-zero-day`. The deterministic grader is the reward.
+5. Publish adapters, curves, metrics, and benchmark tables to
+   [`SaiManish123/Janus`](https://huggingface.co/SaiManish123/Janus).
+A free-tier Colab notebook reproduces steps 1–4 end-to-end on a T4 in
+roughly 35 minutes using Qwen2.5-0.5B and reduced episode budgets. The
+numbers in this README come from the 1.5B run on a Hugging Face L4 Job.
+---
+## Environment Description
+The agent defends a 4-node enterprise network (`auth_service`,
+`payment_service`, `database`, `api_gateway`). Each turn has two phases:
+**Phase 1 — Threat Analyst.** Agent reads SIEM metrics, can call SOC tools
+(log search, network telemetry, threat intel lookup), and emits a
+structured `Phase1Action` with threat type, target node, confidence and a
+recommended action.
+**Phase 2 — Tactical Executor.** Agent receives only the Phase 1
+assessment (blind to raw state) and emits a `Phase2Action`. The analyst
+has to communicate clearly because the executor cannot double-check the
+network.
+The attacker escalates through `recon → exploit → exfiltration` if the
+agent fails to respond correctly. On the hard task, the attacker shifts
+strategy mid-episode and seeds false-positive noise that looks like a
+real attack but isn't — punishing reflexive isolation.
+### Observation Space
+```json
+{
+  "phase": "1 or 2",
+  "network_nodes": {
+    "auth_service": {"status": "...", "request_rate": 0, "error_rate": 0.0, "cpu": 0}
+  },
+  "active_alerts": ["raw metric alert strings — no MITRE codes"],
+  "attack_stage": "recon | exploit | exfiltration | none",
+  "history": [{"turn": "1", "p1": "classified:brute_force", "p2": "rate_limit→auth_service"}],
+  "phase1_assessment": {"threat_type": "...", "confidence": 0.9, "target_node": "..."},
+  "metadata": {"normalized_score": 0.72}
+}
+```
+Phase 2 observations have empty `network_nodes` and `active_alerts` — the
+executor only sees the analyst's handoff.
+### Action Space
+**Phase 1 (`Phase1Action`):**
+```json
+{"threat_type": "brute_force", "confidence": 0.9, "target_node": "auth_service", "recommended_action": "rate_limit", "reasoning": "..."}
+```
+**Phase 2 (`Phase2Action`):**
+```json
+{"action": "rate_limit", "target_node": "auth_service", "reasoning": "..."}
+```
+Valid actions: `rate_limit`, `isolate`, `honeypot`, `patch`, `monitor`.
+### Tasks
+| Task | Difficulty | Description | Rule baseline |
+|------|-----------|-------------|--------------:|
+| `direct-triage` | Easy | Single fixed strategy | ~0.87 |
+| `dual-pivot` | Medium | Two alternating strategies | ~0.76 |
+| `polymorphic-zero-day` | Hard | All four + mid-episode shift + noise | ~0.52 |
+### Reward Function
+| Outcome | Reward |
+|---------|-------:|
+| Phase 1 threat type correct | +0.15 |
+| Phase 1 target node correct | +0.10 |
+| Phase 2 optimal action + correct target | +0.39 |
+| Phase 2 heavy-handed but effective | +0.18 |
+| Phase 2 wrong action | -0.25 |
+| False positive on benign event | -0.39 |
+| Catastrophic: database exfiltrated | -0.49, `done=True` |
+Scores are clipped to the open interval `(0.01, 0.99)` — the grader never
+emits exactly 0 or 1, which keeps GRPO advantages well-defined.
+### Operational Impact Layer
+AdaptShield also scores business impact, so the agent is rewarded for
+stopping the attack without ignoring operational blast radius. Each
+service has a criticality weight and a dependency fan-out:
+| Service | Criticality | Downstream dependency risk |
+|---------|------------:|----------------------------|
+| `auth_service` | 0.70 | `payment_service` |
+| `payment_service` | 0.90 | `api_gateway` |
+| `database` | 1.00 | `payment_service`, `api_gateway` |
+| `api_gateway` | 0.80 | `auth_service`, `payment_service`, `database` |
+Actions have bounded disruption costs (`monitor` = none, `isolate` =
+highest). The grader emits `business_impact`, `availability_impact`,
+`security_risk`, `dependency_blast_radius`, and `operational_penalty`
+inside `score_breakdown`. The reward adjustment is capped at `±0.05` per
+turn, which keeps the training signal stable while leaving the replay
+detailed enough to explain whether the agent stopped the attack cleanly
+or caused unnecessary business disruption getting there.
+### Mission-Aware Objectives
+Each task carries a mission profile, visible in observation metadata and
+appended to the system prompt:
+| Task | Mission | Primary Asset | SLA Priority | Risk Tolerance |
+|------|---------|---------------|--------------|----------------|
+| `direct-triage` | `login_stability` | `auth_service` | availability | medium |
+| `dual-pivot` | `checkout_continuity` | `payment_service` | availability | medium |
+| `polymorphic-zero-day` | `breach_containment` | `database` | containment | low |
+The grader emits `mission_alignment` and `mission_adjustment`, capped at
+`±0.04` per turn. This makes the agent optimize for the operational
+mission, not just the threat label. Availability-priority missions
+discourage unnecessary isolation of the primary asset; containment
+missions reward decisive correct containment of the crown-jewel
+database.
+### Design choices that aren't obvious
+A few decisions in the environment that look like details but matter
+for what the benchmark actually measures:
+- **Information bottleneck between phases.** Phase 2's observation has
+  empty `network_nodes` and `active_alerts`. The executor only sees
+  Phase 1's structured handoff. If Phase 1 can't communicate clearly,
+  Phase 2 fails — and you see it in the score, not in a separate metric.
+  This is what makes the env actually test cross-role coordination
+  rather than just two independent policies stitched together.
+- **Train/eval split by world family, not by seed.** The world templates
+  used for training are disjoint from the ones used for held-out
+  evaluation. A model that overfits to a specific service-name pattern
+  or a specific alert distribution will pass train evals and fail
+  held-out. Same-seed evaluation would have hidden this.
+- **Open scoring interval `(0.01, 0.99)`.** The grader never emits
+  exactly 0 or 1. This keeps GRPO advantage estimates well-defined —
+  saturating rewards collapse the variance the algorithm needs.
+- **Bounded auxiliary signals.** Operational impact is capped at `±0.05`
+  per turn and mission alignment at `±0.04`. They steer the policy
+  without dominating the security signal, so the training curve doesn't
+  get hijacked by a single side-objective.
+- **Deterministic Python grader, no LLM-as-judge.** Rewards come from
+  strategy matching against a fixed ground-truth attacker, not from a
+  judge model. The benchmark cannot be gamed by a more eloquent policy.
+- **Phase-1 alerts are raw metric strings, not MITRE codes.** The agent
+  has to do the classification, not match a label to a label. This is
+  what makes the soc-tool baseline collapse on the hard task: heuristic
+  classification doesn't survive injected noise.
+---
+## Reproduce it
+### Free-tier Colab (recommended for judges)
+Open the Colab notebook linked above and run top-to-bottom. It will:
+- install the exact pinned dependency stack used in the HF Job
+- generate SFT demos from the environment
+- train an SFT LoRA on Qwen2.5-0.5B (T4-friendly)
+- run GRPO refinement on top of that SFT adapter
+- print the benchmark table and inline the production training curves
+  from `SaiManish123/Janus` so you can compare scaled-down vs. full runs
+End-to-end runtime on a Colab T4 is roughly 35 minutes.
+### Local setup
+```bash
+pip install openenv-core
+git clone https://github.com/SaiManish123/adaptshield
+cd adaptshield
+python -m adaptshield.server.app
+```
+### Run inference against the live environment
+```bash
+export HF_TOKEN=your_token
+export ADAPTSHIELD_TASK=direct-triage     # or dual-pivot / polymorphic-zero-day
+export ENV_BASE_URL=http://localhost:7860
+python inference.py                        # run from the repo root
+```
+`inference.py` honors the evaluator contract: `[START]`, `[STEP]`, `[END]`
+stdout markers and credentials read only from environment variables.
+### Smoke test
+```bash
+python smoke_test.py
+```
+Spins the env up in-process and walks one episode of each task with a
+deterministic policy. Should finish in <10 seconds.
+### Regression tests
+```bash
+adaptshield/.venv/bin/python -m unittest tests.test_regression -v
+```
+### Baseline scores
+With `ADAPTSHIELD_SEED=42`, the deterministic rule baseline produces:
+| Task | Score | Steps | Status |
+|------|------:|------:|--------|
+| `direct-triage` | 0.870 | 10 | PASS |
+| `dual-pivot` | 0.760 | 12 | PASS |
+| `polymorphic-zero-day` | 0.520 | 16 | PASS |
+Difficulty staircase: **PASS**.
+---
+## Repository layout
+```
+adaptshield/
+├── server/                  # FastAPI server (OpenEnv-compatible)
+├── client.py                # OpenEnv client (no server-internal imports)
+├── models.py                # Phase1Action / Phase2Action schemas
+├── soc_tools.py             # SIEM, log search, threat intel SOC tools
+├── eval_tasks.py            # task definitions + difficulty staircase
+├── baseline.py              # deterministic rule baseline
+├── tool_baseline.py         # tool-aware heuristic baseline
+├── generate_sft_data.py     # rolls episodes → SFT JSONL
+├── train_sft.py             # LoRA SFT trainer (Unsloth + TRL)
+├── train.py                 # GRPO trainer (Unsloth + TRL)
+├── plot_training.py         # reward / loss curve plotting
+├── build_benchmark_table.py # eval matrix builder
+├── inference.py             # judge-facing entry point
+├── smoke_test.py            # one-shot in-process smoke test
+├── tests/test_regression.py # determinism + reward regression tests
+├── openenv.yaml             # OpenEnv manifest
+└── Dockerfile               # HF Space container
+```
+## Engineering notes
+`AdaptShieldEnvironment` extends OpenEnv's `Environment` base class and
+follows the Gym-style API (`reset`, `step`, `state`). The client in
+`client.py` talks to the server only through HTTP — no shared imports,
+no leaking of server internals. None of the SOC tools are named
+`reset`, `step`, `state`, or `close`, so they don't collide with the
+reserved MCP tool names. Grading is deterministic Python; the reward
+signal and the benchmark scores both come from strategy matching
+against a fixed ground-truth attacker, never from an LLM judge.
+All adapters, curves, metrics, and benchmark tables for the 1.5B run
+are public on [`SaiManish123/Janus`](https://huggingface.co/SaiManish123/Janus).
+## License
+MIT.

__init__.py ADDED Viewed

	@@ -0,0 +1,23 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""AdaptShield environment package."""
+from client import AdaptshieldEnv
+from models import (
+    AdaptShieldAction,
+    AdaptShieldObservation,
+    AdaptshieldAction,
+    AdaptshieldObservation,
+)
+__all__ = [
+    "AdaptShieldAction",
+    "AdaptShieldObservation",
+    "AdaptshieldAction",
+    "AdaptshieldObservation",
+    "AdaptshieldEnv",
+]

assets/_make_headline_chart.py ADDED Viewed

	@@ -0,0 +1,79 @@

+"""Render the headline benchmark chart for README.
+Produces a clean grouped bar chart of held-out evaluation scores
+(tool-aware baseline / SFT / GRPO) across the three difficulty tiers.
+Numbers are pulled directly from
+https://huggingface.co/SaiManish123/Janus benchmark tables and are
+identical to the values in README.md so the figure stays in sync.
+Run: python assets/_make_headline_chart.py
+"""
+from __future__ import annotations
+import os
+from pathlib import Path
+os.environ.setdefault("MPLCONFIGDIR", "/tmp/mpl-adaptshield")
+import matplotlib.pyplot as plt
+import numpy as np
+OUT = Path(__file__).parent / "headline_results.png"
+tasks = ["direct-triage\n(easy)", "dual-pivot\n(medium)", "polymorphic-zero-day\n(hard)"]
+tool_baseline = [0.990, 0.640, 0.180]
+sft_heldout = [0.990, 0.825, 0.930]
+grpo_heldout = [0.990, 0.825, 0.902]
+x = np.arange(len(tasks))
+width = 0.26
+fig, ax = plt.subplots(figsize=(9.5, 4.6), dpi=150)
+c_tool = "#9aa0a6"
+c_sft = "#1f6feb"
+c_grpo = "#d63b2f"
+b1 = ax.bar(x - width, tool_baseline, width, label="Tool-aware baseline", color=c_tool, edgecolor="white", linewidth=0.6)
+b2 = ax.bar(x,         sft_heldout,  width, label="SFT (held-out)",      color=c_sft,  edgecolor="white", linewidth=0.6)
+b3 = ax.bar(x + width, grpo_heldout, width, label="GRPO (held-out)",     color=c_grpo, edgecolor="white", linewidth=0.6)
+for bars in (b1, b2, b3):
+    ax.bar_label(bars, fmt="%.2f", padding=3, fontsize=9, color="#333")
+ax.set_ylim(0, 1.08)
+ax.set_yticks(np.arange(0, 1.01, 0.2))
+ax.set_ylabel("Mean score (0.01–0.99 grader)", fontsize=10)
+ax.set_xticks(x)
+ax.set_xticklabels(tasks, fontsize=10)
+ax.set_title(
+    "AdaptShield held-out evaluation · Qwen2.5-1.5B · 50 deterministic seeds / task",
+    fontsize=11.5, pad=12, color="#222",
+)
+ax.spines["top"].set_visible(False)
+ax.spines["right"].set_visible(False)
+ax.spines["left"].set_color("#cccccc")
+ax.spines["bottom"].set_color("#cccccc")
+ax.tick_params(colors="#555")
+ax.yaxis.grid(True, color="#eeeeee", linewidth=0.8)
+ax.set_axisbelow(True)
+ax.annotate(
+    "5.0× lift on the only task that\nactually requires adaptation",
+    xy=(2 + width, grpo_heldout[2]),
+    xytext=(2 - 0.15, 0.45),
+    fontsize=9, color="#444",
+    arrowprops=dict(arrowstyle="->", color="#888", lw=0.9, connectionstyle="arc3,rad=-0.2"),
+)
+ax.legend(
+    loc="lower left", frameon=False, fontsize=9.5, ncol=3,
+    bbox_to_anchor=(0.0, -0.22),
+)
+plt.tight_layout()
+fig.savefig(OUT, bbox_inches="tight", facecolor="white")
+print(f"wrote {OUT}")

assets/architecture_overview.svg ADDED Viewed

assets/headline_results.png ADDED Viewed

assets/training_pipeline.svg ADDED Viewed

baseline.py ADDED Viewed

	@@ -0,0 +1,204 @@

+#!/usr/bin/env python3
+"""Rule-based AdaptShield baseline with evaluator-style stdout."""
+from __future__ import annotations
+import argparse
+import json
+import sys
+from pathlib import Path
+from typing import Any, Dict, List
+REPO_ROOT = Path(__file__).resolve().parent
+if str(REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(REPO_ROOT))
+from models import AdaptShieldAction
+from server.adaptshield_environment import AdaptShieldEnvironment
+TASKS = ["direct-triage", "dual-pivot", "polymorphic-zero-day"]
+BENCHMARK = "adaptshield"
+MODEL_NAME = "rule-baseline"
+MAX_STEPS = 30
+POLICY = {
+    "brute_force": ("auth_service", "rate_limit"),
+    "lateral_movement": ("payment_service", "isolate"),
+    "exfiltration": ("database", "honeypot"),
+    "supply_chain": ("api_gateway", "patch"),
+    "benign": ("api_gateway", "monitor"),
+}
+def log_start(task: str) -> None:
+    print(f"[START] task={task} env={BENCHMARK} model={MODEL_NAME}", flush=True)
+def log_step(step: int, action: Dict[str, Any], reward: float, done: bool) -> None:
+    action_str = json.dumps(action, separators=(",", ":"))
+    if len(action_str) > 100:
+        action_str = action_str[:97] + "..."
+    print(
+        f"[STEP] step={step} action={action_str} "
+        f"reward={reward:.2f} done={str(done).lower()} error=null",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    reward_str = ",".join(f"{reward:.2f}" for reward in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} "
+        f"score={score:.3f} rewards={reward_str}",
+        flush=True,
+    )
+def print_replay(task: str, metadata: Dict[str, Any]) -> None:
+    replay = metadata.get("episode_replay") or []
+    print()
+    print(f"Replay: {task}")
+    if not replay:
+        print("  No replay metadata available.")
+        return
+    for row in replay:
+        shift = " STRATEGY SHIFT:" if row.get("shift") else ""
+        foothold = " FOOTHOLD:" if row.get("foothold_transition") else ""
+        blast_radius = row.get("blast_radius") or []
+        blast = ",".join(blast_radius) if blast_radius else "none"
+        active_defenses = row.get("active_defenses") or []
+        print(
+            f"  Turn {row.get('turn')}:"
+            f"{shift}{foothold} {row.get('p1')} -> {row.get('p2_action')}"
+            f"({row.get('target', 'unknown')}) -> {row.get('result')}"
+            f" | impact={float(row.get('impact', 0.0)):.2f}"
+            f" blast_radius={blast}"
+            f" active_controls={len(active_defenses)}"
+            f" mission={row.get('mission_alignment', 'neutral')}"
+        )
+def classify_from_metrics(network_nodes: Dict[str, Dict[str, Any]]) -> str:
+    auth = network_nodes.get("auth_service", {})
+    payment = network_nodes.get("payment_service", {})
+    database = network_nodes.get("database", {})
+    gateway = network_nodes.get("api_gateway", {})
+    if float(auth.get("error_rate", 0.0)) >= 0.10:
+        return "brute_force"
+    if payment.get("status") == "suspicious" or float(payment.get("cpu", 0)) >= 55:
+        return "lateral_movement"
+    if float(database.get("outbound_mb", 0)) >= 50:
+        return "exfiltration"
+    if gateway.get("status") == "suspicious":
+        return "supply_chain"
+    return "benign"
+def phase1_payload(obs) -> Dict[str, Any]:
+    threat_type = classify_from_metrics(obs.network_nodes)
+    target_node, action = POLICY[threat_type]
+    return {
+        "threat_type": threat_type,
+        "confidence": 0.90,
+        "target_node": target_node,
+        "recommended_action": action,
+        "reasoning": "rule-based metric classifier",
+    }
+def phase2_payload(obs) -> Dict[str, Any]:
+    assessment = obs.phase1_assessment or {}
+    threat_type = str(assessment.get("threat_type", "benign"))
+    fallback_target, fallback_action = POLICY.get(threat_type, POLICY["benign"])
+    action = str(assessment.get("recommended_action") or fallback_action)
+    target_node = str(assessment.get("target_node") or fallback_target)
+    return {
+        "action": action,
+        "target_node": target_node,
+        "reasoning": "execute analyst recommendation",
+    }
+def action_from_payload(payload: Dict[str, Any]) -> AdaptShieldAction:
+    return AdaptShieldAction(**payload)
+def run_task(task: str, emit_logs: bool = True) -> Dict[str, Any]:
+    env = AdaptShieldEnvironment(task_name=task)
+    obs = env.reset()
+    rewards: List[float] = []
+    steps = 0
+    if emit_logs:
+        log_start(task)
+    while not obs.done and steps < MAX_STEPS:
+        if obs.phase == 1:
+            payload = phase1_payload(obs)
+        else:
+            payload = phase2_payload(obs)
+        obs = env.step(action_from_payload(payload))
+        reward = float(obs.reward)
+        rewards.append(reward)
+        steps += 1
+        if emit_logs:
+            log_step(steps, payload, reward, obs.done)
+    metadata = obs.metadata if isinstance(obs.metadata, dict) else {}
+    score = float(metadata.get("normalized_score", 0.01))
+    success = obs.done and 0.01 <= score <= 0.99
+    if emit_logs:
+        log_end(success, steps, score, rewards)
+    return {
+        "task": task,
+        "score": score,
+        "steps": steps,
+        "done": bool(obs.done),
+        "rewards": rewards,
+        "metadata": metadata,
+        "normalized_score_present": "normalized_score" in metadata,
+        "success": success,
+    }
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Run AdaptShield rule baseline.")
+    parser.add_argument(
+        "--task",
+        default="direct-triage",
+        choices=TASKS + ["all"],
+        help="Task to run, or 'all' for every task.",
+    )
+    parser.add_argument(
+        "--replay",
+        action="store_true",
+        help="Print a human-readable final episode replay.",
+    )
+    return parser.parse_args()
+def main() -> int:
+    args = parse_args()
+    tasks = TASKS if args.task == "all" else [args.task]
+    for index, task in enumerate(tasks):
+        if index:
+            print()
+        result = run_task(task, emit_logs=True)
+        if args.replay:
+            print_replay(task, result["metadata"])
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

build_benchmark_table.py ADDED Viewed

	@@ -0,0 +1,103 @@

+#!/usr/bin/env python3
+"""Build a README-friendly benchmark table from baselines and training metrics."""
+from __future__ import annotations
+import argparse
+import json
+from pathlib import Path
+from typing import Any, Dict, List
+from baseline import TASKS, run_task as run_no_tool_task
+from tool_baseline import run_task as run_tool_task
+def rows_to_map(rows: List[Dict[str, Any]]) -> Dict[str, Dict[str, Any]]:
+    return {str(row["task"]): row for row in rows}
+def load_metrics(path: Path) -> Dict[str, Any]:
+    return json.loads(path.read_text(encoding="utf-8"))
+def markdown_table(headers: List[str], rows: List[List[str]]) -> str:
+    lines = [
+        "| " + " | ".join(headers) + " |",
+        "| " + " | ".join(["---"] * len(headers)) + " |",
+    ]
+    for row in rows:
+        lines.append("| " + " | ".join(row) + " |")
+    return "\n".join(lines)
+def fmt(value: float | None) -> str:
+    if value is None:
+        return "-"
+    return f"{float(value):.3f}"
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Build AdaptShield benchmark comparison table.")
+    parser.add_argument("--sft-metrics", required=True, help="Path to sft_metrics.json")
+    parser.add_argument("--grpo-metrics", default="", help="Optional path to GRPO metrics.json")
+    parser.add_argument("--output", default="artifacts/benchmark_table.md")
+    args = parser.parse_args()
+    sft_metrics = load_metrics(Path(args.sft_metrics))
+    grpo_metrics = load_metrics(Path(args.grpo_metrics)) if args.grpo_metrics else {}
+    no_tool_rows = {task: run_no_tool_task(task, emit_logs=False) for task in TASKS}
+    tool_rows = {task: run_tool_task(task, emit_logs=False) for task in TASKS}
+    sft_eval = rows_to_map(sft_metrics.get("evaluation_rows", []))
+    sft_heldout = rows_to_map(sft_metrics.get("heldout_evaluation_rows", []))
+    grpo_eval = rows_to_map(grpo_metrics.get("evaluation_rows", [])) if grpo_metrics else {}
+    grpo_heldout = rows_to_map(grpo_metrics.get("heldout_evaluation_rows", [])) if grpo_metrics else {}
+    rows: List[List[str]] = []
+    for task in TASKS:
+        rows.append([
+            task,
+            fmt(no_tool_rows[task]["score"]),
+            fmt(tool_rows[task]["score"]),
+            fmt(sft_eval.get(task, {}).get("score")),
+            fmt(sft_heldout.get(task, {}).get("score")),
+            fmt(grpo_eval.get(task, {}).get("score") if grpo_eval else None),
+            fmt(grpo_heldout.get(task, {}).get("score") if grpo_heldout else None),
+        ])
+    md = markdown_table(
+        headers=[
+            "Task",
+            "No-tool baseline",
+            "Tool-aware baseline",
+            "SFT (train family)",
+            "SFT (held-out family)",
+            "GRPO (train family)",
+            "GRPO (held-out family)",
+        ],
+        rows=rows,
+    )
+    summary = {
+        "no_tool_baseline": {task: no_tool_rows[task]["score"] for task in TASKS},
+        "tool_baseline": {task: tool_rows[task]["score"] for task in TASKS},
+        "sft_train_family": {task: sft_eval.get(task, {}).get("score") for task in TASKS},
+        "sft_heldout_family": {task: sft_heldout.get(task, {}).get("score") for task in TASKS},
+        "grpo_train_family": {task: grpo_eval.get(task, {}).get("score") for task in TASKS} if grpo_eval else {},
+        "grpo_heldout_family": {task: grpo_heldout.get(task, {}).get("score") for task in TASKS} if grpo_heldout else {},
+    }
+    output_path = Path(args.output)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    output_path.write_text(md + "\n", encoding="utf-8")
+    output_path.with_suffix(".json").write_text(json.dumps(summary, indent=2), encoding="utf-8")
+    print(md)
+    print()
+    print(f"Saved markdown table to: {output_path}")
+    print(f"Saved JSON summary to: {output_path.with_suffix('.json')}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

client.py ADDED Viewed

	@@ -0,0 +1,107 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""AdaptShield environment client."""
+from typing import Any, Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from models import AdaptShieldAction, AdaptShieldObservation
+class AdaptshieldEnv(
+    EnvClient[AdaptShieldAction, AdaptShieldObservation, State]
+):
+    """
+    Client for the Adaptshield Environment.
+    This client maintains a persistent WebSocket connection to the environment server,
+    enabling efficient multi-step interactions with lower latency.
+    Each client instance has its own dedicated environment session on the server.
+    Example:
+        >>> # Connect to a running server
+        >>> with AdaptshieldEnv(base_url="http://localhost:7860") as client:
+        ...     result = client.reset()
+        ...     print(result.observation.phase)
+        ...
+        ...     result = client.step(AdaptShieldAction(
+        ...         threat_type="brute_force",
+        ...         confidence=0.9,
+        ...         target_node="auth_service",
+        ...         recommended_action="rate_limit",
+        ...     ))
+        ...     print(result.observation.phase1_assessment)
+    Example with Docker:
+        >>> # Automatically start container and connect
+        >>> client = AdaptshieldEnv.from_docker_image("adaptshield-env:latest")
+        >>> try:
+        ...     result = client.reset()
+        ...     result = client.step(AdaptShieldAction(
+        ...         threat_type="benign",
+        ...         confidence=0.8,
+        ...         target_node="auth_service",
+        ...         recommended_action="monitor",
+        ...     ))
+        ... finally:
+        ...     client.close()
+    """
+    def _step_payload(self, action: AdaptShieldAction) -> Dict[str, Any]:
+        """
+        Convert AdaptShieldAction to a JSON-safe payload.
+        Args:
+            action: AdaptShieldAction instance
+        Returns:
+            Dictionary representation suitable for JSON encoding
+        """
+        return action.model_dump(
+            mode="json",
+            exclude_none=True,
+            exclude_defaults=True,
+        )
+    def _parse_result(self, payload: Dict[str, Any]) -> StepResult[AdaptShieldObservation]:
+        """
+        Parse server response into StepResult[AdaptShieldObservation].
+        Args:
+            payload: JSON response data from server
+        Returns:
+            StepResult with AdaptShieldObservation
+        """
+        obs_data = dict(payload.get("observation", {}))
+        obs_data.setdefault("done", payload.get("done", False))
+        obs_data.setdefault("reward", payload.get("reward", 0.0))
+        observation = AdaptShieldObservation(**obs_data)
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        """
+        Parse server response into State object.
+        Args:
+            payload: JSON response from state request
+        Returns:
+            State object with episode_id and step_count
+        """
+        return State(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

eval_tasks.py ADDED Viewed

	@@ -0,0 +1,46 @@

+#!/usr/bin/env python3
+"""Run all AdaptShield tasks with the local rule baseline."""
+from __future__ import annotations
+from baseline import TASKS, run_task
+def status_for(result: dict) -> str:
+    score = result["score"]
+    passed = (
+        result["done"] and
+        result["normalized_score_present"] and
+        0.01 <= score <= 0.99
+    )
+    return "PASS" if passed else "FAIL"
+def main() -> int:
+    results = [run_task(task, emit_logs=False) for task in TASKS]
+    print("AdaptShield Evaluation")
+    print()
+    print(f"{'Task':<24} {'Score':>7} {'Steps':>5} {'normalized_score':>18} {'Status':>8}")
+    print("-" * 68)
+    for result in results:
+        normalized = "yes" if result["normalized_score_present"] else "no"
+        print(
+            f"{result['task']:<24} "
+            f"{result['score']:>7.3f} "
+            f"{result['steps']:>5} "
+            f"{normalized:>18} "
+            f"{status_for(result):>8}"
+        )
+    scores = [result["score"] for result in results]
+    staircase = all(left > right for left, right in zip(scores, scores[1:]))
+    print()
+    print(f"Difficulty staircase: {'PASS' if staircase else 'FAIL'}")
+    return 0 if all(status_for(result) == "PASS" for result in results) else 1
+if __name__ == "__main__":
+    raise SystemExit(main())

generate_sft_data.py ADDED Viewed

	@@ -0,0 +1,197 @@

+#!/usr/bin/env python3
+"""Generate supervised fine-tuning data directly from AdaptShield rollouts."""
+from __future__ import annotations
+import argparse
+import json
+import random
+from pathlib import Path
+from typing import Any, Dict, List
+from models import AdaptShieldAction
+from server.adaptshield_environment import AdaptShieldEnvironment
+from train import (
+    TASKS,
+    _current_reference,
+    _teacher_payload,
+    build_messages,
+    obs_to_dict,
+    render_messages,
+    task_for_episode,
+)
+from soc_tools import attach_tool_results, investigate_local_with_depth
+def build_dataset(
+    selected_task: str,
+    curriculum: bool,
+    use_tools: bool,
+    rollout_episodes: int,
+    max_steps: int,
+    seed: int,
+    world_split: str,
+    world_family: str | None,
+) -> List[Dict[str, Any]]:
+    random.seed(seed)
+    rows: List[Dict[str, Any]] = []
+    for episode in range(1, rollout_episodes + 1):
+        task, stage = task_for_episode(
+            episode=episode,
+            total_episodes=rollout_episodes,
+            selected_task=selected_task,
+            curriculum=curriculum,
+        )
+        env = AdaptShieldEnvironment(
+            task_name=task,
+            world_split=world_split,
+            world_family=world_family,
+        )
+        obs = env.reset()
+        step_count = 0
+        while not obs.done and step_count < max_steps:
+            phase = int(getattr(obs, "phase", 1))
+            tool_results = investigate_local_with_depth(
+                env,
+                obs,
+                use_tools=use_tools,
+                thorough=(task == "polymorphic-zero-day"),
+            )
+            obs_dict = attach_tool_results(obs_to_dict(obs), tool_results)
+            messages = build_messages(obs_dict)
+            reference = _current_reference(env)
+            teacher_payload = _teacher_payload(phase, reference)
+            response_text = json.dumps(teacher_payload, separators=(",", ":"))
+            rows.append({
+                "task": task,
+                "stage": stage,
+                "episode": episode,
+                "turn": int(getattr(obs, "turn", 0) or 0),
+                "phase": phase,
+                "attack_stage": reference["stage"],
+                "world_split": getattr(env, "_world_split", world_split),
+                "world_family": getattr(env, "_world_family", world_family or ""),
+                "operational_mode": getattr(env, "_operational_mode", ""),
+                "is_benign": bool(reference["is_benign"]),
+                "expected_threat_type": reference["threat_type"],
+                "expected_target_node": reference["target_node"],
+                "expected_action": reference["expected_action"],
+                "tool_calls": len(tool_results),
+                "messages": messages,
+                "response": response_text,
+                "text": f"{render_messages(messages)}\n\nASSISTANT:\n{response_text}",
+            })
+            obs = env.step(AdaptShieldAction(**teacher_payload))
+            step_count += 1
+    return rows
+def summarize_rows(rows: List[Dict[str, Any]]) -> Dict[str, Any]:
+    by_task = {task: 0 for task in TASKS}
+    by_phase = {1: 0, 2: 0}
+    with_tools = 0
+    for row in rows:
+        task = str(row.get("task", ""))
+        phase = int(row.get("phase", 1) or 1)
+        if task in by_task:
+            by_task[task] += 1
+        by_phase[phase] = by_phase.get(phase, 0) + 1
+        if int(row.get("tool_calls", 0) or 0) > 0:
+            with_tools += 1
+    return {
+        "rows": len(rows),
+        "task_counts": by_task,
+        "phase_counts": by_phase,
+        "rows_with_tool_calls": with_tools,
+    }
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Generate AdaptShield SFT JSONL data")
+    parser.add_argument(
+        "--task",
+        default="all",
+        choices=["all", *TASKS],
+        help="Task to sample. Use 'all' with --curriculum for mixed data.",
+    )
+    parser.add_argument(
+        "--episodes",
+        type=int,
+        default=120,
+        help="Number of rollout episodes to sample.",
+    )
+    parser.add_argument(
+        "--max-steps",
+        type=int,
+        default=20,
+        help="Maximum env steps per episode.",
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=42,
+        help="Dataset generation seed.",
+    )
+    parser.add_argument(
+        "--curriculum",
+        action="store_true",
+        help="Use easy->medium->hard sampling schedule.",
+    )
+    parser.add_argument(
+        "--use-tools",
+        action="store_true",
+        help="Include SOC tool evidence in prompts where applicable.",
+    )
+    parser.add_argument(
+        "--output",
+        default="data/adaptshield_sft.jsonl",
+        help="Where to write the JSONL dataset.",
+    )
+    parser.add_argument(
+        "--world-split",
+        default="train",
+        choices=["train", "eval"],
+        help="World-family split used to generate the dataset.",
+    )
+    parser.add_argument(
+        "--world-family",
+        default=None,
+        help="Optional fixed world family override (e.g. train-a, eval-x).",
+    )
+    args = parser.parse_args()
+    rows = build_dataset(
+        selected_task=args.task,
+        curriculum=args.curriculum,
+        use_tools=args.use_tools,
+        rollout_episodes=args.episodes,
+        max_steps=args.max_steps,
+        seed=args.seed,
+        world_split=args.world_split,
+        world_family=args.world_family,
+    )
+    output_path = Path(args.output)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    with output_path.open("w", encoding="utf-8") as handle:
+        for row in rows:
+            handle.write(json.dumps(row, ensure_ascii=True) + "\n")
+    summary = summarize_rows(rows)
+    summary_path = output_path.with_suffix(".summary.json")
+    summary_path.write_text(json.dumps(summary, indent=2), encoding="utf-8")
+    print(f"Wrote {len(rows)} rows to {output_path}")
+    print(f"Summary saved to {summary_path}")
+    print(json.dumps(summary, indent=2))
+if __name__ == "__main__":
+    main()

inference.py ADDED Viewed

	@@ -0,0 +1,351 @@

+"""
+AdaptShield Inference Script
+Single task per run. Emits mandatory [START]/[STEP]/[END] stdout format.
+All credentials read from environment — never hardcoded.
+Required env vars (injected by evaluator):
+    API_KEY:          Evaluator's LiteLLM proxy key (checked first)
+    API_BASE_URL:     LLM endpoint
+    MODEL_NAME:       Model identifier
+Optional env vars:
+    HF_TOKEN:         Fallback if API_KEY not set
+    ADAPTSHIELD_TASK: Task name (default: direct-triage)
+    ENV_BASE_URL:     Environment server URL (default: localhost:7860)
+"""
+import json
+import os
+import sys
+import textwrap
+from typing import Any, Dict, List, Optional
+import urllib.request
+import urllib.error
+from openai import OpenAI
+from client import AdaptshieldEnv
+from models import AdaptShieldAction
+from soc_tools import attach_tool_results, investigate_http, summarize_tool_results
+# ── Configuration — read from env, NEVER hardcode ──────────────────────────
+API_KEY      = os.environ.get("API_KEY") or os.environ.get("HF_TOKEN", "")
+API_BASE_URL = os.environ.get("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME   = os.environ.get("MODEL_NAME",   "Qwen/Qwen2.5-72B-Instruct")
+TASK_NAME    = os.environ.get("ADAPTSHIELD_TASK", "direct-triage")
+ENV_BASE_URL = os.environ.get("ENV_BASE_URL",  "http://localhost:7860").rstrip("/")
+BENCHMARK    = "adaptshield"
+MAX_STEPS    = 25
+SUCCESS_THRESHOLD = 0.50
+USE_TOOLS_SETTING = os.environ.get("ADAPTSHIELD_USE_TOOLS", "auto").lower()
+# ── Mandatory stdout format ────────────────────────────────────────────────
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float,
+             done: bool, error: Optional[str]) -> None:
+    ev = error if error else "null"
+    print(
+        f"[STEP] step={step} action={action} "
+        f"reward={reward:.2f} done={str(done).lower()} error={ev}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float,
+            rewards: List[float]) -> None:
+    rs = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} "
+        f"score={score:.3f} rewards={rs}",
+        flush=True,
+    )
+# ── Environment calls ──────────────────────────────────────────────────────
+def env_post(path: str, data: Dict) -> Dict:
+    url  = f"{ENV_BASE_URL}{path}"
+    body = json.dumps(data).encode()
+    req  = urllib.request.Request(
+        url, data=body, headers={"Content-Type": "application/json"}
+    )
+    with urllib.request.urlopen(req, timeout=60) as r:
+        return json.loads(r.read())
+def obs_to_dict(obs: Any) -> Dict[str, Any]:
+    """Convert Pydantic observations from the persistent client to JSON dicts."""
+    if hasattr(obs, "model_dump"):
+        return obs.model_dump(mode="json")
+    return dict(obs)
+def build_env_action(parsed: Dict[str, Any], phase: int) -> AdaptShieldAction:
+    """Validate model output and fall back to a phase-correct safe action."""
+    try:
+        return AdaptShieldAction(**parsed)
+    except Exception:
+        if phase == 1:
+            return AdaptShieldAction(
+                threat_type="brute_force",
+                confidence=0.5,
+                target_node="auth_service",
+                recommended_action="monitor",
+                reasoning="validated fallback",
+            )
+        return AdaptShieldAction(
+            action="monitor",
+            target_node="auth_service",
+            reasoning="validated fallback",
+        )
+# ── Score computation — strictly (0.01, 0.99) ─────────────────────────────
+def safe_score(rewards: List[float], meta: Dict) -> float:
+    if "normalized_score" in meta:
+        raw = float(meta["normalized_score"])
+    elif rewards:
+        pos  = sum(r for r in rewards if r > 0.50)
+        maxp = len(rewards) * 0.99
+        raw  = pos / maxp if maxp > 0 else 0.50
+    else:
+        raw = 0.50
+    return max(0.01, min(0.99, raw))
+# ── System prompts ─────────────────────────────────────────────────────────
+PHASE1_SYS = textwrap.dedent("""
+    You are a Threat Analyst for a 4-node enterprise network.
+    Analyze the SIEM metrics and alerts. Identify the threat type.
+    Attack strategies: brute_force, lateral_movement, exfiltration, supply_chain, benign
+    If SOC tool evidence is provided, use it to update your belief before classifying.
+    Respond ONLY with valid JSON:
+    {"threat_type":"...","confidence":0.0,"target_node":"...","recommended_action":"...","reasoning":"..."}
+    Nodes: auth_service, payment_service, database, api_gateway
+    Actions: rate_limit, isolate, honeypot, patch, monitor
+""").strip()
+PHASE2_SYS = textwrap.dedent("""
+    You are a Tactical Executor. Act on the threat assessment provided.
+    You cannot see raw network data. Use the analyst assessment plus any SOC tool trace.
+    rate_limit=throttle traffic, isolate=take offline, honeypot=redirect attacker,
+    patch=fix vulnerability, monitor=observe only
+    Respond ONLY with valid JSON:
+    {"action":"...","target_node":"...","reasoning":"..."}
+    Nodes: auth_service, payment_service, database, api_gateway
+""").strip()
+def get_action(client: OpenAI, obs: Dict) -> Dict[str, Any]:
+    """Call LLM for current phase. Falls back gracefully on parse error."""
+    phase = obs.get("phase", 1)
+    if phase == 1:
+        sys_msg = PHASE1_SYS
+        user_msg = "\n".join([
+            "Network nodes:",
+            json.dumps(obs.get("network_nodes", {}), indent=2),
+            "\nAlerts:",
+            "\n".join(obs.get("active_alerts", [])),
+            "\nSOC tool evidence:",
+            summarize_tool_results(obs.get("tool_results", [])),
+            "\nHistory:",
+            json.dumps(obs.get("history", []), indent=2),
+            "\nClassify the threat:",
+        ])
+        fallback = {
+            "threat_type": "brute_force", "confidence": 0.5,
+            "target_node": "auth_service", "recommended_action": "monitor",
+            "reasoning": "fallback",
+        }
+    else:
+        sys_msg = PHASE2_SYS
+        metadata = obs.get("metadata", {}) if isinstance(obs.get("metadata", {}), dict) else {}
+        current_turn = int(obs.get("turn", 0) or 0)
+        tool_trace = [
+            row for row in metadata.get("tool_trace", [])
+            if int(row.get("turn", -1)) == current_turn
+        ]
+        user_msg = "\n".join([
+            "Threat assessment from analyst:",
+            json.dumps(obs.get("phase1_assessment", {}), indent=2),
+            "\nSOC tool trace for this turn:",
+            json.dumps(tool_trace, indent=2),
+            "\nChoose your defensive action:",
+        ])
+        fallback = {
+            "action": "monitor",
+            "target_node": "auth_service",
+            "reasoning": "fallback",
+        }
+    try:
+        resp = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system",  "content": sys_msg},
+                {"role": "user",    "content": user_msg},
+            ],
+            temperature=0.1,
+            max_tokens=300,
+            stream=False,
+        )
+        text = (resp.choices[0].message.content or "").strip()
+        # Strip markdown fences
+        if "```" in text:
+            for part in text.split("```"):
+                if "{" in part:
+                    text = part.strip().lstrip("json").strip()
+                    break
+        return json.loads(text)
+    except Exception as exc:
+        print(f"[DEBUG] phase={phase} parse error: {exc}", flush=True)
+        return fallback
+def should_use_tools(task_name: str) -> bool:
+    if USE_TOOLS_SETTING in ("1", "true", "yes", "on"):
+        return True
+    if USE_TOOLS_SETTING in ("0", "false", "no", "off"):
+        return False
+    return task_name == "polymorphic-zero-day"
+def run_soc_episode(client: OpenAI, use_tools: bool) -> tuple[List[float], int, Dict[str, Any]]:
+    rewards: List[float] = []
+    steps_taken = 0
+    reset = env_post("/soc/reset", {"task": TASK_NAME})
+    session_id = str(reset.get("session_id", ""))
+    obs = dict(reset.get("observation", {}))
+    done = bool(obs.get("done", False))
+    for step in range(1, MAX_STEPS + 1):
+        if done:
+            break
+        tool_results = investigate_http(
+            env_base_url=ENV_BASE_URL,
+            session_id=session_id,
+            obs=obs,
+            use_tools=use_tools,
+            thorough=True,
+        )
+        obs_for_model = attach_tool_results(obs, tool_results)
+        parsed = get_action(client, obs_for_model)
+        action_str = json.dumps(parsed, separators=(",", ":"))
+        if len(action_str) > 100:
+            action_str = action_str[:97] + "..."
+        try:
+            action = build_env_action(parsed, phase=int(obs.get("phase", 1)))
+            action_payload = action.model_dump(
+                mode="json",
+                exclude_none=True,
+                exclude_defaults=True,
+            )
+            result = env_post("/soc/step", {"session_id": session_id, "action": action_payload})
+            obs = dict(result.get("observation", {}))
+            reward = float(result.get("reward", obs.get("reward", 0.0)))
+            done = bool(result.get("done", obs.get("done", False)))
+            error = None
+        except Exception as exc:
+            reward = 0.0
+            done = True
+            error = str(exc)[:80]
+        rewards.append(reward)
+        steps_taken = step
+        log_step(step=step, action=action_str, reward=reward, done=done, error=error)
+        if done:
+            break
+    return rewards, steps_taken, obs
+def run_openenv_episode(client: OpenAI) -> tuple[List[float], int, Dict[str, Any]]:
+    rewards: List[float] = []
+    steps_taken = 0
+    obs: Dict[str, Any] = {}
+    env = AdaptshieldEnv(base_url=ENV_BASE_URL).sync()
+    with env:
+        result = env.reset(task_name=TASK_NAME)
+        obs = obs_to_dict(result.observation)
+        done = bool(result.done or obs.get("done", False))
+        for step in range(1, MAX_STEPS + 1):
+            if done:
+                break
+            parsed = get_action(client, obs)
+            action_str = json.dumps(parsed, separators=(",", ":"))
+            if len(action_str) > 100:
+                action_str = action_str[:97] + "..."
+            try:
+                action = build_env_action(parsed, phase=int(obs.get("phase", 1)))
+                sr = env.step(action)
+                obs = obs_to_dict(sr.observation)
+                reward = float(sr.reward if sr.reward is not None else obs.get("reward", 0.0))
+                done = bool(sr.done or obs.get("done", False))
+                error = None
+            except Exception as exc:
+                reward = 0.0
+                done = True
+                error = str(exc)[:80]
+            rewards.append(reward)
+            steps_taken = step
+            log_step(step=step, action=action_str, reward=reward,
+                     done=done, error=error)
+            if done:
+                break
+    return rewards, steps_taken, obs
+def main() -> None:
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    rewards:     List[float] = []
+    steps_taken: int         = 0
+    score:       float       = 0.50
+    success:     bool        = False
+    obs:         Dict        = {}
+    log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        if should_use_tools(TASK_NAME):
+            rewards, steps_taken, obs = run_soc_episode(client, use_tools=True)
+        else:
+            rewards, steps_taken, obs = run_openenv_episode(client)
+        score   = safe_score(rewards, obs.get("metadata", {}))
+        success = score >= SUCCESS_THRESHOLD
+    except Exception as exc:
+        print(f"[DEBUG] episode error: {exc}", flush=True)
+        score = 0.10
+    log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+if __name__ == "__main__":
+    main()

launch_hf_grpo_job.py ADDED Viewed

	@@ -0,0 +1,403 @@

+#!/usr/bin/env python3
+"""Launch AdaptShield GRPO refinement on Hugging Face Jobs."""
+from __future__ import annotations
+import argparse
+import shlex
+import subprocess
+import time
+from pathlib import Path
+from huggingface_hub import HfApi, get_token, run_job
+from huggingface_hub.errors import HfHubHTTPError, RepositoryNotFoundError
+from train import MODEL_CHOICES, TASKS
+REPO_ROOT = Path(__file__).resolve().parent
+DEFAULT_IMAGE = "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel"
+def _should_retry_hf(exc: Exception) -> bool:
+    response = getattr(exc, "response", None)
+    status_code = getattr(response, "status_code", None)
+    return status_code == 429 or (isinstance(status_code, int) and 500 <= status_code < 600)
+def _retry_hf_call(fn, *args, retries: int = 4, delay_s: float = 2.0, **kwargs):
+    last_exc = None
+    for attempt in range(retries):
+        try:
+            return fn(*args, **kwargs)
+        except Exception as exc:
+            last_exc = exc
+            if not _should_retry_hf(exc) or attempt == retries - 1:
+                raise
+            sleep_for = delay_s * (2 ** attempt)
+            print(f"Retrying HF API call after transient error ({exc}); sleeping {sleep_for:.1f}s")
+            time.sleep(sleep_for)
+    raise last_exc  # pragma: no cover
+def infer_repo_url() -> str:
+    result = subprocess.run(
+        ["git", "config", "--get", "remote.origin.url"],
+        cwd=REPO_ROOT,
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+    repo_url = result.stdout.strip()
+    if not repo_url:
+        raise RuntimeError("Could not infer git remote.origin.url")
+    return repo_url
+def repo_namespace(repo_id: str) -> str:
+    if "/" not in repo_id:
+        raise RuntimeError(f"Invalid repo id: {repo_id}. Expected namespace/name.")
+    return repo_id.split("/", 1)[0]
+def authenticated_username(api: HfApi) -> str | None:
+    try:
+        info = api.whoami(cache=True)
+    except Exception:
+        return None
+    if isinstance(info, dict):
+        for key in ("name", "fullname", "user"):
+            value = info.get(key)
+            if isinstance(value, str) and value:
+                return value
+    return None
+def validate_repo_access(
+    api: HfApi,
+    repo_id: str,
+    repo_type: str,
+    skip_create: bool,
+    allow_cross_namespace: bool,
+) -> None:
+    owner = repo_namespace(repo_id)
+    username = authenticated_username(api)
+    if username and owner != username:
+        message = (
+            f"Authenticated HF account appears to be '{username}', but target repo is under '{owner}'. "
+            "Use a repo under the same namespace or pass --allow-cross-namespace only if you are certain "
+            "this token has write access there."
+        )
+        if not allow_cross_namespace:
+            raise RuntimeError(message)
+        print(f"Warning: {message}")
+    if skip_create or repo_type == "model":
+        try:
+            _retry_hf_call(api.repo_info, repo_id=repo_id, repo_type=repo_type)
+        except RepositoryNotFoundError as exc:
+            raise RuntimeError(
+                f"Repo '{repo_id}' ({repo_type}) was not found or is not accessible with the current token."
+            ) from exc
+        except HfHubHTTPError as exc:
+            raise RuntimeError(f"Could not verify repo '{repo_id}' ({repo_type}): {exc}") from exc
+def validate_source_artifacts(
+    api: HfApi,
+    repo_id: str,
+    repo_type: str,
+    subdir: str,
+) -> None:
+    try:
+        files = set(_retry_hf_call(api.list_repo_files, repo_id=repo_id, repo_type=repo_type))
+    except Exception as exc:
+        raise RuntimeError(f"Could not list files for source repo '{repo_id}' ({repo_type}): {exc}") from exc
+    required = {
+        f"{subdir}/final/adapter_config.json",
+        f"{subdir}/sft_metrics.json",
+    }
+    missing = sorted(path for path in required if path not in files)
+    if missing:
+        raise RuntimeError(
+            "Source repo is missing required SFT artifacts: " + ", ".join(missing)
+        )
+def build_command(args: argparse.Namespace, repo_url: str, output_subdir: str) -> str:
+    output_path = f"/workspace/adaptshield/checkpoints/{output_subdir}"
+    return f"""
+set -euo pipefail
+export TRANSFORMERS_NO_ADVISORY_WARNINGS=1
+export PYTHONWARNINGS="ignore::FutureWarning"
+export HF_HUB_ENABLE_HF_TRANSFER=1
+export PIP_DISABLE_PIP_VERSION_CHECK=1
+python - <<'PY'
+import torch
+print(f"baseline torch={{torch.__version__}}, cuda={{torch.version.cuda}}")
+PY
+apt-get update -qq
+apt-get install -y -qq git
+if [ ! -d /workspace/adaptshield/.git ]; then
+  rm -rf /workspace/adaptshield
+  git clone --depth 1 {shlex.quote(repo_url)} /workspace/adaptshield
+fi
+cd /workspace/adaptshield
+python -m pip install --upgrade pip wheel setuptools
+# ninja+packaging let any source-built dep that DOES sneak in compile cleanly.
+pip install --upgrade ninja packaging
+pip install -e .
+pip uninstall -y torchaudio || true
+# Unsloth ships CUDA/torch-pinned extras (cu124 + torch 2.6.0 + xformers+triton wheels).
+# We deliberately use `cu124-torch260` (NOT the `ampere` variant) because:
+#   * cu124-torch260 pins torch 2.6 + xformers + triton via prebuilt wheels (no source builds).
+#   * cu124-ampere-torch260 ALSO tries to install flash-attn; if its prebuilt wheel URL doesn't
+#     match the image's python/cxx11abi exactly, pip falls through to source-building flash-attn
+#     (10-30 min, often fails with "ModuleNotFoundError: No module named 'torch'" because PEP 517
+#     build isolation hides torch). Unsloth's xformers/triton attention is plenty fast on L4.
+# `unsloth[cu124-torch260]` transitively installs `unsloth[huggingface]` which pins ALL of
+# transformers / trl / peft / accelerate / datasets / bitsandbytes / tokenizers / safetensors
+# to versions Unsloth has tested together. Do NOT add a `--no-deps` override on top of this —
+# previous attempts to do so downgraded peft/trl below what Unsloth requires.
+# --no-build-isolation lets any incidental source build (e.g. a stray dep) see system torch.
+pip install --upgrade --no-build-isolation "unsloth[cu124-torch260]"
+# Pin transformers to a single known-good version. Why this is necessary:
+#   Unsloth's pyproject allows transformers >=4.51.3 ... <=5.5.0. Pip prefers the latest, so it
+#   picks 5.5.0 by default. But transformers 4.x requires huggingface-hub<1.0 while 5.x requires
+#   hub>=1.5,<2.0 — and unsloth's pyproject does NOT bound hub. So a separate `pip install hub<1.0`
+#   silently breaks transformers 5.x (and a separate `pip install hub>=1.5` silently breaks 4.x).
+#   The only robust fix is to pin transformers and let pip select the matching hub in the SAME
+#   resolution step. We pick 4.57.6 because:
+#     * latest 4.x release on PyPI (so qwen3, etc. are supported);
+#     * not on Unsloth's blocklist (4.57.0/.4/.5 are; 4.57.6 is fine);
+#     * pulls huggingface-hub<1.0 automatically (no separate hub pin needed).
+pip install "transformers==4.57.6"
+# torchao comes preinstalled in the base image at a version that requires torch 2.7+
+# (it calls torch.utils._pytree.register_constant which doesn't exist in torch 2.6, so
+# `import torchao` crashes with AttributeError). transformers' quantizer registry imports
+# torchao unconditionally if it's installed (`is_torchao_available()` only checks package
+# metadata, not import-ability). With torchao GONE, that check returns False and transformers
+# skips torchao cleanly. We don't use torchao quantization anyway — we use bitsandbytes 4-bit.
+pip uninstall -y torchao || true
+# Optional helpers we use directly (matplotlib for plots, hf_transfer for fast download/upload).
+pip install --upgrade matplotlib hf_transfer
+# Hard guard: if torch was upgraded, bitsandbytes will fail at import; fail FAST with a clear log.
+python - <<'PY'
+import sys, torch
+if not torch.__version__.startswith("2.6."):
+    print(f"FATAL: torch was upgraded to {{torch.__version__}}; aborting before training.")
+    sys.exit(2)
+print(f"torch ok: {{torch.__version__}} cuda={{torch.version.cuda}}")
+PY
+# Smoke-test the actual modules we use. unsloth MUST import before transformers/trl
+# per its own warning. Importing transformers also triggers its OWN runtime version check on
+# huggingface_hub and tokenizers, AND eagerly imports any installed quantizer backend
+# (torchao, bnb, etc.) — so if anything is mis-pinned this line raises a clear error before
+# training starts.
+python - <<'PY'
+import sys, importlib.util
+# Pre-flight: torchao must be GONE (preinstalled version requires torch>=2.7 and crashes
+# `import torchao` on torch 2.6). If it leaked back in, fail with a precise message.
+if importlib.util.find_spec("torchao") is not None:
+    print("FATAL: torchao is installed; on torch 2.6 it crashes transformers at import. "
+          "Run `pip uninstall -y torchao` and rebuild.")
+    sys.exit(2)
+import unsloth  # noqa: F401  (must be first)
+import torch, transformers, trl, peft, datasets, bitsandbytes, huggingface_hub
+print(
+    f"unsloth={{unsloth.__version__}} transformers={{transformers.__version__}} "
+    f"trl={{trl.__version__}} peft={{peft.__version__}} bnb={{bitsandbytes.__version__}} "
+    f"hub={{huggingface_hub.__version__}} datasets={{datasets.__version__}}"
+)
+expected_transformers = "4.57.6"
+if transformers.__version__ != expected_transformers:
+    print(
+        f"FATAL: transformers={{transformers.__version__}} but pinned to {{expected_transformers}}. "
+        f"Pip resolution drifted; aborting before training."
+    )
+    sys.exit(2)
+import train, build_benchmark_table  # noqa: F401
+print("Dependency smoke check passed.")
+PY
+python - <<'PY'
+from huggingface_hub import snapshot_download
+from pathlib import Path
+repo_id = {args.source_repo!r}
+repo_type = {args.source_repo_type!r}
+subdir = {args.source_subdir!r}
+local_dir = snapshot_download(repo_id=repo_id, repo_type=repo_type)
+adapter_path = Path(local_dir) / subdir / "final"
+sft_metrics_path = Path(local_dir) / subdir / "sft_metrics.json"
+if not adapter_path.exists():
+    raise RuntimeError(f"SFT adapter path not found: {{adapter_path}}")
+if not sft_metrics_path.exists():
+    raise RuntimeError(f"SFT metrics path not found: {{sft_metrics_path}}")
+print(adapter_path)
+Path("/workspace/adaptshield/.grpo_adapter_path.txt").write_text(str(adapter_path), encoding="utf-8")
+Path("/workspace/adaptshield/.grpo_sft_metrics_path.txt").write_text(str(sft_metrics_path), encoding="utf-8")
+PY
+ADAPTER_PATH=$(cat /workspace/adaptshield/.grpo_adapter_path.txt)
+SFT_METRICS_PATH=$(cat /workspace/adaptshield/.grpo_sft_metrics_path.txt)
+python train.py \\
+  --trainer grpo \\
+  --task {args.task} \\
+  --curriculum \\
+  --use-tools \\
+  --model {args.model} \\
+  --model-path "$ADAPTER_PATH" \\
+  --lr {args.lr} \\
+  --prompt-bank-episodes {args.prompt_bank_episodes} \\
+  --max-steps {args.max_steps} \\
+  --prompt-bank-hard-multiplier {args.prompt_bank_hard_multiplier} \\
+  --prompt-bank-borderline-bonus {args.prompt_bank_borderline_bonus} \\
+  --grpo-epochs {args.grpo_epochs} \\
+  --num-generations {args.num_generations} \\
+  --per-device-batch-size {args.per_device_batch_size} \\
+  --gradient-accumulation-steps {args.gradient_accumulation_steps} \\
+  --save-every {args.save_every} \\
+  --eval-episodes {args.eval_episodes} \\
+  --train-world-split train \\
+  --heldout-world-split eval \\
+  --heldout-seed {args.heldout_seed} \\
+  --output {output_path} \\
+  --plot
+if ! python build_benchmark_table.py \\
+  --sft-metrics "$SFT_METRICS_PATH" \\
+  --grpo-metrics {output_path}/metrics.json \\
+  --output {output_path}/benchmark_table.md; then
+  echo "Benchmark table generation failed; continuing with core artifacts."
+fi
+python - <<'PY'
+import os
+import time
+from huggingface_hub import HfApi
+api = HfApi(token=os.environ["HF_TOKEN"])
+repo_id = os.environ["RUNS_REPO"]
+repo_type = os.environ["RUNS_REPO_TYPE"]
+output_dir = {output_path!r}
+subdir = {output_subdir!r}
+last_exc = None
+for attempt in range(4):
+    try:
+        api.upload_folder(
+            repo_id=repo_id,
+            repo_type=repo_type,
+            folder_path=output_dir,
+            path_in_repo=subdir,
+        )
+        last_exc = None
+        break
+    except Exception as exc:
+        last_exc = exc
+        response = getattr(exc, "response", None)
+        status_code = getattr(response, "status_code", None)
+        if status_code == 429 or (isinstance(status_code, int) and 500 <= status_code < 600):
+            sleep_for = 2 ** attempt
+            print(f"Transient upload error: {{exc}}; retrying in {{sleep_for}}s")
+            time.sleep(sleep_for)
+            continue
+        raise
+if last_exc is not None:
+    raise last_exc
+print("Uploaded artifacts to", repo_id)
+PY
+"""
+def default_output_subdir(task: str, model: str) -> str:
+    model_slug = model.replace(".", "_")
+    if task == "all":
+        return f"grpo_worldsplit_{model_slug}"
+    task_slug = task.replace("-", "_")
+    return f"grpo_{task_slug}_{model_slug}"
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Launch AdaptShield GRPO refinement on Hugging Face Jobs")
+    parser.add_argument("--runs-repo", required=True)
+    parser.add_argument("--runs-repo-type", default="model", choices=["dataset", "model"])
+    parser.add_argument("--skip-create", action="store_true")
+    parser.add_argument("--allow-cross-namespace", action="store_true")
+    parser.add_argument("--repo-url", default=None)
+    parser.add_argument("--source-repo", required=True, help="Repo containing SFT artifacts.")
+    parser.add_argument("--source-repo-type", default="model", choices=["dataset", "model"])
+    parser.add_argument("--source-subdir", default="sft_worldsplit_1_5b", help="Subdirectory containing the SFT output.")
+    parser.add_argument("--task", default="all", choices=TASKS + ["all"])
+    parser.add_argument("--model", default="1.5b", choices=list(MODEL_CHOICES))
+    parser.add_argument("--flavor", default="l4x1")
+    parser.add_argument("--timeout", default="6h")
+    parser.add_argument("--lr", type=float, default=1e-5)
+    parser.add_argument("--prompt-bank-episodes", type=int, default=120)
+    parser.add_argument("--max-steps", type=int, default=20)
+    parser.add_argument("--prompt-bank-hard-multiplier", type=int, default=3)
+    parser.add_argument("--prompt-bank-borderline-bonus", type=int, default=2)
+    parser.add_argument("--grpo-epochs", type=int, default=1)
+    parser.add_argument("--num-generations", type=int, default=2)
+    parser.add_argument("--per-device-batch-size", type=int, default=1)
+    parser.add_argument("--gradient-accumulation-steps", type=int, default=4)
+    parser.add_argument("--save-every", type=int, default=0)
+    parser.add_argument("--eval-episodes", type=int, default=2)
+    parser.add_argument("--heldout-seed", type=int, default=314)
+    parser.add_argument("--output-subdir", default="")
+    args = parser.parse_args()
+    if not args.output_subdir:
+        args.output_subdir = default_output_subdir(args.task, args.model)
+    token = get_token()
+    if not token:
+        raise RuntimeError("No Hugging Face token found. Run `hf auth login` first.")
+    repo_url = args.repo_url or infer_repo_url()
+    api = HfApi(token=token)
+    validate_repo_access(api, args.runs_repo, args.runs_repo_type, args.skip_create, args.allow_cross_namespace)
+    validate_repo_access(api, args.source_repo, args.source_repo_type, True, args.allow_cross_namespace)
+    validate_source_artifacts(api, args.source_repo, args.source_repo_type, args.source_subdir)
+    if not args.skip_create:
+        _retry_hf_call(api.create_repo, repo_id=args.runs_repo, repo_type=args.runs_repo_type, private=True, exist_ok=True)
+    command = build_command(args=args, repo_url=repo_url, output_subdir=args.output_subdir)
+    job = _retry_hf_call(
+        run_job,
+        image=DEFAULT_IMAGE,
+        command=["bash", "-lc", command],
+        flavor=args.flavor,
+        timeout=args.timeout,
+        namespace=repo_namespace(args.runs_repo),
+        env={
+            "RUNS_REPO": args.runs_repo,
+            "RUNS_REPO_TYPE": args.runs_repo_type,
+        },
+        secrets={"HF_TOKEN": token},
+    )
+    print("Job launched successfully.")
+    print(f"Job ID: {job.id}")
+    print(f"Job URL: {job.url}")
+    print(f"Artifacts repo: {args.runs_repo}")
+    print(f"Artifacts path: {args.output_subdir}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

launch_hf_sft_job.py ADDED Viewed

	@@ -0,0 +1,363 @@

+#!/usr/bin/env python3
+"""Launch an AdaptShield SFT training run on Hugging Face Jobs."""
+from __future__ import annotations
+import argparse
+import shlex
+import subprocess
+import time
+from pathlib import Path
+from huggingface_hub import HfApi, get_token, run_job
+from huggingface_hub.errors import HfHubHTTPError, RepositoryNotFoundError
+from train import MODEL_CHOICES
+REPO_ROOT = Path(__file__).resolve().parent
+DEFAULT_IMAGE = "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel"
+def _should_retry_hf(exc: Exception) -> bool:
+    response = getattr(exc, "response", None)
+    status_code = getattr(response, "status_code", None)
+    return status_code == 429 or (isinstance(status_code, int) and 500 <= status_code < 600)
+def _retry_hf_call(fn, *args, retries: int = 4, delay_s: float = 2.0, **kwargs):
+    last_exc = None
+    for attempt in range(retries):
+        try:
+            return fn(*args, **kwargs)
+        except Exception as exc:
+            last_exc = exc
+            if not _should_retry_hf(exc) or attempt == retries - 1:
+                raise
+            sleep_for = delay_s * (2 ** attempt)
+            print(f"Retrying HF API call after transient error ({exc}); sleeping {sleep_for:.1f}s")
+            time.sleep(sleep_for)
+    raise last_exc  # pragma: no cover
+def repo_namespace(repo_id: str) -> str:
+    if "/" not in repo_id:
+        raise RuntimeError(f"Invalid repo id: {repo_id}. Expected namespace/name.")
+    return repo_id.split("/", 1)[0]
+def authenticated_username(api: HfApi) -> str | None:
+    try:
+        info = api.whoami(cache=True)
+    except Exception:
+        return None
+    if isinstance(info, dict):
+        for key in ("name", "fullname", "user"):
+            value = info.get(key)
+            if isinstance(value, str) and value:
+                return value
+    return None
+def validate_artifact_repo(
+    api: HfApi,
+    repo_id: str,
+    repo_type: str,
+    skip_create: bool,
+    allow_cross_namespace: bool,
+) -> None:
+    owner = repo_namespace(repo_id)
+    username = authenticated_username(api)
+    if username and owner != username:
+        message = (
+            f"Authenticated HF account appears to be '{username}', but artifacts repo is under '{owner}'. "
+            "Use a repo under the same namespace or pass --allow-cross-namespace only if you are certain "
+            "this token has write access there."
+        )
+        if not allow_cross_namespace:
+            raise RuntimeError(message)
+        print(f"Warning: {message}")
+    if skip_create:
+        try:
+            _retry_hf_call(api.repo_info, repo_id=repo_id, repo_type=repo_type)
+        except RepositoryNotFoundError as exc:
+            raise RuntimeError(
+                f"Artifacts repo '{repo_id}' ({repo_type}) was not found or is not accessible "
+                "with the current token. Create it manually under the correct namespace or use "
+                "a repo you definitely own before launching the job."
+            ) from exc
+        except HfHubHTTPError as exc:
+            raise RuntimeError(
+                f"Could not verify artifacts repo '{repo_id}' ({repo_type}) before launch: {exc}"
+            ) from exc
+def infer_repo_url() -> str:
+    result = subprocess.run(
+        ["git", "config", "--get", "remote.origin.url"],
+        cwd=REPO_ROOT,
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+    repo_url = result.stdout.strip()
+    if not repo_url:
+        raise RuntimeError("Could not infer git remote.origin.url")
+    return repo_url
+def build_command(args: argparse.Namespace, repo_url: str, output_subdir: str) -> str:
+    dataset_path = "/workspace/adaptshield/data/adaptshield_sft_worldsplit.jsonl"
+    output_path = f"/workspace/adaptshield/checkpoints/{output_subdir}"
+    summary_path = "/workspace/adaptshield/data/adaptshield_sft_worldsplit.summary.json"
+    extra_train_flags = "--skip-reward-curve" if args.skip_reward_curve else ""
+    return f"""
+set -euo pipefail
+export TRANSFORMERS_NO_ADVISORY_WARNINGS=1
+export PYTHONWARNINGS="ignore::FutureWarning"
+export HF_HUB_ENABLE_HF_TRANSFER=1
+export PIP_DISABLE_PIP_VERSION_CHECK=1
+python - <<'PY'
+import torch
+print(f"baseline torch={{torch.__version__}}, cuda={{torch.version.cuda}}")
+PY
+apt-get update -qq
+apt-get install -y -qq git
+if [ ! -d /workspace/adaptshield/.git ]; then
+  rm -rf /workspace/adaptshield
+  git clone --depth 1 {shlex.quote(repo_url)} /workspace/adaptshield
+fi
+cd /workspace/adaptshield
+python -m pip install --upgrade pip wheel setuptools
+# ninja+packaging let any source-built dep that DOES sneak in compile cleanly.
+pip install --upgrade ninja packaging
+pip install -e .
+pip uninstall -y torchaudio || true
+# Unsloth ships CUDA/torch-pinned extras (cu124 + torch 2.6.0 + xformers+triton wheels).
+# We deliberately use `cu124-torch260` (NOT the `ampere` variant) because:
+#   * cu124-torch260 pins torch 2.6 + xformers + triton via prebuilt wheels (no source builds).
+#   * cu124-ampere-torch260 ALSO tries to install flash-attn; if its prebuilt wheel URL doesn't
+#     match the image's python/cxx11abi exactly, pip falls through to source-building flash-attn
+#     (10-30 min, often fails with "ModuleNotFoundError: No module named 'torch'" because PEP 517
+#     build isolation hides torch). Unsloth's xformers/triton attention is plenty fast on L4.
+# `unsloth[cu124-torch260]` transitively installs `unsloth[huggingface]` which pins ALL of
+# transformers / trl / peft / accelerate / datasets / bitsandbytes / tokenizers / safetensors
+# to versions Unsloth has tested together. Do NOT add a `--no-deps` override on top of this —
+# previous attempts to do so downgraded peft/trl below what Unsloth requires.
+# --no-build-isolation lets any incidental source build (e.g. a stray dep) see system torch.
+pip install --upgrade --no-build-isolation "unsloth[cu124-torch260]"
+# Pin transformers to a single known-good version. Why this is necessary:
+#   Unsloth's pyproject allows transformers >=4.51.3 ... <=5.5.0. Pip prefers the latest, so it
+#   picks 5.5.0 by default. But transformers 4.x requires huggingface-hub<1.0 while 5.x requires
+#   hub>=1.5,<2.0 — and unsloth's pyproject does NOT bound hub. So a separate `pip install hub<1.0`
+#   silently breaks transformers 5.x (and a separate `pip install hub>=1.5` silently breaks 4.x).
+#   The only robust fix is to pin transformers and let pip select the matching hub in the SAME
+#   resolution step. We pick 4.57.6 because:
+#     * latest 4.x release on PyPI (so qwen3, etc. are supported);
+#     * not on Unsloth's blocklist (4.57.0/.4/.5 are; 4.57.6 is fine);
+#     * pulls huggingface-hub<1.0 automatically (no separate hub pin needed).
+pip install "transformers==4.57.6"
+# torchao comes preinstalled in the base image at a version that requires torch 2.7+
+# (it calls torch.utils._pytree.register_constant which doesn't exist in torch 2.6, so
+# `import torchao` crashes with AttributeError). transformers' quantizer registry imports
+# torchao unconditionally if it's installed (`is_torchao_available()` only checks package
+# metadata, not import-ability). With torchao GONE, that check returns False and transformers
+# skips torchao cleanly. We don't use torchao quantization anyway — we use bitsandbytes 4-bit.
+pip uninstall -y torchao || true
+# Optional helpers we use directly (matplotlib for plots, hf_transfer for fast download/upload).
+pip install --upgrade matplotlib hf_transfer
+# Hard guard: if torch was upgraded, bitsandbytes will fail at import; fail FAST with a clear log.
+python - <<'PY'
+import sys, torch
+if not torch.__version__.startswith("2.6."):
+    print(f"FATAL: torch was upgraded to {{torch.__version__}}; aborting before training.")
+    sys.exit(2)
+print(f"torch ok: {{torch.__version__}} cuda={{torch.version.cuda}}")
+PY
+# Smoke-test the actual modules we use. unsloth MUST import before transformers/trl
+# per its own warning. Importing transformers also triggers its OWN runtime version check on
+# huggingface_hub and tokenizers, AND eagerly imports any installed quantizer backend
+# (torchao, bnb, etc.) — so if anything is mis-pinned this line raises a clear error before
+# training starts.
+python - <<'PY'
+import sys, importlib.util
+# Pre-flight: torchao must be GONE (preinstalled version requires torch>=2.7 and crashes
+# `import torchao` on torch 2.6). If it leaked back in, fail with a precise message.
+if importlib.util.find_spec("torchao") is not None:
+    print("FATAL: torchao is installed; on torch 2.6 it crashes transformers at import. "
+          "Run `pip uninstall -y torchao` and rebuild.")
+    sys.exit(2)
+import unsloth  # noqa: F401  (must be first)
+import torch, transformers, trl, peft, datasets, bitsandbytes, huggingface_hub
+print(
+    f"unsloth={{unsloth.__version__}} transformers={{transformers.__version__}} "
+    f"trl={{trl.__version__}} peft={{peft.__version__}} bnb={{bitsandbytes.__version__}} "
+    f"hub={{huggingface_hub.__version__}} datasets={{datasets.__version__}}"
+)
+expected_transformers = "4.57.6"
+if transformers.__version__ != expected_transformers:
+    print(
+        f"FATAL: transformers={{transformers.__version__}} but pinned to {{expected_transformers}}. "
+        f"Pip resolution drifted; aborting before training."
+    )
+    sys.exit(2)
+import train, train_sft, generate_sft_data  # noqa: F401
+print("Dependency smoke check passed.")
+PY
+python generate_sft_data.py \\
+  --task all \\
+  --curriculum \\
+  --use-tools \\
+  --episodes {args.dataset_episodes} \\
+  --max-steps {args.max_steps} \\
+  --seed {args.seed} \\
+  --world-split train \\
+  --output {dataset_path}
+python train_sft.py \\
+  --dataset {dataset_path} \\
+  --model {args.model} \\
+  --epochs {args.epochs} \\
+  --lr {args.lr} \\
+  --per-device-batch-size {args.per_device_batch_size} \\
+  --gradient-accumulation-steps {args.gradient_accumulation_steps} \\
+  --save-steps {args.save_steps} \\
+  --heldout-seed {args.heldout_seed} \\
+  --train-world-split train \\
+  --heldout-world-split eval \\
+  --eval-task all \\
+  --eval-episodes {args.eval_episodes} \\
+  --use-tools \\
+  --output {output_path} \\
+  {extra_train_flags}
+python - <<'PY'
+import os
+import time
+from huggingface_hub import HfApi
+api = HfApi(token=os.environ["HF_TOKEN"])
+repo_id = os.environ["RUNS_REPO"]
+repo_type = os.environ["RUNS_REPO_TYPE"]
+output_dir = {output_path!r}
+summary_path = {summary_path!r}
+subdir = {output_subdir!r}
+last_exc = None
+for attempt in range(4):
+    try:
+        api.upload_folder(
+            repo_id=repo_id,
+            repo_type=repo_type,
+            folder_path=output_dir,
+            path_in_repo=subdir,
+        )
+        api.upload_file(
+            repo_id=repo_id,
+            repo_type=repo_type,
+            path_or_fileobj=summary_path,
+            path_in_repo=f"{{subdir}}/adaptshield_sft_worldsplit.summary.json",
+        )
+        last_exc = None
+        break
+    except Exception as exc:
+        last_exc = exc
+        response = getattr(exc, "response", None)
+        status_code = getattr(response, "status_code", None)
+        if status_code == 429 or (isinstance(status_code, int) and 500 <= status_code < 600):
+            sleep_for = 2 ** attempt
+            print(f"Transient upload error: {{exc}}; retrying in {{sleep_for}}s")
+            time.sleep(sleep_for)
+            continue
+        raise
+if last_exc is not None:
+    raise last_exc
+print("Uploaded artifacts to", repo_id)
+PY
+"""
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Launch AdaptShield SFT training on Hugging Face Jobs")
+    parser.add_argument("--runs-repo", required=True, help="Artifact repo to upload outputs to, e.g. username/adaptshield-runs")
+    parser.add_argument("--runs-repo-type", default="dataset", choices=["dataset", "model"], help="Repo type used to store training artifacts.")
+    parser.add_argument("--skip-create", action="store_true", help="Skip repo creation and assume the artifacts repo already exists.")
+    parser.add_argument("--allow-cross-namespace", action="store_true", help="Allow uploads to a repo owned by a different namespace than the authenticated account.")
+    parser.add_argument("--repo-url", default=None, help="Git repo URL to clone inside the HF Job. Defaults to remote.origin.url")
+    parser.add_argument("--model", default="1.5b", choices=list(MODEL_CHOICES))
+    parser.add_argument("--flavor", default="l4x1", help="HF Jobs hardware flavor, e.g. l4x1, a10g-small, a100-large")
+    parser.add_argument("--timeout", default="6h", help="HF Jobs timeout, e.g. 6h")
+    parser.add_argument("--dataset-episodes", type=int, default=240)
+    parser.add_argument("--max-steps", type=int, default=20)
+    parser.add_argument("--epochs", type=float, default=1.0)
+    parser.add_argument("--lr", type=float, default=2e-4)
+    parser.add_argument("--per-device-batch-size", type=int, default=2)
+    parser.add_argument("--gradient-accumulation-steps", type=int, default=4)
+    parser.add_argument("--save-steps", type=int, default=40)
+    parser.add_argument("--eval-episodes", type=int, default=2)
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--heldout-seed", type=int, default=314)
+    parser.add_argument(
+        "--skip-reward-curve",
+        action="store_true",
+        help="Skip the per-checkpoint held-out reward sweep inside train_sft.py.",
+    )
+    parser.add_argument("--output-subdir", default=None, help="Optional output folder name in the runs dataset repo")
+    args = parser.parse_args()
+    token = get_token()
+    if not token:
+        raise RuntimeError("No Hugging Face token found. Run `hf auth login` first.")
+    repo_url = args.repo_url or infer_repo_url()
+    output_subdir = args.output_subdir or f"sft_worldsplit_{args.model.replace('.', '_')}"
+    api = HfApi(token=token)
+    validate_artifact_repo(
+        api,
+        args.runs_repo,
+        args.runs_repo_type,
+        args.skip_create,
+        args.allow_cross_namespace,
+    )
+    if not args.skip_create:
+        _retry_hf_call(api.create_repo, repo_id=args.runs_repo, repo_type=args.runs_repo_type, private=True, exist_ok=True)
+    command = build_command(args=args, repo_url=repo_url, output_subdir=output_subdir)
+    job = _retry_hf_call(
+        run_job,
+        image=DEFAULT_IMAGE,
+        command=["bash", "-lc", command],
+        flavor=args.flavor,
+        timeout=args.timeout,
+        namespace=repo_namespace(args.runs_repo),
+        env={
+            "RUNS_REPO": args.runs_repo,
+            "RUNS_REPO_TYPE": args.runs_repo_type,
+        },
+        secrets={"HF_TOKEN": token},
+    )
+    print("Job launched successfully.")
+    print(f"Job ID: {job.id}")
+    print(f"Job URL: {job.url}")
+    print(f"Artifacts repo: {args.runs_repo}")
+    print(f"Artifacts path: {output_subdir}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

models.py ADDED Viewed

	@@ -0,0 +1,243 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# AdaptShield — Pydantic Data Models
+#
+# CRITICAL DESIGN DECISION: Phase1Action and Phase2Action are SEPARATE classes.
+# A single combined class with optional fields causes 500 errors when the
+# evaluator sends a Phase 2 payload and Pydantic tries to validate Phase 1 fields.
+from enum import Enum
+from typing import Any, Dict, List, Optional
+from openenv.core.env_server.types import Action, Observation
+from pydantic import Field, model_validator
+class DefenseAction(str, Enum):
+    """
+    Strict action space for the Tactical Executor (Phase 2).
+    Using Enum prevents LLM hallucination from reaching the grader.
+    """
+    RATE_LIMIT = "rate_limit"  # Light — throttles traffic, keeps service online
+    ISOLATE    = "isolate"     # Heavy — takes node offline, stops spread
+    HONEYPOT   = "honeypot"    # Strategic — redirects attacker to decoy
+    PATCH      = "patch"       # Targeted — fixes supply chain vulnerability
+    MONITOR    = "monitor"     # Passive — gather info, risk escalation
+class ThreatType(str, Enum):
+    """Known attack strategies the Threat Analyst can classify."""
+    BRUTE_FORCE      = "brute_force"
+    LATERAL_MOVEMENT = "lateral_movement"
+    EXFILTRATION     = "exfiltration"
+    SUPPLY_CHAIN     = "supply_chain"
+    BENIGN           = "benign"
+class Phase1Action(Action):
+    """
+    Threat Analyst output — pure reasoning, no defensive action.
+    The agent reads raw network state and produces a structured
+    threat assessment. This is graded independently for classification
+    accuracy before Phase 2 acts on it.
+    """
+    threat_type:        str             = Field(
+        ...,
+        description="Identified attack strategy: brute_force, lateral_movement, "
+                    "exfiltration, supply_chain, or benign",
+    )
+    confidence:         float           = Field(
+        ...,
+        ge=0.0,
+        le=1.0,
+        description="Confidence in the threat classification (0.0 to 1.0)",
+    )
+    target_node:        str             = Field(
+        ...,
+        description="Primary affected node: auth_service, payment_service, "
+                    "database, or api_gateway",
+    )
+    recommended_action: DefenseAction   = Field(
+        ...,
+        description="Recommended defense action for Phase 2 to execute",
+    )
+    reasoning:          Optional[str]   = Field(
+        default=None,
+        description="Chain of thought. Not graded. Helps training stability.",
+    )
+class Phase2Action(Action):
+    """
+    Tactical Executor output — defensive action based ONLY on Phase 1 assessment.
+    Phase 2 agent is deliberately blind to raw network state.
+    It receives only the Phase 1 threat assessment and must act on it.
+    """
+    action:      DefenseAction  = Field(
+        ...,
+        description="Defense action to execute",
+    )
+    target_node: str            = Field(
+        ...,
+        description="Node to apply action to: auth_service, payment_service, "
+                    "database, or api_gateway",
+    )
+    reasoning:   Optional[str]  = Field(
+        default=None,
+        description="Chain of thought. Not graded.",
+    )
+class AdaptShieldAction(Action):
+    """
+    Unified action model accepted by the OpenEnv HTTP server.
+    The environment alternates between two phases, so the transport layer must
+    accept either a Threat Analyst payload or a Tactical Executor payload.
+    Validation keeps those shapes distinct while still fitting the single
+    action model expected by `create_app`.
+    """
+    threat_type: Optional[str] = Field(
+        default=None,
+        description="Phase 1 only: identified attack strategy",
+    )
+    confidence: Optional[float] = Field(
+        default=None,
+        ge=0.0,
+        le=1.0,
+        description="Phase 1 only: confidence in the threat classification",
+    )
+    target_node: Optional[str] = Field(
+        default=None,
+        description="Target node for either phase",
+    )
+    recommended_action: Optional[DefenseAction] = Field(
+        default=None,
+        description="Phase 1 only: recommended follow-up action",
+    )
+    action: Optional[DefenseAction] = Field(
+        default=None,
+        description="Phase 2 only: defensive action to execute",
+    )
+    reasoning: Optional[str] = Field(
+        default=None,
+        description="Optional one-sentence rationale",
+    )
+    @model_validator(mode="after")
+    def validate_phase_shape(self) -> "AdaptShieldAction":
+        phase1_present = any(
+            value is not None
+            for value in (self.threat_type, self.confidence, self.recommended_action)
+        )
+        phase2_present = self.action is not None
+        if phase1_present and phase2_present:
+            raise ValueError(
+                "Action payload must be either Phase 1 or Phase 2, not both."
+            )
+        if not phase1_present and not phase2_present:
+            raise ValueError(
+                "Action payload must contain Phase 1 fields or a Phase 2 action."
+            )
+        if phase1_present:
+            missing = [
+                field_name
+                for field_name, value in (
+                    ("threat_type", self.threat_type),
+                    ("confidence", self.confidence),
+                    ("target_node", self.target_node),
+                    ("recommended_action", self.recommended_action),
+                )
+                if value is None
+            ]
+        else:
+            missing = [
+                field_name
+                for field_name, value in (
+                    ("action", self.action),
+                    ("target_node", self.target_node),
+                )
+                if value is None
+            ]
+        if missing:
+            raise ValueError(
+                f"Missing required fields for this phase: {', '.join(missing)}"
+            )
+        return self
+class AdaptShieldObservation(Observation):
+    """
+    Observation returned after each step.
+    Phase 1 observation: contains full network state (network_nodes, active_alerts).
+    Phase 2 observation: network_nodes and active_alerts are EMPTY.
+                         phase1_assessment contains the Phase 1 output.
+    Episode number is NEVER included — agent must rely on signals only.
+    """
+    # Identity
+    scenario_id:    str             = Field(default="")
+    task_name:      str             = Field(default="")
+    phase:          int             = Field(default=1,
+        description="1 = Threat Analyst turn, 2 = Tactical Executor turn")
+    turn:           int             = Field(default=0)
+    max_turns:      int             = Field(default=5)
+    # Network state — populated in Phase 1, EMPTY in Phase 2
+    network_nodes:  Dict[str, Any]  = Field(default_factory=dict)
+    active_alerts:  List[str]       = Field(default_factory=list)
+    attack_stage:   str             = Field(
+        default="none",
+        description="Current attack progression stage: recon, exploit, exfiltration, none",
+    )
+    # Rolling history of last 3 turns
+    history:        List[Dict[str, str]] = Field(default_factory=list)
+    # Phase 2 only — Phase 1 output passed to executor
+    phase1_assessment: Optional[Dict[str, Any]] = Field(
+        default=None,
+        description="Populated only in Phase 2. Phase 2 agent sees ONLY this.",
+    )
+    # Context
+    system_context:    str          = Field(default="")
+    available_actions: List[str]    = Field(default_factory=list)
+    # Feedback
+    last_action_result: Optional[str] = Field(default=None)
+    reward:             float          = Field(default=0.0)
+    done:               bool           = Field(default=False)
+    metadata:           Dict[str, Any] = Field(default_factory=dict)
+    def model_dump(self, *args: Any, **kwargs: Any) -> Dict[str, Any]:
+        """
+        Keep metadata in OpenEnv HTTP observation payloads.
+        OpenEnv's serializer excludes metadata from the nested observation by
+        default. AdaptShield exposes normalized_score there, so we remove only
+        that exclusion while preserving the serializer's reward/done handling.
+        """
+        exclude = kwargs.get("exclude")
+        if isinstance(exclude, set) and "metadata" in exclude:
+            kwargs["exclude"] = set(exclude) - {"metadata"}
+        elif isinstance(exclude, dict) and "metadata" in exclude:
+            kwargs["exclude"] = {
+                key: value for key, value in exclude.items() if key != "metadata"
+            }
+        return super().model_dump(*args, **kwargs)
+# Backward-compatible aliases for earlier package names.
+AdaptshieldAction = AdaptShieldAction
+AdaptshieldObservation = AdaptShieldObservation

openenv.yaml ADDED Viewed

	@@ -0,0 +1,26 @@

+spec_version: 1
+name: adaptshield
+type: space
+runtime: fastapi
+app: server.app:app
+port: 7860
+description: >
+  AdaptShield is a two-phase agentic cybersecurity environment that trains
+  LLMs to adapt to polymorphic adversarial strategies. An agent acts as
+  Threat Analyst (Phase 1) then Tactical Executor (Phase 2), defending a
+  simulated 4-node enterprise network against a scripted attacker that progresses
+  through attack stages and shifts strategy mid-episode. Grading is fully
+  deterministic via Python strategy matching. No LLM-as-judge components.
+tasks:
+  - name: direct-triage
+    difficulty: easy
+    description: Single fixed strategy. Agent learns baseline threat response.
+    max_steps: 5
+  - name: dual-pivot
+    difficulty: medium
+    description: Two strategies alternating every 20 episodes. Detect and adapt.
+    max_steps: 6
+  - name: polymorphic-zero-day
+    difficulty: hard
+    description: All four strategies with mid-episode shift and false-positive noise.
+    max_steps: 8

plot_sft_checkpoint_curve.py ADDED Viewed

	@@ -0,0 +1,93 @@

+#!/usr/bin/env python3
+"""Plot an SFT checkpoint curve with an optional honest baseline start point."""
+from __future__ import annotations
+import argparse
+import json
+from pathlib import Path
+from typing import Any
+def load_json(path: Path) -> dict[str, Any]:
+    return json.loads(path.read_text(encoding="utf-8"))
+def mean_baseline(benchmark: dict[str, Any], key: str) -> float:
+    values = benchmark.get(key, {})
+    numeric = [float(value) for value in values.values() if value is not None]
+    if not numeric:
+        raise ValueError(f"No numeric values found under benchmark key '{key}'")
+    return sum(numeric) / len(numeric)
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Plot SFT checkpoint learning curve with optional baseline point.")
+    parser.add_argument("--metrics", required=True, help="Path to sft_metrics.json")
+    parser.add_argument("--output", required=True, help="Where to write the PNG")
+    parser.add_argument(
+        "--baseline-json",
+        default="",
+        help="Optional benchmark_table.json path used to prepend a real baseline point.",
+    )
+    parser.add_argument(
+        "--baseline-key",
+        default="tool_baseline",
+        choices=["tool_baseline", "no_tool_baseline"],
+        help="Which benchmark JSON field to average for the prepended baseline point.",
+    )
+    parser.add_argument(
+        "--baseline-label",
+        default="baseline",
+        help="X-axis label for the prepended baseline point.",
+    )
+    return parser.parse_args()
+def main() -> int:
+    args = parse_args()
+    metrics = load_json(Path(args.metrics))
+    rows = metrics.get("reward_curve_rows", []) or []
+    if not rows:
+        raise SystemExit("No reward_curve_rows found in the provided SFT metrics file.")
+    labels = [str(row["checkpoint"]) for row in rows]
+    train_scores = [float(row["in_distribution_score"]) for row in rows]
+    heldout_scores = [float(row["heldout_score"]) for row in rows]
+    if args.baseline_json:
+        benchmark = load_json(Path(args.baseline_json))
+        baseline_value = mean_baseline(benchmark, args.baseline_key)
+        labels = [args.baseline_label] + labels
+        train_scores = [baseline_value] + train_scores
+        heldout_scores = [baseline_value] + heldout_scores
+    try:
+        import matplotlib
+        matplotlib.use("Agg")
+        import matplotlib.pyplot as plt
+    except ImportError as exc:
+        raise SystemExit(f"matplotlib is required to plot this curve: {exc}") from exc
+    output_path = Path(args.output)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    plt.figure(figsize=(11, 5))
+    plt.plot(labels, train_scores, marker="o", linewidth=2, color="#174c7a", label="train family")
+    plt.plot(labels, heldout_scores, marker="s", linewidth=2, color="#6d4acb", label="held-out family")
+    plt.title("Janus SFT Checkpoint Learning Curve")
+    plt.xlabel("Checkpoint")
+    plt.ylabel("normalized_score")
+    plt.ylim(0.0, 1.0)
+    plt.grid(alpha=0.25)
+    plt.legend()
+    plt.xticks(rotation=30, ha="right")
+    plt.tight_layout()
+    plt.savefig(output_path, dpi=160)
+    print(output_path)
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

plot_training.py ADDED Viewed

	@@ -0,0 +1,106 @@

+#!/usr/bin/env python3
+"""Plot AdaptShield training CSV or metrics JSON."""
+from __future__ import annotations
+import argparse
+import csv
+import json
+from pathlib import Path
+from typing import List, Tuple
+def load_scores(path: Path) -> Tuple[List[int], List[float], str, List[str]]:
+    if path.suffix == ".json":
+        data = json.loads(path.read_text())
+        rows = data.get("rows", []) or data.get("evaluation_rows", [])
+        episodes = [int(row["episode"]) for row in rows]
+        scores = [float(row["score"]) for row in rows]
+        stages = [str(row.get("stage", row.get("task", ""))) for row in rows]
+        return episodes, scores, str(data.get("model", "adaptshield")), stages
+    with path.open() as handle:
+        rows = list(csv.DictReader(handle))
+    episodes = [int(row["episode"]) for row in rows]
+    scores = [float(row["score"]) for row in rows]
+    stages = [str(row.get("stage", row.get("task", ""))) for row in rows]
+    return episodes, scores, "adaptshield-smoke", stages
+def moving_average(values: List[float], window: int) -> List[float]:
+    smoothed = []
+    for index in range(len(values)):
+        start = max(0, index - window + 1)
+        chunk = values[start:index + 1]
+        smoothed.append(sum(chunk) / len(chunk))
+    return smoothed
+def plot(path: Path, output: Path) -> None:
+    episodes, scores, label, stages = load_scores(path)
+    if not scores:
+        raise SystemExit("No scores found to plot.")
+    try:
+        import matplotlib
+        matplotlib.use("Agg")
+        import matplotlib.pyplot as plt
+    except ImportError:
+        first = sum(scores[:max(1, len(scores) // 5)]) / max(1, len(scores) // 5)
+        last = sum(scores[-max(1, len(scores) // 5):]) / max(1, len(scores) // 5)
+        print("matplotlib is not installed; skipping PNG generation.")
+        print(f"Episodes: {len(scores)}")
+        print(f"First-window avg: {first:.3f}")
+        print(f"Last-window avg:  {last:.3f}")
+        print(f"Delta:            {last - first:+.3f}")
+        return
+    window = max(1, min(10, len(scores) // 5))
+    smoothed = moving_average(scores, window)
+    output.parent.mkdir(parents=True, exist_ok=True)
+    fig, ax = plt.subplots(figsize=(10, 5))
+    ax.plot(episodes, scores, color="#6b8fbf", alpha=0.35, label="raw score")
+    ax.plot(episodes, smoothed, color="#123c69", linewidth=2.5, label=f"{window}-episode avg")
+    for episode, stage in stage_boundaries(episodes, stages):
+        ax.axvline(episode, color="#c44e52", linestyle="--", alpha=0.45)
+        ax.text(episode, 0.04, stage.replace("curriculum:", ""), rotation=90, fontsize=8, color="#7a1f24")
+    ax.set_title(f"AdaptShield Training Curve ({label})")
+    ax.set_xlabel("Episode")
+    ax.set_ylabel("normalized_score")
+    ax.set_ylim(0.0, 1.0)
+    ax.grid(alpha=0.25)
+    ax.legend()
+    fig.tight_layout()
+    fig.savefig(output, dpi=160)
+    print(f"Saved plot: {output}")
+def stage_boundaries(episodes: List[int], stages: List[str]) -> List[Tuple[int, str]]:
+    if not stages:
+        return []
+    boundaries = []
+    previous = stages[0]
+    for episode, stage in zip(episodes, stages):
+        if stage != previous:
+            boundaries.append((episode, stage))
+            previous = stage
+    return boundaries
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Plot AdaptShield training output.")
+    parser.add_argument("--input", default="training_runs/train_smoke.csv")
+    parser.add_argument("--output", default="training_runs/reward_curve.png")
+    return parser.parse_args()
+def main() -> int:
+    args = parse_args()
+    plot(Path(args.input), Path(args.output))
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

pyproject.toml ADDED Viewed

	@@ -0,0 +1,31 @@

+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-adaptshield"
+version = "0.1.0"
+description = "AdaptShield: Two-Phase Adaptive Cybersecurity RL Environment"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv-core[core]>=0.2.2",
+    "fastapi>=0.111.0",
+    "openai>=1.0.0",
+    "uvicorn>=0.24.0",
+    "pydantic>=2.0.0",
+    "python-dotenv>=1.0.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+server = "adaptshield.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["adaptshield", "adaptshield.server"]
+package-dir = { "adaptshield" = ".", "adaptshield.server" = "server" }

server/Dockerfile ADDED Viewed

	@@ -0,0 +1,47 @@

+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git curl && \
+    rm -rf /var/lib/apt/lists/*
+COPY . /app/env
+WORKDIR /app/env
+RUN if ! command -v uv >/dev/null 2>&1; then \
+    curl -LsSf https://astral.sh/uv/install.sh | sh && \
+    mv /root/.local/bin/uv /usr/local/bin/uv && \
+    mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+    uv sync --frozen --no-install-project --no-editable; \
+    else \
+    uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+    uv sync --frozen --no-editable; \
+    else \
+    uv sync --no-editable; \
+    fi
+FROM ${BASE_IMAGE}
+WORKDIR /app
+COPY --from=builder /app/env/.venv /app/.venv
+COPY --from=builder /app/env /app/env
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+EXPOSE 7860
+HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
+    CMD curl -f http://localhost:7860/health || exit 1
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 7860"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Adaptshield environment server components."""
+from server.adaptshield_environment import AdaptShieldEnvironment
+__all__ = ["AdaptShieldEnvironment"]

server/adaptshield_environment.py ADDED Viewed

	@@ -0,0 +1,1324 @@

+"""
+AdaptShield Environment
+Two-phase agentic cybersecurity environment implementing full OpenEnv spec.
+Phase 1 (Threat Analyst): Agent reads raw SIEM state, outputs threat assessment.
+Phase 2 (Tactical Executor): Agent reads ONLY Phase 1 output, executes defense.
+The attacker progresses through stages (recon→exploit→exfiltration) if agent
+fails to act. On the hard task, strategy shifts mid-episode after turn 3.
+OpenEnv compliance:
+- reset() returns initial observation
+- step() returns observation with reward, done, info
+- state property returns current State
+- SUPPORTS_CONCURRENT_SESSIONS = True
+- normalized_score ALWAYS present in metadata
+"""
+import os
+import sys
+from enum import Enum
+from typing import Any, Dict, List, Optional
+from uuid import uuid4
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+from models import AdaptShieldAction, Phase1Action, Phase2Action, AdaptShieldObservation
+from server.attacker import AttackerEngine
+from server.grader import grade_step, normalize_episode_score, _clamp
+from server.scenarios import (
+    TASK_CONFIGS,
+    build_phase1_obs,
+    build_phase2_obs,
+    choose_operational_mode,
+    choose_world_family,
+    mission_profile_for,
+)
+DEFENSE_TTL = {
+    "rate_limit": 2,
+    "isolate":    2,
+    "honeypot":   3,
+    "patch":      4,
+}
+DEFENSE_SIDE_EFFECT = {
+    "rate_limit": "login_latency",
+    "isolate":    "service_downtime",
+    "honeypot":   "attacker_redirection",
+    "patch":      "temporary_restart",
+}
+AVAILABLE_SOC_TOOLS = [
+    {
+        "name": "log_search",
+        "endpoint": "/tools/log_search",
+        "description": "Search recent SIEM/application logs for a node and time window.",
+    },
+    {
+        "name": "cmdb_lookup",
+        "endpoint": "/tools/cmdb_lookup",
+        "description": "Inspect service ownership, criticality, dependencies, and blast radius.",
+    },
+    {
+        "name": "edr_status",
+        "endpoint": "/tools/edr_status",
+        "description": "Check endpoint containment, persistence, beaconing, and active controls.",
+    },
+    {
+        "name": "vuln_lookup",
+        "endpoint": "/tools/vuln_lookup",
+        "description": "Query internal package/advisory risk for supply-chain investigations.",
+    },
+    {
+        "name": "identity_lookup",
+        "endpoint": "/tools/identity_lookup",
+        "description": "Inspect account type, privilege level, normal host affinity, and anomalous identity use.",
+    },
+    {
+        "name": "change_calendar_lookup",
+        "endpoint": "/tools/change_calendar_lookup",
+        "description": "Check whether maintenance, deploys, or patch windows were scheduled for the target service.",
+    },
+    {
+        "name": "netflow_lookup",
+        "endpoint": "/tools/netflow_lookup",
+        "description": "Inspect east-west and outbound traffic summaries for enterprise network pivots and data movement.",
+    },
+]
+SERVICE_OWNERS = {
+    "auth_service": "identity-platform",
+    "payment_service": "checkout-platform",
+    "database": "data-platform",
+    "api_gateway": "edge-platform",
+}
+IDENTITY_CONTEXT = {
+    "auth_service": {
+        "account": "svc_auth_frontend",
+        "account_type": "service_account",
+        "privilege_level": "medium",
+        "normal_hosts": ["auth_service", "api_gateway"],
+    },
+    "payment_service": {
+        "account": "svc_checkout",
+        "account_type": "service_account",
+        "privilege_level": "high",
+        "normal_hosts": ["payment_service"],
+    },
+    "database": {
+        "account": "svc_data_sync",
+        "account_type": "service_account",
+        "privilege_level": "high",
+        "normal_hosts": ["database", "payment_service"],
+    },
+    "api_gateway": {
+        "account": "deploy_bot",
+        "account_type": "automation",
+        "privilege_level": "medium",
+        "normal_hosts": ["api_gateway"],
+    },
+}
+CHANGE_CALENDAR = {
+    "auth_service": {
+        "window": "03:00-03:20Z",
+        "change_type": "auth policy sync",
+        "expected_actor": "svc_auth_frontend",
+    },
+    "payment_service": {
+        "window": "02:30-02:45Z",
+        "change_type": "checkout rollout",
+        "expected_actor": "svc_checkout",
+    },
+    "database": {
+        "window": "04:00-04:30Z",
+        "change_type": "backup and index maintenance",
+        "expected_actor": "svc_data_sync",
+    },
+    "api_gateway": {
+        "window": "03:10-03:25Z",
+        "change_type": "gateway deploy",
+        "expected_actor": "deploy_bot",
+    },
+}
+class AdaptShieldEnvironment(Environment):
+    """
+    AdaptShield: Two-Phase Adaptive Cybersecurity RL Environment.
+    Example:
+        >>> env = AdaptShieldEnvironment(task_name="direct-triage")
+        >>> obs = env.reset()
+        >>> # Phase 1 — classify the threat
+        >>> obs2 = env.step(Phase1Action(
+        ...     threat_type="brute_force", confidence=0.9,
+        ...     target_node="auth_service", recommended_action="rate_limit"
+        ... ))
+        >>> print(obs2.phase)  # 2
+        >>> # Phase 2 — execute the defense
+        >>> obs3 = env.step(Phase2Action(
+        ...     action="rate_limit", target_node="auth_service"
+        ... ))
+        >>> print(obs3.reward)  # reward signal
+    """
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(
+        self,
+        task_name: str = "direct-triage",
+        world_split: str | None = None,
+        world_family: str | None = None,
+        operational_mode: str | None = None,
+    ):
+        if task_name not in TASK_CONFIGS:
+            task_name = "direct-triage"
+        self._task_name       = task_name
+        self._config          = TASK_CONFIGS[task_name]
+        self._world_split     = self._sanitize_world_split(world_split or os.environ.get("ADAPTSHIELD_WORLD_SPLIT", "train"))
+        self._requested_world_family = world_family or os.environ.get("ADAPTSHIELD_WORLD_FAMILY")
+        self._requested_operational_mode = operational_mode or os.environ.get("ADAPTSHIELD_OPERATIONAL_MODE")
+        self._world_family    = choose_world_family(self._world_split, self._requested_world_family)
+        self._operational_mode = choose_operational_mode(task_name, self._requested_operational_mode)
+        self._mission_profile = mission_profile_for(task_name, self._operational_mode, self._world_family)
+        self._attacker        = AttackerEngine(task_name, world_family=self._world_family)
+        self._state           = State(episode_id=str(uuid4()), step_count=0)
+        # Episode state
+        self._turn:              int                      = 0
+        self._phase:             int                      = 1
+        self._rewards:           List[float]              = []
+        self._done:              bool                     = False
+        self._last_reward:       float                    = 0.0
+        self._history:           List[Dict[str, str]]     = []
+        self._phase1_output:     Optional[Dict[str, Any]] = None
+        self._phase1_grading_output: Optional[Dict[str, Any]] = None
+        self._turn_config:       Optional[Dict[str, Any]] = None
+        self._consecutive_wrong: int                      = 0
+        self._last_obs:          Optional[AdaptShieldObservation] = None
+        self._episode_replay:    List[Dict[str, Any]]     = []
+        self._last_replay_strategy: Optional[str]          = None
+        self._active_defenses:   List[Dict[str, Any]]      = []
+        self._foothold_established: bool                   = False
+        self._tool_trace:        List[Dict[str, Any]]      = []
+        self._turn_tool_evidence: Dict[int, List[Dict[str, Any]]] = {}
+        self._turn_tool_results: Dict[int, List[Dict[str, Any]]] = {}
+    # ── OpenEnv interface ──────────────────────────────────────────────────
+    def reset(self, task_name: str = None) -> AdaptShieldObservation:
+        """
+        Reset environment. Optionally switch task via task_name.
+        Always returns Phase 1 observation (Threat Analyst turn).
+        """
+        if task_name and task_name in TASK_CONFIGS:
+            self._task_name = task_name
+            self._config    = TASK_CONFIGS[task_name]
+        self._world_family = choose_world_family(self._world_split, self._requested_world_family)
+        self._operational_mode = choose_operational_mode(self._task_name, self._requested_operational_mode)
+        self._mission_profile = mission_profile_for(self._task_name, self._operational_mode, self._world_family)
+        self._attacker  = AttackerEngine(self._task_name, world_family=self._world_family)
+        self._state              = State(episode_id=str(uuid4()), step_count=0)
+        self._turn               = 1
+        self._phase              = 1
+        self._rewards            = []
+        self._done               = False
+        self._last_reward        = 0.0
+        self._history            = []
+        self._phase1_output      = None
+        self._phase1_grading_output = None
+        self._consecutive_wrong  = 0
+        self._episode_replay     = []
+        self._last_replay_strategy = None
+        self._active_defenses    = []
+        self._foothold_established = False
+        self._tool_trace         = []
+        self._turn_tool_evidence = {}
+        self._turn_tool_results  = {}
+        self._attacker.reset_episode()
+        self._turn_config = self._prepare_turn_config(self._attacker.build_observation())
+        obs_dict = build_phase1_obs(
+            turn_config=self._turn_config,
+            history=self._history,
+            task_name=self._task_name,
+            turn=self._turn,
+            max_turns=self._config["max_turns"],
+            episode_id=self._state.episode_id,
+            mission_profile=self._mission_profile,
+        )
+        obs = self._to_obs(obs_dict)
+        obs.metadata = self._metadata_with_defenses(obs.metadata)
+        self._last_obs = obs
+        return obs
+    def step(
+        self, action: AdaptShieldAction | Phase1Action | Phase2Action
+    ) -> AdaptShieldObservation:  # type: ignore[override]
+        """
+        Execute one step.
+        Accepts either Phase1Action or Phase2Action.
+        Phase 1 → transitions to Phase 2 (no reward yet).
+        Phase 2 → grades action, advances turn, returns to Phase 1.
+        """
+        if self._done:
+            return self._last_obs or self._error_observation(
+                "Episode already completed."
+            )
+        try:
+            self._state.step_count += 1
+            # ── Phase 1 → Phase 2 transition ──────────────────────────────
+            if self._phase == 1:
+                phase1_output = {
+                    "threat_type":        _action_value(getattr(action, "threat_type", None), "unknown"),
+                    "confidence":         _action_float(getattr(action, "confidence", None), 0.5),
+                    "target_node":        _action_value(getattr(action, "target_node", None), "unknown"),
+                    "recommended_action": _action_value(getattr(action, "recommended_action", None), "monitor"),
+                    "reasoning":          str(getattr(action, "reasoning", "") or ""),
+                }
+                self._phase1_grading_output = dict(phase1_output)
+                self._phase1_output = _degrade_handoff(
+                    phase1_output=phase1_output,
+                    turn_config=self._turn_config or {},
+                    task_name=self._task_name,
+                    turn=self._turn,
+                )
+                self._phase = 2
+                current_score = normalize_episode_score(self._rewards)
+                obs_dict = build_phase2_obs(
+                    phase1_output=self._phase1_output,
+                    history=self._history,
+                    task_name=self._task_name,
+                    turn=self._turn,
+                    max_turns=self._config["max_turns"],
+                    episode_id=self._state.episode_id,
+                    current_score=current_score,
+                    mission_profile=self._mission_profile,
+                )
+                obs = self._to_obs(obs_dict)
+                obs.reward = _clamp(self._last_reward if self._last_reward > 0 else 0.01)
+                obs.metadata = self._metadata_with_defenses({
+                    "episode_id": self._state.episode_id,
+                    "normalized_score": float(current_score),
+                    "mission_profile": self._mission_profile,
+                })
+                self._last_obs = obs
+                return obs
+            # ── Phase 2 — grade and advance turn ──────────────────────────
+            p2 = {
+                "action":      _action_value(getattr(action, "action", None), "monitor"),
+                "target_node": _action_value(getattr(action, "target_node", None), "unknown"),
+                "reasoning":   str(getattr(action, "reasoning", "") or ""),
+            }
+            current_stage = self._attacker.current_stage()
+            foothold_before = self._foothold_established
+            reward, catastrophic, info = grade_step(
+                phase1_action=self._phase1_grading_output or self._phase1_output or {},
+                phase2_action=p2,
+                turn_config=self._turn_config or {},
+                stage=current_stage,
+                consecutive_wrong=self._consecutive_wrong,
+                task_name=self._task_name,
+                foothold_established=foothold_before,
+                mission_profile=self._mission_profile,
+                tool_context=self._tool_context_for_turn(),
+            )
+            reward = _clamp(_action_float(reward, 0.01))
+            self._register_active_defense(p2)
+            foothold_transition = self._update_foothold_state(
+                p2=p2,
+                info=info,
+                stage=current_stage,
+            )
+            info["foothold_established"] = self._foothold_established
+            info["foothold_transition"] = foothold_transition
+            # Track consecutive wrong actions for stage escalation
+            if info.get("acted_correctly", False):
+                self._consecutive_wrong = 0
+            else:
+                self._consecutive_wrong += 1
+            self._rewards.append(reward)
+            self._last_reward = reward
+            # Update history
+            replay_strategy = self._attacker.current_strategy()
+            strategy_shift = (
+                self._last_replay_strategy is not None and
+                replay_strategy != self._last_replay_strategy
+            )
+            self._last_replay_strategy = replay_strategy
+            self._episode_replay.append({
+                "turn":        self._turn,
+                "p1":          (self._phase1_output or {}).get("threat_type", "unknown"),
+                "p2_action":   p2["action"],
+                "target":      p2["target_node"],
+                "result":      _replay_result(info),
+                "shift":       strategy_shift,
+                "impact":      float(info.get("business_impact", 0.0)),
+                "blast_radius": info.get("dependency_blast_radius", []),
+                "active_defenses": self._active_defense_snapshot(),
+                "foothold_established": self._foothold_established,
+                "foothold_transition": foothold_transition,
+                "mission_alignment": info.get("mission_alignment", "neutral"),
+                "tool_calls": info.get("tool_count", 0),
+                "tool_evidence_found": info.get("tool_evidence_found", False),
+            })
+            self._history.append({
+                "turn":   str(self._turn),
+                "p1":     f"classified:{(self._phase1_output or {}).get('threat_type','?')}",
+                "p2":     f"{p2['action']}→{p2['target_node']}",
+                "result": info.get("score_reason", "")[:80],
+                "reward": f"{reward:.2f}",
+            })
+            # Advance attacker
+            self._attacker.advance_turn(
+                agent_acted_correctly=info.get("acted_correctly", False)
+            )
+            self._decay_active_defenses()
+            # Advance turn
+            self._turn  += 1
+            self._phase  = 1
+            self._phase1_output = None
+            self._phase1_grading_output = None
+            episode_done = catastrophic or (self._turn > self._config["max_turns"])
+            self._done   = episode_done
+            # Compute normalized score — ALWAYS present
+            norm_score = normalize_episode_score(self._rewards)
+            if not episode_done:
+                self._turn_config = self._prepare_turn_config(self._attacker.build_observation())
+                obs_dict = build_phase1_obs(
+                    turn_config=self._turn_config,
+                    history=self._history,
+                    task_name=self._task_name,
+                    turn=self._turn,
+                    max_turns=self._config["max_turns"],
+                    episode_id=self._state.episode_id,
+                    mission_profile=self._mission_profile,
+                )
+                obs = self._to_obs(obs_dict)
+                obs.reward             = reward
+                obs.done               = False
+                obs.last_action_result = info.get("score_reason", "")
+                obs.metadata = self._metadata_with_defenses({
+                    "episode_id":       self._state.episode_id,
+                    "normalized_score": float(norm_score),
+                    "score_breakdown":  info,
+                    "turns_completed":  self._turn - 1,
+                    "consecutive_wrong": self._consecutive_wrong,
+                    "mission_profile": self._mission_profile,
+                })
+            else:
+                self._attacker.advance_episode()
+                obs_dict = build_phase1_obs(
+                    turn_config={"network_nodes": {}, "active_alerts": ["[EPISODE COMPLETE]"],
+                                 "attack_stage": "none", "is_benign": False,
+                                 "strategy": "none", "correct_action": "none", "correct_target": "none"},
+                    history=self._history,
+                    task_name=self._task_name,
+                    turn=self._turn,
+                    max_turns=self._config["max_turns"],
+                    episode_id=self._state.episode_id,
+                    mission_profile=self._mission_profile,
+                )
+                obs = self._to_obs(obs_dict)
+                obs.reward = reward
+                obs.done   = True
+                obs.last_action_result = info.get("score_reason", "")
+                obs.metadata = self._metadata_with_defenses({
+                    "episode_id":       self._state.episode_id,
+                    "normalized_score": float(norm_score),
+                    "score_breakdown":  info,
+                    "raw_rewards":      self._rewards,
+                    "catastrophic":     catastrophic,
+                    "turns_completed":  self._turn - 1,
+                    "episode_replay":   self._episode_replay,
+                    "mission_profile": self._mission_profile,
+                })
+            self._last_obs = obs
+            return obs
+        except Exception as exc:
+            return self._error_observation(f"step_error: {exc}")
+    @property
+    def state(self) -> State:
+        """Returns State with episode_id and step_count per OpenEnv spec."""
+        return self._state
+    # ── Internal ──────────────────────────────────────────────────────────
+    def _to_obs(self, d: Dict[str, Any]) -> AdaptShieldObservation:
+        return AdaptShieldObservation(
+            scenario_id        = d.get("scenario_id", ""),
+            task_name          = d.get("task_name", self._task_name),
+            phase              = d.get("phase", 1),
+            turn               = d.get("turn", 0),
+            max_turns          = d.get("max_turns", self._config["max_turns"]),
+            network_nodes      = d.get("network_nodes", {}),
+            active_alerts      = d.get("active_alerts", []),
+            attack_stage       = d.get("attack_stage", "none"),
+            history            = d.get("history", []),
+            phase1_assessment  = d.get("phase1_assessment"),
+            last_action_result = d.get("last_action_result"),
+            system_context     = d.get("system_context", ""),
+            available_actions  = d.get("available_actions", []),
+            reward             = d.get("reward", 0.0),
+            done               = d.get("done", False),
+            metadata           = d.get("metadata", {"normalized_score": 0.50}),
+        )
+    @staticmethod
+    def _sanitize_world_split(value: str) -> str:
+        return value if value in {"train", "eval"} else "train"
+    def _error_observation(self, error_message: str) -> AdaptShieldObservation:
+        """Return a safe observation instead of letting step() raise."""
+        norm_score = float(normalize_episode_score(self._rewards))
+        reward = _clamp(self._last_reward if self._last_reward > 0 else 0.01)
+        if self._phase == 2:
+            obs_dict = build_phase2_obs(
+                phase1_output=self._phase1_output or {},
+                history=self._history,
+                task_name=self._task_name,
+                turn=self._turn,
+                max_turns=self._config["max_turns"],
+                episode_id=self._state.episode_id,
+                current_score=norm_score,
+                mission_profile=self._mission_profile,
+            )
+        else:
+            turn_config = self._turn_config or {
+                "network_nodes": {},
+                "active_alerts": [f"[ERROR] {error_message}"],
+                "attack_stage": "none",
+                "is_benign": False,
+                "strategy": "unknown",
+                "correct_action": "monitor",
+                "correct_target": "unknown",
+            }
+            obs_dict = build_phase1_obs(
+                turn_config=turn_config,
+                history=self._history,
+                task_name=self._task_name,
+                turn=self._turn,
+                max_turns=self._config["max_turns"],
+                episode_id=self._state.episode_id,
+                mission_profile=self._mission_profile,
+            )
+        obs = self._to_obs(obs_dict)
+        obs.reward = float(reward)
+        obs.done = bool(self._done)
+        obs.last_action_result = error_message
+        obs.metadata = self._metadata_with_defenses({
+            "episode_id": self._state.episode_id,
+            "normalized_score": norm_score,
+            "error": error_message,
+            "turns_completed": max(0, self._turn - 1),
+            "mission_profile": self._mission_profile,
+        })
+        self._last_obs = obs
+        return obs
+    def call_tool(self, tool_name: str, **params: Any) -> Dict[str, Any]:
+        """
+        Query the local SOC tool surface.
+        These tools reveal partial evidence, not ground-truth answers. They are
+        stateful because responses depend on the current turn, attacker stage,
+        foothold state, active defenses, and previous actions.
+        """
+        try:
+            tool_name = str(tool_name or "").strip()
+            node = str(params.get("node", params.get("target_node", "unknown")) or "unknown")
+            if tool_name == "log_search":
+                result = self._tool_log_search(node=node, query=str(params.get("query", "")))
+            elif tool_name == "cmdb_lookup":
+                result = self._tool_cmdb_lookup(node=node)
+            elif tool_name == "edr_status":
+                result = self._tool_edr_status(node=node)
+            elif tool_name == "vuln_lookup":
+                result = self._tool_vuln_lookup(node=node, package=str(params.get("package", "")))
+            elif tool_name == "identity_lookup":
+                result = self._tool_identity_lookup(node=node)
+            elif tool_name == "change_calendar_lookup":
+                result = self._tool_change_calendar_lookup(node=node)
+            elif tool_name == "netflow_lookup":
+                result = self._tool_netflow_lookup(node=node)
+            else:
+                result = {
+                    "tool": tool_name,
+                    "turn": self._turn,
+                    "node": node,
+                    "error": f"Unknown tool: {tool_name}",
+                    "available_tools": [tool["name"] for tool in AVAILABLE_SOC_TOOLS],
+                }
+            self._record_tool_result(result)
+            return _public_tool_result(result)
+        except Exception as exc:
+            return {
+                "tool": str(tool_name),
+                "turn": self._turn,
+                "error": f"tool_error: {exc}",
+            }
+    def _tool_log_search(self, node: str, query: str = "") -> Dict[str, Any]:
+        turn_config = self._turn_config or {}
+        hard_task = self._task_name == "polymorphic-zero-day"
+        strategy = str(turn_config.get("strategy", "benign"))
+        target = str(turn_config.get("correct_target", "none"))
+        stage = str(turn_config.get("attack_stage", self._attacker.current_stage()))
+        relevant = node == target
+        events = [f"{node}: routine health heartbeat accepted"]
+        evidence_type = "benign"
+        confidence = 0.32
+        if relevant and strategy == "brute_force":
+            events = [
+                "auth_service: 143 failed logins grouped by /24 source subnet",
+                "auth_service: password spray pattern across active user accounts",
+                "payment_service: no matching checkout errors during same window",
+            ]
+            evidence_type = "brute_force"
+            confidence = 0.88
+        elif relevant and strategy == "lateral_movement":
+            if hard_task:
+                events = [
+                    "auth_service: low-volume identity failures persist from earlier abuse window",
+                    "payment_service: shared workload credential reused from a new east-west origin",
+                    "payment_service: callback mesh fans out across private ports after short-lived sessions",
+                ]
+                confidence = 0.77
+            else:
+                events = [
+                    "payment_service: service account login from previously unseen host",
+                    "auth_service -> payment_service connections exceed baseline",
+                    "payment_service: short-lived internal sessions fan out to private ports",
+                ]
+                confidence = 0.84
+            evidence_type = "lateral_movement"
+        elif relevant and strategy == "exfiltration":
+            if hard_task:
+                events = [
+                    "database: bulk row access concentrates on high-value records",
+                    "database: staging bundle assembled before outbound transfer burst",
+                    "netflow: outbound transfer volume from database exceeds 7-day baseline",
+                ]
+                confidence = 0.82
+            else:
+                events = [
+                    "database: sequential reads concentrated on sensitive tables",
+                    "database: compressed archive created before outbound transfer spike",
+                    "netflow: database egress exceeds 7-day baseline for this hour",
+                ]
+                confidence = 0.90
+            evidence_type = "exfiltration"
+        elif relevant and strategy == "supply_chain":
+            if hard_task:
+                events = [
+                    "api_gateway: release artifact requested from untrusted update channel",
+                    "api_gateway: dependency integrity drift detected in gateway manifest",
+                    "api_gateway: new outbound connection follows release request",
+                ]
+                confidence = 0.79
+            else:
+                events = [
+                    "api_gateway: unsigned package update requested by deploy_bot",
+                    "api_gateway: dependency hash differs from approved registry hash",
+                    "api_gateway: new outbound connection follows update attempt",
+                ]
+                confidence = 0.86
+            evidence_type = "supply_chain"
+        elif turn_config.get("is_benign", False):
+            events = [
+                f"{node}: scheduled maintenance event matches change calendar",
+                f"{node}: no persistence, beaconing, or sensitive data access observed",
+            ]
+            confidence = 0.65
+        return {
+            "tool": "log_search",
+            "turn": self._turn,
+            "phase": self._phase,
+            "node": node,
+            "query": query,
+            "events": [self._surface_text(event) for event in events],
+            "evidence_type": evidence_type,
+            "confidence": confidence,
+            "verified": relevant and evidence_type == strategy,
+        }
+    def _tool_cmdb_lookup(self, node: str) -> Dict[str, Any]:
+        dependencies = {
+            "auth_service": ["payment_service"],
+            "payment_service": ["api_gateway"],
+            "database": ["payment_service", "api_gateway"],
+            "api_gateway": ["auth_service", "payment_service", "database"],
+        }.get(node, [])
+        criticality = {
+            "auth_service": "high",
+            "payment_service": "critical",
+            "database": "critical",
+            "api_gateway": "high",
+        }.get(node, "unknown")
+        safe_actions = {
+            "auth_service": ["rate_limit", "patch", "monitor"],
+            "payment_service": ["honeypot", "rate_limit", "monitor"],
+            "database": ["honeypot", "monitor"],
+            "api_gateway": ["patch", "rate_limit", "monitor"],
+        }.get(node, ["monitor"])
+        return {
+            "tool": "cmdb_lookup",
+            "turn": self._turn,
+            "phase": self._phase,
+            "node": node,
+            "owner": SERVICE_OWNERS.get(node, "unknown"),
+            "criticality": criticality,
+            "dependencies": dependencies,
+            "mission_profile": self._mission_profile,
+            "safe_actions": safe_actions,
+            "evidence_type": "operational_context",
+            "verified": True,
+            "result_summary": self._surface_text(f"{node} criticality {criticality}; blast radius {len(dependencies)}"),
+        }
+    def _tool_edr_status(self, node: str) -> Dict[str, Any]:
+        turn_config = self._turn_config or {}
+        hard_task = self._task_name == "polymorphic-zero-day"
+        strategy = str(turn_config.get("strategy", "benign"))
+        target = str(turn_config.get("correct_target", "none"))
+        relevant = node == target
+        active_controls = [
+            control for control in self._active_defenses
+            if control.get("target") == node
+        ]
+        status = {
+            "containment": "not_isolated",
+            "persistence": False,
+            "beaconing": False,
+            "active_controls": active_controls,
+        }
+        evidence_type = "benign"
+        confidence = 0.45
+        if relevant and strategy == "lateral_movement":
+            if hard_task:
+                status.update({
+                    "containment": "partial" if self._foothold_established else "unconfirmed",
+                    "persistence": self._foothold_established,
+                    "beaconing": self._foothold_established,
+                    "process_note": (
+                        "callback process remains active under reused workload credential"
+                        if self._foothold_established else
+                        "unexpected child process under shared workload credential; no confirmed beacon yet"
+                    ),
+                })
+                confidence = 0.74 if not self._foothold_established else 0.87
+            else:
+                status.update({
+                    "containment": "partial" if self._foothold_established else "not_isolated",
+                    "persistence": self._foothold_established,
+                    "beaconing": True,
+                    "process_note": "unknown child process under service account context",
+                })
+                confidence = 0.87
+            evidence_type = "lateral_movement"
+        elif relevant and strategy == "exfiltration":
+            status.update({
+                "containment": "not_isolated",
+                "persistence": self._foothold_established,
+                "beaconing": False if hard_task else True,
+                "process_note": (
+                    "bundle staging followed by intermittent outbound worker activity"
+                    if hard_task else
+                    "archive process followed by outbound transfer"
+                ),
+            })
+            evidence_type = "exfiltration"
+            confidence = 0.73 if hard_task else 0.82
+        elif active_controls:
+            status["containment"] = "control_active"
+            confidence = 0.70
+        return {
+            "tool": "edr_status",
+            "turn": self._turn,
+            "phase": self._phase,
+            "node": node,
+            **status,
+            "evidence_type": evidence_type,
+            "confidence": confidence,
+            "verified": relevant and evidence_type == strategy,
+            "process_note": self._surface_text(str(status.get("process_note", ""))) if status.get("process_note") else "",
+        }
+    def _tool_vuln_lookup(self, node: str, package: str = "") -> Dict[str, Any]:
+        turn_config = self._turn_config or {}
+        hard_task = self._task_name == "polymorphic-zero-day"
+        strategy = str(turn_config.get("strategy", "benign"))
+        target = str(turn_config.get("correct_target", "none"))
+        relevant = node == target and strategy == "supply_chain"
+        if relevant:
+            advisory = {
+                "package": package or "gateway-router",
+                "advisory_id": "ADV-AS-042",
+                "risk": "critical",
+                "finding": (
+                    "artifact integrity drift from untrusted release channel"
+                    if hard_task else
+                    "registry hash mismatch with unsigned update source"
+                ),
+                "recommended_mitigation": "patch from trusted registry",
+            }
+            evidence_type = "supply_chain"
+            confidence = 0.82 if hard_task else 0.91
+        else:
+            advisory = {
+                "package": package or "unknown",
+                "advisory_id": None,
+                "risk": "none_known",
+                "finding": "no matching active internal advisory for this node/package",
+                "recommended_mitigation": "continue investigation",
+            }
+            evidence_type = "benign"
+            confidence = 0.55
+        return {
+            "tool": "vuln_lookup",
+            "turn": self._turn,
+            "phase": self._phase,
+            "node": node,
+            **{
+                **advisory,
+                "finding": self._surface_text(str(advisory.get("finding", ""))),
+                "recommended_mitigation": self._surface_text(str(advisory.get("recommended_mitigation", ""))),
+            },
+            "evidence_type": evidence_type,
+            "confidence": confidence,
+            "verified": relevant,
+        }
+    def _tool_identity_lookup(self, node: str) -> Dict[str, Any]:
+        turn_config = self._turn_config or {}
+        strategy = str(turn_config.get("strategy", "benign"))
+        target = str(turn_config.get("correct_target", "none"))
+        identity = dict(IDENTITY_CONTEXT.get(node, {
+            "account": "unknown",
+            "account_type": "unknown",
+            "privilege_level": "unknown",
+            "normal_hosts": [],
+        }))
+        lookup = {
+            "tool": "identity_lookup",
+            "turn": self._turn,
+            "phase": self._phase,
+            "node": node,
+            **identity,
+            "recent_source_host": node,
+            "source_status": "expected",
+            "identity_anomaly": False,
+            "result_summary": "",
+            "confidence": 0.58,
+            "evidence_type": "benign",
+            "verified": False,
+        }
+        if node == target and strategy == "lateral_movement":
+            lookup.update({
+                "recent_source_host": "auth_service",
+                "source_status": "unexpected",
+                "identity_anomaly": True,
+                "confidence": 0.84 if self._task_name != "polymorphic-zero-day" else 0.76,
+                "evidence_type": "lateral_movement",
+                "verified": True,
+            })
+        elif node == target and strategy == "supply_chain":
+            lookup.update({
+                "recent_source_host": "external-release-runner",
+                "source_status": "unexpected",
+                "identity_anomaly": True,
+                "confidence": 0.73,
+                "evidence_type": "supply_chain",
+                "verified": True,
+            })
+        elif turn_config.get("is_benign", False):
+            lookup.update({
+                "recent_source_host": identity.get("normal_hosts", [node])[0] if identity.get("normal_hosts") else node,
+                "source_status": "scheduled_change_window",
+                "confidence": 0.69,
+            })
+        if (
+            self._task_name == "dual-pivot" and
+            strategy == "lateral_movement" and
+            self._operational_mode == "evidence_preservation"
+        ):
+            lookup["source_status"] = "unexpected_but_trackable"
+            lookup["result_summary"] = self._surface_text(
+                "Identity trail is intact; preserving visibility before hard containment is mission-aligned."
+            )
+        else:
+            lookup["result_summary"] = self._surface_text(
+                f"account={lookup['account']} source={lookup['recent_source_host']} anomaly={lookup['identity_anomaly']}"
+            )
+        return lookup
+    def _tool_change_calendar_lookup(self, node: str) -> Dict[str, Any]:
+        turn_config = self._turn_config or {}
+        strategy = str(turn_config.get("strategy", "benign"))
+        target = str(turn_config.get("correct_target", "none"))
+        change = dict(CHANGE_CALENDAR.get(node, {
+            "window": "none_scheduled",
+            "change_type": "none",
+            "expected_actor": "unknown",
+        }))
+        scheduled = bool(turn_config.get("is_benign", False))
+        confidence = 0.66 if scheduled else 0.74
+        if node == target and strategy == "supply_chain":
+            scheduled = False
+            confidence = 0.87 if self._task_name != "polymorphic-zero-day" else 0.78
+        elif node == target and strategy == "lateral_movement":
+            scheduled = False
+            confidence = 0.72
+        change_status = "scheduled" if scheduled else "no_matching_change"
+        if (
+            self._task_name == "dual-pivot" and
+            strategy == "lateral_movement" and
+            self._operational_mode == "evidence_preservation"
+        ):
+            change_status = "forensic_observation_hold"
+        return {
+            "tool": "change_calendar_lookup",
+            "turn": self._turn,
+            "phase": self._phase,
+            "node": node,
+            **change,
+            "scheduled": scheduled,
+            "change_status": change_status,
+            "confidence": confidence,
+            "evidence_type": "benign" if scheduled else ("supply_chain" if node == target and strategy == "supply_chain" else "operational_context"),
+            "verified": scheduled or (node == target and strategy == "supply_chain"),
+        }
+    def _tool_netflow_lookup(self, node: str) -> Dict[str, Any]:
+        turn_config = self._turn_config or {}
+        strategy = str(turn_config.get("strategy", "benign"))
+        target = str(turn_config.get("correct_target", "none"))
+        hard_task = self._task_name == "polymorphic-zero-day"
+        summary = {
+            "tool": "netflow_lookup",
+            "turn": self._turn,
+            "phase": self._phase,
+            "node": node,
+            "east_west_connections": 12,
+            "outbound_mb": 4,
+            "new_destinations": 0,
+            "traffic_pattern": "baseline",
+            "confidence": 0.57,
+            "evidence_type": "benign",
+            "verified": False,
+        }
+        if node == target and strategy == "lateral_movement":
+            summary.update({
+                "east_west_connections": 46 if hard_task else 58,
+                "outbound_mb": 9,
+                "new_destinations": 5 if hard_task else 7,
+                "traffic_pattern": "east_west_fanout",
+                "confidence": 0.79 if hard_task else 0.88,
+                "evidence_type": "lateral_movement",
+                "verified": True,
+            })
+        elif node == target and strategy == "exfiltration":
+            summary.update({
+                "east_west_connections": 18,
+                "outbound_mb": 74 if hard_task else 96,
+                "new_destinations": 2,
+                "traffic_pattern": "outbound_transfer_burst",
+                "confidence": 0.82 if hard_task else 0.91,
+                "evidence_type": "exfiltration",
+                "verified": True,
+            })
+        elif node == target and strategy == "supply_chain":
+            summary.update({
+                "east_west_connections": 16,
+                "outbound_mb": 19,
+                "new_destinations": 1,
+                "traffic_pattern": "post_deploy_callback",
+                "confidence": 0.71,
+                "evidence_type": "supply_chain",
+                "verified": True,
+            })
+        if (
+            self._task_name == "polymorphic-zero-day" and
+            self._operational_mode in {"forensic_hold", "business_continuity"} and
+            node == target and strategy in {"lateral_movement", "exfiltration"}
+        ):
+            summary["traffic_pattern"] = "controlled_observation_window"
+            summary["confidence"] = max(float(summary["confidence"]), 0.83)
+        return summary
+    def _record_tool_result(self, result: Dict[str, Any]) -> None:
+        turn = int(result.get("turn", self._turn) or self._turn)
+        internal = {
+            "turn": turn,
+            "phase": result.get("phase", self._phase),
+            "tool": result.get("tool", "unknown"),
+            "node": result.get("node", "unknown"),
+            "evidence_type": result.get("evidence_type", "unknown"),
+            "verified": bool(result.get("verified", False)),
+            "confidence": float(result.get("confidence", 0.0) or 0.0),
+        }
+        self._turn_tool_results.setdefault(turn, []).append(internal)
+        trace = {
+            "turn": result.get("turn", self._turn),
+            "phase": result.get("phase", self._phase),
+            "tool": result.get("tool", "unknown"),
+            "node": result.get("node", "unknown"),
+            "confidence": float(result.get("confidence", 0.0) or 0.0),
+            "summary": _tool_summary(result),
+        }
+        self._tool_trace.append(trace)
+        if internal["verified"]:
+            self._turn_tool_evidence.setdefault(turn, []).append(internal)
+    def _tool_context_for_turn(self) -> Dict[str, Any]:
+        evidence = list(self._turn_tool_evidence.get(self._turn, []))
+        return {
+            "turn": self._turn,
+            "tool_count": len([
+                row for row in self._tool_trace
+                if int(row.get("turn", -1)) == self._turn
+            ]),
+            "evidence": evidence,
+            "tool_results": list(self._turn_tool_results.get(self._turn, [])),
+        }
+    def _update_foothold_state(
+        self,
+        p2: Dict[str, str],
+        info: Dict[str, Any],
+        stage: str,
+    ) -> bool:
+        if (
+            self._task_name != "polymorphic-zero-day" or
+            self._foothold_established or
+            stage not in ("exploit", "exfiltration")
+        ):
+            return False
+        if p2.get("action") == "monitor" or not info.get("acted_correctly", False):
+            self._foothold_established = True
+            return True
+        return False
+    def _register_active_defense(self, p2: Dict[str, str]) -> None:
+        action = p2.get("action", "monitor")
+        if action not in DEFENSE_TTL:
+            return
+        target = p2.get("target_node", "unknown")
+        self._active_defenses = [
+            control for control in self._active_defenses
+            if not (control["action"] == action and control["target"] == target)
+        ]
+        self._active_defenses.append({
+            "action": action,
+            "target": target,
+            "ttl": DEFENSE_TTL[action],
+            "side_effect": DEFENSE_SIDE_EFFECT[action],
+        })
+    def _decay_active_defenses(self) -> None:
+        next_controls = []
+        for control in self._active_defenses:
+            updated = dict(control)
+            updated["ttl"] = int(updated.get("ttl", 0)) - 1
+            if updated["ttl"] > 0:
+                next_controls.append(updated)
+        self._active_defenses = next_controls
+    def _active_defense_snapshot(self) -> List[Dict[str, Any]]:
+        return [dict(control) for control in self._active_defenses]
+    def _metadata_with_defenses(self, metadata: Dict[str, Any]) -> Dict[str, Any]:
+        updated = dict(metadata or {})
+        updated["active_defenses"] = self._active_defense_snapshot()
+        updated["mission_profile"] = self._mission_profile
+        updated["world_split"] = self._world_split
+        updated["world_family"] = self._world_family
+        updated["operational_mode"] = self._operational_mode
+        updated["available_tools"] = [dict(tool) for tool in AVAILABLE_SOC_TOOLS]
+        updated["tool_trace"] = [dict(row) for row in self._tool_trace]
+        updated["tool_calls_this_turn"] = len([
+            row for row in self._tool_trace
+            if int(row.get("turn", -1)) == self._turn
+        ])
+        return updated
+    def _with_active_defense_alerts(self, turn_config: Dict[str, Any]) -> Dict[str, Any]:
+        if not self._active_defenses:
+            return turn_config
+        updated = dict(turn_config)
+        alerts = list(updated.get("active_alerts", []))
+        for control in self._active_defenses:
+            alerts.append(
+                "[CONTROL] "
+                f"{control['action']} active on {control['target']} "
+                f"(ttl={control['ttl']}, side_effect={control['side_effect']})"
+            )
+        updated["active_alerts"] = alerts
+        return updated
+    def _with_foothold_context(self, turn_config: Dict[str, Any]) -> Dict[str, Any]:
+        if (
+            self._task_name != "polymorphic-zero-day" or
+            not self._foothold_established or
+            not turn_config.get("is_benign", False)
+        ):
+            return turn_config
+        updated = dict(turn_config)
+        nodes = {name: dict(metrics) for name, metrics in updated.get("network_nodes", {}).items()}
+        payment = nodes.setdefault("payment_service", {})
+        payment["status"] = "suspicious"
+        payment["cpu"] = max(int(payment.get("cpu", 0)), 58)
+        payment["request_rate"] = max(int(payment.get("request_rate", 0)), 130)
+        alerts = list(updated.get("active_alerts", []))
+        alerts.append(
+            "[FOOTHOLD] Low-volume internal callback resembles benign service noise, "
+            "but prior missed containment means deception is required."
+        )
+        updated["network_nodes"] = nodes
+        updated["active_alerts"] = alerts
+        updated["attack_stage"] = self._attacker.current_stage()
+        updated["is_benign"] = False
+        updated["strategy"] = "lateral_movement"
+        updated["correct_action"] = "honeypot"
+        updated["correct_target"] = "payment_service"
+        updated["foothold_context"] = True
+        return updated
+    def _prepare_turn_config(self, turn_config: Dict[str, Any]) -> Dict[str, Any]:
+        updated = self._with_foothold_context(turn_config)
+        updated = self._apply_operational_mode(updated)
+        updated = self._with_active_defense_alerts(updated)
+        return updated
+    def _apply_operational_mode(self, turn_config: Dict[str, Any]) -> Dict[str, Any]:
+        updated = dict(turn_config)
+        alerts = list(updated.get("active_alerts", []))
+        strategy = str(updated.get("strategy", "benign"))
+        stage = str(updated.get("attack_stage", "recon"))
+        if self._task_name == "dual-pivot" and strategy == "lateral_movement":
+            if self._operational_mode == "evidence_preservation":
+                updated["correct_action"] = "honeypot"
+                alerts.append(
+                    self._surface_text(
+                        "[MODE] Checkout incident posture favors preserving attacker visibility until identity misuse is corroborated."
+                    )
+                )
+            else:
+                updated["correct_action"] = "isolate"
+                alerts.append(
+                    self._surface_text(
+                        "[MODE] Checkout incident posture favors decisive containment once the pivot is corroborated."
+                    )
+                )
+        if self._task_name == "polymorphic-zero-day" and not updated.get("is_benign", False):
+            if self._operational_mode in {"forensic_hold", "business_continuity"} and strategy in {"lateral_movement", "exfiltration"}:
+                updated["correct_action"] = "honeypot"
+                alerts.append(
+                    self._surface_text(
+                        "[MODE] Enterprise posture favors deception over immediate isolation while mapping the callback path."
+                    )
+                )
+            elif self._operational_mode == "containment_first":
+                alerts.append(
+                    self._surface_text(
+                        "[MODE] Enterprise posture favors immediate containment once compromise is corroborated."
+                    )
+                )
+            if self._operational_mode == "business_continuity" and stage == "exploit":
+                nodes = {name: dict(metrics) for name, metrics in updated.get("network_nodes", {}).items()}
+                payment = nodes.setdefault("payment_service", {})
+                payment["status"] = "elevated"
+                payment["request_rate"] = max(int(payment.get("request_rate", 0)), 122)
+                updated["network_nodes"] = nodes
+                alerts.append(
+                    self._surface_text(
+                        "[MODE] Customer traffic remains sensitive; service continuity pressure is elevated during this window."
+                    )
+                )
+        updated["active_alerts"] = alerts
+        updated["world_split"] = self._world_split
+        updated["world_family"] = self._world_family
+        updated["operational_mode"] = self._operational_mode
+        return updated
+    def _surface_text(self, text: str) -> str:
+        return self._attacker._surface(text)
+def _action_value(value: Any, default: str) -> str:
+    """Serialize action fields without leaking Enum member names."""
+    if value is None:
+        return default
+    if isinstance(value, Enum):
+        return str(value.value)
+    return str(value)
+def _action_float(value: Any, default: float) -> float:
+    """Coerce optional numeric action fields to floats with a safe fallback."""
+    if value is None:
+        return float(default)
+    try:
+        return float(value)
+    except (TypeError, ValueError):
+        return float(default)
+def _replay_result(info: Dict[str, Any]) -> str:
+    """Map grader text into compact replay result labels."""
+    reason = str(info.get("score_reason", "")).lower()
+    if "false positive" in reason:
+        return "false_positive"
+    if reason.startswith("unverified"):
+        return "unverified"
+    if reason.startswith("optimal") or reason.startswith("correct") or reason.startswith("context-aware optimal"):
+        return "optimal"
+    if reason.startswith("heavy-handed"):
+        return "heavy"
+    return "wrong"
+def _tool_summary(result: Dict[str, Any]) -> str:
+    if result.get("error"):
+        return str(result["error"])[:120]
+    if result.get("tool") == "log_search":
+        events = result.get("events") or []
+        return str(events[0])[:120] if events else "no matching log events"
+    if result.get("tool") == "cmdb_lookup":
+        deps = result.get("dependencies") or []
+        return f"{result.get('node')} criticality={result.get('criticality')} deps={len(deps)}"
+    if result.get("tool") == "edr_status":
+        return (
+            f"containment={result.get('containment')} "
+            f"beaconing={result.get('beaconing')} "
+            f"persistence={result.get('persistence')}"
+        )
+    if result.get("tool") == "vuln_lookup":
+        return f"risk={result.get('risk')} finding={result.get('finding')}"
+    if result.get("tool") == "identity_lookup":
+        return (
+            f"account={result.get('account')} "
+            f"source={result.get('recent_source_host')} "
+            f"anomaly={result.get('identity_anomaly')}"
+        )
+    if result.get("tool") == "change_calendar_lookup":
+        return (
+            f"scheduled={result.get('scheduled')} "
+            f"window={result.get('window')} "
+            f"change={result.get('change_type')}"
+        )
+    if result.get("tool") == "netflow_lookup":
+        return (
+            f"pattern={result.get('traffic_pattern')} "
+            f"east_west={result.get('east_west_connections')} "
+            f"outbound_mb={result.get('outbound_mb')}"
+        )
+    return str(result)[:120]
+def _public_tool_result(result: Dict[str, Any]) -> Dict[str, Any]:
+    """Return observable tool facts without leaking grader labels."""
+    hidden = {"evidence_type", "verified"}
+    public = {key: value for key, value in result.items() if key not in hidden}
+    public["result_summary"] = _tool_summary(result)
+    return public
+def _degrade_handoff(
+    phase1_output: Dict[str, Any],
+    turn_config: Dict[str, Any],
+    task_name: str,
+    turn: int,
+) -> Dict[str, Any]:
+    """Degrade hard-task handoffs after the strategy shift without changing grading."""
+    output = dict(phase1_output)
+    if (
+        task_name == "dual-pivot" and
+        turn >= 4 and
+        turn_config.get("strategy") == "lateral_movement"
+    ):
+        output["confidence"] = min(float(output.get("confidence", 0.5)), 0.56)
+        output["recommended_action"] = "monitor"
+        output["handoff_quality"] = "degraded"
+        output["handoff_note"] = (
+            "Analyst handoff is less reliable after the pivot; executor should verify "
+            "internal movement before taking payment_service offline."
+        )
+    elif (
+        task_name == "polymorphic-zero-day" and
+        turn >= 4 and
+        turn_config.get("strategy") in {"lateral_movement", "exfiltration", "supply_chain"}
+    ):
+        output["confidence"] = min(float(output.get("confidence", 0.5)), 0.36)
+        output["recommended_action"] = "monitor"
+        output["handoff_quality"] = "degraded"
+        output["handoff_note"] = (
+            "Analyst confidence degraded after attacker pivot; executor must verify "
+            "whether monitor is too passive for the current stage."
+        )
+    else:
+        output["handoff_quality"] = "clean"
+    return output

server/app.py ADDED Viewed

	@@ -0,0 +1,173 @@

+"""
+AdaptShield FastAPI Server
+CRITICAL: Uses factory pattern (make_env function), NOT singleton.
+Singleton was the Round 1 failure — always served wrong task.
+Factory creates a fresh isolated instance per evaluator session.
+openenv validate requires:
+- def main() function present
+- called as main() in if __name__ block (literal string check)
+- port 7860 (HF Spaces default)
+"""
+import os
+import sys
+from typing import Any, Dict
+from uuid import uuid4
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+try:
+    from fastapi import Body, HTTPException
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:
+    raise ImportError(
+        "openenv-core required. Install: pip install openenv-core"
+    ) from e
+from models import AdaptShieldAction, AdaptShieldObservation
+from server.adaptshield_environment import AdaptShieldEnvironment
+DEFAULT_TASK = os.getenv("ADAPTSHIELD_TASK", "direct-triage")
+SOC_SESSIONS: Dict[str, AdaptShieldEnvironment] = {}
+def make_env() -> AdaptShieldEnvironment:
+    """
+    Factory function — fresh isolated instance per session.
+    Never a singleton. Evaluator sessions must be independent.
+    """
+    return AdaptShieldEnvironment(task_name=DEFAULT_TASK)
+app = create_app(
+    make_env,
+    AdaptShieldAction,
+    AdaptShieldObservation,
+    env_name="adaptshield",
+    max_concurrent_envs=10,
+)
+@app.post("/soc/reset", tags=["AdaptShield SOC Tools"])
+async def soc_reset(payload: Dict[str, Any] = Body(default_factory=dict)) -> Dict[str, Any]:
+    """Start a persistent demo session for SOC tool/API workflows."""
+    task = str(payload.get("task", DEFAULT_TASK))
+    env = AdaptShieldEnvironment(task_name=task)
+    obs = env.reset()
+    session_id = str(uuid4())
+    SOC_SESSIONS[session_id] = env
+    return {
+        "session_id": session_id,
+        "observation": obs.model_dump(mode="json"),
+        "available_tools": obs.metadata.get("available_tools", []),
+    }
+@app.post("/soc/step", tags=["AdaptShield SOC Tools"])
+async def soc_step(payload: Dict[str, Any] = Body(default_factory=dict)) -> Dict[str, Any]:
+    """Step a persistent SOC tool/API session."""
+    env = _soc_session(payload)
+    try:
+        action = AdaptShieldAction(**dict(payload.get("action", {})))
+    except Exception as exc:
+        raise HTTPException(status_code=422, detail=str(exc)) from exc
+    obs = env.step(action)
+    return {
+        "session_id": payload.get("session_id"),
+        "observation": obs.model_dump(mode="json"),
+        "reward": float(obs.reward),
+        "done": bool(obs.done),
+    }
+@app.post("/tools/log_search", tags=["AdaptShield SOC Tools"])
+async def tool_log_search(payload: Dict[str, Any] = Body(default_factory=dict)) -> Dict[str, Any]:
+    """Search stateful SIEM/application logs for the active session."""
+    return _soc_session(payload).call_tool(
+        "log_search",
+        node=payload.get("node", payload.get("target_node", "unknown")),
+        query=payload.get("query", ""),
+    )
+@app.post("/tools/cmdb_lookup", tags=["AdaptShield SOC Tools"])
+async def tool_cmdb_lookup(payload: Dict[str, Any] = Body(default_factory=dict)) -> Dict[str, Any]:
+    """Look up service ownership, criticality, and dependency blast radius."""
+    return _soc_session(payload).call_tool(
+        "cmdb_lookup",
+        node=payload.get("node", payload.get("target_node", "unknown")),
+    )
+@app.post("/tools/edr_status", tags=["AdaptShield SOC Tools"])
+async def tool_edr_status(payload: Dict[str, Any] = Body(default_factory=dict)) -> Dict[str, Any]:
+    """Check endpoint containment and persistence indicators."""
+    return _soc_session(payload).call_tool(
+        "edr_status",
+        node=payload.get("node", payload.get("target_node", "unknown")),
+    )
+@app.post("/tools/vuln_lookup", tags=["AdaptShield SOC Tools"])
+async def tool_vuln_lookup(payload: Dict[str, Any] = Body(default_factory=dict)) -> Dict[str, Any]:
+    """Query internal vulnerability/advisory evidence for a service package."""
+    return _soc_session(payload).call_tool(
+        "vuln_lookup",
+        node=payload.get("node", payload.get("target_node", "unknown")),
+        package=payload.get("package", ""),
+    )
+@app.post("/tools/identity_lookup", tags=["AdaptShield SOC Tools"])
+async def tool_identity_lookup(payload: Dict[str, Any] = Body(default_factory=dict)) -> Dict[str, Any]:
+    """Inspect account behavior and unusual source-host affinity for a service identity."""
+    return _soc_session(payload).call_tool(
+        "identity_lookup",
+        node=payload.get("node", payload.get("target_node", "unknown")),
+    )
+@app.post("/tools/change_calendar_lookup", tags=["AdaptShield SOC Tools"])
+async def tool_change_calendar_lookup(payload: Dict[str, Any] = Body(default_factory=dict)) -> Dict[str, Any]:
+    """Check whether a deploy or maintenance window was actually scheduled."""
+    return _soc_session(payload).call_tool(
+        "change_calendar_lookup",
+        node=payload.get("node", payload.get("target_node", "unknown")),
+    )
+@app.post("/tools/netflow_lookup", tags=["AdaptShield SOC Tools"])
+async def tool_netflow_lookup(payload: Dict[str, Any] = Body(default_factory=dict)) -> Dict[str, Any]:
+    """Inspect east-west and outbound traffic summaries for the active session."""
+    return _soc_session(payload).call_tool(
+        "netflow_lookup",
+        node=payload.get("node", payload.get("target_node", "unknown")),
+    )
+def _soc_session(payload: Dict[str, Any]) -> AdaptShieldEnvironment:
+    session_id = str(payload.get("session_id", ""))
+    env = SOC_SESSIONS.get(session_id)
+    if env is None:
+        raise HTTPException(
+            status_code=404,
+            detail="Unknown SOC session. Call /soc/reset first.",
+        )
+    return env
+def main(host: str = "0.0.0.0", port: int = 7860) -> None:
+    """Start the uvicorn server. Call main() to run."""
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=7860)
+    args = parser.parse_args()
+    main()

server/attacker.py ADDED Viewed

	@@ -0,0 +1,361 @@

+"""
+AdaptShield Attacker Engine
+Deterministic scripted Python adversary — NEVER an LLM.
+Implements attack stage progression within episodes.
+Rotates strategies between episodes on a fixed schedule.
+The internal episode counter is NEVER passed to the agent.
+Agent must infer strategy from network signals only.
+"""
+import os
+import random
+from typing import Any, Dict, Tuple
+# ── Strategy definitions ───────────────────────────────────────────────────
+STRATEGY_MAP: Dict[str, Dict[str, str]] = {
+    "brute_force": {
+        "correct_action": "rate_limit",
+        "correct_target": "auth_service",
+        "description":    "Credential stuffing attack on authentication service",
+    },
+    "lateral_movement": {
+        "correct_action": "isolate",
+        "correct_target": "payment_service",
+        "description":    "Attacker pivoting from compromised host to internal nodes",
+    },
+    "exfiltration": {
+        "correct_action": "honeypot",
+        "correct_target": "database",
+        "description":    "Slow sustained data exfiltration from database",
+    },
+    "supply_chain": {
+        "correct_action": "patch",
+        "correct_target": "api_gateway",
+        "description":    "Malicious package injected into update pipeline",
+    },
+}
+# Strategy rotation order
+STRATEGY_ROTATION = [
+    "brute_force",
+    "lateral_movement",
+    "exfiltration",
+    "supply_chain",
+]
+# Attack stages within an episode
+# Each stage escalates if agent fails to act
+ATTACK_STAGES = ["recon", "exploit", "exfiltration"]
+# Baseline healthy network metrics
+BASELINE_NODES: Dict[str, Dict[str, Any]] = {
+    "auth_service":    {"status": "healthy", "request_rate": 120, "error_rate": 0.01, "cpu": 22, "outbound_mb": 2},
+    "payment_service": {"status": "healthy", "request_rate": 85,  "error_rate": 0.01, "cpu": 31, "outbound_mb": 5},
+    "database":        {"status": "healthy", "request_rate": 34,  "error_rate": 0.01, "cpu": 18, "outbound_mb": 8},
+    "api_gateway":     {"status": "healthy", "request_rate": 203, "error_rate": 0.02, "cpu": 29, "outbound_mb": 3},
+}
+SOURCE_SUBNETS = ["192.168.1.x", "10.44.8.x", "172.16.22.x", "203.0.113.x"]
+SERVICE_ACCOUNTS = ["svc_internal", "svc_billing", "svc_reporter", "deploy_bot"]
+PACKAGE_NAMES = ["core-auth-lib", "gateway-router", "payment-sdk", "session-cache"]
+DB_TABLES = ["customer_tokens", "invoice_archive", "payment_methods", "audit_events"]
+ALERT_SOURCES = ["SIEM", "EDR", "WAF", "NETFLOW"]
+FAMILY_REPLACEMENTS = {
+    "train-a": {
+        "failed login attempts": "failed login attempts",
+        "password spray pattern": "password spray pattern",
+        "Unusual internal connections": "Unusual internal connections",
+        "Cross-node traffic volume": "Cross-node traffic volume",
+        "outbound traffic": "outbound traffic",
+        "Large compressed archive creation activity": "Large compressed archive creation activity",
+        "unsigned package update request": "unsigned package update request",
+        "binary hash mismatch": "binary hash mismatch",
+        "Unexpected outbound connection": "Unexpected outbound connection",
+        "scheduled analytics cache refresh": "scheduled analytics cache refresh",
+    },
+    "train-b": {
+        "failed login attempts": "authentication rejections",
+        "password spray pattern": "credential abuse spread",
+        "Unusual internal connections": "Unexpected service-mesh hops",
+        "Cross-node traffic volume": "Identity-linked east-west volume",
+        "outbound traffic": "egress volume",
+        "Large compressed archive creation activity": "archive staging activity",
+        "unsigned package update request": "unapproved release request",
+        "binary hash mismatch": "manifest integrity drift",
+        "Unexpected outbound connection": "post-deploy callback session",
+        "scheduled analytics cache refresh": "scheduled cache tuning window",
+    },
+    "eval-x": {
+        "failed login attempts": "auth rejection burst",
+        "password spray pattern": "credential reuse sweep",
+        "Unusual internal connections": "callback-path fan-out",
+        "Cross-node traffic volume": "mesh traffic clustering",
+        "outbound traffic": "data egress pressure",
+        "Large compressed archive creation activity": "bundle staging activity",
+        "unsigned package update request": "release provenance anomaly",
+        "binary hash mismatch": "artifact provenance drift",
+        "Unexpected outbound connection": "release-linked callback session",
+        "scheduled analytics cache refresh": "approved observability warmup",
+    },
+    "eval-y": {
+        "failed login attempts": "lockout storm",
+        "password spray pattern": "shared-secret sweep",
+        "Unusual internal connections": "lateral fan-out path",
+        "Cross-node traffic volume": "cross-domain session churn",
+        "outbound traffic": "archive egress volume",
+        "Large compressed archive creation activity": "sealed archive staging",
+        "unsigned package update request": "cross-approval deploy request",
+        "binary hash mismatch": "release integrity anomaly",
+        "Unexpected outbound connection": "unknown release callback",
+        "scheduled analytics cache refresh": "scheduled edge warmup",
+    },
+}
+class AttackerEngine:
+    """
+    Polymorphic scripted attacker with stage progression.
+    Within an episode: attack progresses through recon → exploit → exfiltration
+    if the agent fails to act correctly. Early correct action stops progression.
+    Between episodes: strategy rotates on a fixed schedule per task.
+    Hard task additionally shifts strategy mid-episode after turn 3.
+    """
+    def __init__(self, task_name: str, world_family: str = "train-a"):
+        random.seed(int(os.environ.get("ADAPTSHIELD_SEED", random.randint(0, 9999))))
+        self.task_name     = task_name
+        self.world_family  = world_family
+        self._episode      = 0   # internal — NEVER passed to agent
+        self._turn         = 0   # within-episode turn counter
+        self._stage_idx    = 0   # current attack stage index
+        self._escalated    = False  # did agent miss a turn?
+        self._shift_every = {
+            "direct-triage":        9999,
+            "dual-pivot":           20,
+            "polymorphic-zero-day": 10,
+        }.get(task_name, 9999)
+        self._noise_rate = 0.15 if task_name == "polymorphic-zero-day" else 0.0
+    # ── Public interface ───────────────────────────────────────────────────
+    def reset_episode(self) -> None:
+        """Called at the start of each episode."""
+        self._turn      = 0
+        self._stage_idx = 0
+        self._escalated = False
+    def current_strategy(self) -> str:
+        """Return active strategy — accounts for mid-episode shift on hard task."""
+        base_idx = (self._episode // self._shift_every) % len(STRATEGY_ROTATION)
+        # Medium task: one clean pivot from brute force to lateral movement.
+        if self.task_name == "dual-pivot" and self._turn >= 3:
+            base_idx = (base_idx + 1) % 2
+        # Hard task: strategy shifts mid-episode after turn 3
+        if self.task_name == "polymorphic-zero-day" and self._turn >= 3:
+            base_idx = (base_idx + 1) % len(STRATEGY_ROTATION)
+        return STRATEGY_ROTATION[base_idx]
+    def current_stage(self) -> str:
+        """Return current attack stage."""
+        return ATTACK_STAGES[min(self._stage_idx, len(ATTACK_STAGES) - 1)]
+    def build_observation(self) -> Dict[str, Any]:
+        """
+        Build the network observation for current turn.
+        Returns ambiguous signals — no explicit strategy names or codes.
+        Agent must reason about what the metrics mean.
+        """
+        is_benign = random.random() < self._noise_rate
+        strategy  = self.current_strategy()
+        stage     = self.current_stage()
+        nodes  = {k: dict(v) for k, v in BASELINE_NODES.items()}
+        alerts = []
+        if is_benign:
+            alerts = self._build_benign_alerts()
+            nodes["api_gateway"]["cpu"]    = 71
+            nodes["api_gateway"]["status"] = "elevated"
+            return {
+                "network_nodes":  nodes,
+                "active_alerts":  alerts,
+                "attack_stage":   "none",
+                "is_benign":      True,
+                "strategy":       "benign",
+                "correct_action": "monitor",
+                "correct_target": "none",
+            }
+        nodes, alerts = self._build_attack_state(strategy, stage, nodes)
+        return {
+            "network_nodes":  nodes,
+            "active_alerts":  alerts,
+            "attack_stage":   stage,
+            "is_benign":      False,
+            "strategy":       strategy,
+            "correct_action": STRATEGY_MAP[strategy]["correct_action"],
+            "correct_target": STRATEGY_MAP[strategy]["correct_target"],
+        }
+    def advance_turn(self, agent_acted_correctly: bool) -> None:
+        """
+        Advance turn counter. Escalate attack stage if agent failed.
+        Called after grading each Phase 2 action.
+        """
+        self._turn += 1
+        if not agent_acted_correctly:
+            self._escalated = True
+            self._stage_idx = min(self._stage_idx + 1, len(ATTACK_STAGES) - 1)
+    def advance_episode(self) -> None:
+        """Called at the end of each episode."""
+        self._episode += 1
+        self.reset_episode()
+    # ── Internal signal builders ───────────────────────────────────────────
+    def _build_benign_alerts(self) -> list:
+        """Build convincing but harmless noise alerts."""
+        deploy_user = random.choice(SERVICE_ACCOUNTS)
+        alert_prefix = self._alert_prefix()
+        options = [
+            [
+                f"{alert_prefix} Scheduled database backup completed: 2.3GB archived",
+                f"{alert_prefix} User '{deploy_user}' executed routine deployment script",
+                f"{alert_prefix} api_gateway: elevated CPU due to analytics batch job",
+            ],
+            [
+                f"{alert_prefix} Certificate renewal completed for *.internal.corp",
+                f"{alert_prefix} Routine log rotation completed on all nodes",
+                f"{alert_prefix} payment_service: normal traffic spike during billing window",
+            ],
+            [
+                f"{alert_prefix} Health check: all nodes nominal",
+                f"{alert_prefix} auth_service: password policy enforcement triggered for 3 users",
+                f"{alert_prefix} Scheduled vulnerability scan completed: 0 critical findings",
+            ],
+        ]
+        return [self._surface(line) for line in random.choice(options)]
+    def _build_attack_state(
+        self,
+        strategy: str,
+        stage: str,
+        nodes: Dict,
+    ) -> Tuple[Dict, list]:
+        """
+        Build network state and alerts for an active attack.
+        Signals are intentionally ambiguous — no strategy names or MITRE codes.
+        Severity escalates with attack stage.
+        """
+        stage_multiplier = {"recon": 1.0, "exploit": 1.6, "exfiltration": 2.4}
+        mult = stage_multiplier.get(stage, 1.0)
+        alerts = []
+        alert_prefix = self._alert_prefix()
+        if strategy == "brute_force":
+            subnet = random.choice(SOURCE_SUBNETS)
+            nodes["auth_service"]["status"]       = "degraded" if stage != "recon" else "elevated"
+            nodes["auth_service"]["request_rate"] = int(340 * mult)
+            nodes["auth_service"]["error_rate"]   = min(0.48, 0.18 * mult)
+            nodes["auth_service"]["cpu"]          = min(96, int(45 * mult))
+            if self.task_name == "direct-triage":
+                nodes["api_gateway"]["status"] = "elevated"
+                nodes["api_gateway"]["cpu"] = max(nodes["api_gateway"]["cpu"], 54)
+            alerts = [
+                f"{alert_prefix} auth_service: {int(120 * mult)} failed login attempts in last 60s from subnet {subnet}",
+                f"{alert_prefix} auth_service: error rate {nodes['auth_service']['error_rate']:.0%}; {stage} activity detected",
+                f"{alert_prefix} payment_service: normal transaction volume",
+            ]
+            if self.task_name == "direct-triage":
+                alerts.append(
+                    f"{alert_prefix} api_gateway: elevated CPU overlaps with scheduled analytics cache refresh"
+                )
+        elif strategy == "lateral_movement":
+            account = random.choice(SERVICE_ACCOUNTS)
+            nodes["payment_service"]["status"]       = "suspicious"
+            nodes["payment_service"]["request_rate"] = int(95 * mult)
+            nodes["payment_service"]["cpu"]          = int(38 * mult)
+            nodes["auth_service"]["request_rate"]    = int(155 * mult)
+            add_auth_distractor = (
+                (self.task_name == "dual-pivot" and self._turn == 3) or
+                (self.task_name == "polymorphic-zero-day" and self._turn >= 3)
+            )
+            if add_auth_distractor:
+                nodes["auth_service"]["status"]     = "elevated"
+                nodes["auth_service"]["error_rate"] = min(0.36, 0.14 * mult)
+                nodes["auth_service"]["cpu"]        = min(90, int(42 * mult))
+            alerts = [
+                f"{alert_prefix} Unusual internal connections: auth_service -> payment_service ({int(40 * mult)}/min)",
+                f"{alert_prefix} payment_service: service account '{account}' authenticated from new host",
+                f"{alert_prefix} Cross-node traffic volume {mult:.1f}x above 7-day baseline",
+            ]
+            if add_auth_distractor:
+                alerts.append(
+                    f"{alert_prefix} auth_service: residual failed-login noise from earlier credential attack"
+                )
+                if self.task_name == "dual-pivot":
+                    alerts.append(
+                        f"{alert_prefix} payment_service: emergency checkout tuning window is not on the approved change calendar"
+                    )
+        elif strategy == "exfiltration":
+            table = random.choice(DB_TABLES)
+            nodes["database"]["status"]            = "suspicious" if stage != "recon" else "elevated"
+            nodes["database"]["outbound_mb"]       = int(8 * mult * 12)
+            nodes["database"]["request_rate"]      = int(34 * mult * 3)
+            nodes["database"]["cpu"]               = int(18 * mult * 1.5)
+            alerts = [
+                f"{alert_prefix} database: outbound traffic {nodes['database']['outbound_mb']}MB/hr; {mult:.1f}x above baseline",
+                f"{alert_prefix} database: {int(280 * mult)} sequential reads from table '{table}'",
+                f"{alert_prefix} Large compressed archive creation activity on database node",
+            ]
+            if self.task_name == "polymorphic-zero-day":
+                alerts.append(
+                    f"{alert_prefix} payment_service: reused workload credential appears on east-west callback path"
+                )
+        elif strategy == "supply_chain":
+            package = random.choice(PACKAGE_NAMES)
+            nodes["api_gateway"]["status"] = "suspicious"
+            nodes["api_gateway"]["cpu"]    = int(29 + 20 * mult)
+            alerts = [
+                f"{alert_prefix} api_gateway: unsigned package update request received from external source",
+                f"{alert_prefix} api_gateway: binary hash mismatch on dependency '{package}'",
+                f"{alert_prefix} Unexpected outbound connection from api_gateway to unrecognized host",
+            ]
+            if self.task_name == "polymorphic-zero-day":
+                alerts.append(
+                    f"{alert_prefix} change calendar: no approved gateway deploy window covers this request"
+                )
+        return nodes, [self._surface(alert) for alert in alerts]
+    def _alert_prefix(self) -> str:
+        """Return deterministic-looking SOC alert metadata under ADAPTSHIELD_SEED."""
+        source = random.choice(ALERT_SOURCES)
+        alert_id = random.randint(1000, 9999)
+        minute = random.randint(0, 59)
+        second = random.randint(0, 59)
+        return f"[{source}-{alert_id} 03:{minute:02d}:{second:02d}Z]"
+    def _surface(self, text: str) -> str:
+        surfaced = str(text)
+        for source, target in FAMILY_REPLACEMENTS.get(self.world_family, {}).items():
+            surfaced = surfaced.replace(source, target)
+        return surfaced

server/grader.py ADDED Viewed

	@@ -0,0 +1,501 @@

+"""
+AdaptShield Grader
+Fully deterministic reward computation.
+No NLP, no LLM-as-judge, no external API calls.
+Runs in milliseconds.
+INVARIANT: All return values are strictly in [0.01, 0.99].
+normalized_score is ALWAYS present in metadata — never missing.
+"""
+from typing import Any, Dict, List, Tuple
+# ── Reward constants ───────────────────────────────────────────────────────
+BASE_REWARD         =  0.50   # survival — agent gets this for staying alive
+P1_TYPE_BONUS       =  0.15   # Phase 1 correctly identified threat type
+P1_TARGET_BONUS     =  0.10   # Phase 1 correctly identified target node
+P2_OPTIMAL          =  0.39   # correct + efficient action (0.50+0.39 = 0.89 base max)
+P2_HEAVY            =  0.18   # correct action but heavy-handed
+P2_UNVERIFIED       =  0.10   # right hard-task action without investigative evidence
+P2_WRONG            = -0.25   # wrong action on real threat
+FALSE_POSITIVE      = -0.39   # acted aggressively on benign event
+STAGE_ESCALATION    = -0.10   # penalty applied when attack stage escalates
+CATASTROPHIC        = -0.49   # database exfiltration completed → done=True
+# Optimal vs heavy-handed action mapping
+OPTIMAL_ACTION = {
+    "brute_force":      "rate_limit",
+    "lateral_movement": "isolate",
+    "exfiltration":     "honeypot",
+    "supply_chain":     "patch",
+}
+HEAVY_ACTION = {
+    "brute_force":      "isolate",    # works but unnecessary downtime
+    "lateral_movement": "honeypot",   # works but slower
+    "exfiltration":     "isolate",    # works but database goes offline
+    "supply_chain":     "isolate",    # works but gateway goes offline
+}
+# Catastrophic: exfiltration succeeds if agent uses wrong action 3+ times
+EXFIL_CATASTROPHIC_ACTIONS = ["monitor", "rate_limit", "patch"]
+# Operational model: criticality and dependency blast radius.
+# This is intentionally lightweight so it strengthens realism without making
+# rewards noisy or hard to train against.
+ASSET_CRITICALITY = {
+    "auth_service":    0.70,
+    "payment_service": 0.90,
+    "database":        1.00,
+    "api_gateway":     0.80,
+    "none":            0.00,
+    "unknown":         0.50,
+}
+SERVICE_DEPENDENCIES = {
+    "auth_service":    ["payment_service"],
+    "payment_service": ["api_gateway"],
+    "database":        ["payment_service", "api_gateway"],
+    "api_gateway":     ["auth_service", "payment_service", "database"],
+    "none":            [],
+    "unknown":         [],
+}
+ACTION_DISRUPTION = {
+    "monitor":    0.00,
+    "patch":      0.06,
+    "rate_limit": 0.10,
+    "honeypot":   0.12,
+    "isolate":    0.35,
+}
+MAX_OPERATIONAL_PENALTY = 0.05
+MAX_MISSION_ADJUSTMENT = 0.04
+BASE_REQUIRED_TOOL_FUSION = {
+    "brute_force":      {"log_search", "cmdb_lookup"},
+    "lateral_movement": {"edr_status", "log_search"},
+    "exfiltration":     {"log_search", "edr_status"},
+    "supply_chain":     {"vuln_lookup", "log_search"},
+}
+TASK_REQUIRED_TOOL_FUSION = {
+    "direct-triage": {
+        "brute_force": {"log_search"},
+    },
+    "dual-pivot": {
+        "lateral_movement": {"edr_status", "log_search", "identity_lookup"},
+    },
+    "polymorphic-zero-day": {
+        "brute_force": {"log_search", "cmdb_lookup", "identity_lookup"},
+        "lateral_movement": {"edr_status", "log_search", "identity_lookup", "cmdb_lookup"},
+        "exfiltration": {"log_search", "edr_status", "netflow_lookup", "cmdb_lookup"},
+        "supply_chain": {"vuln_lookup", "log_search", "change_calendar_lookup", "cmdb_lookup"},
+    },
+}
+def grade_step(
+    phase1_action: Dict[str, Any],
+    phase2_action: Dict[str, Any],
+    turn_config:   Dict[str, Any],
+    stage:         str,
+    consecutive_wrong: int,
+    task_name: str = "",
+    foothold_established: bool = False,
+    mission_profile: Dict[str, Any] | None = None,
+    tool_context: Dict[str, Any] | None = None,
+) -> Tuple[float, bool, Dict[str, Any]]:
+    """
+    Grade a complete two-phase step.
+    Args:
+        phase1_action:     Agent's Phase 1 output (threat assessment)
+        phase2_action:     Agent's Phase 2 output (defensive action)
+        turn_config:       Ground truth from AttackerEngine.build_observation()
+        stage:             Current attack stage (recon/exploit/exfiltration)
+        consecutive_wrong: How many consecutive wrong actions agent has taken
+    Returns:
+        (reward, catastrophic_failure, info_dict)
+        reward is ALWAYS in range [0.01, 0.99]
+    """
+    is_benign      = turn_config.get("is_benign", False)
+    strategy       = turn_config.get("strategy", "benign")
+    correct_action = turn_config.get("correct_action", "monitor")
+    correct_target = turn_config.get("correct_target", "none")
+    mission_profile = mission_profile or {}
+    tool_context = tool_context or {}
+    contextual_countermeasure = (
+        task_name == "polymorphic-zero-day" and
+        foothold_established and
+        strategy == "lateral_movement"
+    )
+    if contextual_countermeasure:
+        correct_action = "honeypot"
+    p1_type   = str(phase1_action.get("threat_type", ""))
+    p1_target = str(phase1_action.get("target_node", ""))
+    p2_action = str(phase2_action.get("action", "monitor"))
+    p2_target = str(phase2_action.get("target_node", ""))
+    reward       = BASE_REWARD
+    catastrophic = False
+    info = {
+        "strategy":          strategy,
+        "stage":             stage,
+        "correct_action":    correct_action,
+        "correct_target":    correct_target,
+        "p1_threat_type":    p1_type,
+        "p1_target":         p1_target,
+        "p2_action":         p2_action,
+        "p2_target":         p2_target,
+        "is_benign":         is_benign,
+        "p1_type_correct":   False,
+        "p1_target_correct": False,
+        "score_reason":      "",
+        "business_impact":   0.0,
+        "availability_impact": 0.0,
+        "security_risk":     0.0,
+        "dependency_blast_radius": [],
+        "operational_penalty": 0.0,
+        "foothold_established": foothold_established,
+        "contextual_countermeasure": contextual_countermeasure,
+        "mission_profile": mission_profile,
+        "mission_alignment": "neutral",
+        "mission_adjustment": 0.0,
+        "tool_verification_required": False,
+        "tool_evidence_found": False,
+        "tool_fusion_required": [],
+        "tool_fusion_found": [],
+        "tool_count": int(tool_context.get("tool_count", 0) or 0),
+    }
+    # ── False positive path ────────────────────────────────────────────────
+    if is_benign:
+        if p2_action in ("rate_limit", "isolate", "honeypot", "patch"):
+            reward += FALSE_POSITIVE
+            info["acted_correctly"] = False
+            info["score_reason"] = (
+                f"False positive: used {p2_action} on benign event"
+            )
+        else:
+            reward += 0.15
+            info["acted_correctly"] = True
+            info["score_reason"] = "Correct: monitored benign event without overreacting"
+        _apply_operational_impact(
+            info=info,
+            action=p2_action,
+            target=p2_target,
+            result_kind="false_positive" if p2_action != "monitor" else "optimal",
+            strategy=strategy,
+            stage=stage,
+        )
+        _apply_mission_alignment(
+            info=info,
+            action=p2_action,
+            target=p2_target,
+            result_kind="false_positive" if p2_action != "monitor" else "optimal",
+            mission_profile=mission_profile,
+        )
+        reward += info["mission_adjustment"]
+        reward -= info["operational_penalty"]
+        return _clamp(reward), False, info
+    # ── Phase 1 accuracy bonuses ───────────────────────────────────────────
+    if p1_type == strategy:
+        reward += P1_TYPE_BONUS
+        info["p1_type_correct"] = True
+    if p1_target == correct_target:
+        reward += P1_TARGET_BONUS
+        info["p1_target_correct"] = True
+    # ── Catastrophic failure: exfiltration completed ───────────────────────
+    if (strategy == "exfiltration" and
+            stage == "exfiltration" and
+            p2_action in EXFIL_CATASTROPHIC_ACTIONS):
+        reward = BASE_REWARD + CATASTROPHIC
+        catastrophic = True
+        info["score_reason"] = (
+            "CATASTROPHIC: Database exfiltration completed — "
+            f"agent used {p2_action} instead of honeypot"
+        )
+        _apply_operational_impact(
+            info=info,
+            action=p2_action,
+            target=p2_target,
+            result_kind="wrong",
+            strategy=strategy,
+            stage=stage,
+        )
+        _apply_mission_alignment(
+            info=info,
+            action=p2_action,
+            target=p2_target,
+            result_kind="wrong",
+            mission_profile=mission_profile,
+        )
+        reward += info["mission_adjustment"]
+        return _clamp(reward), catastrophic, info
+    # ── Stage escalation penalty ───────────────────────────────────────────
+    if stage == "exploit" and consecutive_wrong >= 1:
+        reward += STAGE_ESCALATION
+    elif stage == "exfiltration" and consecutive_wrong >= 2:
+        reward += STAGE_ESCALATION * 2
+    # ── Phase 2 action grading ─────────────────────────────────────────────
+    optimal = correct_action
+    heavy = "" if contextual_countermeasure else HEAVY_ACTION.get(strategy, "")
+    if heavy == optimal:
+        heavy = ""
+    requires_tool_verification = (
+        not is_benign and
+        strategy in OPTIMAL_ACTION and
+        (
+            task_name == "polymorphic-zero-day" or
+            (task_name == "dual-pivot" and strategy == "lateral_movement") or
+            (task_name == "direct-triage" and strategy == "brute_force")
+        )
+    )
+    required_tools = _required_tool_fusion(task_name=task_name, strategy=strategy)
+    tool_evidence_found, fusion_found = _has_relevant_tool_evidence(
+        tool_context=tool_context,
+        strategy=strategy,
+        target=correct_target,
+        required_tools=required_tools,
+    )
+    info["tool_verification_required"] = requires_tool_verification
+    info["tool_evidence_found"] = tool_evidence_found
+    info["tool_fusion_required"] = sorted(required_tools)
+    info["tool_fusion_found"] = sorted(fusion_found)
+    if (
+        p2_action == optimal and
+        p2_target == correct_target and
+        requires_tool_verification and
+        not tool_evidence_found
+    ):
+        reward += P2_UNVERIFIED
+        result_kind = "unverified"
+        info["score_reason"] = (
+            f"Unverified correct action: {p2_action} on {p2_target} would help, "
+            f"but {task_name or 'this task'} requires stronger SOC evidence before full credit"
+        )
+    elif p2_action == optimal and p2_target == correct_target:
+        reward += P2_OPTIMAL
+        result_kind = "optimal"
+        if contextual_countermeasure:
+            info["score_reason"] = (
+                f"Context-aware optimal: {p2_action} on {p2_target} — "
+                "foothold already established, so deception beats isolation"
+            )
+        else:
+            info["score_reason"] = (
+                f"Optimal: {p2_action} on {p2_target} — attack stopped efficiently"
+            )
+    elif p2_action == optimal and p2_target != correct_target:
+        reward += P2_HEAVY * 0.5
+        result_kind = "wrong_target"
+        info["score_reason"] = (
+            f"Right action ({p2_action}) but wrong target "
+            f"(got {p2_target}, needed {correct_target})"
+        )
+    elif p2_action == heavy and p2_target == correct_target:
+        reward += P2_HEAVY
+        result_kind = "heavy"
+        info["score_reason"] = (
+            f"Heavy-handed: {p2_action} stopped attack on {p2_target} "
+            f"but caused unnecessary service disruption"
+        )
+    else:
+        reward += P2_WRONG
+        result_kind = "wrong"
+        info["score_reason"] = (
+            f"Wrong: {p2_action} on {p2_target} — "
+            f"needed {correct_action} on {correct_target}"
+        )
+    acted_correctly = p2_action in (optimal, heavy) and p2_target == correct_target
+    info["acted_correctly"] = acted_correctly
+    _apply_operational_impact(
+        info=info,
+        action=p2_action,
+        target=p2_target,
+        result_kind=result_kind,
+        strategy=strategy,
+        stage=stage,
+    )
+    _apply_mission_alignment(
+        info=info,
+        action=p2_action,
+        target=p2_target,
+        result_kind=result_kind,
+        mission_profile=mission_profile,
+    )
+    reward += info["mission_adjustment"]
+    reward -= info["operational_penalty"]
+    return _clamp(reward), catastrophic, info
+def _apply_mission_alignment(
+    info: Dict[str, Any],
+    action: str,
+    target: str,
+    result_kind: str,
+    mission_profile: Dict[str, Any],
+) -> None:
+    sla_priority = str(mission_profile.get("sla_priority", "balanced"))
+    primary_asset = str(mission_profile.get("primary_asset", "unknown"))
+    risk_tolerance = str(mission_profile.get("risk_tolerance", "medium"))
+    adjustment = 0.0
+    alignment = "neutral"
+    if sla_priority == "availability" and action == "isolate" and target == primary_asset:
+        adjustment -= MAX_MISSION_ADJUSTMENT
+        alignment = "sla_violation"
+    elif sla_priority == "availability" and result_kind == "optimal" and action in ("rate_limit", "patch", "monitor"):
+        adjustment += MAX_MISSION_ADJUSTMENT / 2
+        alignment = "sla_aligned"
+    elif sla_priority == "containment" and result_kind == "optimal" and action in ("honeypot", "isolate", "patch"):
+        adjustment += MAX_MISSION_ADJUSTMENT / 2
+        alignment = "containment_aligned"
+    elif risk_tolerance == "low" and result_kind in ("wrong", "wrong_target"):
+        adjustment -= MAX_MISSION_ADJUSTMENT / 2
+        alignment = "risk_misaligned"
+    info["mission_alignment"] = alignment
+    info["mission_adjustment"] = round(adjustment, 2)
+def _apply_operational_impact(
+    info: Dict[str, Any],
+    action: str,
+    target: str,
+    result_kind: str,
+    strategy: str,
+    stage: str,
+) -> None:
+    """
+    Add deterministic business-impact telemetry and a small bounded penalty.
+    The penalty is intentionally capped at 0.05 so existing learning curves keep
+    their shape while demos can explain service criticality and blast radius.
+    """
+    criticality = ASSET_CRITICALITY.get(target, ASSET_CRITICALITY["unknown"])
+    disruption = ACTION_DISRUPTION.get(action, 0.10)
+    dependents = SERVICE_DEPENDENCIES.get(target, [])
+    dependency_factor = min(1.0, 0.15 * len(dependents))
+    availability = round(min(1.0, disruption * (criticality + dependency_factor)), 2)
+    security = _security_risk(result_kind=result_kind, strategy=strategy, stage=stage)
+    impact = round(min(1.0, availability + security), 2)
+    if result_kind == "optimal":
+        penalty = 0.0
+    elif result_kind == "unverified":
+        penalty = round(min(MAX_OPERATIONAL_PENALTY, impact * MAX_OPERATIONAL_PENALTY / 2), 2)
+    else:
+        penalty = round(min(MAX_OPERATIONAL_PENALTY, impact * MAX_OPERATIONAL_PENALTY), 2)
+    info["business_impact"] = impact
+    info["availability_impact"] = availability
+    info["security_risk"] = security
+    info["dependency_blast_radius"] = dependents if disruption > 0 else []
+    info["operational_penalty"] = penalty
+def _security_risk(result_kind: str, strategy: str, stage: str) -> float:
+    if result_kind in ("optimal", "heavy"):
+        return 0.0
+    if result_kind == "unverified":
+        return 0.08
+    if result_kind == "false_positive":
+        return 0.0
+    stage_risk = {
+        "recon": 0.18,
+        "exploit": 0.32,
+        "exfiltration": 0.50,
+    }.get(stage, 0.20)
+    if strategy == "exfiltration":
+        stage_risk += 0.15
+    elif strategy == "lateral_movement":
+        stage_risk += 0.08
+    return round(min(1.0, stage_risk), 2)
+def _has_relevant_tool_evidence(
+    tool_context: Dict[str, Any],
+    strategy: str,
+    target: str,
+    required_tools: set[str],
+) -> Tuple[bool, set[str]]:
+    fusion_found = {
+        str(result.get("tool", ""))
+        for result in tool_context.get("tool_results", []) or []
+        if str(result.get("node", "")) == target
+    }
+    has_attack_evidence = False
+    for evidence in tool_context.get("evidence", []) or []:
+        if (
+            str(evidence.get("evidence_type", "")) == strategy and
+            str(evidence.get("node", "")) == target and
+            bool(evidence.get("verified", False))
+        ):
+            has_attack_evidence = True
+            break
+    return has_attack_evidence and required_tools.issubset(fusion_found), fusion_found
+def _required_tool_fusion(task_name: str, strategy: str) -> set[str]:
+    task_rules = TASK_REQUIRED_TOOL_FUSION.get(task_name, {})
+    if strategy in task_rules:
+        return set(task_rules[strategy])
+    return set(BASE_REQUIRED_TOOL_FUSION.get(strategy, set()))
+def _clamp(value: float) -> float:
+    """Strict bounds: never exactly 0.0 or 1.0."""
+    return max(0.01, min(0.99, round(value, 2)))
+def normalize_episode_score(rewards: List[float]) -> float:
+    """
+    Normalize episode rewards to a single score strictly in (0.01, 0.99).
+    ALWAYS returns a value — never raises, never returns exactly 0 or 1.
+    """
+    if not rewards:
+        return 0.50
+    total = sum(rewards)
+    n = len(rewards)
+    # Per-step rewards are clamped before they enter the episode reward list,
+    # so normalization must use the reachable ceiling instead of the raw
+    # unclamped sum of bonuses. Otherwise perfect episodes top out around 0.87.
+    max_step_reward = _clamp(
+        BASE_REWARD + P2_OPTIMAL + P1_TYPE_BONUS + P1_TARGET_BONUS + MAX_MISSION_ADJUSTMENT
+    )
+    min_step_reward = _clamp(BASE_REWARD + CATASTROPHIC)
+    max_poss = n * max_step_reward
+    min_poss = n * min_step_reward
+    if max_poss == min_poss:
+        return 0.50
+    raw = (total - min_poss) / (max_poss - min_poss)
+    return _clamp(raw)

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv[core]>=0.2.0
+fastapi>=0.115.0
+uvicorn>=0.24.0

server/scenarios.py ADDED Viewed

	@@ -0,0 +1,264 @@

+"""
+AdaptShield Scenarios
+Observation builder and system prompts.
+IMPORTANT: No MITRE codes in alerts. No explicit strategy names.
+Agent must reason from raw metrics — not pattern match on codes.
+"""
+import random
+from typing import Any, Dict, List
+VALID_NODES   = ["auth_service", "payment_service", "database", "api_gateway"]
+VALID_ACTIONS = ["rate_limit", "isolate", "honeypot", "patch", "monitor"]
+TASK_CONFIGS = {
+    "direct-triage": {
+        "max_turns":   5,
+        "description": "Single fixed attacker strategy. Learn baseline threat response.",
+        "mission_profile": {
+            "mission_id": "login_stability",
+            "primary_asset": "auth_service",
+            "sla_priority": "availability",
+            "risk_tolerance": "medium",
+            "objective": "Preserve user login availability while stopping credential abuse.",
+        },
+    },
+    "dual-pivot": {
+        "max_turns":   6,
+        "description": "Two strategies alternating every 20 episodes.",
+        "mission_profile": {
+            "mission_id": "checkout_continuity",
+            "primary_asset": "payment_service",
+            "sla_priority": "availability",
+            "risk_tolerance": "medium",
+            "objective": "Keep checkout online while containing internal movement.",
+        },
+    },
+    "polymorphic-zero-day": {
+        "max_turns":   8,
+        "description": "All four strategies with mid-episode shift and noise injection.",
+        "mission_profile": {
+            "mission_id": "breach_containment",
+            "primary_asset": "database",
+            "sla_priority": "containment",
+            "risk_tolerance": "low",
+            "objective": "Protect sensitive data and adapt response if attacker persistence is established.",
+        },
+    },
+}
+WORLD_FAMILY_SPLITS = {
+    "train": ["train-a", "train-b"],
+    "eval": ["eval-x", "eval-y"],
+}
+TASK_OPERATIONAL_MODES = {
+    "direct-triage": ["availability_guarded"],
+    "dual-pivot": ["containment_first", "evidence_preservation"],
+    "polymorphic-zero-day": [
+        "containment_first",
+        "forensic_hold",
+        "business_continuity",
+    ],
+}
+FAMILY_MISSION_NOTES = {
+    "train-a": "Primary incident feed emphasizes login telemetry and CMDB-linked service relationships.",
+    "train-b": "Primary incident feed emphasizes identity anomalies, service-account movement, and deploy context.",
+    "eval-x": "Primary incident feed emphasizes east-west callback patterns, release provenance drift, and egress clustering.",
+    "eval-y": "Primary incident feed emphasizes rejection bursts, archive staging, and cross-domain approval mismatches.",
+}
+MODE_HINTS = {
+    "availability_guarded": {
+        "sla_priority": "availability",
+        "risk_tolerance": "medium",
+        "objective_suffix": "Use one confirming signal before disruptive action when login noise overlaps with maintenance chatter.",
+    },
+    "containment_first": {
+        "sla_priority": "containment",
+        "risk_tolerance": "low",
+        "objective_suffix": "Prefer decisive containment once corroborating evidence suggests compromise is active.",
+    },
+    "evidence_preservation": {
+        "sla_priority": "balanced",
+        "risk_tolerance": "medium",
+        "objective_suffix": "Preserve attacker visibility during checkout pivots until identity misuse is corroborated across tools.",
+    },
+    "forensic_hold": {
+        "sla_priority": "containment",
+        "risk_tolerance": "low",
+        "objective_suffix": "Preserve attacker tradecraft long enough to map the callback path before cutting access entirely.",
+    },
+    "business_continuity": {
+        "sla_priority": "availability",
+        "risk_tolerance": "medium",
+        "objective_suffix": "Reduce blast radius while protecting customer-facing continuity whenever deception can safely buy time.",
+    },
+}
+PHASE1_SYSTEM = """You are a Threat Analyst defending a 4-node enterprise network.
+Analyze the network metrics and security alerts to classify the threat.
+Network nodes: auth_service, payment_service, database, api_gateway
+Attack strategies to recognize:
+- brute_force: High failed login rate on auth_service, error spikes
+- lateral_movement: Unusual cross-node connections, service account anomalies
+- exfiltration: Anomalous outbound data volume from database node
+- supply_chain: Suspicious update requests or hash mismatches on api_gateway
+- benign: Normal operations — scheduled jobs, routine activity
+Respond ONLY with valid JSON, no other text:
+{
+  "threat_type": "<brute_force|lateral_movement|exfiltration|supply_chain|benign>",
+  "confidence": <0.0 to 1.0>,
+  "target_node": "<auth_service|payment_service|database|api_gateway>",
+  "recommended_action": "<rate_limit|isolate|honeypot|patch|monitor>",
+  "reasoning": "<one sentence chain of thought>"
+}"""
+PHASE2_SYSTEM = """You are a Tactical Executor defending a 4-node enterprise network.
+You receive a threat assessment from the Threat Analyst and must execute the defense.
+You CANNOT see raw network logs — act only on the assessment provided.
+Available actions:
+- rate_limit: Throttle traffic to node. Light touch, keeps service online. Best for DoS/brute force.
+- isolate: Take node completely offline. Stops spread but causes downtime. Use for lateral movement.
+- honeypot: Redirect attacker to decoy system. Best for data exfiltration attempts.
+- patch: Apply security update. Targeted fix for supply chain attacks.
+- monitor: Observe without acting. Use only when genuinely uncertain or event is benign.
+Respond ONLY with valid JSON, no other text:
+{
+  "action": "<rate_limit|isolate|honeypot|patch|monitor>",
+  "target_node": "<auth_service|payment_service|database|api_gateway>",
+  "reasoning": "<one sentence chain of thought>"
+}"""
+def build_phase1_obs(
+    turn_config: Dict[str, Any],
+    history:     List[Dict[str, str]],
+    task_name:   str,
+    turn:        int,
+    max_turns:   int,
+    episode_id:  str,
+    mission_profile: Dict[str, Any] | None = None,
+) -> Dict[str, Any]:
+    """Build Phase 1 observation — full network state visible."""
+    mission_profile = mission_profile or {}
+    return {
+        "scenario_id":        episode_id,
+        "task_name":          task_name,
+        "phase":              1,
+        "turn":               turn,
+        "max_turns":          max_turns,
+        "network_nodes":      turn_config["network_nodes"],
+        "active_alerts":      turn_config["active_alerts"],
+        "attack_stage":       turn_config.get("attack_stage", "none"),
+        "history":            history[-3:],
+        "phase1_assessment":  None,
+        "last_action_result": None,
+        "system_context":     _with_mission_context(PHASE1_SYSTEM, mission_profile),
+        "available_actions":  VALID_ACTIONS,
+        "reward":             0.0,
+        "done":               False,
+        "metadata":           {
+            "episode_id":       episode_id,
+            "normalized_score": 0.50,  # always present from step 1
+            "mission_profile":  mission_profile,
+        },
+    }
+def build_phase2_obs(
+    phase1_output: Dict[str, Any],
+    history:       List[Dict[str, str]],
+    task_name:     str,
+    turn:          int,
+    max_turns:     int,
+    episode_id:    str,
+    current_score: float,
+    mission_profile: Dict[str, Any] | None = None,
+) -> Dict[str, Any]:
+    """
+    Build Phase 2 observation.
+    CRITICAL: network_nodes and active_alerts are EMPTY.
+    Phase 2 agent is blind to raw state — sees only Phase 1 assessment.
+    """
+    mission_profile = mission_profile or {}
+    return {
+        "scenario_id":        episode_id,
+        "task_name":          task_name,
+        "phase":              2,
+        "turn":               turn,
+        "max_turns":          max_turns,
+        "network_nodes":      {},   # deliberately empty
+        "active_alerts":      [],   # deliberately empty
+        "attack_stage":       "hidden",
+        "history":            history[-3:],
+        "phase1_assessment":  phase1_output,
+        "last_action_result": None,
+        "system_context":     _with_mission_context(PHASE2_SYSTEM, mission_profile),
+        "available_actions":  VALID_ACTIONS,
+        "reward":             0.0,
+        "done":               False,
+        "metadata":           {
+            "episode_id":       episode_id,
+            "normalized_score": current_score,  # always present
+            "mission_profile":  mission_profile,
+        },
+    }
+def _with_mission_context(system_prompt: str, mission_profile: Dict[str, Any]) -> str:
+    if not mission_profile:
+        return system_prompt
+    mission = "\n".join([
+        "",
+        "Mission context:",
+        f"- mission_id: {mission_profile.get('mission_id', 'unknown')}",
+        f"- primary_asset: {mission_profile.get('primary_asset', 'unknown')}",
+        f"- sla_priority: {mission_profile.get('sla_priority', 'balanced')}",
+        f"- risk_tolerance: {mission_profile.get('risk_tolerance', 'medium')}",
+        f"- objective: {mission_profile.get('objective', 'Balance security and availability.')}",
+    ])
+    return f"{system_prompt}{mission}"
+def choose_world_family(world_split: str, requested_family: str | None = None) -> str:
+    if requested_family:
+        return requested_family
+    families = WORLD_FAMILY_SPLITS.get(world_split, WORLD_FAMILY_SPLITS["train"])
+    return random.choice(families)
+def choose_operational_mode(task_name: str, requested_mode: str | None = None) -> str:
+    if requested_mode:
+        return requested_mode
+    modes = TASK_OPERATIONAL_MODES.get(task_name, ["availability_guarded"])
+    return random.choice(modes)
+def mission_profile_for(task_name: str, operational_mode: str, world_family: str) -> Dict[str, Any]:
+    base = dict(TASK_CONFIGS[task_name].get("mission_profile", {}))
+    mode = MODE_HINTS.get(operational_mode, {})
+    base["world_family"] = world_family
+    base["operational_mode_hint"] = operational_mode.replace("_", " ")
+    base["scenario_style"] = FAMILY_MISSION_NOTES.get(world_family, "")
+    if mode.get("sla_priority"):
+        base["sla_priority"] = mode["sla_priority"]
+    if mode.get("risk_tolerance"):
+        base["risk_tolerance"] = mode["risk_tolerance"]
+    objective = str(base.get("objective", "")).rstrip()
+    suffix = str(mode.get("objective_suffix", "")).strip()
+    family_note = str(FAMILY_MISSION_NOTES.get(world_family, "")).strip()
+    if suffix:
+        objective = f"{objective} {suffix}".strip()
+    if family_note:
+        objective = f"{objective} {family_note}".strip()
+    base["objective"] = objective
+    return base

smoke_test.py ADDED Viewed

	@@ -0,0 +1,59 @@

+#!/usr/bin/env python3
+"""
+Quick repo-root smoke test for AdaptShield.
+Run from the repo root:
+    python smoke_test.py
+"""
+from __future__ import annotations
+import sys
+from pathlib import Path
+REPO_ROOT = Path(__file__).resolve().parent
+if str(REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(REPO_ROOT))
+import __init__ as adaptshield
+import server.app as server_app
+from models import AdaptShieldAction
+from server.adaptshield_environment import AdaptShieldEnvironment
+def main() -> int:
+    print("AdaptShield smoke test")
+    print(f"- package exports: {adaptshield.__all__}")
+    print(f"- server app type: {server_app.app.__class__.__name__}")
+    env = AdaptShieldEnvironment("direct-triage")
+    obs = env.reset()
+    print(
+        f"- reset: phase={obs.phase} turn={obs.turn} "
+        f"score={obs.metadata.get('normalized_score')}"
+    )
+    obs = env.step(
+        AdaptShieldAction(
+            threat_type="brute_force",
+            confidence=0.9,
+            target_node="auth_service",
+            recommended_action="rate_limit",
+        )
+    )
+    print(f"- phase 1 -> phase 2: assessment={obs.phase1_assessment}")
+    obs = env.step(AdaptShieldAction(action="rate_limit", target_node="auth_service"))
+    print(
+        f"- phase 2 -> next turn: reward={obs.reward} done={obs.done} "
+        f"result={obs.last_action_result}"
+    )
+    print("Smoke test passed.")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

soc_tools.py ADDED Viewed

	@@ -0,0 +1,239 @@

+#!/usr/bin/env python3
+"""Shared SOC investigation helpers for AdaptShield agents."""
+from __future__ import annotations
+import json
+import urllib.request
+from typing import Any, Callable, Dict, List, Optional
+THREAT_TOOL_PLAN = {
+    "brute_force": [("log_search", "auth_service"), ("cmdb_lookup", "auth_service"), ("identity_lookup", "auth_service")],
+    "lateral_movement": [("edr_status", "payment_service"), ("log_search", "payment_service"), ("identity_lookup", "payment_service"), ("cmdb_lookup", "payment_service")],
+    "exfiltration": [("log_search", "database"), ("edr_status", "database"), ("netflow_lookup", "database"), ("cmdb_lookup", "database")],
+    "supply_chain": [("vuln_lookup", "api_gateway"), ("log_search", "api_gateway"), ("change_calendar_lookup", "api_gateway"), ("cmdb_lookup", "api_gateway")],
+    "benign": [("cmdb_lookup", "api_gateway")],
+}
+FALLBACK_SWEEP = [
+    ("edr_status", "payment_service"),
+    ("log_search", "database"),
+    ("vuln_lookup", "api_gateway"),
+]
+def classify_from_metrics(network_nodes: Dict[str, Dict[str, Any]]) -> str:
+    auth = network_nodes.get("auth_service", {})
+    payment = network_nodes.get("payment_service", {})
+    database = network_nodes.get("database", {})
+    gateway = network_nodes.get("api_gateway", {})
+    if float(auth.get("error_rate", 0.0)) >= 0.10:
+        return "brute_force"
+    if payment.get("status") == "suspicious" or float(payment.get("cpu", 0)) >= 55:
+        return "lateral_movement"
+    if float(database.get("outbound_mb", 0)) >= 50:
+        return "exfiltration"
+    if gateway.get("status") == "suspicious":
+        return "supply_chain"
+    return "benign"
+def investigate_local(env: Any, obs: Any, use_tools: bool) -> List[Dict[str, Any]]:
+    """Query local environment tool methods before Phase 1 action."""
+    return investigate_local_with_depth(env, obs, use_tools=use_tools, thorough=False)
+def investigate_local_with_depth(
+    env: Any,
+    obs: Any,
+    use_tools: bool,
+    thorough: bool,
+) -> List[Dict[str, Any]]:
+    """Query local tools; thorough mode adds evidence-fusion follow-ups."""
+    if not use_tools or getattr(obs, "phase", 1) != 1:
+        return []
+    task_name = getattr(obs, "task_name", "")
+    threat = classify_from_metrics(getattr(obs, "network_nodes", {}))
+    if task_name == "direct-triage":
+        if threat == "brute_force":
+            return [env.call_tool("log_search", node="auth_service")]
+        return []
+    if task_name == "dual-pivot":
+        if threat == "lateral_movement":
+            return [
+                env.call_tool("edr_status", node="payment_service"),
+                env.call_tool("log_search", node="payment_service"),
+                env.call_tool("identity_lookup", node="payment_service"),
+            ]
+        tool_name, node = THREAT_TOOL_PLAN.get(threat, THREAT_TOOL_PLAN["benign"])[0]
+        return [env.call_tool(tool_name, node=node)]
+    if task_name != "polymorphic-zero-day":
+        return []
+    results = []
+    for tool_name, node in THREAT_TOOL_PLAN.get(threat, THREAT_TOOL_PLAN["benign"]):
+        results.append(env.call_tool(tool_name, node=node))
+    if not has_attack_indicators(results):
+        for tool_name, node in FALLBACK_SWEEP:
+            if (tool_name, node) not in THREAT_TOOL_PLAN.get(threat, []):
+                results.append(env.call_tool(tool_name, node=node))
+    if thorough:
+        _complete_evidence_fusion(
+            call_tool=lambda tool_name, node: env.call_tool(tool_name, node=node),
+            results=results,
+        )
+    return results
+def investigate_http(
+    env_base_url: str,
+    session_id: Optional[str],
+    obs: Dict[str, Any],
+    use_tools: bool,
+    thorough: bool = False,
+) -> List[Dict[str, Any]]:
+    """Query SOC HTTP tool endpoints for a persistent /soc session."""
+    if not use_tools or not session_id or int(obs.get("phase", 1)) != 1:
+        return []
+    task_name = obs.get("task_name")
+    threat = classify_from_metrics(obs.get("network_nodes", {}))
+    def call(tool_name: str, node: str) -> Dict[str, Any]:
+        path = f"/tools/{tool_name}"
+        payload = {"session_id": session_id, "node": node}
+        return http_post(env_base_url, path, payload)
+    if task_name == "direct-triage":
+        if threat == "brute_force":
+            return [call("log_search", "auth_service")]
+        return []
+    results: List[Dict[str, Any]] = []
+    if task_name == "dual-pivot":
+        if threat == "lateral_movement":
+            return [
+                call("edr_status", "payment_service"),
+                call("log_search", "payment_service"),
+                call("identity_lookup", "payment_service"),
+            ]
+        tool_name, node = THREAT_TOOL_PLAN.get(threat, THREAT_TOOL_PLAN["benign"])[0]
+        return [call(tool_name, node)]
+    if task_name != "polymorphic-zero-day":
+        return []
+    for tool_name, node in THREAT_TOOL_PLAN.get(threat, THREAT_TOOL_PLAN["benign"]):
+        results.append(call(tool_name, node))
+    if not has_attack_indicators(results):
+        for tool_name, node in FALLBACK_SWEEP:
+            if (tool_name, node) not in THREAT_TOOL_PLAN.get(threat, []):
+                results.append(call(tool_name, node))
+    if thorough:
+        _complete_evidence_fusion(call_tool=call, results=results)
+    return results
+def has_attack_indicators(results: List[Dict[str, Any]]) -> bool:
+    return infer_threat_from_tool_results(results)["threat_type"] != "benign"
+def infer_threat_from_tool_results(results: List[Dict[str, Any]]) -> Dict[str, str]:
+    """Infer threat from observable tool facts only, not grader labels."""
+    text_blob = " ".join(
+        [
+            str(result.get("result_summary", "")) + " " +
+            " ".join(str(event) for event in result.get("events", [])) + " " +
+            str(result.get("finding", "")) + " " +
+            str(result.get("process_note", ""))
+            for result in results
+        ]
+    ).lower()
+    if any(result.get("risk") == "critical" for result in results) or "hash mismatch" in text_blob:
+        return {"threat_type": "supply_chain", "target_node": "api_gateway", "action": "patch"}
+    if "sequential reads" in text_blob or "compressed archive" in text_blob or "egress exceeds" in text_blob or "outbound_transfer_burst" in text_blob:
+        return {"threat_type": "exfiltration", "target_node": "database", "action": "honeypot"}
+    if (
+        any(result.get("beaconing") for result in results) or
+        "service account" in text_blob or
+        "internal sessions" in text_blob or
+        "identity_anomaly" in text_blob or
+        "source=auth_service" in text_blob or
+        "east_west_fanout" in text_blob
+    ):
+        return {"threat_type": "lateral_movement", "target_node": "payment_service", "action": "isolate"}
+    if "failed logins" in text_blob or "password spray" in text_blob:
+        return {"threat_type": "brute_force", "target_node": "auth_service", "action": "rate_limit"}
+    return {"threat_type": "benign", "target_node": "api_gateway", "action": "monitor"}
+def _complete_evidence_fusion(
+    call_tool: Callable[[str, str], Dict[str, Any]],
+    results: List[Dict[str, Any]],
+) -> None:
+    belief = infer_threat_from_tool_results(results)
+    threat = belief["threat_type"]
+    if threat == "benign":
+        return
+    called = {
+        (str(result.get("tool", "")), str(result.get("node", "")))
+        for result in results
+    }
+    for tool_name, node in THREAT_TOOL_PLAN.get(threat, []):
+        if (tool_name, node) not in called:
+            results.append(call_tool(tool_name, node))
+def attach_tool_results(obs: Dict[str, Any], tool_results: List[Dict[str, Any]]) -> Dict[str, Any]:
+    updated = dict(obs)
+    updated["tool_results"] = tool_results
+    return updated
+def summarize_tool_results(tool_results: List[Dict[str, Any]]) -> str:
+    if not tool_results:
+        return "No SOC tools queried for this turn."
+    lines = []
+    for result in tool_results:
+        lines.append(json.dumps(_compact_result(result), separators=(",", ":")))
+    return "\n".join(lines)
+def http_post(env_base_url: str, path: str, payload: Dict[str, Any]) -> Dict[str, Any]:
+    url = f"{env_base_url.rstrip('/')}{path}"
+    req = urllib.request.Request(
+        url,
+        data=json.dumps(payload).encode(),
+        headers={"Content-Type": "application/json"},
+    )
+    with urllib.request.urlopen(req, timeout=60) as response:
+        return json.loads(response.read())
+def _compact_result(result: Dict[str, Any]) -> Dict[str, Any]:
+    keep = [
+        "tool",
+        "node",
+        "evidence_type",
+        "verified",
+        "confidence",
+        "events",
+        "containment",
+        "persistence",
+        "beaconing",
+        "criticality",
+        "dependencies",
+        "risk",
+        "finding",
+        "recommended_mitigation",
+        "safe_actions",
+    ]
+    return {key: result[key] for key in keep if key in result}

tests/test_regression.py ADDED Viewed

	@@ -0,0 +1,352 @@

+import sys
+import unittest
+from pathlib import Path
+REPO_ROOT = Path(__file__).resolve().parents[1]
+PACKAGE_ROOT = REPO_ROOT / "adaptshield"
+if str(REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(REPO_ROOT))
+if str(PACKAGE_ROOT) not in sys.path:
+    sys.path.insert(0, str(PACKAGE_ROOT))
+import __init__ as adaptshield
+import server.app as server_app
+import train as train_module
+from client import AdaptshieldEnv
+from models import AdaptShieldAction
+from server.adaptshield_environment import AdaptShieldEnvironment
+from server.grader import normalize_episode_score, _required_tool_fusion
+class PackageRegressionTests(unittest.TestCase):
+    def test_package_import_exports_expected_symbols(self) -> None:
+        self.assertIn("AdaptShieldAction", adaptshield.__all__)
+        self.assertIn("AdaptShieldObservation", adaptshield.__all__)
+        self.assertIn("AdaptshieldEnv", adaptshield.__all__)
+    def test_server_app_imports_fastapi_instance(self) -> None:
+        self.assertEqual(server_app.app.__class__.__name__, "FastAPI")
+class EnvironmentRegressionTests(unittest.TestCase):
+    def test_phase_flow_accepts_both_action_shapes(self) -> None:
+        env = AdaptShieldEnvironment("direct-triage")
+        phase1_obs = env.reset()
+        self.assertEqual(phase1_obs.phase, 1)
+        self.assertEqual(phase1_obs.turn, 1)
+        self.assertEqual(phase1_obs.metadata["normalized_score"], 0.50)
+        self.assertIn("mission_profile", phase1_obs.metadata)
+        self.assertEqual(phase1_obs.metadata["world_split"], "train")
+        self.assertIn(phase1_obs.metadata["world_family"], {"train-a", "train-b"})
+        phase2_obs = env.step(
+            AdaptShieldAction(
+                threat_type="brute_force",
+                confidence=0.9,
+                target_node="auth_service",
+                recommended_action="rate_limit",
+            )
+        )
+        self.assertEqual(phase2_obs.phase, 2)
+        self.assertEqual(phase2_obs.phase1_assessment["recommended_action"], "rate_limit")
+        next_turn_obs = env.step(
+            AdaptShieldAction(action="rate_limit", target_node="auth_service")
+        )
+        self.assertEqual(next_turn_obs.phase, 1)
+        self.assertGreaterEqual(next_turn_obs.reward, 0.65)
+        self.assertIn("requires stronger SOC evidence", next_turn_obs.last_action_result)
+        self.assertIn("business_impact", next_turn_obs.metadata["score_breakdown"])
+        self.assertIn("dependency_blast_radius", next_turn_obs.metadata["score_breakdown"])
+        self.assertIn("mission_alignment", next_turn_obs.metadata["score_breakdown"])
+        self.assertIn("active_defenses", next_turn_obs.metadata)
+        self.assertIn("available_tools", next_turn_obs.metadata)
+        tool_names = {tool["name"] for tool in next_turn_obs.metadata["available_tools"]}
+        self.assertTrue({
+            "identity_lookup",
+            "change_calendar_lookup",
+            "netflow_lookup",
+        }.issubset(tool_names))
+        env = AdaptShieldEnvironment("direct-triage")
+        env.reset()
+        env.call_tool("log_search", node="auth_service")
+        env.step(
+            AdaptShieldAction(
+                threat_type="brute_force",
+                confidence=0.9,
+                target_node="auth_service",
+                recommended_action="rate_limit",
+            )
+        )
+        verified_obs = env.step(
+            AdaptShieldAction(action="rate_limit", target_node="auth_service")
+        )
+        self.assertGreaterEqual(verified_obs.reward, 0.9)
+        self.assertIn("Optimal: rate_limit", verified_obs.last_action_result)
+    def test_client_payload_omits_empty_metadata_and_serializes_enums(self) -> None:
+        client = AdaptshieldEnv(base_url="http://localhost:7860")
+        phase1_payload = client._step_payload(
+            AdaptShieldAction(
+                threat_type="benign",
+                confidence=0.8,
+                target_node="auth_service",
+                recommended_action="monitor",
+            )
+        )
+        self.assertEqual(
+            phase1_payload,
+            {
+                "threat_type": "benign",
+                "confidence": 0.8,
+                "target_node": "auth_service",
+                "recommended_action": "monitor",
+            },
+        )
+        phase2_payload = client._step_payload(
+            AdaptShieldAction(action="rate_limit", target_node="auth_service")
+        )
+        self.assertEqual(
+            phase2_payload,
+            {"action": "rate_limit", "target_node": "auth_service"},
+        )
+    def test_hard_task_records_verified_tool_evidence(self) -> None:
+        env = AdaptShieldEnvironment("polymorphic-zero-day")
+        for _ in range(8):
+            obs = env.reset()
+            turn_config = dict(getattr(env, "_turn_config", {}) or {})
+            if not turn_config.get("is_benign", False):
+                break
+        else:
+            self.fail("Expected a non-benign hard-task reset within 8 attempts")
+        self.assertIn("available_tools", obs.metadata)
+        self.assertNotIn("foothold_established", obs.metadata)
+        target = str(turn_config.get("correct_target", "auth_service"))
+        for tool_name in sorted(_required_tool_fusion("polymorphic-zero-day", str(turn_config.get("strategy", "benign")))):
+            tool_result = env.call_tool(tool_name, node=target)
+            self.assertNotIn("verified", tool_result)
+            self.assertNotIn("evidence_type", tool_result)
+            self.assertTrue(tool_result.get("result_summary"))
+        env.step(
+            AdaptShieldAction(
+                threat_type=turn_config.get("strategy", "brute_force"),
+                confidence=0.9,
+                target_node=target,
+                recommended_action=turn_config.get("correct_action", "monitor"),
+            )
+        )
+        obs = env.step(
+            AdaptShieldAction(
+                action=turn_config.get("correct_action", "monitor"),
+                target_node=target,
+            )
+        )
+        breakdown = obs.metadata["score_breakdown"]
+        self.assertTrue(breakdown["tool_verification_required"])
+        self.assertTrue(breakdown["tool_evidence_found"])
+        self.assertGreaterEqual(obs.reward, 0.65)
+    def test_enterprise_context_tools_return_public_fields_only(self) -> None:
+        env = AdaptShieldEnvironment("polymorphic-zero-day")
+        env.reset()
+        identity = env.call_tool("identity_lookup", node="payment_service")
+        self.assertIn("account", identity)
+        self.assertIn("recent_source_host", identity)
+        self.assertNotIn("verified", identity)
+        self.assertNotIn("evidence_type", identity)
+        change = env.call_tool("change_calendar_lookup", node="api_gateway")
+        self.assertIn("scheduled", change)
+        self.assertIn("change_status", change)
+        self.assertNotIn("verified", change)
+        self.assertNotIn("evidence_type", change)
+        netflow = env.call_tool("netflow_lookup", node="database")
+        self.assertIn("traffic_pattern", netflow)
+        self.assertIn("east_west_connections", netflow)
+        self.assertNotIn("verified", netflow)
+        self.assertNotIn("evidence_type", netflow)
+    def test_dual_pivot_requires_tool_confirmation_after_pivot(self) -> None:
+        env = AdaptShieldEnvironment("dual-pivot", operational_mode="containment_first")
+        env.reset()
+        for _ in range(3):
+            turn_config = dict(getattr(env, "_turn_config", {}) or {})
+            env.step(
+                AdaptShieldAction(
+                    threat_type=str(turn_config.get("strategy", "brute_force")),
+                    confidence=0.9,
+                    target_node=str(turn_config.get("correct_target", "auth_service")),
+                    recommended_action=str(turn_config.get("correct_action", "monitor")),
+                )
+            )
+            obs = env.step(
+                AdaptShieldAction(
+                    action=str(turn_config.get("correct_action", "monitor")),
+                    target_node=str(turn_config.get("correct_target", "auth_service")),
+                )
+            )
+            self.assertFalse(obs.done)
+        turn_config = dict(getattr(env, "_turn_config", {}) or {})
+        self.assertEqual(turn_config.get("strategy"), "lateral_movement")
+        target = str(turn_config.get("correct_target", "payment_service"))
+        env.step(
+            AdaptShieldAction(
+                threat_type="lateral_movement",
+                confidence=0.9,
+                target_node=target,
+                recommended_action=str(turn_config.get("correct_action", "isolate")),
+            )
+        )
+        obs = env.step(
+            AdaptShieldAction(
+                action=str(turn_config.get("correct_action", "isolate")),
+                target_node=target,
+            )
+        )
+        self.assertTrue(obs.metadata["score_breakdown"]["tool_verification_required"])
+        self.assertFalse(obs.metadata["score_breakdown"]["tool_evidence_found"])
+        self.assertIn("requires stronger SOC evidence", obs.last_action_result)
+        env = AdaptShieldEnvironment("dual-pivot", operational_mode="containment_first")
+        env.reset()
+        for _ in range(3):
+            turn_config = dict(getattr(env, "_turn_config", {}) or {})
+            env.step(
+                AdaptShieldAction(
+                    threat_type=str(turn_config.get("strategy", "brute_force")),
+                    confidence=0.9,
+                    target_node=str(turn_config.get("correct_target", "auth_service")),
+                    recommended_action=str(turn_config.get("correct_action", "monitor")),
+                )
+            )
+            env.step(
+                AdaptShieldAction(
+                    action=str(turn_config.get("correct_action", "monitor")),
+                    target_node=str(turn_config.get("correct_target", "auth_service")),
+                )
+            )
+        turn_config = dict(getattr(env, "_turn_config", {}) or {})
+        target = str(turn_config.get("correct_target", "payment_service"))
+        env.call_tool("edr_status", node=target)
+        env.call_tool("log_search", node=target)
+        env.call_tool("identity_lookup", node=target)
+        env.step(
+            AdaptShieldAction(
+                threat_type="lateral_movement",
+                confidence=0.9,
+                target_node=target,
+                recommended_action=str(turn_config.get("correct_action", "isolate")),
+            )
+        )
+        obs = env.step(
+            AdaptShieldAction(
+                action=str(turn_config.get("correct_action", "isolate")),
+                target_node=target,
+            )
+        )
+        self.assertTrue(obs.metadata["score_breakdown"]["tool_verification_required"])
+        self.assertTrue(obs.metadata["score_breakdown"]["tool_evidence_found"])
+        self.assertIn("Optimal: isolate", obs.last_action_result)
+    def test_world_family_metadata_and_surfaces_are_selectable(self) -> None:
+        env = AdaptShieldEnvironment(
+            "direct-triage",
+            world_split="eval",
+            world_family="eval-x",
+        )
+        obs = env.reset()
+        self.assertEqual(obs.metadata["world_split"], "eval")
+        self.assertEqual(obs.metadata["world_family"], "eval-x")
+        alerts_blob = " ".join(obs.active_alerts).lower()
+        self.assertTrue(
+            "auth rejection burst" in alerts_blob or
+            "credential reuse sweep" in alerts_blob
+        )
+    def test_operational_modes_change_medium_and_hard_optimal_actions(self) -> None:
+        medium_env = AdaptShieldEnvironment(
+            "dual-pivot",
+            operational_mode="evidence_preservation",
+            world_family="train-b",
+        )
+        medium_env.reset()
+        for _ in range(3):
+            turn_config = dict(getattr(medium_env, "_turn_config", {}) or {})
+            medium_env.step(
+                AdaptShieldAction(
+                    threat_type=str(turn_config.get("strategy", "brute_force")),
+                    confidence=0.9,
+                    target_node=str(turn_config.get("correct_target", "auth_service")),
+                    recommended_action=str(turn_config.get("correct_action", "monitor")),
+                )
+            )
+            medium_env.step(
+                AdaptShieldAction(
+                    action=str(turn_config.get("correct_action", "monitor")),
+                    target_node=str(turn_config.get("correct_target", "auth_service")),
+                )
+            )
+        self.assertEqual(getattr(medium_env, "_turn_config", {}).get("strategy"), "lateral_movement")
+        self.assertEqual(getattr(medium_env, "_turn_config", {}).get("correct_action"), "honeypot")
+        hard_env = AdaptShieldEnvironment(
+            "polymorphic-zero-day",
+            operational_mode="forensic_hold",
+            world_family="eval-y",
+        )
+        hard_obs = hard_env.reset()
+        adjusted = hard_env._apply_operational_mode({
+            "strategy": "exfiltration",
+            "attack_stage": "exploit",
+            "is_benign": False,
+            "correct_action": "isolate",
+            "correct_target": "database",
+            "network_nodes": {"payment_service": {"status": "healthy", "request_rate": 85}},
+            "active_alerts": [],
+        })
+        self.assertEqual(hard_obs.metadata["operational_mode"], "forensic_hold")
+        self.assertEqual(adjusted.get("correct_action"), "honeypot")
+    def test_prompt_bank_builds_phase_rows_without_gpu_deps(self) -> None:
+        rows = train_module.build_prompt_bank(
+            tokenizer=None,
+            selected_task="all",
+            curriculum=True,
+            rollout_episodes=3,
+            max_steps=6,
+            use_tools=True,
+            seed=42,
+        )
+        self.assertTrue(rows)
+        phases = {int(row["phase"]) for row in rows}
+        tasks = {str(row["task"]) for row in rows}
+        self.assertIn(1, phases)
+        self.assertIn(2, phases)
+        self.assertTrue(tasks.intersection({"direct-triage", "dual-pivot", "polymorphic-zero-day"}))
+        hard_rows = [row for row in rows if row["task"] == "polymorphic-zero-day"]
+        self.assertTrue(hard_rows)
+        self.assertTrue(any(int(row["tool_calls"]) >= 2 for row in hard_rows))
+    def test_normalized_score_uses_reachable_reward_ceiling(self) -> None:
+        rewards = [0.99] * 10
+        self.assertEqual(normalize_episode_score(rewards), 0.99)
+if __name__ == "__main__":
+    unittest.main()

tool_baseline.py ADDED Viewed

	@@ -0,0 +1,128 @@

+#!/usr/bin/env python3
+"""Tool-aware AdaptShield baseline for world-modeling demos."""
+from __future__ import annotations
+import argparse
+import json
+from typing import Any, Dict, List
+from baseline import (
+    BENCHMARK,
+    MAX_STEPS,
+    POLICY,
+    TASKS,
+    action_from_payload,
+    log_end,
+    log_step,
+    phase1_payload as no_tool_phase1_payload,
+    phase2_payload as no_tool_phase2_payload,
+    print_replay,
+)
+from server.adaptshield_environment import AdaptShieldEnvironment
+from soc_tools import infer_threat_from_tool_results, investigate_local
+MODEL_NAME = "tool-aware-baseline"
+def log_start(task: str) -> None:
+    print(f"[START] task={task} env={BENCHMARK} model={MODEL_NAME}", flush=True)
+def phase2_payload(obs: Any, belief_by_turn: Dict[int, Dict[str, str]]) -> Dict[str, Any]:
+    """Use belief inferred from observable SOC tool evidence when Phase 2 is ambiguous."""
+    belief = belief_by_turn.get(int(obs.turn), {})
+    if obs.task_name == "polymorphic-zero-day" and belief:
+        return {
+            "action": belief["action"],
+            "target_node": belief["target_node"],
+            "reasoning": "inferred from observable SOC tool fields",
+        }
+    return no_tool_phase2_payload(obs)
+def phase1_payload(obs: Any, belief_by_turn: Dict[int, Dict[str, str]]) -> Dict[str, Any]:
+    """Use tool-derived belief in Phase 1 so the baseline is tool-aware end to end."""
+    belief = belief_by_turn.get(int(obs.turn), {})
+    if obs.task_name == "polymorphic-zero-day" and belief:
+        return {
+            "threat_type": belief["threat_type"],
+            "confidence": 0.86,
+            "target_node": belief["target_node"],
+            "recommended_action": belief["action"],
+            "reasoning": "classified from observable SOC tool fields",
+        }
+    return no_tool_phase1_payload(obs)
+def run_task(task: str, emit_logs: bool = True) -> Dict[str, Any]:
+    env = AdaptShieldEnvironment(task_name=task)
+    obs = env.reset()
+    rewards: List[float] = []
+    steps = 0
+    belief_by_turn: Dict[int, Dict[str, str]] = {}
+    if emit_logs:
+        log_start(task)
+    while not obs.done and steps < MAX_STEPS:
+        if obs.phase == 1:
+            tool_results = investigate_local(env, obs, use_tools=True)
+            belief_by_turn[int(obs.turn)] = infer_threat_from_tool_results(tool_results)
+            payload = phase1_payload(obs, belief_by_turn)
+        else:
+            payload = phase2_payload(obs, belief_by_turn)
+        obs = env.step(action_from_payload(payload))
+        reward = float(obs.reward)
+        rewards.append(reward)
+        steps += 1
+        if emit_logs:
+            log_step(steps, payload, reward, obs.done)
+    metadata = obs.metadata if isinstance(obs.metadata, dict) else {}
+    score = float(metadata.get("normalized_score", 0.01))
+    success = obs.done and 0.01 <= score <= 0.99
+    if emit_logs:
+        log_end(success, steps, score, rewards)
+        tool_trace = metadata.get("tool_trace") or []
+        print(f"[TOOLS] calls={len(tool_trace)} trace={json.dumps(tool_trace[-4:], separators=(',', ':'))}")
+    return {
+        "task": task,
+        "score": score,
+        "steps": steps,
+        "done": bool(obs.done),
+        "rewards": rewards,
+        "metadata": metadata,
+        "normalized_score_present": "normalized_score" in metadata,
+        "success": success,
+    }
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Run AdaptShield tool-aware baseline.")
+    parser.add_argument("--task", default="polymorphic-zero-day", choices=TASKS + ["all"])
+    parser.add_argument("--replay", action="store_true")
+    return parser.parse_args()
+def main() -> int:
+    args = parse_args()
+    tasks = TASKS if args.task == "all" else [args.task]
+    for index, task in enumerate(tasks):
+        if index:
+            print()
+        result = run_task(task, emit_logs=True)
+        if args.replay:
+            print_replay(task, result["metadata"])
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

train.py ADDED Viewed

	@@ -0,0 +1,1332 @@

+#!/usr/bin/env python3
+"""AdaptShield onsite GPU training harness with safe local fallback."""
+from __future__ import annotations
+import argparse
+import inspect
+import json
+import os
+import random
+import sys
+import time
+from pathlib import Path
+from typing import Any, Dict, List, Tuple
+REPO_ROOT = Path(__file__).resolve().parent
+if str(REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(REPO_ROOT))
+from models import AdaptShieldAction
+from server.adaptshield_environment import AdaptShieldEnvironment
+from train_smoke import TASKS, run_smoke_training
+from plot_training import plot as plot_training_output
+from soc_tools import attach_tool_results, investigate_local_with_depth, summarize_tool_results
+MODEL_CHOICES = {
+    "0.5b": "unsloth/Qwen2.5-0.5B-Instruct",
+    "1.5b": "unsloth/Qwen2.5-1.5B-Instruct",
+    "3b": "unsloth/Qwen2.5-3B-Instruct",
+    "7b": "unsloth/Qwen2.5-7B-Instruct",
+}
+DEFAULT_MODEL = "1.5b"
+MAX_SEQ_LEN = 2048
+LORA_RANK = 16
+CURRICULUM_STAGES = [
+    ("direct-triage", 0.30),
+    ("dual-pivot", 0.40),
+    ("polymorphic-zero-day", 0.30),
+]
+PHASE1_SYS = """You are a Threat Analyst for a 4-node enterprise network.
+Analyze SIEM metrics and alerts. Identify the threat.
+Attack strategies: brute_force, lateral_movement, exfiltration, supply_chain, benign
+Nodes: auth_service, payment_service, database, api_gateway
+Actions: rate_limit, isolate, honeypot, patch, monitor
+If SOC tool evidence is provided, use it to update your belief before classifying.
+Respond ONLY with valid JSON:
+{"threat_type":"...","confidence":0.0,"target_node":"...","recommended_action":"...","reasoning":"..."}"""
+PHASE2_SYS = """You are a Tactical Executor. Act only on the analyst handoff.
+You cannot see raw network data in Phase 2.
+Use the analyst handoff plus any SOC tool trace from this turn.
+Actions: rate_limit, isolate, honeypot, patch, monitor
+Nodes: auth_service, payment_service, database, api_gateway
+Respond ONLY with valid JSON:
+{"action":"...","target_node":"...","reasoning":"..."}"""
+def obs_to_dict(obs: Any) -> Dict[str, Any]:
+    if hasattr(obs, "model_dump"):
+        return obs.model_dump(mode="json")
+    return dict(obs)
+def make_phase1_prompt(obs: Dict[str, Any]) -> str:
+    return "\n".join([
+        "Network nodes:",
+        json.dumps(obs.get("network_nodes", {}), indent=2),
+        "",
+        "Active alerts:",
+        "\n".join(obs.get("active_alerts", [])),
+        "",
+        "SOC tool evidence:",
+        summarize_tool_results(obs.get("tool_results", [])),
+        "",
+        "Recent history:",
+        json.dumps(obs.get("history", [])[-3:], indent=2),
+        "",
+        "Classify the threat:",
+    ])
+def make_phase2_prompt(obs: Dict[str, Any]) -> str:
+    metadata = obs.get("metadata", {}) if isinstance(obs.get("metadata", {}), dict) else {}
+    current_turn = int(obs.get("turn", 0) or 0)
+    tool_trace = [
+        row for row in metadata.get("tool_trace", [])
+        if int(row.get("turn", -1)) == current_turn
+    ]
+    return "\n".join([
+        "Threat assessment from analyst:",
+        json.dumps(obs.get("phase1_assessment", {}), indent=2),
+        "",
+        "SOC tool trace for this turn:",
+        json.dumps(tool_trace, indent=2),
+        "",
+        "Choose the defensive action:",
+    ])
+def build_messages(obs: Dict[str, Any]) -> List[Dict[str, str]]:
+    if int(obs.get("phase", 1)) == 1:
+        return [
+            {"role": "system", "content": PHASE1_SYS},
+            {"role": "user", "content": make_phase1_prompt(obs)},
+        ]
+    return [
+        {"role": "system", "content": PHASE2_SYS},
+        {"role": "user", "content": make_phase2_prompt(obs)},
+    ]
+def task_for_episode(
+    episode: int,
+    total_episodes: int,
+    selected_task: str,
+    curriculum: bool,
+) -> Tuple[str, str]:
+    if not curriculum:
+        if selected_task == "all":
+            task = TASKS[(episode - 1) % len(TASKS)]
+            return task, "round_robin"
+        return selected_task, "fixed"
+    progress = episode / max(1, total_episodes)
+    cumulative = 0.0
+    for task, fraction in CURRICULUM_STAGES:
+        cumulative += fraction
+        if progress <= cumulative:
+            return task, f"curriculum:{task}"
+    return CURRICULUM_STAGES[-1][0], f"curriculum:{CURRICULUM_STAGES[-1][0]}"
+def save_metrics(
+    output_dir: Path,
+    rows: List[Dict[str, Any]],
+    model_name: str,
+    episodes: int,
+    curriculum: bool,
+    use_tools: bool,
+    trainer: str = "pg",
+    evaluation_rows: List[Dict[str, Any]] | None = None,
+    heldout_evaluation_rows: List[Dict[str, Any]] | None = None,
+    prompt_bank_size: int = 0,
+    extra: Dict[str, Any] | None = None,
+) -> Path:
+    output_dir.mkdir(parents=True, exist_ok=True)
+    best_score = max((float(row["score"]) for row in rows), default=0.0)
+    metrics_path = output_dir / "metrics.json"
+    payload = {
+        "model": model_name,
+        "episodes": episodes,
+        "curriculum": curriculum,
+        "curriculum_stages": CURRICULUM_STAGES,
+        "use_tools": use_tools,
+        "trainer": trainer,
+        "rows": rows,
+        "best_score": best_score,
+    }
+    if evaluation_rows is not None:
+        payload["evaluation_rows"] = evaluation_rows
+    if heldout_evaluation_rows is not None:
+        payload["heldout_evaluation_rows"] = heldout_evaluation_rows
+    if prompt_bank_size:
+        payload["prompt_bank_size"] = prompt_bank_size
+    if extra:
+        payload.update(extra)
+    metrics_path.write_text(json.dumps(payload, indent=2))
+    return metrics_path
+def maybe_plot(metrics_path: Path, output_dir: Path) -> None:
+    try:
+        plot_training_output(metrics_path, output_dir / "reward_curve.png")
+    except Exception as exc:
+        print(f"Plot generation skipped: {exc}")
+def parse_response(text: str, phase: int) -> Dict[str, Any]:
+    """Parse model JSON. Invalid output becomes a safe phase-correct action."""
+    if "```" in text:
+        for part in text.split("```"):
+            if "{" in part:
+                text = part.strip().removeprefix("json").strip()
+                break
+    try:
+        parsed = json.loads(text)
+        if phase == 1:
+            return {
+                "threat_type": str(parsed.get("threat_type", "brute_force")),
+                "confidence": float(parsed.get("confidence", 0.5)),
+                "target_node": str(parsed.get("target_node", "auth_service")),
+                "recommended_action": str(parsed.get("recommended_action", "monitor")),
+                "reasoning": str(parsed.get("reasoning", "")),
+            }
+        return {
+            "action": str(parsed.get("action", "monitor")),
+            "target_node": str(parsed.get("target_node", "auth_service")),
+            "reasoning": str(parsed.get("reasoning", "")),
+        }
+    except Exception:
+        if phase == 1:
+            return {
+                "threat_type": "brute_force",
+                "confidence": 0.5,
+                "target_node": "auth_service",
+                "recommended_action": "monitor",
+                "reasoning": "parse_error",
+            }
+        return {
+            "action": "monitor",
+            "target_node": "auth_service",
+            "reasoning": "parse_error",
+        }
+def render_messages(messages: List[Dict[str, str]], tokenizer: Any | None = None) -> str:
+    if tokenizer is not None and hasattr(tokenizer, "apply_chat_template"):
+        return tokenizer.apply_chat_template(
+            messages,
+            tokenize=False,
+            add_generation_prompt=True,
+        )
+    return "\n\n".join(
+        f"{message.get('role', 'user').upper()}:\n{message.get('content', '')}"
+        for message in messages
+    )
+def generate_response(model: Any, tokenizer: Any, messages: List[Dict[str, str]]) -> Tuple[str, str]:
+    import torch
+    prompt = render_messages(messages, tokenizer=tokenizer)
+    device = getattr(model, "device", None)
+    if device is None:
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    inputs = tokenizer(prompt, return_tensors="pt").to(device)
+    pad_token_id = (
+        tokenizer.pad_token_id
+        if getattr(tokenizer, "pad_token_id", None) is not None
+        else tokenizer.eos_token_id
+    )
+    with torch.no_grad():
+        _normalize_generation_config(model)
+        output_ids = model.generate(
+            **inputs,
+            max_new_tokens=220,
+            temperature=0.7,
+            do_sample=True,
+            pad_token_id=pad_token_id,
+        )
+    new_ids = output_ids[0][inputs["input_ids"].shape[1]:]
+    response = tokenizer.decode(new_ids, skip_special_tokens=True).strip()
+    return prompt, response
+def _current_reference(env: AdaptShieldEnvironment) -> Dict[str, Any]:
+    turn_config = dict(getattr(env, "_turn_config", {}) or {})
+    is_benign = bool(turn_config.get("is_benign", False))
+    threat_type = "benign" if is_benign else str(turn_config.get("strategy", "benign"))
+    target_node = str(turn_config.get("correct_target", "auth_service"))
+    expected_action = str(turn_config.get("correct_action", "monitor"))
+    return {
+        "threat_type": threat_type,
+        "target_node": target_node,
+        "expected_action": expected_action,
+        "stage": str(turn_config.get("attack_stage", getattr(env._attacker, "current_stage", lambda: "recon")())),
+        "is_benign": is_benign,
+    }
+def _align_trainable_dtypes(model: Any, target_dtype: Any | None = None) -> str:
+    """Keep LoRA/trainable params on the same compute dtype as the main model.
+    Some adapter checkpoints reload trainable LoRA weights as float32, while
+    Unsloth GRPO kernels run activations in float16/bfloat16. That mismatch
+    trips fast_lora matmuls at runtime. We fix only trainable floating params.
+    """
+    import torch
+    if target_dtype is None:
+        for param in model.parameters():
+            if param.is_floating_point() and not param.requires_grad:
+                target_dtype = param.dtype
+                break
+    if target_dtype is None:
+        for param in model.parameters():
+            if param.is_floating_point():
+                target_dtype = param.dtype
+                break
+    if target_dtype is None:
+        return "no-floating-params"
+    converted = 0
+    for param in model.parameters():
+        if param.requires_grad and param.is_floating_point() and param.dtype != target_dtype:
+            param.data = param.data.to(target_dtype)
+            converted += 1
+    for buffer_name, buffer in model.named_buffers():
+        if "lora_" in buffer_name and buffer.is_floating_point() and buffer.dtype != target_dtype:
+            buffer.data = buffer.data.to(target_dtype)
+    if getattr(model, "generation_config", None) is not None:
+        _normalize_generation_config(model)
+    return f"{target_dtype} ({converted} trainable params aligned)"
+def _normalize_generation_config(model: Any) -> None:
+    generation_config = getattr(model, "generation_config", None)
+    if generation_config is None:
+        return
+    for field in ("max_length",):
+        try:
+            setattr(generation_config, field, None)
+        except Exception:
+            continue
+def _load_training_model_and_tokenizer(
+    model_name: str,
+    model_key: str,
+    max_seq_length: int,
+    compute_dtype: Any,
+    seed: int,
+):
+    from unsloth import FastLanguageModel
+    adapter_path = model_name if _looks_like_adapter_path(model_name) else ""
+    base_model_name = MODEL_CHOICES[model_key] if adapter_path else model_name
+    model, tokenizer = FastLanguageModel.from_pretrained(
+        model_name=base_model_name,
+        max_seq_length=max_seq_length,
+        load_in_4bit=True,
+        dtype=compute_dtype,
+    )
+    if adapter_path:
+        from peft import PeftModel
+        model = PeftModel.from_pretrained(
+            model,
+            adapter_path,
+            is_trainable=True,
+            autocast_adapter_dtype=False,
+        )
+        try:
+            from transformers import AutoTokenizer
+            tokenizer = AutoTokenizer.from_pretrained(adapter_path, trust_remote_code=True)
+        except Exception:
+            pass
+    else:
+        model = FastLanguageModel.get_peft_model(
+            model,
+            r=LORA_RANK,
+            target_modules=[
+                "q_proj", "k_proj", "v_proj", "o_proj",
+                "gate_proj", "up_proj", "down_proj",
+            ],
+            lora_alpha=LORA_RANK * 2,
+            lora_dropout=0.0,
+            bias="none",
+            use_gradient_checkpointing="unsloth",
+            random_state=seed,
+        )
+    return model, tokenizer
+def _teacher_payload(phase: int, reference: Dict[str, Any]) -> Dict[str, Any]:
+    if phase == 1:
+        return {
+            "threat_type": reference["threat_type"],
+            "confidence": 0.92 if reference["threat_type"] != "benign" else 0.78,
+            "target_node": reference["target_node"],
+            "recommended_action": reference["expected_action"],
+            "reasoning": "reference policy",
+        }
+    return {
+        "action": reference["expected_action"],
+        "target_node": reference["target_node"],
+        "reasoning": "reference policy",
+    }
+def build_prompt_bank(
+    tokenizer: Any | None,
+    selected_task: str,
+    curriculum: bool,
+    rollout_episodes: int,
+    max_steps: int,
+    use_tools: bool,
+    seed: int,
+    world_split: str = "train",
+    world_family: str | None = None,
+    hard_multiplier: int = 2,
+    borderline_bonus: int = 1,
+) -> List[Dict[str, Any]]:
+    random.seed(seed)
+    rows: List[Dict[str, Any]] = []
+    for episode in range(1, rollout_episodes + 1):
+        task, stage = task_for_episode(
+            episode=episode,
+            total_episodes=rollout_episodes,
+            selected_task=selected_task,
+            curriculum=curriculum,
+        )
+        env = AdaptShieldEnvironment(
+            task_name=task,
+            world_split=world_split,
+            world_family=world_family,
+        )
+        obs = env.reset()
+        step_count = 0
+        while not obs.done and step_count < max_steps:
+            phase = int(getattr(obs, "phase", 1))
+            tool_results = investigate_local_with_depth(
+                env,
+                obs,
+                use_tools=use_tools,
+                thorough=True,
+            )
+            obs_dict = attach_tool_results(obs_to_dict(obs), tool_results)
+            messages = build_messages(obs_dict)
+            reference = _current_reference(env)
+            rows.append({
+                "prompt": render_messages(messages, tokenizer=tokenizer),
+                "task": task,
+                "stage": stage,
+                "phase": phase,
+                "turn": int(getattr(obs, "turn", 0) or 0),
+                "attack_stage": reference["stage"],
+                "world_split": getattr(env, "_world_split", world_split),
+                "world_family": getattr(env, "_world_family", world_family or ""),
+                "operational_mode": getattr(env, "_operational_mode", ""),
+                "expected_threat_type": reference["threat_type"],
+                "expected_target_node": reference["target_node"],
+                "expected_recommended_action": reference["expected_action"] if phase == 1 else "",
+                "expected_action": reference["expected_action"] if phase == 2 else "",
+                "tool_calls": len(tool_results),
+                "history_length": len(obs_dict.get("history", [])),
+                "difficulty_tags": _difficulty_tags(
+                    task=task,
+                    phase=phase,
+                    attack_stage=reference["stage"],
+                    tool_calls=len(tool_results),
+                    handoff_quality=str((obs_dict.get("phase1_assessment") or {}).get("handoff_quality", "")),
+                ),
+            })
+            base_row = rows[-1]
+            for _ in range(_prompt_bank_extra_copies(
+                row=base_row,
+                hard_multiplier=hard_multiplier,
+                borderline_bonus=borderline_bonus,
+            )):
+                rows.append(dict(base_row))
+            obs = env.step(AdaptShieldAction(**_teacher_payload(phase, reference)))
+            step_count += 1
+    return rows
+def _difficulty_tags(
+    task: str,
+    phase: int,
+    attack_stage: str,
+    tool_calls: int,
+    handoff_quality: str,
+) -> List[str]:
+    tags: List[str] = []
+    if task == "polymorphic-zero-day":
+        tags.append("hard")
+    elif task == "dual-pivot":
+        tags.append("medium")
+    if phase == 2:
+        tags.append("phase2")
+    if attack_stage in {"exploit", "exfiltration"}:
+        tags.append("late_stage")
+    if tool_calls >= 3:
+        tags.append("tool_fusion")
+    if handoff_quality == "degraded":
+        tags.append("borderline")
+    return tags
+def _prompt_bank_extra_copies(
+    row: Dict[str, Any],
+    hard_multiplier: int,
+    borderline_bonus: int,
+) -> int:
+    tags = set(row.get("difficulty_tags", []) or [])
+    extra = 0
+    if row.get("task") == "polymorphic-zero-day":
+        extra += max(0, hard_multiplier - 1)
+    elif row.get("task") == "dual-pivot" and "late_stage" in tags:
+        extra += 1
+    if "borderline" in tags or ("phase2" in tags and "tool_fusion" in tags and "late_stage" in tags):
+        extra += max(0, borderline_bonus)
+    return extra
+def _completion_to_text(completion: Any) -> str:
+    if isinstance(completion, str):
+        return completion
+    if isinstance(completion, dict):
+        if "content" in completion:
+            return str(completion.get("content", ""))
+        if "text" in completion:
+            return str(completion.get("text", ""))
+    if isinstance(completion, list):
+        parts = []
+        for item in completion:
+            if isinstance(item, dict):
+                parts.append(str(item.get("content", item.get("text", ""))))
+            else:
+                parts.append(str(item))
+        return "".join(parts)
+    return str(completion)
+def _phase1_reward(
+    parsed: Dict[str, Any],
+    expected_threat_type: str,
+    expected_target_node: str,
+    expected_recommended_action: str,
+) -> float:
+    reward = 0.08
+    if parsed.get("threat_type") == expected_threat_type:
+        reward += 0.36
+    if parsed.get("target_node") == expected_target_node:
+        reward += 0.20
+    if parsed.get("recommended_action") == expected_recommended_action:
+        reward += 0.18
+    try:
+        confidence = float(parsed.get("confidence", 0.5))
+    except Exception:
+        confidence = 0.5
+    if 0.0 <= confidence <= 1.0:
+        reward += 0.05
+    if parsed.get("threat_type") == expected_threat_type and confidence >= 0.65:
+        reward += 0.06
+    elif parsed.get("threat_type") != expected_threat_type and confidence >= 0.80:
+        reward -= 0.05
+    if parsed.get("recommended_action") == "monitor" and expected_threat_type != "benign":
+        reward -= 0.05
+    return max(0.01, min(0.99, round(reward, 2)))
+def _phase2_reward(
+    parsed: Dict[str, Any],
+    expected_action: str,
+    expected_target_node: str,
+    tool_calls: int,
+) -> float:
+    reward = 0.08
+    if parsed.get("action") == expected_action:
+        reward += 0.62
+    if parsed.get("target_node") == expected_target_node:
+        reward += 0.18
+    if parsed.get("action") == expected_action and tool_calls >= 2:
+        reward += 0.07
+    if parsed.get("action") == "monitor" and expected_action != "monitor":
+        reward -= 0.08
+    return max(0.01, min(0.99, round(reward, 2)))
+def build_grpo_reward_fn():
+    def reward_fn(completions: List[Any], **kwargs: Any) -> List[float]:
+        phases = kwargs.get("phase", [])
+        expected_threat_types = kwargs.get("expected_threat_type", [])
+        expected_targets = kwargs.get("expected_target_node", [])
+        expected_recommended_actions = kwargs.get("expected_recommended_action", [])
+        expected_actions = kwargs.get("expected_action", [])
+        tool_calls = kwargs.get("tool_calls", [])
+        rewards: List[float] = []
+        for index, completion in enumerate(completions):
+            phase = int(phases[index]) if phases else 1
+            text = _completion_to_text(completion)
+            parsed = parse_response(text, phase)
+            if phase == 1:
+                reward = _phase1_reward(
+                    parsed=parsed,
+                    expected_threat_type=str(expected_threat_types[index]),
+                    expected_target_node=str(expected_targets[index]),
+                    expected_recommended_action=str(expected_recommended_actions[index]),
+                )
+            else:
+                reward = _phase2_reward(
+                    parsed=parsed,
+                    expected_action=str(expected_actions[index]),
+                    expected_target_node=str(expected_targets[index]),
+                    tool_calls=int(tool_calls[index]) if tool_calls else 0,
+                )
+            rewards.append(reward)
+        return rewards
+    return reward_fn
+def _filter_supported_kwargs(callable_obj: Any, kwargs: Dict[str, Any]) -> Dict[str, Any]:
+    try:
+        signature = inspect.signature(callable_obj)
+    except (TypeError, ValueError):
+        return kwargs
+    valid = {}
+    for key, value in kwargs.items():
+        if key in signature.parameters:
+            valid[key] = value
+    return valid
+def _trainer_log_rows(log_history: List[Dict[str, Any]], selected_task: str) -> List[Dict[str, Any]]:
+    rows: List[Dict[str, Any]] = []
+    for entry in log_history:
+        step = entry.get("step")
+        if step is None:
+            continue
+        reward_keys = [
+            "reward",
+            "mean_reward",
+            "rewards/mean",
+            "objective",
+            "objective/rlhf_reward",
+        ]
+        score = None
+        for key in reward_keys:
+            if key in entry:
+                try:
+                    score = float(entry[key])
+                    break
+                except Exception:
+                    continue
+        if score is None:
+            score = 0.50
+        row = {
+            "episode": int(step),
+            "task": "mixed" if selected_task == "all" else selected_task,
+            "stage": "grpo",
+            "score": max(0.01, min(0.99, score)),
+            "loss": float(entry.get("loss", 0.0) or 0.0),
+            "learning_rate": float(entry.get("learning_rate", 0.0) or 0.0),
+        }
+        rows.append(row)
+    return rows
+def evaluate_model_suite(
+    model: Any,
+    tokenizer: Any,
+    selected_task: str,
+    eval_episodes: int,
+    max_steps: int,
+    use_tools: bool,
+    world_split: str = "train",
+    world_family: str | None = None,
+    seed_start: int | None = None,
+) -> List[Dict[str, Any]]:
+    tasks = TASKS if selected_task == "all" else [selected_task]
+    rows: List[Dict[str, Any]] = []
+    for task in tasks:
+        scores: List[float] = []
+        steps: List[int] = []
+        tool_calls: List[int] = []
+        original_seed = os.environ.get("ADAPTSHIELD_SEED")
+        for episode_index in range(eval_episodes):
+            if seed_start is not None:
+                os.environ["ADAPTSHIELD_SEED"] = str(seed_start + len(rows) * 100 + episode_index)
+            _, metrics = run_model_episode(
+                model=model,
+                tokenizer=tokenizer,
+                task=task,
+                max_steps=max_steps,
+                use_tools=use_tools,
+                world_split=world_split,
+                world_family=world_family,
+            )
+            scores.append(float(metrics["score"]))
+            steps.append(int(metrics["steps"]))
+            tool_calls.append(int(metrics["tool_calls"]))
+        if original_seed is None:
+            os.environ.pop("ADAPTSHIELD_SEED", None)
+        else:
+            os.environ["ADAPTSHIELD_SEED"] = original_seed
+        rows.append({
+            "episode": len(rows) + 1,
+            "task": task,
+            "stage": "evaluation",
+            "score": round(sum(scores) / len(scores), 3) if scores else 0.50,
+            "steps": round(sum(steps) / len(steps), 2) if steps else 0.0,
+            "tool_calls": round(sum(tool_calls) / len(tool_calls), 2) if tool_calls else 0.0,
+            "eval_episodes": eval_episodes,
+            "world_split": world_split,
+            "world_family": world_family or "auto",
+        })
+    return rows
+def run_model_episode(
+    model: Any,
+    tokenizer: Any,
+    task: str,
+    max_steps: int,
+    use_tools: bool,
+    world_split: str = "train",
+    world_family: str | None = None,
+) -> Tuple[List[Dict[str, Any]], Dict[str, Any]]:
+    env = AdaptShieldEnvironment(
+        task_name=task,
+        world_split=world_split,
+        world_family=world_family,
+    )
+    obs = env.reset()
+    samples: List[Dict[str, Any]] = []
+    rewards: List[float] = []
+    tool_calls = 0
+    while not obs.done and len(samples) < max_steps:
+        phase = int(getattr(obs, "phase", 1))
+        tool_results = investigate_local_with_depth(
+            env,
+            obs,
+            use_tools=use_tools,
+            thorough=True,
+        )
+        tool_calls += len(tool_results)
+        obs_dict = obs_to_dict(obs)
+        obs_dict = attach_tool_results(obs_dict, tool_results)
+        messages = build_messages(obs_dict)
+        prompt, response = generate_response(model, tokenizer, messages)
+        payload = parse_response(response, phase)
+        try:
+            obs = env.step(AdaptShieldAction(**payload))
+            reward = float(obs.reward)
+        except Exception as exc:
+            reward = 0.01
+            samples.append({
+                "prompt": prompt,
+                "response": response,
+                "reward": reward,
+                "phase": phase,
+                "tool_calls": len(tool_results),
+                "error": str(exc),
+            })
+            break
+        rewards.append(reward)
+        samples.append({
+            "prompt": prompt,
+            "response": response,
+            "reward": reward,
+            "phase": phase,
+            "tool_calls": len(tool_results),
+            "error": None,
+        })
+    metadata = obs.metadata if isinstance(obs.metadata, dict) else {}
+    if "normalized_score" not in metadata:
+        raise RuntimeError("normalized_score missing after training episode")
+    return samples, {
+        "score": float(metadata["normalized_score"]),
+        "steps": len(samples),
+        "reward_sum": sum(rewards),
+        "mean_reward": sum(rewards) / len(rewards) if rewards else 0.0,
+        "tool_calls": tool_calls,
+        "world_split": world_split,
+        "world_family": metadata.get("world_family", world_family or "auto"),
+        "operational_mode": metadata.get("operational_mode", "unknown"),
+    }
+def train_policy_gradient(args: argparse.Namespace) -> None:
+    import torch
+    from torch.optim import AdamW
+    random.seed(args.seed)
+    torch.manual_seed(args.seed)
+    model_name = args.model_path or MODEL_CHOICES[args.model]
+    output_dir = Path(args.output)
+    output_dir.mkdir(parents=True, exist_ok=True)
+    print("AdaptShield policy-gradient GPU training")
+    print(f"Task: {args.task}")
+    print(f"Curriculum: {args.curriculum}")
+    print(f"Use tools: {args.use_tools}")
+    print(f"Model: {model_name}")
+    print(f"Episodes: {args.episodes}")
+    print(f"Output: {output_dir}")
+    print()
+    model, tokenizer = _load_training_model_and_tokenizer(
+        model_name=model_name,
+        model_key=args.model,
+        max_seq_length=MAX_SEQ_LEN,
+        compute_dtype=None,
+        seed=args.seed,
+    )
+    from unsloth import FastLanguageModel
+    FastLanguageModel.for_training(model)
+    dtype_summary = _align_trainable_dtypes(model)
+    print(f"Aligned trainable parameter dtypes: {dtype_summary}")
+    optimizer = AdamW(model.parameters(), lr=args.lr, weight_decay=0.01)
+    rows: List[Dict[str, Any]] = []
+    best_score = -1.0
+    for episode in range(1, args.episodes + 1):
+        started = time.time()
+        task, stage = task_for_episode(
+            episode=episode,
+            total_episodes=args.episodes,
+            selected_task=args.task,
+            curriculum=args.curriculum,
+        )
+        samples, metrics = run_model_episode(
+            model=model,
+            tokenizer=tokenizer,
+            task=task,
+            max_steps=args.max_steps,
+            use_tools=args.use_tools,
+            world_split=args.train_world_split,
+        )
+        rewards = [float(sample["reward"]) for sample in samples]
+        baseline = sum(rewards) / len(rewards) if rewards else 0.0
+        total_loss = 0.0
+        for sample in samples:
+            advantage = float(sample["reward"]) - baseline
+            full_text = sample["prompt"] + sample["response"] + tokenizer.eos_token
+            inputs = tokenizer(
+                full_text,
+                return_tensors="pt",
+                truncation=True,
+                max_length=MAX_SEQ_LEN,
+            ).to("cuda")
+            outputs = model(**inputs, labels=inputs["input_ids"])
+            loss = outputs.loss * (-advantage)
+            optimizer.zero_grad()
+            loss.backward()
+            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+            optimizer.step()
+            total_loss += float(loss.item())
+        row = {
+            "episode": episode,
+            "task": task,
+            "stage": stage,
+            "score": metrics["score"],
+            "steps": metrics["steps"],
+            "reward_sum": metrics["reward_sum"],
+            "mean_reward": metrics["mean_reward"],
+            "tool_calls": metrics["tool_calls"],
+            "loss": total_loss,
+            "seconds": round(time.time() - started, 2),
+        }
+        rows.append(row)
+        print(
+            f"episode={episode:03d} task={task:<20} "
+            f"stage={stage:<32} "
+            f"score={row['score']:.3f} mean_reward={row['mean_reward']:.3f} "
+            f"loss={row['loss']:.4f} steps={row['steps']:02d} tools={row['tool_calls']:02d}"
+        )
+        if row["score"] > best_score:
+            best_score = row["score"]
+            model.save_pretrained(output_dir / "best")
+            tokenizer.save_pretrained(output_dir / "best")
+        if args.save_every and episode % args.save_every == 0:
+            model.save_pretrained(output_dir / f"checkpoint-{episode}")
+            tokenizer.save_pretrained(output_dir / f"checkpoint-{episode}")
+    model.save_pretrained(output_dir / "final")
+    tokenizer.save_pretrained(output_dir / "final")
+    evaluation_rows = evaluate_model_suite(
+        model=model,
+        tokenizer=tokenizer,
+        selected_task=args.task,
+        eval_episodes=args.eval_episodes,
+        max_steps=args.max_steps,
+        use_tools=args.use_tools,
+        world_split=args.train_world_split,
+        seed_start=args.heldout_seed,
+    )
+    heldout_evaluation_rows = evaluate_model_suite(
+        model=model,
+        tokenizer=tokenizer,
+        selected_task=args.task,
+        eval_episodes=args.eval_episodes,
+        max_steps=args.max_steps,
+        use_tools=args.use_tools,
+        world_split=args.heldout_world_split,
+        seed_start=args.heldout_seed,
+    )
+    metrics_path = save_metrics(
+        output_dir=output_dir,
+        rows=rows,
+        model_name=model_name,
+        episodes=args.episodes,
+        curriculum=args.curriculum,
+        use_tools=args.use_tools,
+        trainer="pg",
+        evaluation_rows=evaluation_rows,
+        heldout_evaluation_rows=heldout_evaluation_rows,
+        extra={
+            "train_world_split": args.train_world_split,
+            "heldout_world_split": args.heldout_world_split,
+            "heldout_seed": args.heldout_seed,
+        },
+    )
+    if args.plot:
+        maybe_plot(metrics_path, output_dir)
+    print()
+    print(f"Training complete. Best score: {best_score:.3f}")
+    print("Post-train online evaluation:")
+    for row in evaluation_rows:
+        print(
+            f"  task={row['task']:<20} score={row['score']:.3f} "
+            f"steps={row['steps']} tools={row['tool_calls']}"
+        )
+    print("Held-out family evaluation:")
+    for row in heldout_evaluation_rows:
+        print(
+            f"  task={row['task']:<20} score={row['score']:.3f} "
+            f"steps={row['steps']} tools={row['tool_calls']}"
+        )
+    print(f"Metrics saved to: {metrics_path}")
+def train_grpo(args: argparse.Namespace) -> None:
+    from datasets import Dataset
+    from trl import GRPOConfig, GRPOTrainer
+    import torch
+    random.seed(args.seed)
+    torch.manual_seed(args.seed)
+    model_name = args.model_path or MODEL_CHOICES[args.model]
+    output_dir = Path(args.output)
+    output_dir.mkdir(parents=True, exist_ok=True)
+    print("AdaptShield GRPO training")
+    print(f"Task: {args.task}")
+    print(f"Curriculum: {args.curriculum}")
+    print(f"Use tools: {args.use_tools}")
+    print(f"Model: {model_name}")
+    print(f"Prompt-bank episodes: {args.prompt_bank_episodes}")
+    print(f"GRPO epochs: {args.grpo_epochs}")
+    print(f"Eval episodes: {args.eval_episodes}")
+    print(f"Output: {output_dir}")
+    print()
+    bf16_supported = bool(getattr(torch.cuda, "is_bf16_supported", lambda: False)())
+    compute_dtype = torch.bfloat16 if bf16_supported else torch.float16
+    model, tokenizer = _load_training_model_and_tokenizer(
+        model_name=model_name,
+        model_key=args.model,
+        max_seq_length=MAX_SEQ_LEN,
+        compute_dtype=compute_dtype,
+        seed=args.seed,
+    )
+    from unsloth import FastLanguageModel
+    if getattr(tokenizer, "pad_token", None) is None:
+        tokenizer.pad_token = tokenizer.eos_token
+    if getattr(model, "config", None) is not None:
+        try:
+            model.config.return_dict = True
+        except Exception:
+            pass
+        try:
+            model.config.use_cache = False
+        except Exception:
+            pass
+    if getattr(model, "generation_config", None) is not None:
+        try:
+            model.generation_config.pad_token_id = tokenizer.pad_token_id
+        except Exception:
+            pass
+    FastLanguageModel.for_training(model)
+    dtype_summary = _align_trainable_dtypes(model, target_dtype=compute_dtype)
+    print(f"Using GRPO compute dtype: {compute_dtype}")
+    print(f"Aligned trainable parameter dtypes: {dtype_summary}")
+    prompt_bank = build_prompt_bank(
+        tokenizer=tokenizer,
+        selected_task=args.task,
+        curriculum=args.curriculum,
+        rollout_episodes=args.prompt_bank_episodes,
+        max_steps=args.max_steps,
+        use_tools=args.use_tools,
+        seed=args.seed,
+        world_split=args.train_world_split,
+        hard_multiplier=args.prompt_bank_hard_multiplier,
+        borderline_bonus=args.prompt_bank_borderline_bonus,
+    )
+    if not prompt_bank:
+        raise RuntimeError("Prompt bank is empty; cannot start GRPO training.")
+    dataset = Dataset.from_list(prompt_bank)
+    reward_fn = build_grpo_reward_fn()
+    config_kwargs = {
+        "output_dir": str(output_dir),
+        "learning_rate": args.lr,
+        "per_device_train_batch_size": args.per_device_batch_size,
+        "gradient_accumulation_steps": args.gradient_accumulation_steps,
+        "num_train_epochs": args.grpo_epochs,
+        "max_prompt_length": MAX_SEQ_LEN - 256,
+        "max_completion_length": 256,
+        "num_generations": args.num_generations,
+        "logging_steps": 1,
+        "save_strategy": "no" if args.save_every <= 0 else "steps",
+        "report_to": "none",
+        "remove_unused_columns": False,
+        "bf16": bf16_supported,
+        "fp16": not bf16_supported,
+        "max_grad_norm": 1.0,
+        "seed": args.seed,
+    }
+    if args.save_every > 0:
+        config_kwargs["save_steps"] = args.save_every
+    grpo_config = GRPOConfig(**_filter_supported_kwargs(GRPOConfig, config_kwargs))
+    trainer_kwargs = {
+        "model": model,
+        "reward_funcs": [reward_fn],
+        "args": grpo_config,
+        "train_dataset": dataset,
+        "processing_class": tokenizer,
+        "tokenizer": tokenizer,
+    }
+    trainer = GRPOTrainer(**_filter_supported_kwargs(GRPOTrainer, trainer_kwargs))
+    trainer.train()
+    model.save_pretrained(output_dir / "final")
+    tokenizer.save_pretrained(output_dir / "final")
+    log_history = list(getattr(getattr(trainer, "state", None), "log_history", []) or [])
+    train_rows = _trainer_log_rows(log_history, selected_task=args.task)
+    if not train_rows:
+        train_rows = [{
+            "episode": index + 1,
+            "task": "mixed" if args.task == "all" else args.task,
+            "stage": "grpo",
+            "score": 0.50,
+        } for index in range(max(1, args.grpo_epochs))]
+    try:
+        evaluation_rows = evaluate_model_suite(
+            model=model,
+            tokenizer=tokenizer,
+            selected_task=args.task,
+            eval_episodes=args.eval_episodes,
+            max_steps=args.max_steps,
+            use_tools=args.use_tools,
+            world_split=args.train_world_split,
+            seed_start=args.heldout_seed,
+        )
+    except Exception as exc:
+        print(f"GRPO in-distribution evaluation failed: {exc}")
+        evaluation_rows = []
+    try:
+        heldout_evaluation_rows = evaluate_model_suite(
+            model=model,
+            tokenizer=tokenizer,
+            selected_task=args.task,
+            eval_episodes=args.eval_episodes,
+            max_steps=args.max_steps,
+            use_tools=args.use_tools,
+            world_split=args.heldout_world_split,
+            seed_start=args.heldout_seed,
+        )
+    except Exception as exc:
+        print(f"GRPO held-out evaluation failed: {exc}")
+        heldout_evaluation_rows = []
+    metrics_path = save_metrics(
+        output_dir=output_dir,
+        rows=train_rows,
+        model_name=model_name,
+        episodes=max(1, len(train_rows)),
+        curriculum=args.curriculum,
+        use_tools=args.use_tools,
+        trainer="grpo",
+        evaluation_rows=evaluation_rows,
+        heldout_evaluation_rows=heldout_evaluation_rows,
+        prompt_bank_size=len(prompt_bank),
+        extra={
+            "train_world_split": args.train_world_split,
+            "heldout_world_split": args.heldout_world_split,
+            "heldout_seed": args.heldout_seed,
+            "base_model": model_name,
+        },
+    )
+    if args.plot:
+        maybe_plot(metrics_path, output_dir)
+    print("GRPO training complete.")
+    print(f"Prompt bank size: {len(prompt_bank)}")
+    print("Post-train online evaluation:")
+    for row in evaluation_rows:
+        print(
+            f"  task={row['task']:<20} score={row['score']:.3f} "
+            f"steps={row['steps']} tools={row['tool_calls']}"
+        )
+    print("Held-out family evaluation:")
+    for row in heldout_evaluation_rows:
+        print(
+            f"  task={row['task']:<20} score={row['score']:.3f} "
+            f"steps={row['steps']} tools={row['tool_calls']}"
+        )
+    if log_history:
+        final_keys = sorted(log_history[-1].keys())
+        print(f"Trainer log keys: {final_keys}")
+    print(f"Metrics saved to: {metrics_path}")
+def _looks_like_adapter_path(model_name: str) -> bool:
+    path = Path(str(model_name))
+    return path.exists() and (path / "adapter_config.json").exists()
+def run_fallback_smoke(args: argparse.Namespace) -> None:
+    if args.use_tools:
+        run_tool_fallback_smoke(args)
+        return
+    if args.curriculum:
+        tasks = [
+            task_for_episode(
+                episode=episode,
+                total_episodes=min(args.episodes, args.smoke_episodes),
+                selected_task=args.task,
+                curriculum=True,
+            )[0]
+            for episode in range(1, min(args.episodes, args.smoke_episodes) + 1)
+        ]
+    else:
+        tasks = TASKS if args.task == "all" else [args.task]
+    rows = run_smoke_training(
+        tasks=tasks,
+        episodes=min(args.episodes, args.smoke_episodes),
+        output=Path(args.output) / "train_smoke.csv",
+        seed=args.seed,
+        epsilon=0.85,
+        epsilon_decay=0.94,
+        epsilon_floor=0.08,
+        lr=0.35,
+        max_steps=args.max_steps,
+    )
+    output_dir = Path(args.output)
+    metrics_rows = []
+    for row in rows:
+        row = dict(row)
+        episode = int(row["episode"])
+        _, stage = task_for_episode(
+            episode=episode,
+            total_episodes=min(args.episodes, args.smoke_episodes),
+            selected_task=args.task,
+            curriculum=args.curriculum,
+        )
+        row["stage"] = stage
+        metrics_rows.append(row)
+    metrics_path = save_metrics(
+        output_dir=output_dir,
+        rows=metrics_rows,
+        model_name="smoke-tabular-policy",
+        episodes=min(args.episodes, args.smoke_episodes),
+        curriculum=args.curriculum,
+        use_tools=False,
+    )
+    print(f"Metrics saved to: {metrics_path}")
+    if args.plot:
+        maybe_plot(metrics_path, output_dir)
+def run_tool_fallback_smoke(args: argparse.Namespace) -> None:
+    """No-GPU tool-aware rehearsal. This validates flow, not model learning."""
+    from tool_baseline import run_task as run_tool_task
+    total = min(args.episodes, args.smoke_episodes)
+    if args.curriculum:
+        tasks = [
+            task_for_episode(
+                episode=episode,
+                total_episodes=total,
+                selected_task=args.task,
+                curriculum=True,
+            )[0]
+            for episode in range(1, total + 1)
+        ]
+    else:
+        tasks = TASKS if args.task == "all" else [args.task]
+    print("AdaptShield tool-aware smoke evaluation")
+    print("Mode: no-GPU flow validation, not model learning")
+    print(f"Tasks: {', '.join(tasks)}")
+    print(f"Episodes: {total}")
+    print()
+    rows: List[Dict[str, Any]] = []
+    for episode in range(1, total + 1):
+        task = tasks[(episode - 1) % len(tasks)]
+        result = run_tool_task(task, emit_logs=False)
+        metadata = result.get("metadata", {})
+        tool_calls = len(metadata.get("tool_trace", [])) if isinstance(metadata, dict) else 0
+        _, stage = task_for_episode(
+            episode=episode,
+            total_episodes=total,
+            selected_task=args.task,
+            curriculum=args.curriculum,
+        )
+        row = {
+            "episode": episode,
+            "task": task,
+            "stage": stage,
+            "score": result["score"],
+            "steps": result["steps"],
+            "reward_sum": sum(result["rewards"]),
+            "mean_reward": sum(result["rewards"]) / len(result["rewards"]) if result["rewards"] else 0.0,
+            "tool_calls": tool_calls,
+            "status": "PASS" if result["success"] else "FAIL",
+        }
+        rows.append(row)
+        print(
+            f"episode={episode:03d} task={task:<20} "
+            f"score={row['score']:.3f} steps={row['steps']:02d} "
+            f"tools={tool_calls:02d} {row['status']}"
+        )
+    output_dir = Path(args.output)
+    metrics_path = save_metrics(
+        output_dir=output_dir,
+        rows=rows,
+        model_name="tool-aware-smoke-policy",
+        episodes=total,
+        curriculum=args.curriculum,
+        use_tools=True,
+    )
+    print(f"Metrics saved to: {metrics_path}")
+    if args.plot:
+        maybe_plot(metrics_path, output_dir)
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="AdaptShield training harness.")
+    parser.add_argument("--task", default="direct-triage", choices=TASKS + ["all"])
+    parser.add_argument("--model", default=DEFAULT_MODEL, choices=list(MODEL_CHOICES))
+    parser.add_argument("--model-path", default="", help="Optional local/HF adapter path to continue training from.")
+    parser.add_argument("--episodes", type=int, default=60)
+    parser.add_argument("--max-steps", type=int, default=30)
+    parser.add_argument("--output", default="checkpoints/adaptshield")
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--lr", type=float, default=1e-5)
+    parser.add_argument("--save-every", type=int, default=20)
+    parser.add_argument("--smoke", action="store_true", help="Force dependency-free smoke mode.")
+    parser.add_argument("--smoke-episodes", type=int, default=30)
+    parser.add_argument("--curriculum", action="store_true", help="Train direct -> dual -> hard instead of fixed/round-robin tasks.")
+    parser.add_argument("--use-tools", action="store_true", help="Let GPU training query SOC tools before hard-task actions.")
+    parser.add_argument("--plot", action="store_true", help="Generate reward_curve.png from metrics.json after training.")
+    parser.add_argument("--trainer", default="auto", choices=["auto", "pg", "grpo"], help="Training backend: safe policy-gradient fallback or TRL GRPO.")
+    parser.add_argument("--prompt-bank-episodes", type=int, default=24, help="Reference rollout episodes used to build the GRPO prompt bank.")
+    parser.add_argument("--prompt-bank-hard-multiplier", type=int, default=2, help="Duplicate hard-task GRPO prompts this many times to emphasize difficult slices.")
+    parser.add_argument("--prompt-bank-borderline-bonus", type=int, default=1, help="Extra copies for degraded-handoff / borderline GRPO prompts.")
+    parser.add_argument("--grpo-epochs", type=int, default=1, help="Number of epochs over the prompt bank for GRPO runs.")
+    parser.add_argument("--num-generations", type=int, default=4, help="GRPO generations per prompt when TRL path is active.")
+    parser.add_argument("--per-device-batch-size", type=int, default=1, help="Per-device batch size for GRPO training.")
+    parser.add_argument("--gradient-accumulation-steps", type=int, default=4, help="Gradient accumulation for GRPO training.")
+    parser.add_argument("--eval-episodes", type=int, default=2, help="Online environment episodes per task after GPU training.")
+    parser.add_argument("--train-world-split", default="train", choices=["train", "eval"], help="World split used for training/prompt-bank generation.")
+    parser.add_argument("--heldout-world-split", default="eval", choices=["train", "eval"], help="World split used for held-out evaluation.")
+    parser.add_argument("--heldout-seed", type=int, default=314, help="Seed offset used for held-out evaluation episodes.")
+    return parser.parse_args()
+def main() -> int:
+    args = parse_args()
+    if args.smoke:
+        run_fallback_smoke(args)
+        return 0
+    trainer_choice = args.trainer
+    if trainer_choice == "auto":
+        try:
+            import datasets  # noqa: F401
+            import trl  # noqa: F401
+            trainer_choice = "grpo"
+        except ImportError:
+            trainer_choice = "pg"
+    try:
+        if trainer_choice == "grpo":
+            train_grpo(args)
+        else:
+            train_policy_gradient(args)
+    except ImportError as exc:
+        print(f"GPU training dependency missing for trainer={trainer_choice}: {exc}")
+        if trainer_choice == "grpo":
+            print("Falling back to policy-gradient GPU trainer.")
+            try:
+                train_policy_gradient(args)
+                return 0
+            except ImportError as nested_exc:
+                print(f"Policy-gradient fallback also unavailable: {nested_exc}")
+        print("Falling back to dependency-free smoke training.")
+        run_fallback_smoke(args)
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

train_sft.py ADDED Viewed

	@@ -0,0 +1,611 @@

+#!/usr/bin/env python3
+"""Supervised fine-tuning for AdaptShield chat-style demonstrations."""
+from __future__ import annotations
+import argparse
+import json
+import os
+import random
+from pathlib import Path
+from typing import Any, Dict, List
+from train import (
+    DEFAULT_MODEL,
+    LORA_RANK,
+    MAX_SEQ_LEN,
+    MODEL_CHOICES,
+    _align_trainable_dtypes,
+    _filter_supported_kwargs,
+    _normalize_generation_config,
+    evaluate_model_suite,
+    run_model_episode,
+)
+def load_jsonl(path: Path) -> List[Dict[str, Any]]:
+    rows: List[Dict[str, Any]] = []
+    with path.open("r", encoding="utf-8") as handle:
+        for line in handle:
+            line = line.strip()
+            if not line:
+                continue
+            rows.append(json.loads(line))
+    if not rows:
+        raise RuntimeError(f"No training rows found in {path}")
+    return rows
+def build_loss_plot(log_history: List[Dict[str, Any]], output_path: Path) -> None:
+    try:
+        import matplotlib.pyplot as plt
+    except ImportError:
+        print("matplotlib not installed; skipping loss plot")
+        return
+    xs: List[int] = []
+    ys: List[float] = []
+    for index, entry in enumerate(log_history, start=1):
+        if "loss" not in entry:
+            continue
+        step = int(entry.get("step", index) or index)
+        try:
+            loss = float(entry["loss"])
+        except Exception:
+            continue
+        xs.append(step)
+        ys.append(loss)
+    if not xs:
+        print("No loss entries found; skipping loss plot")
+        return
+    plt.figure(figsize=(10, 5))
+    plt.plot(xs, ys, color="#0f4c81", linewidth=2, label="training loss")
+    plt.xlabel("Training step")
+    plt.ylabel("Loss")
+    plt.title("AdaptShield SFT Loss Curve")
+    plt.grid(alpha=0.3)
+    plt.legend()
+    plt.tight_layout()
+    plt.savefig(output_path, dpi=150, bbox_inches="tight")
+    plt.close()
+def build_reward_plot(rows: List[Dict[str, Any]], output_path: Path) -> None:
+    try:
+        import matplotlib.pyplot as plt
+    except ImportError:
+        print("matplotlib not installed; skipping reward plot")
+        return
+    if not rows:
+        print("No held-out reward rows found; skipping reward plot")
+        return
+    checkpoint_labels = [str(row["checkpoint"]) for row in rows]
+    in_distribution_scores = [float(row["in_distribution_score"]) for row in rows]
+    heldout_scores = [float(row["heldout_score"]) for row in rows]
+    plt.figure(figsize=(10, 5))
+    plt.plot(
+        range(len(rows)),
+        in_distribution_scores,
+        color="#136f63",
+        linewidth=2.5,
+        marker="o",
+        label="in-distribution mean reward",
+    )
+    plt.plot(
+        range(len(rows)),
+        heldout_scores,
+        color="#8a3ffc",
+        linewidth=2.5,
+        marker="s",
+        label="held-out family mean reward",
+    )
+    plt.xticks(range(len(rows)), checkpoint_labels, rotation=35, ha="right")
+    plt.xlabel("Checkpoint")
+    plt.ylabel("normalized_score")
+    plt.title("AdaptShield In-Distribution vs Held-out Reward Curve")
+    plt.ylim(0.0, 1.0)
+    plt.grid(alpha=0.3)
+    plt.legend()
+    plt.tight_layout()
+    plt.savefig(output_path, dpi=150, bbox_inches="tight")
+    plt.close()
+def render_example(example: Dict[str, Any], tokenizer: Any) -> str:
+    if "messages" in example:
+        return tokenizer.apply_chat_template(
+            example["messages"],
+            tokenize=False,
+            add_generation_prompt=False,
+        )
+    return str(example["text"])
+def _checkpoint_sort_key(path: Path) -> tuple[int, str]:
+    if path.name == "final":
+        return (10**9, path.name)
+    if path.name.startswith("checkpoint-"):
+        try:
+            return (int(path.name.split("-", 1)[1]), path.name)
+        except Exception:
+            return (10**8, path.name)
+    return (10**7, path.name)
+def checkpoint_dirs(output_dir: Path) -> List[Path]:
+    checkpoints = [
+        path for path in output_dir.iterdir()
+        if path.is_dir() and (path.name.startswith("checkpoint-") or path.name == "final")
+    ]
+    return sorted(checkpoints, key=_checkpoint_sort_key)
+def evaluate_suite_with_seed(
+    model: Any,
+    tokenizer: Any,
+    selected_task: str,
+    eval_episodes: int,
+    max_steps: int,
+    use_tools: bool,
+    seed_start: int,
+    world_split: str,
+    world_family: str | None = None,
+) -> List[Dict[str, Any]]:
+    tasks = ["direct-triage", "dual-pivot", "polymorphic-zero-day"] if selected_task == "all" else [selected_task]
+    rows: List[Dict[str, Any]] = []
+    original_seed = os.environ.get("ADAPTSHIELD_SEED")
+    try:
+        for task_index, task in enumerate(tasks):
+            scores: List[float] = []
+            steps: List[int] = []
+            tool_calls: List[int] = []
+            for episode_index in range(eval_episodes):
+                os.environ["ADAPTSHIELD_SEED"] = str(seed_start + task_index * 100 + episode_index)
+                try:
+                    _, metrics = run_model_episode(
+                        model=model,
+                        tokenizer=tokenizer,
+                        task=task,
+                        max_steps=max_steps,
+                        use_tools=use_tools,
+                        world_split=world_split,
+                        world_family=world_family,
+                    )
+                except Exception as exc:
+                    print(f"  eval episode failed (task={task}, ep={episode_index}): {exc}")
+                    continue
+                scores.append(float(metrics["score"]))
+                steps.append(int(metrics["steps"]))
+                tool_calls.append(int(metrics["tool_calls"]))
+            rows.append({
+                "task": task,
+                "score": round(sum(scores) / len(scores), 3) if scores else 0.50,
+                "steps": round(sum(steps) / len(steps), 2) if steps else 0.0,
+                "tool_calls": round(sum(tool_calls) / len(tool_calls), 2) if tool_calls else 0.0,
+                "eval_episodes": eval_episodes,
+                "successful_episodes": len(scores),
+                "seed_start": seed_start,
+                "world_split": world_split,
+                "world_family": world_family or "auto",
+            })
+    finally:
+        if original_seed is None:
+            os.environ.pop("ADAPTSHIELD_SEED", None)
+        else:
+            os.environ["ADAPTSHIELD_SEED"] = original_seed
+    return rows
+def _free_gpu(*objects: Any) -> None:
+    """Best-effort release of GPU memory between checkpoint evaluations."""
+    import gc
+    for obj in objects:
+        try:
+            del obj
+        except Exception:
+            pass
+    gc.collect()
+    try:
+        import torch
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+            torch.cuda.synchronize()
+    except Exception:
+        pass
+def _load_checkpoint_for_eval(
+    checkpoint_dir: Path,
+    base_model_name: str,
+    max_seq_length: int,
+) -> tuple[Any, Any]:
+    """Load an adapter checkpoint robustly, falling back to PEFT if needed."""
+    from unsloth import FastLanguageModel
+    is_adapter_only = (checkpoint_dir / "adapter_config.json").exists() and not (
+        checkpoint_dir / "config.json"
+    ).exists()
+    if not is_adapter_only:
+        try:
+            return FastLanguageModel.from_pretrained(
+                model_name=str(checkpoint_dir),
+                max_seq_length=max_seq_length,
+                load_in_4bit=True,
+                dtype=None,
+            )
+        except Exception as exc:
+            print(f"  direct load failed for {checkpoint_dir.name}: {exc}; "
+                  "falling back to base+adapter loader.")
+    model, tokenizer = FastLanguageModel.from_pretrained(
+        model_name=base_model_name,
+        max_seq_length=max_seq_length,
+        load_in_4bit=True,
+        dtype=None,
+    )
+    from peft import PeftModel
+    model = PeftModel.from_pretrained(
+        model,
+        str(checkpoint_dir),
+        is_trainable=False,
+        autocast_adapter_dtype=False,
+    )
+    try:
+        from transformers import AutoTokenizer
+        tokenizer = AutoTokenizer.from_pretrained(str(checkpoint_dir), trust_remote_code=True)
+    except Exception:
+        pass
+    return model, tokenizer
+def evaluate_saved_checkpoints(
+    output_dir: Path,
+    model_key: str,
+    max_seq_length: int,
+    selected_task: str,
+    eval_episodes: int,
+    max_steps: int,
+    use_tools: bool,
+    heldout_seed: int,
+    train_world_split: str,
+    heldout_world_split: str,
+) -> List[Dict[str, Any]]:
+    base_model_name = MODEL_CHOICES[model_key]
+    rows: List[Dict[str, Any]] = []
+    for index, checkpoint_dir in enumerate(checkpoint_dirs(output_dir)):
+        print(f"Held-out evaluating checkpoint: {checkpoint_dir.name}")
+        model = None
+        tokenizer = None
+        try:
+            model, tokenizer = _load_checkpoint_for_eval(
+                checkpoint_dir=checkpoint_dir,
+                base_model_name=base_model_name,
+                max_seq_length=max_seq_length,
+            )
+            _normalize_generation_config(model)
+            _align_trainable_dtypes(model)
+            in_distribution_rows = evaluate_suite_with_seed(
+                model=model,
+                tokenizer=tokenizer,
+                selected_task=selected_task,
+                eval_episodes=eval_episodes,
+                max_steps=max_steps,
+                use_tools=use_tools,
+                seed_start=heldout_seed + index * 1000,
+                world_split=train_world_split,
+            )
+            heldout_rows = evaluate_suite_with_seed(
+                model=model,
+                tokenizer=tokenizer,
+                selected_task=selected_task,
+                eval_episodes=eval_episodes,
+                max_steps=max_steps,
+                use_tools=use_tools,
+                seed_start=heldout_seed + index * 1000,
+                world_split=heldout_world_split,
+            )
+            in_distribution_score = round(
+                sum(float(row["score"]) for row in in_distribution_rows) / max(1, len(in_distribution_rows)),
+                3,
+            )
+            heldout_score = round(
+                sum(float(row["score"]) for row in heldout_rows) / max(1, len(heldout_rows)),
+                3,
+            )
+            rows.append({
+                "checkpoint": checkpoint_dir.name,
+                "in_distribution_score": in_distribution_score,
+                "heldout_score": heldout_score,
+                "in_distribution_rows": in_distribution_rows,
+                "heldout_rows": heldout_rows,
+            })
+        except Exception as exc:
+            print(f"  checkpoint eval failed for {checkpoint_dir.name}: {exc}")
+            rows.append({
+                "checkpoint": checkpoint_dir.name,
+                "in_distribution_score": 0.0,
+                "heldout_score": 0.0,
+                "error": str(exc),
+            })
+        finally:
+            _free_gpu(model, tokenizer)
+            model = None
+            tokenizer = None
+    return rows
+def train_sft(args: argparse.Namespace) -> None:
+    from unsloth import FastLanguageModel
+    from datasets import Dataset
+    from trl import SFTTrainer
+    import torch
+    random.seed(args.seed)
+    torch.manual_seed(args.seed)
+    dataset_path = Path(args.dataset)
+    rows = load_jsonl(dataset_path)
+    if args.max_rows and args.max_rows > 0:
+        rows = rows[: args.max_rows]
+    model_name = MODEL_CHOICES[args.model]
+    output_dir = Path(args.output)
+    output_dir.mkdir(parents=True, exist_ok=True)
+    print("AdaptShield SFT training")
+    print(f"Dataset: {dataset_path}")
+    print(f"Rows: {len(rows)}")
+    print(f"Model: {model_name}")
+    print(f"Epochs: {args.epochs}")
+    print(f"Batch size: {args.per_device_batch_size}")
+    print(f"Grad accumulation: {args.gradient_accumulation_steps}")
+    print(f"Learning rate: {args.lr}")
+    print(f"Output: {output_dir}")
+    print()
+    model, tokenizer = FastLanguageModel.from_pretrained(
+        model_name=model_name,
+        max_seq_length=args.max_seq_length,
+        load_in_4bit=True,
+        dtype=None,
+    )
+    model = FastLanguageModel.get_peft_model(
+        model,
+        r=LORA_RANK,
+        target_modules=[
+            "q_proj", "k_proj", "v_proj", "o_proj",
+            "gate_proj", "up_proj", "down_proj",
+        ],
+        lora_alpha=LORA_RANK * 2,
+        lora_dropout=0.0,
+        bias="none",
+        use_gradient_checkpointing="unsloth",
+        random_state=args.seed,
+    )
+    if getattr(tokenizer, "pad_token", None) is None:
+        tokenizer.pad_token = tokenizer.eos_token
+    _normalize_generation_config(model)
+    _align_trainable_dtypes(model)
+    prepared_rows = [{"text": render_example(row, tokenizer), **row} for row in rows]
+    dataset = Dataset.from_list(prepared_rows)
+    bf16_supported = bool(getattr(torch.cuda, "is_bf16_supported", lambda: False)())
+    try:
+        from trl import SFTConfig
+        train_config_cls = SFTConfig
+    except ImportError:
+        from transformers import TrainingArguments
+        train_config_cls = TrainingArguments
+    config_kwargs = {
+        "output_dir": str(output_dir),
+        "learning_rate": args.lr,
+        "per_device_train_batch_size": args.per_device_batch_size,
+        "gradient_accumulation_steps": args.gradient_accumulation_steps,
+        "num_train_epochs": args.epochs,
+        "logging_steps": 1,
+        "save_strategy": "steps",
+        "save_steps": args.save_steps,
+        "report_to": "none",
+        "seed": args.seed,
+        "bf16": bf16_supported,
+        "fp16": not bf16_supported,
+        "max_seq_length": args.max_seq_length,
+        "dataset_text_field": "text",
+        "dataset_num_proc": 1,
+        "packing": False,
+    }
+    train_args = train_config_cls(
+        **_filter_supported_kwargs(train_config_cls, config_kwargs)
+    )
+    trainer_kwargs = {
+        "model": model,
+        "train_dataset": dataset,
+        "args": train_args,
+        "processing_class": tokenizer,
+        "tokenizer": tokenizer,
+        "dataset_text_field": "text",
+        "dataset_num_proc": 1,
+        "max_seq_length": args.max_seq_length,
+        "packing": False,
+    }
+    trainer = SFTTrainer(**_filter_supported_kwargs(SFTTrainer, trainer_kwargs))
+    trainer.train()
+    final_dir = output_dir / "final"
+    model.save_pretrained(final_dir)
+    tokenizer.save_pretrained(final_dir)
+    log_history = list(getattr(getattr(trainer, "state", None), "log_history", []) or [])
+    loss_plot_path = output_dir / "loss_curve.png"
+    try:
+        build_loss_plot(log_history, loss_plot_path)
+    except Exception as exc:
+        print(f"Loss plot generation skipped: {exc}")
+    metrics: Dict[str, Any] = {
+        "trainer": "sft",
+        "model": model_name,
+        "dataset": str(dataset_path),
+        "rows": len(rows),
+        "epochs": args.epochs,
+        "learning_rate": args.lr,
+        "evaluation_rows": [],
+        "heldout_evaluation_rows": [],
+        "heldout_seed": args.heldout_seed,
+        "train_world_split": args.train_world_split,
+        "heldout_world_split": args.heldout_world_split,
+        "reward_curve_rows": [],
+        "log_history": log_history,
+    }
+    metrics_path = output_dir / "sft_metrics.json"
+    def _flush_metrics() -> None:
+        metrics_path.write_text(json.dumps(metrics, indent=2), encoding="utf-8")
+    _flush_metrics()
+    try:
+        metrics["evaluation_rows"] = evaluate_suite_with_seed(
+            model=model,
+            tokenizer=tokenizer,
+            selected_task=args.eval_task,
+            eval_episodes=args.eval_episodes,
+            max_steps=args.eval_max_steps,
+            use_tools=args.use_tools,
+            seed_start=args.heldout_seed,
+            world_split=args.train_world_split,
+        )
+    except Exception as exc:
+        print(f"In-distribution evaluation failed: {exc}")
+    _flush_metrics()
+    try:
+        metrics["heldout_evaluation_rows"] = evaluate_suite_with_seed(
+            model=model,
+            tokenizer=tokenizer,
+            selected_task=args.eval_task,
+            eval_episodes=args.eval_episodes,
+            max_steps=args.eval_max_steps,
+            use_tools=args.use_tools,
+            seed_start=args.heldout_seed,
+            world_split=args.heldout_world_split,
+        )
+    except Exception as exc:
+        print(f"Held-out evaluation failed: {exc}")
+    _flush_metrics()
+    reward_curve_rows: List[Dict[str, Any]] = []
+    if args.skip_reward_curve:
+        print("Skipping per-checkpoint reward curve (--skip-reward-curve).")
+    else:
+        # Free training-time model before reloading checkpoints to avoid OOM.
+        _free_gpu(model, trainer)
+        try:
+            reward_curve_rows = evaluate_saved_checkpoints(
+                output_dir=output_dir,
+                model_key=args.model,
+                max_seq_length=args.max_seq_length,
+                selected_task=args.eval_task,
+                eval_episodes=args.eval_episodes,
+                max_steps=args.eval_max_steps,
+                use_tools=args.use_tools,
+                heldout_seed=args.heldout_seed,
+                train_world_split=args.train_world_split,
+                heldout_world_split=args.heldout_world_split,
+            )
+        except Exception as exc:
+            print(f"Per-checkpoint reward curve failed: {exc}")
+    metrics["reward_curve_rows"] = reward_curve_rows
+    _flush_metrics()
+    reward_plot_path = output_dir / "reward_curve.png"
+    if reward_curve_rows:
+        try:
+            build_reward_plot(reward_curve_rows, reward_plot_path)
+        except Exception as exc:
+            print(f"Reward plot generation skipped: {exc}")
+    evaluation_rows = metrics["evaluation_rows"]
+    heldout_evaluation_rows = metrics["heldout_evaluation_rows"]
+    print("SFT complete.")
+    print(f"Saved adapter to: {final_dir}")
+    print(f"Loss curve: {loss_plot_path}")
+    print(f"Reward curve: {reward_plot_path}")
+    print(f"Metrics: {metrics_path}")
+    print("Post-train evaluation:")
+    for row in evaluation_rows:
+        print(
+            f"  task={row['task']:<20} score={row['score']:.3f} "
+            f"steps={row['steps']} tools={row['tool_calls']}"
+        )
+    print("Held-out checkpoint reward curve:")
+    for row in reward_curve_rows:
+        print(
+            f"  checkpoint={row['checkpoint']:<16} "
+            f"in_dist={row['in_distribution_score']:.3f} "
+            f"heldout={row['heldout_score']:.3f}"
+        )
+def main() -> None:
+    parser = argparse.ArgumentParser(description="AdaptShield supervised fine-tuning")
+    parser.add_argument(
+        "--dataset",
+        default="data/adaptshield_sft.jsonl",
+        help="Path to JSONL dataset from generate_sft_data.py",
+    )
+    parser.add_argument(
+        "--model",
+        default=DEFAULT_MODEL,
+        choices=list(MODEL_CHOICES.keys()),
+    )
+    parser.add_argument("--output", default="checkpoints/sft-run")
+    parser.add_argument("--epochs", type=float, default=1.0)
+    parser.add_argument("--lr", type=float, default=2e-4)
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--heldout-seed", type=int, default=314)
+    parser.add_argument("--train-world-split", default="train", choices=["train", "eval"])
+    parser.add_argument("--heldout-world-split", default="eval", choices=["train", "eval"])
+    parser.add_argument("--max-rows", type=int, default=0)
+    parser.add_argument("--max-seq-length", type=int, default=MAX_SEQ_LEN)
+    parser.add_argument("--per-device-batch-size", type=int, default=2)
+    parser.add_argument("--gradient-accumulation-steps", type=int, default=4)
+    parser.add_argument("--save-steps", type=int, default=40)
+    parser.add_argument(
+        "--eval-task",
+        default="all",
+        choices=["all", "direct-triage", "dual-pivot", "polymorphic-zero-day"],
+    )
+    parser.add_argument("--eval-episodes", type=int, default=2)
+    parser.add_argument("--eval-max-steps", type=int, default=20)
+    parser.add_argument(
+        "--use-tools",
+        action="store_true",
+        help="Use SOC tools during post-train evaluation.",
+    )
+    parser.add_argument(
+        "--skip-reward-curve",
+        action="store_true",
+        help="Skip the per-checkpoint reward curve sweep (faster, avoids OOM).",
+    )
+    args = parser.parse_args()
+    train_sft(args)
+if __name__ == "__main__":
+    main()

train_smoke.py ADDED Viewed

	@@ -0,0 +1,250 @@

+#!/usr/bin/env python3
+"""Dependency-free training-readiness smoke test for AdaptShield."""
+from __future__ import annotations
+import argparse
+import csv
+import random
+import sys
+from pathlib import Path
+from typing import Any, Dict, Iterable, List, Tuple
+REPO_ROOT = Path(__file__).resolve().parent
+if str(REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(REPO_ROOT))
+from models import AdaptShieldAction
+from server.adaptshield_environment import AdaptShieldEnvironment
+TASKS = ["direct-triage", "dual-pivot", "polymorphic-zero-day"]
+POLICY = {
+    "brute_force": ("auth_service", "rate_limit"),
+    "lateral_movement": ("payment_service", "isolate"),
+    "exfiltration": ("database", "honeypot"),
+    "supply_chain": ("api_gateway", "patch"),
+    "benign": ("api_gateway", "monitor"),
+}
+ACTION_SPACE = [
+    ("auth_service", "rate_limit"),
+    ("payment_service", "isolate"),
+    ("database", "honeypot"),
+    ("api_gateway", "patch"),
+    ("api_gateway", "monitor"),
+]
+def classify_from_metrics(network_nodes: Dict[str, Dict[str, Any]]) -> str:
+    auth = network_nodes.get("auth_service", {})
+    payment = network_nodes.get("payment_service", {})
+    database = network_nodes.get("database", {})
+    gateway = network_nodes.get("api_gateway", {})
+    if float(auth.get("error_rate", 0.0)) >= 0.10:
+        return "brute_force"
+    if payment.get("status") == "suspicious" or float(payment.get("cpu", 0)) >= 55:
+        return "lateral_movement"
+    if float(database.get("outbound_mb", 0)) >= 50:
+        return "exfiltration"
+    if gateway.get("status") == "suspicious":
+        return "supply_chain"
+    return "benign"
+class TabularDefensePolicy:
+    """Tiny epsilon-greedy policy used only to verify trainability."""
+    def __init__(self, epsilon: float, lr: float) -> None:
+        self.epsilon = epsilon
+        self.lr = lr
+        self.q: Dict[str, Dict[Tuple[str, str], float]] = {
+            threat: {action: 0.50 for action in ACTION_SPACE}
+            for threat in POLICY
+        }
+    def choose_phase1(self, obs: Any) -> Dict[str, Any]:
+        threat = classify_from_metrics(obs.network_nodes)
+        target, action = POLICY[threat]
+        return {
+            "threat_type": threat,
+            "confidence": 0.90,
+            "target_node": target,
+            "recommended_action": action,
+            "reasoning": "smoke-train metric policy",
+        }
+    def choose_phase2(self, obs: Any) -> Tuple[Dict[str, Any], str, Tuple[str, str]]:
+        assessment = obs.phase1_assessment or {}
+        threat = str(assessment.get("threat_type", "benign"))
+        choices = self.q.get(threat, self.q["benign"])
+        if random.random() < self.epsilon:
+            target, action = random.choice(ACTION_SPACE)
+        else:
+            best_value = max(choices.values())
+            best_actions = [
+                action for action, value in choices.items()
+                if value == best_value
+            ]
+            target, action = random.choice(best_actions)
+        return {
+            "action": action,
+            "target_node": target,
+            "reasoning": "epsilon-greedy smoke policy",
+        }, threat, (target, action)
+    def update(self, threat: str, selected: Tuple[str, str], reward: float) -> None:
+        choices = self.q.setdefault(
+            threat,
+            {action: 0.50 for action in ACTION_SPACE},
+        )
+        old_value = choices.get(selected, 0.50)
+        choices[selected] = old_value + self.lr * (reward - old_value)
+    def decay(self, rate: float, floor: float) -> None:
+        self.epsilon = max(floor, self.epsilon * rate)
+def run_episode(task: str, policy: TabularDefensePolicy, max_steps: int) -> Dict[str, Any]:
+    env = AdaptShieldEnvironment(task_name=task)
+    obs = env.reset()
+    rewards: List[float] = []
+    steps = 0
+    while not obs.done and steps < max_steps:
+        if obs.phase == 1:
+            payload = policy.choose_phase1(obs)
+            obs = env.step(AdaptShieldAction(**payload))
+        else:
+            payload, threat, selected = policy.choose_phase2(obs)
+            obs = env.step(AdaptShieldAction(**payload))
+            policy.update(threat, selected, float(obs.reward))
+        rewards.append(float(obs.reward))
+        steps += 1
+        metadata = obs.metadata if isinstance(obs.metadata, dict) else {}
+        if "normalized_score" not in metadata:
+            raise RuntimeError("normalized_score missing during smoke training")
+    metadata = obs.metadata if isinstance(obs.metadata, dict) else {}
+    return {
+        "task": task,
+        "score": float(metadata.get("normalized_score", 0.01)),
+        "reward_sum": sum(rewards),
+        "mean_reward": sum(rewards) / len(rewards) if rewards else 0.0,
+        "steps": steps,
+        "done": bool(obs.done),
+        "normalized_score_present": "normalized_score" in metadata,
+    }
+def write_rows(path: Path, rows: Iterable[Dict[str, Any]]) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    rows = list(rows)
+    if not rows:
+        return
+    with path.open("w", newline="") as handle:
+        writer = csv.DictWriter(handle, fieldnames=list(rows[0].keys()))
+        writer.writeheader()
+        writer.writerows(rows)
+def trend(values: List[float]) -> Tuple[float, float]:
+    if not values:
+        return 0.0, 0.0
+    window = max(1, len(values) // 5)
+    first = sum(values[:window]) / window
+    last = sum(values[-window:]) / window
+    return first, last
+def run_smoke_training(
+    tasks: List[str],
+    episodes: int,
+    output: Path,
+    seed: int,
+    epsilon: float,
+    epsilon_decay: float,
+    epsilon_floor: float,
+    lr: float,
+    max_steps: int,
+) -> List[Dict[str, Any]]:
+    random.seed(seed)
+    policy = TabularDefensePolicy(epsilon=epsilon, lr=lr)
+    rows: List[Dict[str, Any]] = []
+    print("AdaptShield smoke training")
+    print(f"Tasks: {', '.join(tasks)}")
+    print(f"Episodes: {episodes}")
+    print(f"Output: {output}")
+    print()
+    for episode in range(1, episodes + 1):
+        task = tasks[(episode - 1) % len(tasks)]
+        result = run_episode(task=task, policy=policy, max_steps=max_steps)
+        result.update({
+            "episode": episode,
+            "epsilon": round(policy.epsilon, 4),
+            "status": "PASS" if result["done"] and result["normalized_score_present"] else "FAIL",
+        })
+        rows.append(result)
+        policy.decay(epsilon_decay, epsilon_floor)
+        print(
+            f"episode={episode:03d} task={task:<20} "
+            f"score={result['score']:.3f} steps={result['steps']:02d} "
+            f"epsilon={result['epsilon']:.3f} {result['status']}"
+        )
+    write_rows(output, rows)
+    scores = [float(row["score"]) for row in rows]
+    first, last = trend(scores)
+    print()
+    print(f"First-window avg score: {first:.3f}")
+    print(f"Last-window avg score:  {last:.3f}")
+    print(f"Score delta:            {last - first:+.3f}")
+    print(f"Saved CSV:              {output}")
+    print("Smoke training verdict: PASS")
+    return rows
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Run cheap AdaptShield training smoke test.")
+    parser.add_argument("--task", default="direct-triage", choices=TASKS + ["all"])
+    parser.add_argument("--episodes", type=int, default=30)
+    parser.add_argument("--output", default="training_runs/train_smoke.csv")
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--epsilon", type=float, default=0.85)
+    parser.add_argument("--epsilon-decay", type=float, default=0.94)
+    parser.add_argument("--epsilon-floor", type=float, default=0.08)
+    parser.add_argument("--lr", type=float, default=0.35)
+    parser.add_argument("--max-steps", type=int, default=30)
+    return parser.parse_args()
+def main() -> int:
+    args = parse_args()
+    tasks = TASKS if args.task == "all" else [args.task]
+    run_smoke_training(
+        tasks=tasks,
+        episodes=args.episodes,
+        output=Path(args.output),
+        seed=args.seed,
+        epsilon=args.epsilon,
+        epsilon_decay=args.epsilon_decay,
+        epsilon_floor=args.epsilon_floor,
+        lr=args.lr,
+        max_steps=args.max_steps,
+    )
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff