Spaces:
Running
Running
File size: 14,364 Bytes
136ea72 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 db820a9 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 3824e84 cf2cf65 136ea72 cf2cf65 136ea72 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 | {
"cells": [
{
"cell_type": "markdown",
"id": "01751695",
"metadata": {},
"source": [
"# SENTINEL GRPO Training Notebook\n",
"\n",
"Free T4 = smoke run (50 episodes, ~30 min). Pro/L4 = real run (200 episodes, ~1.5-2.5 hr).\n",
"Run cells top-to-bottom. The \"go big\" cell at the bottom is optional and only changes `--episodes`.\n",
"\n",
"This notebook is the single driver that produces every artifact the rest of the repo already expects but does not have on disk:\n",
"\n",
"- `outputs/eval_pre.json`\n",
"- `training/sentinel_qwen15_grpo/` (LoRA adapter + `trainer_state.json`)\n",
"- `outputs/trained_policy_replay.jsonl` (UI replay table)\n",
"- `outputs/eval_post.json` (also copied to `outputs/evaluation_results.json` for the live dashboard)\n",
"- `outputs/reward_report_task3_seed42.json`\n",
"- `outputs/cluster_health_history.json`\n",
"- `outputs/charts/*.png` (12 charts via `training/plots.py`)\n",
"\n",
"It is idempotent: re-running any cell overwrites its outputs cleanly. If GRPO deps fail to install, every downstream cell still runs because the codepaths fall back to a heuristic policy / dependency-free PNGs."
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf35ae51",
"metadata": {},
"outputs": [],
"source": [
"# Cell 2 - Setup. GPU check, clone, install deps, set PYTHONPATH defensively.\n",
"!nvidia-smi || echo \"No GPU detected; CPU fallbacks will still produce artifacts.\"\n",
"\n",
"import os, sys, subprocess\n",
"\n",
"if not os.path.isdir(\"sentinel-env\"):\n",
" subprocess.check_call([\"git\", \"clone\", \"https://github.com/ADITYAGABA1322/sentinel-env\"])\n",
"if os.path.basename(os.getcwd()) != \"sentinel-env\":\n",
" os.chdir(\"sentinel-env\")\n",
"\n",
"subprocess.check_call([\"pip\", \"install\", \"-q\", \"-r\", \"requirements.txt\"])\n",
"\n",
"try:\n",
" subprocess.check_call([\"pip\", \"install\", \"-q\",\n",
" \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\",\n",
" ])\n",
" subprocess.check_call([\"pip\", \"install\", \"-q\", \"--no-deps\",\n",
" \"trl==0.24.0\", \"transformers==4.57.6\", \"datasets==4.3.0\", \"accelerate==1.13.0\", \"peft==0.19.1\", \"bitsandbytes==0.49.2\",\n",
" ])\n",
"except subprocess.CalledProcessError as exc:\n",
" print(f\"Training extras failed to install ({exc}); continuing with heuristic-fallback path.\")\n",
"\n",
"subprocess.check_call([\"pip\", \"install\", \"-q\", \"matplotlib\", \"seaborn\", \"pandas\", \"huggingface_hub\"])\n",
"\n",
"os.environ[\"PYTHONPATH\"] = os.getcwd()\n",
"if os.getcwd() not in sys.path:\n",
" sys.path.insert(0, os.getcwd())\n",
"\n",
"print(\"Working dir:\", os.getcwd())\n",
"print(\"PYTHONPATH set to:\", os.environ[\"PYTHONPATH\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b4b77b5",
"metadata": {},
"outputs": [],
"source": [
"# Cell 3 - Hugging Face auth. Optional. Needed only for credit-backed inference\n",
"# providers and for pushing the trained adapter back to the Hub in Cell 12.\n",
"# Skip this cell if you do not want to upload anything.\n",
"try:\n",
" from huggingface_hub import notebook_login\n",
" notebook_login()\n",
"except Exception as exc:\n",
" print(f\"HF login skipped: {exc}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "796bf539",
"metadata": {},
"outputs": [],
"source": [
"# Cell 4 - Pre-training baseline eval. Locks in the \"before\" numbers used by the\n",
"# delta charts and the ablation chart in training/plots.py.\n",
"!python training/evaluate.py --episodes 30 --task all \\\n",
" --policies random,heuristic,oracle_lite \\\n",
" --out outputs/eval_pre.json --no-plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc28625a",
"metadata": {},
"outputs": [],
"source": [
"# Cell 5 - Smoke GRPO (default tier; free T4).\n",
"# Writes training/sentinel_qwen15_grpo/ including trainer_state.json which\n",
"# the GRPO reward-curve chart reads. If training deps are missing this prints\n",
"# a friendly message and exits 0; downstream cells then fall back to heuristic\n",
"# policy via training/replay.py.\n",
"!python training/train.py \\\n",
" --episodes 50 --task all --seed 0 \\\n",
" --model unsloth/Qwen2.5-1.5B-Instruct \\\n",
" --epochs 1 --batch-size 2 --learning-rate 5e-6 \\\n",
" --lora-rank 16 --max-seq-length 1024 \\\n",
" --output-dir training/sentinel_qwen15_grpo"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "02c012d7",
"metadata": {},
"outputs": [],
"source": [
"# Cell 6 - Record trained-policy actions across 30 seeds x 3 tasks.\n",
"# Writes outputs/trained_policy_replay.jsonl which the UI fetches at\n",
"# /assets/trained_policy_replay.jsonl. If the LoRA adapter is missing this\n",
"# automatically writes heuristic actions tagged model_source=\"heuristic_fallback\";\n",
"# the replay still works end-to-end so the dashboard never 404s.\n",
"from training.replay import record_trained_actions\n",
"\n",
"out_path = record_trained_actions(\n",
" adapter_path=\"training/sentinel_qwen15_grpo\",\n",
" base_model=\"unsloth/Qwen2.5-1.5B-Instruct\",\n",
" tasks=[\"task1\", \"task2\", \"task3\"],\n",
" seeds=range(30),\n",
" out_path=\"outputs/trained_policy_replay.jsonl\",\n",
")\n",
"print(f\"Wrote {out_path}\")\n",
"!head -n 2 outputs/trained_policy_replay.jsonl\n",
"!wc -l outputs/trained_policy_replay.jsonl"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "142c7750",
"metadata": {},
"outputs": [],
"source": [
"# Cell 7 - Post-training eval with the 4th \"trained\" policy. This is the\n",
"# headline file the live dashboard reads at /assets/evaluation_results.json,\n",
"# so we copy eval_post.json into that canonical name.\n",
"import shutil\n",
"\n",
"!python training/evaluate.py --episodes 30 --task all \\\n",
" --policies random,heuristic,oracle_lite,trained \\\n",
" --replay outputs/trained_policy_replay.jsonl \\\n",
" --out outputs/eval_post.json --no-plot\n",
"\n",
"shutil.copy(\"outputs/eval_post.json\", \"outputs/evaluation_results.json\")\n",
"print(\"Copied outputs/eval_post.json -> outputs/evaluation_results.json (UI-canonical)\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a0661105",
"metadata": {},
"outputs": [],
"source": [
"# Cell 8 - Reward report dump for task3, seed=42.\n",
"# This is the input training/plots.py needs to draw trust_evolution.png,\n",
"# trust_gap_over_time.png, and reward_component_stacked_area.png.\n",
"import json, os, random, sys\n",
"\n",
"if os.getcwd() not in sys.path:\n",
" sys.path.insert(0, os.getcwd())\n",
"\n",
"from environment import SentinelEnv\n",
"from training.evaluate import heuristic_policy\n",
"\n",
"env = SentinelEnv()\n",
"result = env.reset(task_type=\"task3\", seed=42)\n",
"rng = random.Random(42)\n",
"while not result[\"done\"]:\n",
" result = env.step(heuristic_policy(env, result[\"observation\"], rng))\n",
"\n",
"raw_events = env.reward_report().get(\"events\", [])\n",
"events = []\n",
"for idx, event in enumerate(raw_events):\n",
" snap = event.get(\"trust_snapshot\", {}) or {}\n",
" action = event.get(\"action\", {}) or {}\n",
" sid = action.get(\"specialist_id\")\n",
" events.append({\n",
" \"step_count\": event.get(\"step_count\", idx + 1),\n",
" \"trust_snapshot\": snap,\n",
" \"signal_breakdown\": event.get(\"signal_breakdown\", {}),\n",
" \"specialist_id\": sid,\n",
" \"trust_after\": snap.get(sid) if sid else None,\n",
" })\n",
"\n",
"report = {\"task_type\": \"task3\", \"seed\": 42, \"events\": events}\n",
"os.makedirs(\"outputs\", exist_ok=True)\n",
"with open(\"outputs/reward_report_task3_seed42.json\", \"w\") as f:\n",
" json.dump(report, f, indent=2)\n",
"print(f\"Wrote outputs/reward_report_task3_seed42.json with {len(events)} events\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ee5da19",
"metadata": {},
"outputs": [],
"source": [
"# Cell 9 - Cluster health timeline dump.\n",
"# Runs ClusterTrustEnv twice (random/blind allocation vs trust-aware) so the\n",
"# cluster_health_timeline.png and cluster_health_policy_lines.png charts have\n",
"# real series instead of plots.py' synthetic fallback data.\n",
"import json, os, random, sys\n",
"from typing import List\n",
"\n",
"if os.getcwd() not in sys.path:\n",
" sys.path.insert(0, os.getcwd())\n",
"\n",
"from cluster_trust_env import ClusterTrustEnv\n",
"from scripts.cluster_trust_walkthrough import choose_action\n",
"\n",
"def run_cluster(policy_arg: str, steps: int = 80, seed: int = 42) -> List[float]:\n",
" env = ClusterTrustEnv()\n",
" res = env.reset(task_type=\"task3\", seed=seed)\n",
" rng = random.Random(seed)\n",
" series: List[float] = []\n",
" for _ in range(steps):\n",
" if res[\"done\"]:\n",
" break\n",
" action = choose_action(res[\"observation\"], policy_arg, rng)\n",
" res = env.step(action)\n",
" series.append(env.state()[\"cluster\"][\"cluster_health_score\"])\n",
" return series\n",
"\n",
"series = {\n",
" \"random\": run_cluster(\"blind\"),\n",
" \"heuristic\": run_cluster(\"trust\"),\n",
"}\n",
"\n",
"os.makedirs(\"outputs\", exist_ok=True)\n",
"with open(\"outputs/cluster_health_history.json\", \"w\") as f:\n",
" json.dump({\"task_type\": \"task3\", \"seed\": 42, \"series\": series}, f, indent=2)\n",
"print({k: len(v) for k, v in series.items()})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a17fe36f",
"metadata": {},
"outputs": [],
"source": [
"# Cell 10 - Render all 12 charts via training/plots.py.\n",
"# matplotlib path on Colab; falls back to dependency-free PNGs if needed.\n",
"!python -m training.plots \\\n",
" --pre outputs/eval_pre.json \\\n",
" --post outputs/eval_post.json \\\n",
" --trainer-state training/sentinel_qwen15_grpo/trainer_state.json \\\n",
" --reward-report-task3 outputs/reward_report_task3_seed42.json \\\n",
" --cluster-health outputs/cluster_health_history.json \\\n",
" --out-dir outputs/charts\n",
"\n",
"!ls outputs/charts"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3cadde73",
"metadata": {},
"outputs": [],
"source": [
"# Cell 11 - Inline preview of the headline charts.\n",
"from IPython.display import Image, display\n",
"\n",
"for name in [\n",
" \"baseline_grouped_bars.png\",\n",
" \"grpo_reward_curve.png\",\n",
" \"trust_evolution.png\",\n",
" \"detection_vs_poisoning.png\",\n",
" \"cluster_health_timeline.png\",\n",
" \"task_radar.png\",\n",
" \"ablation.png\",\n",
"]:\n",
" path = f\"outputs/charts/{name}\"\n",
" print(path)\n",
" display(Image(path))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ede6dde",
"metadata": {},
"outputs": [],
"source": [
"# Cell 12 - (optional) Push the LoRA adapter and outputs/ to a Hub repo.\n",
"# Requires Cell 3 to have authenticated. Change the repo id to your own namespace.\n",
"REPO_ID = \"XcodeAddy/sentinel-grpo-qwen15\"\n",
"\n",
"from huggingface_hub import HfApi\n",
"import os\n",
"\n",
"api = HfApi()\n",
"api.create_repo(REPO_ID, exist_ok=True)\n",
"\n",
"if os.path.isdir(\"training/sentinel_qwen15_grpo\"):\n",
" api.upload_folder(folder_path=\"training/sentinel_qwen15_grpo\", repo_id=REPO_ID)\n",
"else:\n",
" print(\"No adapter folder; skipping LoRA upload.\")\n",
"\n",
"api.upload_folder(\n",
" folder_path=\"outputs\",\n",
" repo_id=REPO_ID,\n",
" path_in_repo=\"outputs\",\n",
" allow_patterns=[\"*.json\", \"*.jsonl\", \"charts/*.png\"],\n",
")\n",
"print(f\"Uploaded artifacts to https://huggingface.co/{REPO_ID}\")"
]
},
{
"cell_type": "markdown",
"id": "f4f3522e",
"metadata": {},
"source": [
"---\n",
"\n",
"## GO BIG TIER - only run on Pro / L4 / A100\n",
"\n",
"The cell below replaces the smoke run from Cell 5 with a 200-episode GRPO training run. Free T4 is unlikely to finish it in a single Colab session, so prefer Pro/L4 here. After it completes, re-run cells 6, 7, 8, 9, 10, 11 (and optionally 12) in order to refresh every artifact and chart against the better adapter."
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "code",
"execution_count": null,
"id": "8c01f3cc",
"metadata": {},
"outputs": [],
"source": [
"# Cell 14 - Real-run GRPO. After this finishes, re-run cells 6 -> 11 (-> 12).\n",
"!python training/train.py \\\n",
" --episodes 200 --task all --seed 0 \\\n",
" --model unsloth/Qwen2.5-1.5B-Instruct \\\n",
" --epochs 1 --batch-size 2 --learning-rate 5e-6 \\\n",
" --lora-rank 16 --max-seq-length 1024 \\\n",
" --output-dir training/sentinel_qwen15_grpo"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv (3.13.7)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
|