Humanlearning commited on
Commit
e3d939d
·
verified ·
1 Parent(s): 31637b2

Update CyberSecurity_OWASP environment

Browse files
.agents/skills/openenv-cli/SKILL.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: openenv-cli
3
+ description: "OpenEnv CLI (`openenv`) for scaffolding, validating, building, and pushing OpenEnv environments."
4
+ ---
5
+
6
+ Install: `pip install openenv-core`
7
+
8
+ The OpenEnv CLI command `openenv` is available.
9
+ Use `openenv --help` to view available commands.
10
+
11
+ Generated with `openenv-core v0.2.3`. Run `openenv skills add --force` to regenerate.
12
+
13
+ ## Tips
14
+
15
+ - Start with `openenv init <env_name>` to scaffold a new environment
16
+ - Validate projects with `openenv validate`
17
+ - Build and deploy with `openenv build` and `openenv push`
18
+ - Use `openenv <command> --help` for command-specific options
.agents/skills/trackio/SKILL.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: hugging-face-trackio
3
+ description: Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API), firing alerts for training diagnostics, or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, alerts with webhooks, HF Space syncing, and JSON output for automation.
4
+ ---
5
+
6
+ # Trackio - Experiment Tracking for ML Training
7
+
8
+ Trackio is an experiment tracking library for logging and visualizing ML training metrics. It syncs to Hugging Face Spaces for real-time monitoring dashboards.
9
+
10
+ ## Three Interfaces
11
+
12
+ | Task | Interface | Reference |
13
+ |------|-----------|-----------|
14
+ | **Logging metrics** during training | Python API | [logging_metrics.md](logging_metrics.md) |
15
+ | **Firing alerts** for training diagnostics | Python API | [alerts.md](alerts.md) |
16
+ | **Retrieving metrics & alerts** after/during training | CLI | [retrieving_metrics.md](retrieving_metrics.md) |
17
+ | **Inspecting storage schema and running direct SQL** | CLI | [storage_schema.md](storage_schema.md) |
18
+
19
+ ## When to Use Each
20
+
21
+ ### Python API → Logging
22
+
23
+ Use `import trackio` in your training scripts to log metrics:
24
+
25
+ - Initialize tracking with `trackio.init()`
26
+ - Log metrics with `trackio.log()` or use TRL's `report_to="trackio"`
27
+ - Finalize with `trackio.finish()`
28
+
29
+ **Key concept**: For remote/cloud training, pass `space_id` — metrics sync to a Space dashboard so they persist after the instance terminates.
30
+
31
+ → See [logging_metrics.md](logging_metrics.md) for setup, TRL integration, and configuration options.
32
+
33
+ ### Python API → Alerts
34
+
35
+ Insert `trackio.alert()` calls in training code to flag important events — like inserting print statements for debugging, but structured and queryable:
36
+
37
+ - `trackio.alert(title="...", level=trackio.AlertLevel.WARN)` — fire an alert
38
+ - Three severity levels: `INFO`, `WARN`, `ERROR`
39
+ - Alerts are printed to terminal, stored in the database, shown in the dashboard, and optionally sent to webhooks (Slack/Discord)
40
+
41
+ **Key concept for LLM agents**: Alerts are the primary mechanism for autonomous experiment iteration. An agent should insert alerts into training code for diagnostic conditions (loss spikes, NaN gradients, low accuracy, training stalls). Since alerts are printed to the terminal, an agent that is watching the training script's output will see them automatically. For background or detached runs, the agent can poll via CLI instead.
42
+
43
+ → See [alerts.md](alerts.md) for the full alerts API, webhook setup, and autonomous agent workflows.
44
+
45
+ ### CLI → Retrieving
46
+
47
+ Use the `trackio` command to query logged metrics and alerts:
48
+
49
+ - `trackio list projects/runs/metrics` — discover what's available
50
+ - `trackio get project/run/metric` — retrieve summaries and values
51
+ - `trackio query project --project <name> --sql "SELECT ..."` — run catch-all read-only SQL
52
+ - `trackio list alerts --project <name> --json` — retrieve alerts
53
+ - `trackio show` — launch the dashboard
54
+ - `trackio sync` — sync to HF Space
55
+
56
+ **Key concept**: Add `--json` for programmatic output suitable for automation and LLM agents.
57
+
58
+ **Remote Spaces**: Add `--space <space_id_or_url>` to any `list`/`get`/`query` command to query a remote HF Space instead of local data. Use `--hf-token` for private Spaces.
59
+
60
+ → See [retrieving_metrics.md](retrieving_metrics.md) for all commands, workflows, and JSON output formats.
61
+ → See [storage_schema.md](storage_schema.md) for SQLite tables, parquet layout, and direct query examples.
62
+
63
+ ## Minimal Logging Setup
64
+
65
+ ```python
66
+ import trackio
67
+
68
+ trackio.init(project="my-project", space_id="username/trackio")
69
+ trackio.log({"loss": 0.1, "accuracy": 0.9})
70
+ trackio.log({"loss": 0.09, "accuracy": 0.91})
71
+ trackio.finish()
72
+ ```
73
+
74
+ ### Minimal Retrieval
75
+
76
+ ```bash
77
+ trackio list projects --json
78
+ trackio get metric --project my-project --run my-run --metric loss --json
79
+ trackio query project --project my-project --sql "SELECT name FROM sqlite_master WHERE type = 'table'" --json
80
+
81
+ # Query a remote Space
82
+ trackio list projects --space username/my-space --json
83
+ ```
84
+
85
+ ## Autonomous ML Experiment Workflow
86
+
87
+ When running experiments autonomously as an LLM agent, the recommended workflow is:
88
+
89
+ 1. **Set up training with alerts** — insert `trackio.alert()` calls for diagnostic conditions
90
+ 2. **Launch training** — run the script in the background
91
+ 3. **Poll for alerts** — use `trackio list alerts --project <name> --json --since <timestamp>` to check for new alerts
92
+ 4. **Read metrics** — use `trackio get metric ...` to inspect specific values
93
+ 5. **Iterate** — based on alerts and metrics, stop the run, adjust hyperparameters, and launch a new run
94
+
95
+ ```python
96
+ import trackio
97
+
98
+ trackio.init(project="my-project", config={"lr": 1e-4})
99
+
100
+ for step in range(num_steps):
101
+ loss = train_step()
102
+ trackio.log({"loss": loss, "step": step})
103
+
104
+ if step > 100 and loss > 5.0:
105
+ trackio.alert(
106
+ title="Loss divergence",
107
+ text=f"Loss {loss:.4f} still high after {step} steps",
108
+ level=trackio.AlertLevel.ERROR,
109
+ )
110
+ if step > 0 and abs(loss) < 1e-8:
111
+ trackio.alert(
112
+ title="Vanishing loss",
113
+ text="Loss near zero — possible gradient collapse",
114
+ level=trackio.AlertLevel.WARN,
115
+ )
116
+
117
+ trackio.finish()
118
+ ```
119
+
120
+ Then poll from a separate terminal/process:
121
+
122
+ ```bash
123
+ trackio list alerts --project my-project --json --since "2025-01-01T00:00:00"
124
+ ```
.agents/skills/trackio/alerts.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Trackio Alerts
2
+
3
+ Alerts let you flag important training events directly from code. They are the primary mechanism for LLM agents to diagnose runs and iterate autonomously on ML experiments.
4
+
5
+ Alerts are printed to the terminal, stored in the database, displayed in the dashboard, and optionally sent to webhooks (Slack/Discord).
6
+
7
+ <img width="2972" height="1694" alt="image" src="https://github.com/user-attachments/assets/02d938f8-51a9-4706-85c4-d95b7645bcf4" />
8
+
9
+
10
+ ## Core API
11
+
12
+ ### trackio.alert()
13
+
14
+ ```python
15
+ trackio.alert(
16
+ title="Loss divergence", # Short title (required)
17
+ text="Loss 5.2 still high after 200 steps", # Detailed description (optional)
18
+ level=trackio.AlertLevel.WARN, # INFO, WARN, or ERROR (default: WARN)
19
+ webhook_url="https://hooks.slack.com/...", # Per-alert webhook override (optional)
20
+ )
21
+ ```
22
+
23
+ ### Alert Levels
24
+
25
+ | Level | Usage |
26
+ |-------|-------|
27
+ | `trackio.AlertLevel.INFO` | Informational milestones (checkpoints saved, eval completed) |
28
+ | `trackio.AlertLevel.WARN` | Potential issues (loss plateau, low accuracy, high gradient norm) |
29
+ | `trackio.AlertLevel.ERROR` | Critical failures (NaN loss, divergence, OOM) |
30
+
31
+ ### Webhook Support
32
+
33
+ Set a global webhook URL via `trackio.init()` or the `TRACKIO_WEBHOOK_URL` environment variable. Alerts are auto-formatted for Slack and Discord URLs.
34
+
35
+ ```python
36
+ trackio.init(
37
+ project="my-project",
38
+ webhook_url="https://hooks.slack.com/services/...",
39
+ webhook_min_level=trackio.AlertLevel.WARN, # Only send WARN+ to webhook
40
+ )
41
+ ```
42
+
43
+ Per-alert override:
44
+
45
+ ```python
46
+ trackio.alert(
47
+ title="Critical failure",
48
+ level=trackio.AlertLevel.ERROR,
49
+ webhook_url="https://hooks.slack.com/services/...", # Overrides global URL
50
+ )
51
+ ```
52
+
53
+ Environment variables:
54
+ - `TRACKIO_WEBHOOK_URL` — global webhook URL
55
+ - `TRACKIO_WEBHOOK_MIN_LEVEL` — minimum level for webhook delivery (`info`, `warn`, `error`)
56
+
57
+ ## Retrieving Alerts (CLI)
58
+
59
+ ```bash
60
+ # List all alerts for a project
61
+ trackio list alerts --project my-project --json
62
+
63
+ # Filter by run or level
64
+ trackio list alerts --project my-project --run my-run --level error --json
65
+
66
+ # Poll for new alerts since a timestamp (efficient for agents)
67
+ trackio list alerts --project my-project --json --since "2025-06-01T12:00:00"
68
+ ```
69
+
70
+ ### JSON Output Structure
71
+
72
+ ```json
73
+ {
74
+ "project": "my-project",
75
+ "run": null,
76
+ "level": null,
77
+ "since": "2025-06-01T12:00:00",
78
+ "alerts": [
79
+ {
80
+ "run": "run-name",
81
+ "title": "Loss divergence",
82
+ "text": "Loss 5.2 still high after 200 steps",
83
+ "level": "warn",
84
+ "step": 200,
85
+ "timestamp": "2025-06-01T12:05:30"
86
+ }
87
+ ]
88
+ }
89
+ ```
90
+
91
+ ## Autonomous Agent Workflow
92
+
93
+ The recommended pattern for an LLM agent running ML experiments:
94
+
95
+ ### 1. Insert Alerts Into Training Code
96
+
97
+ Add diagnostic `trackio.alert()` calls for conditions the agent should react to:
98
+
99
+ ```python
100
+ import trackio
101
+
102
+ trackio.init(project="hyperparam-sweep", config={"lr": lr, "batch_size": bs})
103
+
104
+ for step in range(num_steps):
105
+ loss = train_step()
106
+ trackio.log({"loss": loss, "step": step})
107
+
108
+ if step > 200 and loss > 5.0:
109
+ trackio.alert(
110
+ title="Loss divergence",
111
+ text=f"Loss {loss:.4f} still above 5.0 after {step} steps — learning rate may be too high",
112
+ level=trackio.AlertLevel.ERROR,
113
+ )
114
+
115
+ if step > 500 and loss_delta < 0.001:
116
+ trackio.alert(
117
+ title="Training stall",
118
+ text=f"Loss barely changed over last 100 steps (delta={loss_delta:.6f})",
119
+ level=trackio.AlertLevel.WARN,
120
+ )
121
+
122
+ if math.isnan(loss):
123
+ trackio.alert(
124
+ title="NaN loss",
125
+ text="Loss became NaN — training is broken",
126
+ level=trackio.AlertLevel.ERROR,
127
+ )
128
+ break
129
+
130
+ trackio.finish()
131
+ ```
132
+
133
+ ### 2. Monitor Alerts
134
+
135
+ Alerts are automatically printed to the terminal when fired. If the agent is watching the training script's output (e.g. running in the foreground or tailing logs), it will see alerts immediately — no polling needed.
136
+
137
+ For background or detached runs, poll for alerts via CLI:
138
+
139
+ ```bash
140
+ # Poll for alerts (run periodically)
141
+ trackio list alerts --project hyperparam-sweep --json --since "2025-06-01T00:00:00"
142
+ ```
143
+
144
+ ### 3. Inspect Metrics Around the Alert
145
+
146
+ When an alert fires, use `trackio get snapshot` to see all metrics at that point:
147
+
148
+ ```bash
149
+ # Alert fired at step 200 — get all metrics in a ±5 step window
150
+ trackio get snapshot --project hyperparam-sweep --run run-1 --around 200 --window 5 --json
151
+
152
+ # Or inspect a single metric around the alert's timestamp
153
+ trackio get metric --project hyperparam-sweep --run run-1 --metric loss --around 200 --window 10 --json
154
+ ```
155
+
156
+ ### 4. React and Iterate
157
+
158
+ Based on alerts:
159
+ - **ERROR alerts** → stop the run, adjust hyperparameters, relaunch
160
+ - **WARN alerts** → inspect metrics with `trackio get snapshot ...`, decide whether to intervene
161
+ - **INFO alerts** → note progress, continue monitoring
162
+
163
+ ### 5. Compare Across Runs
164
+
165
+ ```bash
166
+ # Check metrics from previous runs
167
+ trackio get run --project hyperparam-sweep --run run-1 --json
168
+ trackio get metric --project hyperparam-sweep --run run-1 --metric loss --json
169
+
170
+ # Launch new run with adjusted config
171
+ python train.py --lr 5e-5
172
+ ```
173
+
174
+ ## Using Alerts with Transformers / TRL
175
+
176
+ When using `report_to="trackio"`, you don't control the training loop directly. Use a `TrainerCallback` to fire alerts:
177
+
178
+ ```python
179
+ from transformers import TrainerCallback
180
+
181
+ class AlertCallback(TrainerCallback):
182
+ def on_log(self, args, state, control, logs=None, **kwargs):
183
+ if "trackio" not in args.report_to:
184
+ return
185
+ if logs and "loss" in logs:
186
+ if logs["loss"] > 5.0 and state.global_step > 100:
187
+ trackio.alert(
188
+ title="High loss",
189
+ text=f"Loss {logs['loss']:.4f} at step {state.global_step}",
190
+ level=trackio.AlertLevel.ERROR,
191
+ )
192
+
193
+ trainer = SFTTrainer(
194
+ model=model,
195
+ args=SFTConfig(output_dir="./out", report_to="trackio"),
196
+ callbacks=[AlertCallback()],
197
+ ...
198
+ )
199
+ ```
.agents/skills/trackio/logging_metrics.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Logging Metrics with Trackio
2
+
3
+ **Trackio** is a lightweight, free experiment tracking library from Hugging Face. It provides a wandb-compatible API for logging metrics with local-first design.
4
+
5
+ - **GitHub**: [gradio-app/trackio](https://github.com/gradio-app/trackio)
6
+ - **Docs**: [huggingface.co/docs/trackio](https://huggingface.co/docs/trackio/index)
7
+
8
+ ## Installation
9
+
10
+ ```bash
11
+ pip install trackio
12
+ # or
13
+ uv pip install trackio
14
+ ```
15
+
16
+ ## Core API
17
+
18
+ ### Basic Usage
19
+
20
+ ```python
21
+ import trackio
22
+
23
+ # Initialize a run
24
+ trackio.init(
25
+ project="my-project",
26
+ config={"learning_rate": 0.001, "epochs": 10}
27
+ )
28
+
29
+ # Log metrics during training
30
+ for epoch in range(10):
31
+ loss = train_epoch()
32
+ trackio.log({"loss": loss, "epoch": epoch})
33
+
34
+ # Finalize the run
35
+ trackio.finish()
36
+ ```
37
+
38
+ ### Key Functions
39
+
40
+ | Function | Purpose |
41
+ |----------|---------|
42
+ | `trackio.init(...)` | Start a new tracking run |
43
+ | `trackio.log(dict)` | Log metrics (called repeatedly during training) |
44
+ | `trackio.finish()` | Finalize run and ensure all metrics are saved |
45
+ | `trackio.show()` | Launch the local dashboard |
46
+ | `trackio.sync(...)` | Sync local project to HF Space |
47
+
48
+ ## trackio.init() Parameters
49
+
50
+ ```python
51
+ trackio.init(
52
+ project="my-project", # Project name (groups runs together)
53
+ name="run-name", # Optional: name for this specific run
54
+ config={...}, # Hyperparameters and config to log
55
+ space_id="username/trackio", # Optional: sync to HF Space for remote dashboard
56
+ group="experiment-group", # Optional: group related runs
57
+ )
58
+ ```
59
+
60
+ ## Local vs Remote Dashboard
61
+
62
+ ### Local (Default)
63
+
64
+ By default, trackio stores metrics in a local SQLite database and runs the dashboard locally:
65
+
66
+ ```python
67
+ trackio.init(project="my-project")
68
+ # ... training ...
69
+ trackio.finish()
70
+
71
+ # Launch local dashboard
72
+ trackio.show()
73
+ ```
74
+
75
+ Or from terminal:
76
+ ```bash
77
+ trackio show --project my-project
78
+ ```
79
+
80
+ ### Remote (HF Space)
81
+
82
+ Pass `space_id` to sync metrics to a Hugging Face Space for persistent, shareable dashboards:
83
+
84
+ ```python
85
+ trackio.init(
86
+ project="my-project",
87
+ space_id="username/trackio" # Auto-creates Space if it doesn't exist
88
+ )
89
+ ```
90
+
91
+ ⚠️ **For remote training** (cloud GPUs, HF Jobs, etc.): Always use `space_id` since local storage is lost when the instance terminates.
92
+
93
+ ### Sync Local to Remote
94
+
95
+ Sync existing local projects to a Space:
96
+
97
+ ```python
98
+ trackio.sync(project="my-project", space_id="username/my-experiments")
99
+ ```
100
+
101
+ ## wandb Compatibility
102
+
103
+ Trackio is API-compatible with wandb. Drop-in replacement:
104
+
105
+ ```python
106
+ import trackio as wandb
107
+
108
+ wandb.init(project="my-project")
109
+ wandb.log({"loss": 0.5})
110
+ wandb.finish()
111
+ ```
112
+
113
+ ## TRL Integration
114
+
115
+ When using TRL trainers, set `report_to="trackio"` for automatic metric logging:
116
+
117
+ ```python
118
+ from trl import SFTConfig, SFTTrainer
119
+ import trackio
120
+
121
+ trackio.init(
122
+ project="sft-training",
123
+ space_id="username/trackio",
124
+ config={"model": "Qwen/Qwen2.5-0.5B", "dataset": "trl-lib/Capybara"}
125
+ )
126
+
127
+ config = SFTConfig(
128
+ output_dir="./output",
129
+ report_to="trackio", # Automatic metric logging
130
+ # ... other config
131
+ )
132
+
133
+ trainer = SFTTrainer(model=model, args=config, ...)
134
+ trainer.train()
135
+ trackio.finish()
136
+ ```
137
+
138
+ ## What Gets Logged
139
+
140
+ With TRL/Transformers integration, trackio automatically captures:
141
+ - Training loss
142
+ - Learning rate
143
+ - Eval metrics
144
+ - Training throughput
145
+
146
+ For manual logging, log any numeric metrics:
147
+
148
+ ```python
149
+ trackio.log({
150
+ "train_loss": 0.5,
151
+ "train_accuracy": 0.85,
152
+ "val_loss": 0.4,
153
+ "val_accuracy": 0.88,
154
+ "epoch": 1
155
+ })
156
+ ```
157
+
158
+ ## Grouping Runs
159
+
160
+ Use `group` to organize related experiments in the dashboard sidebar:
161
+
162
+ ```python
163
+ # Group by experiment type
164
+ trackio.init(project="my-project", name="baseline-v1", group="baseline")
165
+ trackio.init(project="my-project", name="augmented-v1", group="augmented")
166
+
167
+ # Group by hyperparameter
168
+ trackio.init(project="hyperparam-sweep", name="lr-0.001", group="lr_0.001")
169
+ trackio.init(project="hyperparam-sweep", name="lr-0.01", group="lr_0.01")
170
+ ```
171
+
172
+ ## Configuration Best Practices
173
+
174
+ Keep config minimal — only log what's useful for comparing runs:
175
+
176
+ ```python
177
+ trackio.init(
178
+ project="qwen-sft-capybara",
179
+ name="baseline-lr2e5",
180
+ config={
181
+ "model": "Qwen/Qwen2.5-0.5B",
182
+ "dataset": "trl-lib/Capybara",
183
+ "learning_rate": 2e-5,
184
+ "num_epochs": 3,
185
+ "batch_size": 8,
186
+ }
187
+ )
188
+ ```
189
+
190
+ ## Embedding Dashboards
191
+
192
+ Embed Space dashboards in websites with query parameters:
193
+
194
+ ```html
195
+ <iframe
196
+ src="https://username-trackio.hf.space/?project=my-project&metrics=train_loss,val_loss&sidebar=hidden"
197
+ style="width:1600px; height:500px; border:0;">
198
+ </iframe>
199
+ ```
200
+
201
+ Query parameters:
202
+ - `project`: Filter to specific project
203
+ - `metrics`: Comma-separated metric names to show
204
+ - `sidebar`: `hidden` or `collapsed`
205
+ - `smoothing`: 0-20 (smoothing slider value)
206
+ - `xmin`, `xmax`: X-axis limits
.agents/skills/trackio/retrieving_metrics.md ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Retrieving Metrics with Trackio CLI
2
+
3
+ The `trackio` CLI provides direct terminal access to query Trackio experiment tracking data without needing to start the MCP server. Commands work against local data by default, or against a remote HF Space when `--space` is provided.
4
+
5
+ ## Quick Command Reference
6
+
7
+ | Task | Command |
8
+ |------|---------|
9
+ | List projects | `trackio list projects` |
10
+ | List runs | `trackio list runs --project <name>` |
11
+ | List metrics | `trackio list metrics --project <name> --run <name>` |
12
+ | List system metrics | `trackio list system-metrics --project <name> --run <name>` |
13
+ | List alerts | `trackio list alerts --project <name> [--run <name>] [--level <level>] [--since <timestamp>]` |
14
+ | Get project summary | `trackio get project --project <name>` |
15
+ | Get run summary | `trackio get run --project <name> --run <name>` |
16
+ | Get metric values | `trackio get metric --project <name> --run <name> --metric <name>` |
17
+ | Get metric at step | `trackio get metric ... --metric <name> --step <N>` |
18
+ | Get metric around step | `trackio get metric ... --metric <name> --around <N> --window <W>` |
19
+ | Get all metrics snapshot | `trackio get snapshot --project <name> --run <name> --step <N>` |
20
+ | Get system metrics | `trackio get system-metric --project <name> --run <name>` |
21
+ | Run direct SQL | `trackio query project --project <name> --sql "SELECT ..."` |
22
+ | Query remote Space | `trackio list projects --space <space_id_or_url>` |
23
+ | Show dashboard | `trackio show [--project <name>]` |
24
+ | Sync to Space | `trackio sync --project <name> --space-id <space_id>` |
25
+
26
+ ## Core Commands
27
+
28
+ ### List Commands
29
+
30
+ ```bash
31
+ trackio list projects # List all projects
32
+ trackio list projects --json # JSON output
33
+
34
+ trackio list runs --project <name> # List runs in project
35
+ trackio list runs --project <name> --json # JSON output
36
+
37
+ trackio list metrics --project <name> --run <name> # List metrics for run
38
+ trackio list metrics --project <name> --run <name> --json
39
+
40
+ trackio list system-metrics --project <name> --run <name> # List system metrics
41
+ trackio list system-metrics --project <name> --run <name> --json
42
+
43
+ trackio list alerts --project <name> # List alerts
44
+ trackio list alerts --project <name> --run <name> --json # Filter by run
45
+ trackio list alerts --project <name> --level error --json # Filter by level
46
+ trackio list alerts --project <name> --json --since <ts> # Poll since timestamp
47
+ ```
48
+
49
+ ### Get Commands
50
+
51
+ ```bash
52
+ trackio get project --project <name> # Project summary
53
+ trackio get project --project <name> --json # JSON output
54
+
55
+ trackio get run --project <name> --run <name> # Run summary
56
+ trackio get run --project <name> --run <name> --json
57
+
58
+ trackio get metric --project <name> --run <name> --metric <name> # Metric values
59
+ trackio get metric --project <name> --run <name> --metric <name> --json
60
+ trackio get metric ... --metric <name> --step 200 # At exact step
61
+ trackio get metric ... --metric <name> --around 200 --window 10 # ±10 steps
62
+ trackio get metric ... --metric <name> --at-time <ts> --window 60 # ±60 seconds
63
+
64
+ trackio get snapshot --project <name> --run <name> --step 200 --json # All metrics at step
65
+ trackio get snapshot --project <name> --run <name> --around 200 --window 5 --json # Window
66
+ trackio get snapshot --project <name> --run <name> --at-time <ts> --window 60 --json
67
+
68
+ trackio get system-metric --project <name> --run <name> # All system metrics
69
+ trackio get system-metric --project <name> --run <name> --metric <name> # Specific metric
70
+ trackio get system-metric --project <name> --run <name> --json
71
+ ```
72
+
73
+ ### Query Command
74
+
75
+ ```bash
76
+ trackio query project --project <name> --sql "SELECT name FROM sqlite_master WHERE type = 'table'"
77
+ trackio query project --project <name> --sql "PRAGMA table_info(metrics)" --json
78
+ trackio query project --project <name> --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name"
79
+ ```
80
+
81
+ ### Remote Space Queries
82
+
83
+ All `list`, `get`, and `query` commands support querying a remote HF Space with `--space`:
84
+
85
+ ```bash
86
+ trackio list projects --space user/my-space # Space ID
87
+ trackio list projects --space https://user-my-space.hf.space # Space URL
88
+ trackio get metric --project <name> --run <name> --metric loss --space user/my-space
89
+ trackio query project --project <name> --sql "SELECT COUNT(*) AS num_alerts FROM alerts" --space user/my-space
90
+ trackio list projects --space user/private-space --hf-token hf_xxx # Private Space
91
+ ```
92
+
93
+ ### Dashboard Commands
94
+
95
+ ```bash
96
+ trackio show # Launch dashboard
97
+ trackio show --project <name> # Load specific project
98
+ trackio show --theme <theme> # Custom theme
99
+ trackio show --mcp-server # Enable MCP server
100
+ trackio show --color-palette "#FF0000,#00FF00" # Custom colors
101
+ ```
102
+
103
+ ### Sync Commands
104
+
105
+ ```bash
106
+ trackio sync --project <name> --space-id <space_id> # Sync to HF Space
107
+ trackio sync --project <name> --space-id <space_id> --private # Private space
108
+ trackio sync --project <name> --space-id <space_id> --force # Overwrite
109
+ ```
110
+
111
+ ## Output Formats
112
+
113
+ All `list`, `get`, and `query` commands support two output formats:
114
+
115
+ - **Human-readable** (default): Formatted text for terminal viewing
116
+ - **JSON** (with `--json` flag): Structured JSON for programmatic use
117
+
118
+ ## Common Patterns
119
+
120
+ ### Discover Projects and Runs
121
+
122
+ ```bash
123
+ # List all available projects
124
+ trackio list projects
125
+
126
+ # List runs in a project
127
+ trackio list runs --project my-project
128
+
129
+ # Get project overview
130
+ trackio get project --project my-project --json
131
+ ```
132
+
133
+ ### Inspect Run Details
134
+
135
+ ```bash
136
+ # Get run summary with all metrics
137
+ trackio get run --project my-project --run my-run --json
138
+
139
+ # List available metrics
140
+ trackio list metrics --project my-project --run my-run
141
+
142
+ # Get specific metric values
143
+ trackio get metric --project my-project --run my-run --metric loss --json
144
+ ```
145
+
146
+ ### Query System Metrics
147
+
148
+ ```bash
149
+ # List system metrics (GPU, etc.)
150
+ trackio list system-metrics --project my-project --run my-run
151
+
152
+ # Get all system metric data
153
+ trackio get system-metric --project my-project --run my-run --json
154
+
155
+ # Get specific system metric
156
+ trackio get system-metric --project my-project --run my-run --metric gpu_utilization --json
157
+ ```
158
+
159
+ ### Automation Scripts
160
+
161
+ ```bash
162
+ # Extract latest metric value
163
+ LATEST_LOSS=$(trackio get metric --project my-project --run my-run --metric loss --json | jq -r '.values[-1].value')
164
+
165
+ # Export run summary to file
166
+ trackio get run --project my-project --run my-run --json > run_summary.json
167
+
168
+ # Filter runs with jq
169
+ trackio list runs --project my-project --json | jq '.runs[] | select(startswith("train"))'
170
+
171
+ # Run a direct SQL aggregate
172
+ trackio query project --project my-project --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name" --json
173
+ ```
174
+
175
+ ### LLM Agent Workflow
176
+
177
+ ```bash
178
+ # 1. Discover available projects
179
+ trackio list projects --json
180
+
181
+ # 2. Explore project structure
182
+ trackio get project --project my-project --json
183
+
184
+ # 3. Inspect specific run
185
+ trackio get run --project my-project --run my-run --json
186
+
187
+ # 4. Query metric values
188
+ trackio get metric --project my-project --run my-run --metric accuracy --json
189
+
190
+ # 5. Poll for alerts (use --since for efficient incremental polling)
191
+ trackio list alerts --project my-project --json --since "2025-06-01T00:00:00"
192
+
193
+ # 6. When an alert fires at step N, get all metrics around that point
194
+ trackio get snapshot --project my-project --run my-run --around 200 --window 5 --json
195
+
196
+ # 7. Fall back to direct SQL for one-off inspection
197
+ trackio query project --project my-project --sql "SELECT timestamp, run_name, level, title FROM alerts ORDER BY timestamp DESC LIMIT 20" --json
198
+ ```
199
+
200
+ ## Error Handling
201
+
202
+ Commands validate inputs and return clear errors:
203
+
204
+ - Missing project: `Error: Project '<name>' not found.`
205
+ - Missing run: `Error: Run '<name>' not found in project '<project>'.`
206
+ - Missing metric: `Error: Metric '<name>' not found in run '<run>' of project '<project>'.`
207
+
208
+ All errors exit with non-zero status code and write to stderr.
209
+
210
+ ## Key Options
211
+
212
+ - `--project`: Project name (required for most commands)
213
+ - `--run`: Run name (required for run-specific commands)
214
+ - `--metric`: Metric name (required for metric-specific commands)
215
+ - `--sql`: Read-only SQL query (for `trackio query`)
216
+ - `--json`: Output in JSON format instead of human-readable
217
+ - `--space`: HF Space ID (e.g. `user/space`) or Space URL to query remotely (for `list`/`get`/`query` commands)
218
+ - `--hf-token`: HF token for accessing private Spaces (for `list`/`get`/`query` commands with `--space`)
219
+ - `--step`: Exact step filter (for `get metric`, `get snapshot`)
220
+ - `--around`: Center step for window filter (for `get metric`, `get snapshot`)
221
+ - `--at-time`: Center ISO timestamp for window filter (for `get metric`, `get snapshot`)
222
+ - `--window`: Window size: ±steps for `--around`, ±seconds for `--at-time` (default: 10)
223
+ - `--level`: Alert level filter (`info`, `warn`, `error`) (for `list alerts`)
224
+ - `--since`: ISO timestamp to filter alerts after (for `list alerts`)
225
+ - `--theme`: Dashboard theme (for `show` command)
226
+ - `--mcp-server`: Enable MCP server mode (for `show` command)
227
+ - `--color-palette`: Comma-separated hex colors (for `show` command)
228
+ - `--private`: Create private Space (for `sync` command)
229
+ - `--force`: Overwrite existing database (for `sync` command)
230
+
231
+ ## JSON Output Structure
232
+
233
+ ### List Projects
234
+ ```json
235
+ {"projects": ["project1", "project2"]}
236
+ ```
237
+
238
+ ### List Runs
239
+ ```json
240
+ {"project": "my-project", "runs": ["run1", "run2"]}
241
+ ```
242
+
243
+ ### Project Summary
244
+ ```json
245
+ {
246
+ "project": "my-project",
247
+ "num_runs": 3,
248
+ "runs": ["run1", "run2", "run3"],
249
+ "last_activity": 100
250
+ }
251
+ ```
252
+
253
+ ### Run Summary
254
+ ```json
255
+ {
256
+ "project": "my-project",
257
+ "run": "my-run",
258
+ "num_logs": 50,
259
+ "metrics": ["loss", "accuracy"],
260
+ "config": {"learning_rate": 0.001},
261
+ "last_step": 49
262
+ }
263
+ ```
264
+
265
+ ### Metric Values
266
+ ```json
267
+ {
268
+ "project": "my-project",
269
+ "run": "my-run",
270
+ "metric": "loss",
271
+ "values": [
272
+ {"step": 0, "timestamp": "2024-01-01T00:00:00", "value": 0.5},
273
+ {"step": 1, "timestamp": "2024-01-01T00:01:00", "value": 0.4}
274
+ ]
275
+ }
276
+ ```
277
+
278
+ ### Query Result
279
+ ```json
280
+ {
281
+ "project": "my-project",
282
+ "query": "SELECT name FROM sqlite_master WHERE type = 'table' ORDER BY name",
283
+ "columns": ["name"],
284
+ "rows": [
285
+ {"name": "alerts"},
286
+ {"name": "configs"},
287
+ {"name": "metrics"}
288
+ ],
289
+ "row_count": 3
290
+ }
291
+ ```
292
+
293
+ ## References
294
+
295
+ - **Complete CLI documentation**: See [docs/source/cli_commands.md](docs/source/cli_commands.md)
296
+ - **Storage schema and direct SQL**: See [storage_schema.md](storage_schema.md)
297
+ - **API and MCP Server**: See [docs/source/api_mcp_server.md](docs/source/api_mcp_server.md)
298
+
.agents/skills/trackio/storage_schema.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Trackio Storage Schema and Direct SQL
2
+
3
+ Use this reference when you need to inspect Trackio data directly instead of going through higher-level `trackio list` or `trackio get` commands.
4
+
5
+ ## Where Data Is Stored
6
+
7
+ - Local project databases live in `TRACKIO_DIR`, which defaults to `~/.cache/huggingface/trackio`.
8
+ - Each project is stored in its own SQLite file: `{project}.db`.
9
+ - Media files live under `TRACKIO_DIR/media/`.
10
+ - Parquet files are derived exports written from SQLite for syncing and static Spaces.
11
+
12
+ ## SQLite Tables
13
+
14
+ Trackio defines its live schema in `trackio/sqlite_storage.py` inside `SQLiteStorage.init_db()`.
15
+
16
+ ### `metrics`
17
+
18
+ - `id`: integer primary key
19
+ - `timestamp`: ISO timestamp
20
+ - `run_name`: run identifier
21
+ - `step`: integer step
22
+ - `metrics`: JSON text payload
23
+ - `log_id`: optional deduplication key
24
+ - `space_id`: optional pending-sync marker
25
+
26
+ Indexes:
27
+
28
+ - `(run_name, step)`
29
+ - `(run_name, timestamp)`
30
+ - unique partial index on `log_id`
31
+ - partial index on `space_id`
32
+
33
+ ### `configs`
34
+
35
+ - `id`: integer primary key
36
+ - `run_name`: run identifier
37
+ - `config`: JSON text payload
38
+ - `created_at`: ISO timestamp
39
+
40
+ Constraints:
41
+
42
+ - unique `run_name`
43
+ - index on `run_name`
44
+
45
+ ### `system_metrics`
46
+
47
+ - `id`: integer primary key
48
+ - `timestamp`: ISO timestamp
49
+ - `run_name`: run identifier
50
+ - `metrics`: JSON text payload
51
+ - `log_id`: optional deduplication key
52
+ - `space_id`: optional pending-sync marker
53
+
54
+ Indexes:
55
+
56
+ - `(run_name, timestamp)`
57
+ - unique partial index on `log_id`
58
+ - partial index on `space_id`
59
+
60
+ ### `project_metadata`
61
+
62
+ - `key`: primary key
63
+ - `value`: metadata value
64
+
65
+ ### `pending_uploads`
66
+
67
+ - `id`
68
+ - `space_id`
69
+ - `run_name`
70
+ - `step`
71
+ - `file_path`
72
+ - `relative_path`
73
+ - `created_at`
74
+
75
+ ### `alerts`
76
+
77
+ - `id`
78
+ - `timestamp`
79
+ - `run_name`
80
+ - `title`
81
+ - `text`
82
+ - `level`
83
+ - `step`
84
+ - `alert_id`
85
+
86
+ Indexes:
87
+
88
+ - `run_name`
89
+ - `timestamp`
90
+ - unique partial index on `alert_id`
91
+
92
+ ## Parquet Layout
93
+
94
+ Trackio flattens JSON blobs when exporting parquet:
95
+
96
+ - `{project}.parquet` comes from `metrics`
97
+ - `{project}_system.parquet` comes from `system_metrics`
98
+ - `{project}_configs.parquet` comes from `configs`
99
+
100
+ Static export layout:
101
+
102
+ - `metrics.parquet`
103
+ - `aux/system_metrics.parquet`
104
+ - `aux/configs.parquet`
105
+ - `runs.json`
106
+ - `settings.json`
107
+
108
+ The flattened parquet files keep structural columns such as `timestamp`, `run_name`, and `step`, then add one column per JSON key found in the source payload.
109
+
110
+ ## Direct SQL With The CLI
111
+
112
+ Use `trackio query` for read-only SQL:
113
+
114
+ ```bash
115
+ trackio query project --project my-project --sql "SELECT name FROM sqlite_master WHERE type = 'table' ORDER BY name" --json
116
+ trackio query project --project my-project --sql "PRAGMA table_info(metrics)"
117
+ trackio query project --project my-project --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name ORDER BY last_step DESC"
118
+ ```
119
+
120
+ Remote query works too:
121
+
122
+ ```bash
123
+ trackio query project --project my-project --sql "SELECT COUNT(*) AS num_alerts FROM alerts" --space username/my-space --json
124
+ ```
125
+
126
+ `trackio query` accepts read-only `SELECT`, `WITH`, and safe schema `PRAGMA` queries.
127
+
128
+ ## Common Query Patterns
129
+
130
+ Recent alerts:
131
+
132
+ ```bash
133
+ trackio query project --project my-project --sql "SELECT timestamp, run_name, level, title, step FROM alerts ORDER BY timestamp DESC LIMIT 20"
134
+ ```
135
+
136
+ Latest step per run:
137
+
138
+ ```bash
139
+ trackio query project --project my-project --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name ORDER BY last_step DESC"
140
+ ```
141
+
142
+ Recent configs:
143
+
144
+ ```bash
145
+ trackio query project --project my-project --sql "SELECT run_name, created_at, config FROM configs ORDER BY created_at DESC"
146
+ ```
147
+
148
+ Schema inspection:
149
+
150
+ ```bash
151
+ trackio query project --project my-project --sql "PRAGMA index_list(metrics)"
152
+ ```
153
+
154
+ ## Agent Guidance
155
+
156
+ - Start with `trackio list projects --json` if you do not know the project name yet.
157
+ - Use `trackio get` for common summaries and metric retrieval.
158
+ - Fall back to `trackio query` when you need one-off aggregates, joins, or schema introspection.
159
+ - Prefer `--json` when another agent or script needs to consume the result.
.gitignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ .env.local
2
+ .env.*
3
+ __pycache__/
4
+ *.pyc
5
+ .pytest_cache/
6
+ outputs/
7
+ codex_tmp_*/
8
+
9
+ *.egg*
01_ARCHITECTURE.md CHANGED
@@ -14,6 +14,12 @@ The environment is intentionally not a two-agent red-team/blue-team setup. The a
14
 
15
  ## 2. Final architecture diagram
16
 
 
 
 
 
 
 
17
  ```mermaid
18
  flowchart TB
19
  %% =========================
@@ -363,6 +369,12 @@ Run before/after evaluation on the same held-out suite.
363
 
364
  ## 8. Training flow
365
 
 
 
 
 
 
 
366
  ```text
367
  1. Build CyberSecurity_OWASP OpenEnv server.
368
  2. Generate 600 MVP scenarios.
@@ -476,4 +488,3 @@ Expected endpoints:
476
  | OpenEnv deployment docs | Informs HF Spaces deployment, endpoints, Docker workflow, and installable client package. | 8.5/10 |
477
  | Hackathon judging criteria | Informs demo priorities: innovation, storytelling, reward improvement, and training pipeline. | 9/10 |
478
  | TRL/OpenEnv training example | Informs rollout function, decomposed reward functions, and Trackio logging pattern. | 8/10 |
479
-
 
14
 
15
  ## 2. Final architecture diagram
16
 
17
+ Rendered asset:
18
+
19
+ ![CyberSecurity_OWASP architecture](assets/architecture_diagram.svg)
20
+
21
+ Editable source: `assets/architecture_diagram.mmd`
22
+
23
  ```mermaid
24
  flowchart TB
25
  %% =========================
 
369
 
370
  ## 8. Training flow
371
 
372
+ Rendered asset:
373
+
374
+ ![CyberSecurity_OWASP RL training flow](assets/env_rl_training_flow_diagram.svg)
375
+
376
+ Editable source: `assets/env_rl_training_flow_diagram.mmd`
377
+
378
  ```text
379
  1. Build CyberSecurity_OWASP OpenEnv server.
380
  2. Generate 600 MVP scenarios.
 
488
  | OpenEnv deployment docs | Informs HF Spaces deployment, endpoints, Docker workflow, and installable client package. | 8.5/10 |
489
  | Hackathon judging criteria | Informs demo priorities: innovation, storytelling, reward improvement, and training pipeline. | 9/10 |
490
  | TRL/OpenEnv training example | Informs rollout function, decomposed reward functions, and Trackio logging pattern. | 8/10 |
 
Dockerfile CHANGED
@@ -21,6 +21,7 @@ WORKDIR /app/env
21
  COPY --from=builder /app/env /app/env
22
  ENV PATH="/app/env/.venv/bin:$PATH"
23
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
 
24
 
25
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
26
  CMD curl -f http://localhost:8000/health || exit 1
 
21
  COPY --from=builder /app/env /app/env
22
  ENV PATH="/app/env/.venv/bin:$PATH"
23
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
24
+ ENV ENABLE_WEB_INTERFACE=true
25
 
26
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
27
  CMD curl -f http://localhost:8000/health || exit 1
README.md CHANGED
@@ -23,6 +23,14 @@ inspect generated app + policy -> discover authorization bug -> submit finding -
23
 
24
  The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward.
25
 
 
 
 
 
 
 
 
 
26
  ## Quick Start
27
 
28
  ```bash
@@ -125,6 +133,24 @@ Training files are under `training/`:
125
 
126
  The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief.
127
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128
  ## Modal Ephemeral Runs
129
 
130
  Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged.
@@ -141,7 +167,7 @@ Run a temporary Modal app for a cheap environment/training smoke check:
141
  uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4
142
  ```
143
 
144
- The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/`.
145
 
146
  You can also validate the GRPO config construction remotely:
147
 
@@ -179,6 +205,20 @@ uv run --extra modal modal run scripts/modal_train_grpo.py \
179
  --difficulty 0
180
  ```
181
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
  Defaults are derived from `HF_TOKEN`:
183
 
184
  - Trackio Space: `<hf-user>/CyberSecurity_OWASP-trackio`
 
23
 
24
  The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward.
25
 
26
+ ## Diagrams
27
+
28
+ ![CyberSecurity_OWASP architecture](assets/architecture_diagram.svg)
29
+
30
+ ![CyberSecurity_OWASP RL training flow](assets/env_rl_training_flow_diagram.svg)
31
+
32
+ Editable Mermaid sources are available in `assets/architecture_diagram.mmd` and `assets/env_rl_training_flow_diagram.mmd`.
33
+
34
  ## Quick Start
35
 
36
  ```bash
 
133
 
134
  The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief.
135
 
136
+ ## Trackio Run Tracking
137
+
138
+ Trackio is the default tracker for official runs. Set `TRACKIO_SPACE_ID` to log to a hosted Hugging Face Trackio Space; otherwise Trackio records locally.
139
+
140
+ ```bash
141
+ export TRACKIO_SPACE_ID=<hf-user>/CyberSecurity_OWASP-trackio
142
+ export TRACKIO_PROJECT=CyberSecurity_OWASP-grpo
143
+ ```
144
+
145
+ Use the tracked smoke wrapper instead of invoking pytest directly when producing run artifacts:
146
+
147
+ ```bash
148
+ bash scripts/smoke_test.sh
149
+ uv run python scripts/track_pytest.py tests
150
+ ```
151
+
152
+ Evaluation summaries saved through `training.eval_before_after.save_eval_summary(...)`, Modal smoke runs, and GRPO training configs all initialize Trackio runs with CyberSecurity_OWASP run names.
153
+
154
  ## Modal Ephemeral Runs
155
 
156
  Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged.
 
167
  uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4
168
  ```
169
 
170
+ The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/` and the summary metrics are logged to Trackio.
171
 
172
  You can also validate the GRPO config construction remotely:
173
 
 
205
  --difficulty 0
206
  ```
207
 
208
+ If running from a public repository and you do not want Modal to package the
209
+ local workspace, use public source mode:
210
+
211
+ ```bash
212
+ uv run --extra modal modal run scripts/modal_train_grpo.py \
213
+ --source-mode public \
214
+ --repo-url https://github.com/humandotlearning/CyberSecurity_OWASP.git \
215
+ --repo-branch master \
216
+ --max-steps 10 \
217
+ --dataset-size 16 \
218
+ --num-generations 2 \
219
+ --difficulty 0
220
+ ```
221
+
222
  Defaults are derived from `HF_TOKEN`:
223
 
224
  - Trackio Space: `<hf-user>/CyberSecurity_OWASP-trackio`
assets/architecture_diagram.mmd ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ flowchart LR
2
+ subgraph Factory["Scenario Factory"]
3
+ Policy["Policy graph\nusers, roles, tenants, ownership"]
4
+ Templates["FastAPI template renderer\nroutes, services, auth helpers"]
5
+ Mutator["A01 bug mutator\none injected authorization defect"]
6
+ Fixtures["Fixture generator\nvisible tests + hidden facts"]
7
+ Compiler["Scenario compiler\nseeded workspace"]
8
+ Policy --> Compiler
9
+ Templates --> Compiler
10
+ Mutator --> Compiler
11
+ Fixtures --> Compiler
12
+ end
13
+
14
+ subgraph Runtime["CyberSecurity_OWASP OpenEnv Runtime"]
15
+ Reset["reset(seed)\ncompile fresh scenario"]
16
+ Env["Environment state\nphase, history, metrics, hidden facts"]
17
+ Tools["Typed step(action) tools\ninspect, read, request, patch, test, submit"]
18
+ Sandbox["Generated local app workspace\neditable app files only"]
19
+ Verifier["Deterministic verifier\nsecurity + regression + public routes"]
20
+ Reward["Reward engine\nstable component breakdown"]
21
+ App["FastAPI OpenEnv server\n/ws, /reset, /step, /state"]
22
+ Reset --> Env
23
+ Env --> Tools
24
+ Tools <--> Sandbox
25
+ Tools --> Verifier
26
+ Verifier --> Reward
27
+ Reward --> Env
28
+ Env --> App
29
+ end
30
+
31
+ subgraph Agent["Single LLM Agent"]
32
+ Obs["Observation parser"]
33
+ Reason["Policy and code reasoning"]
34
+ Act["One JSON action"]
35
+ Obs --> Reason --> Act
36
+ end
37
+
38
+ subgraph Ops["Training, Evaluation, Demo"]
39
+ Rollout["Rollout loop\nreset -> step* -> terminal reward"]
40
+ GRPO["TRL GRPO / LoRA training"]
41
+ Trackio["Trackio metrics\nreward and pass rates"]
42
+ Eval["Held-out evaluation\nunseen seeds/layouts/domains"]
43
+ Artifacts["Rollout artifacts\nbefore/after traces"]
44
+ Rollout --> GRPO --> Trackio --> Eval --> Artifacts
45
+ end
46
+
47
+ Compiler --> Reset
48
+ App --> Obs
49
+ Act --> App
50
+ Reward --> Rollout
51
+ GRPO --> Agent
assets/architecture_diagram.svg ADDED
assets/env_rl_training_flow_diagram.mmd ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ flowchart TD
2
+ Start["Start run\nselect base model + config"] --> Cache["Prepare scenario splits\ntrain, validation, hidden_eval"]
3
+ Cache --> Baseline["Baseline evaluation\nscripted/model rollouts"]
4
+ Baseline --> TrainLoop["GRPO training loop"]
5
+
6
+ subgraph Episode["One OpenEnv Episode"]
7
+ Reset["env.reset(seed)\nnew generated app + policy"] --> Observe["Observation\nphase, hints, available tools"]
8
+ Observe --> Prompt["Build action prompt\nJSON action only"]
9
+ Prompt --> Generate["LLM generates action"]
10
+ Generate --> Step["env.step(action)\nphase gate + execute tool"]
11
+ Step --> Intermediate{"done?"}
12
+ Intermediate -- "no" --> Observe
13
+ Intermediate -- "yes" --> Final["Terminal verifier\nhidden security + regression + anti-cheat"]
14
+ end
15
+
16
+ TrainLoop --> Reset
17
+ Final --> Rewards["Reward components\ndiscovery, security, regression, public_routes,\npatch_quality, visible_tests, safety, anti_cheat"]
18
+ Rewards --> Update["GRPO update\nLoRA adapter checkpoint"]
19
+ Update --> Metrics["Trackio logging\nreward means, pass rates, invalid actions, latency"]
20
+ Metrics --> Validate{"Validation plateau\nor failure cluster?"}
21
+ Validate -- "continue" --> TrainLoop
22
+ Validate -- "adjust curriculum" --> Curriculum["Curriculum controller\nrebalance difficulty and traps"]
23
+ Curriculum --> TrainLoop
24
+ Validate -- "final checkpoint" --> Heldout["Held-out eval\nunseen seeds/layouts/domain combos"]
25
+ Heldout --> Compare["Before/after summary\nsuccess, reward, exploit-block, regression preservation"]
26
+ Compare --> Artifacts["Saved artifacts\noutputs/evals + outputs/rollouts"]
assets/env_rl_training_flow_diagram.svg ADDED
models.py CHANGED
@@ -56,8 +56,12 @@ class CyberSecurityOWASPState(State):
56
  seed: int = 0
57
  split: CyberSecurityOWASPSplit = "train"
58
  difficulty: int = 0
 
59
  domain: str = ""
60
  bug_family: str = ""
 
 
 
61
  phase: CyberSecurityOWASPPhase = "discover"
62
  max_steps: int = 40
63
  done: bool = False
@@ -71,6 +75,11 @@ class CyberSecurityOWASPState(State):
71
  reward_history: list[dict[str, float]] = Field(default_factory=list)
72
  visible_facts: dict[str, Any] = Field(default_factory=dict)
73
  hidden_facts: dict[str, Any] = Field(default_factory=dict)
 
 
 
 
 
74
  metrics: dict[str, Any] = Field(default_factory=dict)
75
  anti_cheat_flags: list[str] = Field(default_factory=list)
76
 
 
56
  seed: int = 0
57
  split: CyberSecurityOWASPSplit = "train"
58
  difficulty: int = 0
59
+ difficulty_tier: str = "warmup"
60
  domain: str = ""
61
  bug_family: str = ""
62
+ scenario_family: str = ""
63
+ template_id: str = "fastapi_basic"
64
+ target_weakness: str = "same_role_cross_object"
65
  phase: CyberSecurityOWASPPhase = "discover"
66
  max_steps: int = 40
67
  done: bool = False
 
75
  reward_history: list[dict[str, float]] = Field(default_factory=list)
76
  visible_facts: dict[str, Any] = Field(default_factory=dict)
77
  hidden_facts: dict[str, Any] = Field(default_factory=dict)
78
+ curriculum_snapshot: dict[str, Any] = Field(default_factory=dict)
79
+ verification_summary: dict[str, Any] = Field(default_factory=dict)
80
+ patch_diff: str = ""
81
+ episode_artifact_path: str | None = None
82
+ observation_history: list[dict[str, Any]] = Field(default_factory=list)
83
  metrics: dict[str, Any] = Field(default_factory=dict)
84
  anti_cheat_flags: list[str] = Field(default_factory=list)
85
 
pyproject.toml CHANGED
@@ -18,6 +18,7 @@ dependencies = [
18
  # install from github
19
  # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
20
  "openenv-core[core]>=0.2.2",
 
21
  # Environment-specific dependencies
22
  # Add all dependencies needed for your environment here
23
  # Examples:
 
18
  # install from github
19
  # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
20
  "openenv-core[core]>=0.2.2",
21
+ "trackio>=0.22.0",
22
  # Environment-specific dependencies
23
  # Add all dependencies needed for your environment here
24
  # Examples:
scenario_compiler.py CHANGED
@@ -1,51 +1,26 @@
1
- """Deterministic scenario compiler for CyberSecurity_OWASP."""
2
 
3
  from __future__ import annotations
4
 
5
- import tempfile
6
- from pathlib import Path
7
  from typing import Any
8
 
9
  try:
10
- from .fixture_generator import visible_workspace_summary
11
- from .policy_graph import build_invoice_policy
12
- from .template_renderer import render_fastapi_basic
13
  except ImportError: # pragma: no cover
14
- from fixture_generator import visible_workspace_summary
15
- from policy_graph import build_invoice_policy
16
- from template_renderer import render_fastapi_basic
17
 
18
 
19
- def compile_scenario(seed: int, split: str = "train", difficulty: int = 0) -> dict[str, Any]:
20
- """Compile one isolated MVP authorization-repair scenario."""
 
 
 
 
 
21
 
22
- compiled = build_invoice_policy(seed)
23
- workspace = Path(tempfile.mkdtemp(prefix=f"cybersecurity_owasp_{split}_{seed}_"))
24
- editable_files = render_fastapi_basic(workspace, compiled.public_hint, compiled.hidden_facts)
25
- task_id = f"{split}-invoices-bola-{seed}"
26
- hidden = dict(compiled.hidden_facts)
27
- hidden.update(
28
- {
29
- "workspace": str(workspace),
30
- "editable_files": editable_files,
31
- "initial_file_hashes": {
32
- path: (workspace / path).read_text(encoding="utf-8")
33
- for path in editable_files
34
- },
35
- }
36
  )
37
- return {
38
- "task_id": task_id,
39
- "workspace": workspace,
40
- "domain": "invoices",
41
- "bug_family": "bola_idor",
42
- "difficulty": difficulty,
43
- "task_brief": (
44
- "Inspect the generated invoices app and policy. Find the broken "
45
- "authorization behavior, submit a finding with local evidence, patch "
46
- "the app, preserve intended owner/admin/public behavior, then submit."
47
- ),
48
- "public_hint": compiled.public_hint,
49
- "workspace_summary": visible_workspace_summary(editable_files, compiled.public_hint),
50
- "hidden_facts": hidden,
51
- }
 
1
+ """Compatibility facade for deterministic scenario compilation."""
2
 
3
  from __future__ import annotations
4
 
 
 
5
  from typing import Any
6
 
7
  try:
8
+ from .server.scenario_factory import ScenarioFactory
 
 
9
  except ImportError: # pragma: no cover
10
+ from server.scenario_factory import ScenarioFactory
 
 
11
 
12
 
13
+ def compile_scenario(
14
+ seed: int,
15
+ split: str = "train",
16
+ difficulty: int = 0,
17
+ curriculum_profile: dict[str, Any] | None = None,
18
+ ) -> dict[str, Any]:
19
+ """Compile one isolated authorization-repair scenario."""
20
 
21
+ return ScenarioFactory().compile_scenario(
22
+ seed,
23
+ split=split,
24
+ difficulty=difficulty,
25
+ curriculum_profile=curriculum_profile,
 
 
 
 
 
 
 
 
 
26
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/modal_ephemeral_train.py CHANGED
@@ -62,12 +62,18 @@ class NoopTrainer:
62
 
63
 
64
  @app.function(image=image, timeout=60 * 30)
65
- def run_ephemeral_smoke(episodes: int = 4, seed_start: int = 0) -> dict[str, Any]:
 
 
 
 
 
66
  from CyberSecurity_OWASP.models import CyberSecurityOWASPAction
67
  from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import (
68
  CybersecurityOwaspEnvironment,
69
  )
70
  from training.rollout import rollout_once
 
71
 
72
  baseline = []
73
  oracle = []
@@ -128,8 +134,9 @@ def run_ephemeral_smoke(episodes: int = 4, seed_start: int = 0) -> dict[str, Any
128
  def mean(items: list[dict[str, Any]], key: str) -> float:
129
  return sum(float(item.get(key, 0.0)) for item in items) / max(1, len(items))
130
 
131
- return {
132
- "run_name": f"{APP_NAME}-{datetime.utcnow().strftime('%Y%m%d-%H%M%S')}",
 
133
  "mode": "smoke",
134
  "episodes": episodes,
135
  "seed_start": seed_start,
@@ -139,6 +146,28 @@ def run_ephemeral_smoke(episodes: int = 4, seed_start: int = 0) -> dict[str, Any
139
  "baseline": baseline,
140
  "oracle": oracle,
141
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
 
143
 
144
  @app.function(image=image, timeout=60 * 10)
@@ -149,9 +178,20 @@ def run_grpo_config_check() -> str:
149
 
150
 
151
  @app.local_entrypoint()
152
- def main(mode: str = "smoke", episodes: int = 4, seed_start: int = 0) -> None:
 
 
 
 
 
 
153
  if mode == "smoke":
154
- result = run_ephemeral_smoke.remote(episodes=episodes, seed_start=seed_start)
 
 
 
 
 
155
  output_dir = PROJECT_ROOT / "outputs" / "rollouts"
156
  output_dir.mkdir(parents=True, exist_ok=True)
157
  output_path = output_dir / f"{result['run_name']}.json"
 
62
 
63
 
64
  @app.function(image=image, timeout=60 * 30)
65
+ def run_ephemeral_smoke(
66
+ episodes: int = 4,
67
+ seed_start: int = 0,
68
+ trackio_space_id: str = "",
69
+ trackio_project: str = "CyberSecurity_OWASP-smoke",
70
+ ) -> dict[str, Any]:
71
  from CyberSecurity_OWASP.models import CyberSecurityOWASPAction
72
  from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import (
73
  CybersecurityOwaspEnvironment,
74
  )
75
  from training.rollout import rollout_once
76
+ from training.trackio_utils import log_trackio_metrics, trackio_run
77
 
78
  baseline = []
79
  oracle = []
 
134
  def mean(items: list[dict[str, Any]], key: str) -> float:
135
  return sum(float(item.get(key, 0.0)) for item in items) / max(1, len(items))
136
 
137
+ run_name = f"{APP_NAME}-{datetime.utcnow().strftime('%Y%m%d-%H%M%S')}"
138
+ result = {
139
+ "run_name": run_name,
140
  "mode": "smoke",
141
  "episodes": episodes,
142
  "seed_start": seed_start,
 
146
  "baseline": baseline,
147
  "oracle": oracle,
148
  }
149
+ with trackio_run(
150
+ run_name=run_name,
151
+ run_type="modal_ephemeral_smoke",
152
+ project=trackio_project,
153
+ space_id=trackio_space_id,
154
+ config={
155
+ "episodes": episodes,
156
+ "seed_start": seed_start,
157
+ "mode": "smoke",
158
+ },
159
+ group="smoke",
160
+ ):
161
+ log_trackio_metrics(
162
+ {
163
+ "smoke/baseline_mean_reward": result["baseline_mean_reward"],
164
+ "smoke/oracle_mean_reward": result["oracle_mean_reward"],
165
+ "smoke/oracle_success_rate": result["oracle_success_rate"],
166
+ "smoke/episodes": episodes,
167
+ },
168
+ step=0,
169
+ )
170
+ return result
171
 
172
 
173
  @app.function(image=image, timeout=60 * 10)
 
178
 
179
 
180
  @app.local_entrypoint()
181
+ def main(
182
+ mode: str = "smoke",
183
+ episodes: int = 4,
184
+ seed_start: int = 0,
185
+ trackio_space_id: str = "",
186
+ trackio_project: str = "CyberSecurity_OWASP-smoke",
187
+ ) -> None:
188
  if mode == "smoke":
189
+ result = run_ephemeral_smoke.remote(
190
+ episodes=episodes,
191
+ seed_start=seed_start,
192
+ trackio_space_id=trackio_space_id,
193
+ trackio_project=trackio_project,
194
+ )
195
  output_dir = PROJECT_ROOT / "outputs" / "rollouts"
196
  output_dir.mkdir(parents=True, exist_ok=True)
197
  output_path = output_dir / f"{result['run_name']}.json"
scripts/modal_run_ephemeral.sh CHANGED
@@ -1,3 +1,8 @@
1
  #!/usr/bin/env bash
2
  set -euo pipefail
3
- modal run scripts/modal_ephemeral_train.py --mode "${MODE:-smoke}" --episodes "${EPISODES:-4}" --seed-start "${SEED_START:-0}"
 
 
 
 
 
 
1
  #!/usr/bin/env bash
2
  set -euo pipefail
3
+ modal run scripts/modal_ephemeral_train.py \
4
+ --mode "${MODE:-smoke}" \
5
+ --episodes "${EPISODES:-4}" \
6
+ --seed-start "${SEED_START:-0}" \
7
+ --trackio-space-id "${TRACKIO_SPACE_ID:-}" \
8
+ --trackio-project "${TRACKIO_PROJECT:-CyberSecurity_OWASP-smoke}"
scripts/modal_train_grpo.py CHANGED
@@ -19,6 +19,7 @@ from __future__ import annotations
19
  import os
20
  import pathlib
21
  import subprocess
 
22
  from datetime import datetime, timezone
23
  from typing import Any
24
 
@@ -31,10 +32,62 @@ SECRET_NAME = "CyberSecurity_OWASP-secrets"
31
  RUNS_DIR = pathlib.Path("/runs")
32
  REMOTE_PROJECT = "/root/CyberSecurity_OWASP"
33
  PROJECT_ROOT = pathlib.Path(__file__).resolve().parents[1]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
 
36
  def _training_image() -> modal.Image:
37
- return (
38
  modal.Image.from_registry(
39
  "nvidia/cuda:12.8.0-devel-ubuntu22.04",
40
  add_python="3.11",
@@ -49,21 +102,33 @@ def _training_image() -> modal.Image:
49
  "datasets",
50
  "huggingface_hub",
51
  "peft",
 
52
  "tokenizers",
53
  "nvidia-ml-py",
54
  "trackio>=0.25.0",
55
  "transformers>=5.5.0",
56
  "trl>=0.28.0",
57
  "openenv-core[core]>=0.2.3",
58
- "pydantic==2.10.6",
59
  )
60
  .uv_pip_install(
61
  "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo",
62
  "unsloth[base] @ git+https://github.com/unslothai/unsloth",
63
  )
 
64
  .uv_pip_install("mergekit", "immutables==0.21", extra_options="--no-deps")
 
65
  .uv_pip_install("trl>=0.28.0", "transformers>=5.5.0", "jmespath")
66
- .add_local_dir(
 
 
 
 
 
 
 
 
 
 
67
  PROJECT_ROOT,
68
  remote_path=REMOTE_PROJECT,
69
  copy=True,
@@ -76,22 +141,23 @@ def _training_image() -> modal.Image:
76
  "*.pyc",
77
  ],
78
  )
79
- .run_commands(
80
  f"python -m pip install -e {REMOTE_PROJECT}",
81
- "python -c \"import os, torch; import transformers.utils.hub as hub; "
82
- "hub.TRANSFORMERS_CACHE = getattr(hub, 'TRANSFORMERS_CACHE', "
83
- "os.path.join(os.path.expanduser('~'), '.cache', 'huggingface', 'hub')); "
84
- "from trl import GRPOConfig, GRPOTrainer; "
85
- "from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import "
86
- "CybersecurityOwaspEnvironment; print('trainer import ok', torch.__version__)\"",
87
  )
88
- .workdir(REMOTE_PROJECT)
89
- )
 
 
 
 
 
 
 
90
 
91
 
92
  app = modal.App(APP_NAME)
93
  volume = modal.Volume.from_name(VOLUME_NAME, create_if_missing=True)
94
- secret = modal.Secret.from_name(SECRET_NAME)
95
 
96
 
97
  @app.function(
@@ -99,7 +165,7 @@ secret = modal.Secret.from_name(SECRET_NAME)
99
  gpu=["L4", "A10G"],
100
  timeout=4 * 60 * 60,
101
  volumes={RUNS_DIR: volume},
102
- secrets=[secret],
103
  )
104
  def check_training_imports() -> dict[str, str]:
105
  import torch
@@ -131,7 +197,7 @@ def check_training_imports() -> dict[str, str]:
131
  gpu=["L4", "A10G"],
132
  timeout=4 * 60 * 60,
133
  volumes={RUNS_DIR: volume},
134
- secrets=[secret],
135
  )
136
  def train_cybersecurity_owasp_grpo(
137
  env_repo_id: str = "",
@@ -149,16 +215,22 @@ def train_cybersecurity_owasp_grpo(
149
  num_generations: int = 2,
150
  seed_start: int = 0,
151
  git_sha: str = "nogit",
 
 
 
 
152
  ) -> dict[str, str | int | float]:
 
153
  import statistics
154
 
155
  import torch
 
156
  import transformers.utils.hub as transformers_hub
157
  from datasets import Dataset
158
  from huggingface_hub import whoami
159
  from transformers import TrainerCallback
160
- from trl import GRPOConfig, GRPOTrainer
161
- from unsloth import FastLanguageModel
162
 
163
  import trackio
164
 
@@ -191,7 +263,9 @@ def train_cybersecurity_owasp_grpo(
191
 
192
  model_slug = model_name.replace("/", "-")
193
  stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
194
- run_name = f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-{stamp}-{git_sha[:8]}"
 
 
195
  output_dir = RUNS_DIR / run_name
196
  output_dir.mkdir(parents=True, exist_ok=True)
197
 
@@ -324,11 +398,27 @@ def train_cybersecurity_owasp_grpo(
324
  return self._step("read_openapi")
325
 
326
  def read_file(self, path: str) -> str:
327
- """Read an editable generated workspace file by relative path."""
 
 
 
 
 
 
 
 
328
  return self._step("read_file", {"path": path})
329
 
330
  def search_code(self, query: str) -> str:
331
- """Search editable generated workspace files for a string."""
 
 
 
 
 
 
 
 
332
  return self._step("search_code", {"query": query})
333
 
334
  def send_local_request(
@@ -337,7 +427,17 @@ def train_cybersecurity_owasp_grpo(
337
  method: str = "GET",
338
  user_id: str | None = None,
339
  ) -> str:
340
- """Send a request to the generated local app only."""
 
 
 
 
 
 
 
 
 
 
341
  return self._step(
342
  "send_local_request",
343
  {"path": path, "method": method, "user_id": user_id},
@@ -350,7 +450,18 @@ def train_cybersecurity_owasp_grpo(
350
  second_user_id: str,
351
  method: str = "GET",
352
  ) -> str:
353
- """Compare one local request as two generated users."""
 
 
 
 
 
 
 
 
 
 
 
354
  return self._step(
355
  "compare_identities",
356
  {
@@ -367,7 +478,17 @@ def train_cybersecurity_owasp_grpo(
367
  evidence: str,
368
  policy_rule: str,
369
  ) -> str:
370
- """Submit structured evidence for the suspected authorization bug."""
 
 
 
 
 
 
 
 
 
 
371
  return self._step(
372
  "submit_finding",
373
  {
@@ -383,7 +504,17 @@ def train_cybersecurity_owasp_grpo(
383
  content: str | None = None,
384
  diff: str | None = None,
385
  ) -> str:
386
- """Patch an editable generated app file with full content or a unified diff."""
 
 
 
 
 
 
 
 
 
 
387
  args: dict[str, Any] = {"path": path}
388
  if content is not None:
389
  args["content"] = content
@@ -534,7 +665,10 @@ def train_cybersecurity_owasp_grpo(
534
  return control
535
 
536
  print(f"CUDA available: {torch.cuda.is_available()}")
537
- print(f"Packaged local CyberSecurity_OWASP repo; default env repo id: {env_repo_id}")
 
 
 
538
  print(f"Trackio Space: {trackio_space_id}")
539
  print(f"Trackio Project: {trackio_project}")
540
  print(f"Output repo: {output_repo_id}")
@@ -547,6 +681,18 @@ def train_cybersecurity_owasp_grpo(
547
  fast_inference=False,
548
  token=hf_token,
549
  )
 
 
 
 
 
 
 
 
 
 
 
 
550
  model = FastLanguageModel.get_peft_model(
551
  model,
552
  r=lora_rank,
@@ -565,46 +711,68 @@ def train_cybersecurity_owasp_grpo(
565
  )
566
  FastLanguageModel.for_training(model)
567
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
568
  training_args = GRPOConfig(
569
- temperature=1.0,
570
- learning_rate=5e-6,
571
- weight_decay=0.001,
572
- warmup_ratio=0.1,
573
- lr_scheduler_type="linear",
574
- optim="adamw_8bit",
575
- logging_steps=1,
576
- per_device_train_batch_size=1,
577
- gradient_accumulation_steps=max(2, num_generations),
578
- num_generations=num_generations,
579
- max_prompt_length=max_seq_length,
580
- max_completion_length=max_completion_length,
581
- max_steps=max_steps,
582
- save_steps=max(10, max_steps),
583
- report_to="trackio",
584
- trackio_space_id=trackio_space_id,
585
- run_name=run_name,
586
- output_dir=str(output_dir),
587
- push_to_hub=True,
588
- hub_model_id=output_repo_id,
589
- hub_private_repo=True,
590
- hub_strategy="every_save",
591
- gradient_checkpointing=True,
592
- gradient_checkpointing_kwargs={"use_reentrant": False},
593
- epsilon=0.2,
594
- epsilon_high=0.28,
595
- delta=1.5,
596
- loss_type="bnpo",
597
- mask_truncated_completions=False,
598
  )
599
 
 
 
 
 
 
 
 
 
 
 
 
 
 
600
  trainer = GRPOTrainer(
601
- model=model,
602
- processing_class=tokenizer,
603
- reward_funcs=cybersecurity_owasp_reward,
604
- args=training_args,
605
- train_dataset=dataset,
606
- environment_factory=CyberSecurityOWASPToolEnv,
607
- callbacks=[TrackioSystemMetricsCallback()],
608
  )
609
  trainer.train()
610
  trainer.push_to_hub()
@@ -623,6 +791,9 @@ def train_cybersecurity_owasp_grpo(
623
  "model_name": model_name,
624
  "max_completion_length": max_completion_length,
625
  "num_generations": num_generations,
 
 
 
626
  }
627
 
628
 
@@ -644,6 +815,10 @@ def main(
644
  num_generations: int = 2,
645
  seed_start: int = 0,
646
  git_sha: str = "nogit",
 
 
 
 
647
  ) -> None:
648
  if mode == "config":
649
  result = check_training_imports.remote()
@@ -652,6 +827,10 @@ def main(
652
  if mode != "train":
653
  raise ValueError("mode must be 'train' or 'config'")
654
 
 
 
 
 
655
  resolved_trackio_space_id = trackio_space_id
656
  resolved_output_repo_id = output_repo_id
657
  if not resolved_trackio_space_id or not resolved_output_repo_id:
@@ -684,12 +863,28 @@ def main(
684
 
685
  model_slug = model_name.replace("/", "-")
686
  local_stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
687
- estimated_run_name = (
688
  f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-"
689
  f"{local_stamp}-{git_sha[:8]}"
690
  )
691
 
692
- call = train_cybersecurity_owasp_grpo.spawn(
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
693
  env_repo_id=env_repo_id,
694
  output_repo_id=output_repo_id,
695
  max_steps=max_steps,
@@ -705,17 +900,14 @@ def main(
705
  num_generations=num_generations,
706
  seed_start=seed_start,
707
  git_sha=git_sha,
 
 
 
 
708
  )
709
- print(f"Spawned Modal training call: {call.object_id}")
710
- print(f"Estimated run name: {estimated_run_name}")
711
- if resolved_trackio_space_id:
712
- print(f"Trackio Space: https://huggingface.co/spaces/{resolved_trackio_space_id}")
713
- else:
714
- print("Trackio Space: derived remotely from HF_TOKEN as <hf-user>/CyberSecurity_OWASP-trackio")
715
- if resolved_output_repo_id:
716
- print(f"Output model repo: https://huggingface.co/{resolved_output_repo_id}")
717
  else:
718
- print(
719
- "Output model repo: derived remotely from HF_TOKEN as "
720
- "<hf-user>/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora"
721
- )
 
19
  import os
20
  import pathlib
21
  import subprocess
22
+ import sys
23
  from datetime import datetime, timezone
24
  from typing import Any
25
 
 
32
  RUNS_DIR = pathlib.Path("/runs")
33
  REMOTE_PROJECT = "/root/CyberSecurity_OWASP"
34
  PROJECT_ROOT = pathlib.Path(__file__).resolve().parents[1]
35
+ PUBLIC_REPO_URL = "https://github.com/humandotlearning/CyberSecurity_OWASP.git"
36
+ PUBLIC_REPO_BRANCH = "master"
37
+
38
+
39
+ def _load_local_env_file() -> None:
40
+ env_path = PROJECT_ROOT / ".env.local"
41
+ if not env_path.exists():
42
+ return
43
+ for raw_line in env_path.read_text(encoding="utf-8").splitlines():
44
+ line = raw_line.strip()
45
+ if not line or line.startswith("#") or "=" not in line:
46
+ continue
47
+ key, value = line.split("=", 1)
48
+ key = key.strip()
49
+ if key not in {"TRACKIO_PROJECT"}:
50
+ continue
51
+ value = value.strip().strip('"').strip("'")
52
+ os.environ.setdefault(key, value)
53
+
54
+
55
+ def _modal_secrets() -> list[modal.Secret]:
56
+ if _is_config_mode():
57
+ return []
58
+ return [modal.Secret.from_name(SECRET_NAME, required_keys=["HF_TOKEN"])]
59
+
60
+
61
+ def _is_config_mode() -> bool:
62
+ args = sys.argv[1:]
63
+ for index, arg in enumerate(args):
64
+ if arg == "--mode" and index + 1 < len(args):
65
+ return args[index + 1] == "config"
66
+ if arg.startswith("--mode="):
67
+ return arg.split("=", 1)[1] == "config"
68
+ return False
69
+
70
+
71
+ _load_local_env_file()
72
+
73
+
74
+ def _cli_arg_value(name: str, default: str = "") -> str:
75
+ args = sys.argv[1:]
76
+ flag = f"--{name}"
77
+ for index, arg in enumerate(args):
78
+ if arg == flag and index + 1 < len(args):
79
+ return args[index + 1]
80
+ if arg.startswith(f"{flag}="):
81
+ return arg.split("=", 1)[1]
82
+ return default
83
+
84
+
85
+ def _source_mode() -> str:
86
+ return _cli_arg_value("source-mode", os.environ.get("MODAL_SOURCE_MODE", "local"))
87
 
88
 
89
  def _training_image() -> modal.Image:
90
+ image = (
91
  modal.Image.from_registry(
92
  "nvidia/cuda:12.8.0-devel-ubuntu22.04",
93
  add_python="3.11",
 
102
  "datasets",
103
  "huggingface_hub",
104
  "peft",
105
+ "pillow",
106
  "tokenizers",
107
  "nvidia-ml-py",
108
  "trackio>=0.25.0",
109
  "transformers>=5.5.0",
110
  "trl>=0.28.0",
111
  "openenv-core[core]>=0.2.3",
 
112
  )
113
  .uv_pip_install(
114
  "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo",
115
  "unsloth[base] @ git+https://github.com/unslothai/unsloth",
116
  )
117
+ .uv_pip_install("pydantic==2.10.6")
118
  .uv_pip_install("mergekit", "immutables==0.21", extra_options="--no-deps")
119
+ .uv_pip_install("llm-blender", "weave")
120
  .uv_pip_install("trl>=0.28.0", "transformers>=5.5.0", "jmespath")
121
+ )
122
+
123
+ if _source_mode() == "public":
124
+ repo_url = _cli_arg_value("repo-url", PUBLIC_REPO_URL)
125
+ repo_branch = _cli_arg_value("repo-branch", PUBLIC_REPO_BRANCH)
126
+ image = image.run_commands(
127
+ f"git clone --depth 1 --branch {repo_branch} {repo_url} {REMOTE_PROJECT}",
128
+ f"python -m pip install -e {REMOTE_PROJECT}",
129
+ )
130
+ else:
131
+ image = image.add_local_dir(
132
  PROJECT_ROOT,
133
  remote_path=REMOTE_PROJECT,
134
  copy=True,
 
141
  "*.pyc",
142
  ],
143
  )
144
+ image = image.run_commands(
145
  f"python -m pip install -e {REMOTE_PROJECT}",
 
 
 
 
 
 
146
  )
147
+
148
+ return image.run_commands(
149
+ "python -c \"import os, torch; import transformers.utils.hub as hub; "
150
+ "hub.TRANSFORMERS_CACHE = getattr(hub, 'TRANSFORMERS_CACHE', "
151
+ "os.path.join(os.path.expanduser('~'), '.cache', 'huggingface', 'hub')); "
152
+ "from trl import GRPOConfig, GRPOTrainer; "
153
+ "from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import "
154
+ "CybersecurityOwaspEnvironment; print('trainer import ok', torch.__version__)\"",
155
+ ).workdir(REMOTE_PROJECT)
156
 
157
 
158
  app = modal.App(APP_NAME)
159
  volume = modal.Volume.from_name(VOLUME_NAME, create_if_missing=True)
160
+ secrets = _modal_secrets()
161
 
162
 
163
  @app.function(
 
165
  gpu=["L4", "A10G"],
166
  timeout=4 * 60 * 60,
167
  volumes={RUNS_DIR: volume},
168
+ secrets=secrets,
169
  )
170
  def check_training_imports() -> dict[str, str]:
171
  import torch
 
197
  gpu=["L4", "A10G"],
198
  timeout=4 * 60 * 60,
199
  volumes={RUNS_DIR: volume},
200
+ secrets=secrets,
201
  )
202
  def train_cybersecurity_owasp_grpo(
203
  env_repo_id: str = "",
 
215
  num_generations: int = 2,
216
  seed_start: int = 0,
217
  git_sha: str = "nogit",
218
+ run_name: str = "",
219
+ source_mode: str = "local",
220
+ repo_url: str = PUBLIC_REPO_URL,
221
+ repo_branch: str = PUBLIC_REPO_BRANCH,
222
  ) -> dict[str, str | int | float]:
223
+ import inspect
224
  import statistics
225
 
226
  import torch
227
+ from unsloth import FastLanguageModel
228
  import transformers.utils.hub as transformers_hub
229
  from datasets import Dataset
230
  from huggingface_hub import whoami
231
  from transformers import TrainerCallback
232
+ from trl import GRPOConfig, GRPOTrainer, clone_chat_template
233
+ from trl.chat_template_utils import add_response_schema
234
 
235
  import trackio
236
 
 
263
 
264
  model_slug = model_name.replace("/", "-")
265
  stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
266
+ run_name = run_name or (
267
+ f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-{stamp}-{git_sha[:8]}"
268
+ )
269
  output_dir = RUNS_DIR / run_name
270
  output_dir.mkdir(parents=True, exist_ok=True)
271
 
 
398
  return self._step("read_openapi")
399
 
400
  def read_file(self, path: str) -> str:
401
+ """
402
+ Read an editable generated workspace file by relative path.
403
+
404
+ Args:
405
+ path: Relative path inside the generated editable workspace.
406
+
407
+ Returns:
408
+ The file contents or a safe tool error observation.
409
+ """
410
  return self._step("read_file", {"path": path})
411
 
412
  def search_code(self, query: str) -> str:
413
+ """
414
+ Search editable generated workspace files for a string.
415
+
416
+ Args:
417
+ query: Search text to find in editable generated app files.
418
+
419
+ Returns:
420
+ Matching file lines or a no-match message.
421
+ """
422
  return self._step("search_code", {"query": query})
423
 
424
  def send_local_request(
 
427
  method: str = "GET",
428
  user_id: str | None = None,
429
  ) -> str:
430
+ """
431
+ Send a request to the generated local app only.
432
+
433
+ Args:
434
+ path: Local route path such as /health or /invoices/<id>.
435
+ method: HTTP method to use for the local request.
436
+ user_id: Optional generated user identifier for authentication.
437
+
438
+ Returns:
439
+ JSON response from the simulated local app request.
440
+ """
441
  return self._step(
442
  "send_local_request",
443
  {"path": path, "method": method, "user_id": user_id},
 
450
  second_user_id: str,
451
  method: str = "GET",
452
  ) -> str:
453
+ """
454
+ Compare one local request as two generated users.
455
+
456
+ Args:
457
+ path: Local route path to request as both generated users.
458
+ first_user_id: First generated user identifier.
459
+ second_user_id: Second generated user identifier.
460
+ method: HTTP method to use for both local requests.
461
+
462
+ Returns:
463
+ JSON summary of both simulated local responses.
464
+ """
465
  return self._step(
466
  "compare_identities",
467
  {
 
478
  evidence: str,
479
  policy_rule: str,
480
  ) -> str:
481
+ """
482
+ Submit structured evidence for the suspected authorization bug.
483
+
484
+ Args:
485
+ summary: Concise description of the suspected access-control bug.
486
+ evidence: Local reproduction evidence from policy, code, or requests.
487
+ policy_rule: Policy rule that the observed behavior violates.
488
+
489
+ Returns:
490
+ Finding acceptance result and next phase information.
491
+ """
492
  return self._step(
493
  "submit_finding",
494
  {
 
504
  content: str | None = None,
505
  diff: str | None = None,
506
  ) -> str:
507
+ """
508
+ Patch an editable generated app file with full content or a unified diff.
509
+
510
+ Args:
511
+ path: Relative path of the editable generated app file to patch.
512
+ content: Complete replacement file content, when using full-file patching.
513
+ diff: Unified diff to apply, when using diff patching.
514
+
515
+ Returns:
516
+ Patch application result.
517
+ """
518
  args: dict[str, Any] = {"path": path}
519
  if content is not None:
520
  args["content"] = content
 
665
  return control
666
 
667
  print(f"CUDA available: {torch.cuda.is_available()}")
668
+ if source_mode == "public":
669
+ print(f"Installed CyberSecurity_OWASP from public repo: {repo_url}@{repo_branch}")
670
+ else:
671
+ print(f"Packaged local CyberSecurity_OWASP repo; default env repo id: {env_repo_id}")
672
  print(f"Trackio Space: {trackio_space_id}")
673
  print(f"Trackio Project: {trackio_project}")
674
  print(f"Output repo: {output_repo_id}")
 
681
  fast_inference=False,
682
  token=hf_token,
683
  )
684
+ try:
685
+ tokenizer = add_response_schema(tokenizer)
686
+ except Exception as exc:
687
+ print(f"Tokenizer response schema add failed before cloning: {exc!r}")
688
+ model, tokenizer, added_tokens = clone_chat_template(
689
+ model,
690
+ tokenizer,
691
+ "Qwen/Qwen3-0.6B",
692
+ )
693
+ print(f"Cloned Qwen3 chat template; added {len(added_tokens)} tokens.")
694
+ tokenizer = add_response_schema(tokenizer)
695
+
696
  model = FastLanguageModel.get_peft_model(
697
  model,
698
  r=lora_rank,
 
711
  )
712
  FastLanguageModel.for_training(model)
713
 
714
+ grpo_config_values = {
715
+ "temperature": 1.0,
716
+ "learning_rate": 5e-6,
717
+ "weight_decay": 0.001,
718
+ "warmup_ratio": 0.1,
719
+ "lr_scheduler_type": "linear",
720
+ "optim": "adamw_8bit",
721
+ "logging_steps": 1,
722
+ "per_device_train_batch_size": 1,
723
+ "gradient_accumulation_steps": max(2, num_generations),
724
+ "num_generations": num_generations,
725
+ "max_prompt_length": max_seq_length,
726
+ "max_completion_length": max_completion_length,
727
+ "max_steps": max_steps,
728
+ "save_steps": max(10, max_steps),
729
+ "report_to": "trackio",
730
+ "trackio_space_id": trackio_space_id,
731
+ "run_name": run_name,
732
+ "output_dir": str(output_dir),
733
+ "push_to_hub": True,
734
+ "hub_model_id": output_repo_id,
735
+ "hub_private_repo": True,
736
+ "hub_strategy": "every_save",
737
+ "gradient_checkpointing": True,
738
+ "gradient_checkpointing_kwargs": {"use_reentrant": False},
739
+ "epsilon": 0.2,
740
+ "epsilon_high": 0.28,
741
+ "delta": 1.5,
742
+ "loss_type": "bnpo",
743
+ "mask_truncated_completions": False,
744
+ }
745
+ grpo_config_parameters = set(inspect.signature(GRPOConfig).parameters)
746
+ skipped_config_keys = sorted(set(grpo_config_values) - grpo_config_parameters)
747
+ if skipped_config_keys:
748
+ print(f"Skipping unsupported GRPOConfig keys: {skipped_config_keys}")
749
  training_args = GRPOConfig(
750
+ **{
751
+ key: value
752
+ for key, value in grpo_config_values.items()
753
+ if key in grpo_config_parameters
754
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
755
  )
756
 
757
+ trainer_values = {
758
+ "model": model,
759
+ "processing_class": tokenizer,
760
+ "reward_funcs": cybersecurity_owasp_reward,
761
+ "args": training_args,
762
+ "train_dataset": dataset,
763
+ "environment_factory": CyberSecurityOWASPToolEnv,
764
+ "callbacks": [TrackioSystemMetricsCallback()],
765
+ }
766
+ trainer_parameters = set(inspect.signature(GRPOTrainer).parameters)
767
+ skipped_trainer_keys = sorted(set(trainer_values) - trainer_parameters)
768
+ if skipped_trainer_keys:
769
+ print(f"Skipping unsupported GRPOTrainer keys: {skipped_trainer_keys}")
770
  trainer = GRPOTrainer(
771
+ **{
772
+ key: value
773
+ for key, value in trainer_values.items()
774
+ if key in trainer_parameters
775
+ }
 
 
776
  )
777
  trainer.train()
778
  trainer.push_to_hub()
 
791
  "model_name": model_name,
792
  "max_completion_length": max_completion_length,
793
  "num_generations": num_generations,
794
+ "source_mode": source_mode,
795
+ "repo_url": repo_url,
796
+ "repo_branch": repo_branch,
797
  }
798
 
799
 
 
815
  num_generations: int = 2,
816
  seed_start: int = 0,
817
  git_sha: str = "nogit",
818
+ source_mode: str = "local",
819
+ repo_url: str = PUBLIC_REPO_URL,
820
+ repo_branch: str = PUBLIC_REPO_BRANCH,
821
+ detach: bool = False,
822
  ) -> None:
823
  if mode == "config":
824
  result = check_training_imports.remote()
 
827
  if mode != "train":
828
  raise ValueError("mode must be 'train' or 'config'")
829
 
830
+ trackio_space_id = trackio_space_id or os.environ.get("TRACKIO_SPACE_ID", "")
831
+ trackio_project = trackio_project or os.environ.get(
832
+ "TRACKIO_PROJECT", "CyberSecurity_OWASP-grpo"
833
+ )
834
  resolved_trackio_space_id = trackio_space_id
835
  resolved_output_repo_id = output_repo_id
836
  if not resolved_trackio_space_id or not resolved_output_repo_id:
 
863
 
864
  model_slug = model_name.replace("/", "-")
865
  local_stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
866
+ run_name = (
867
  f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-"
868
  f"{local_stamp}-{git_sha[:8]}"
869
  )
870
 
871
+ print(f"Run name: {run_name}")
872
+ print(f"Source mode: {source_mode}")
873
+ if source_mode == "public":
874
+ print(f"Public repo: {repo_url}@{repo_branch}")
875
+ if resolved_trackio_space_id:
876
+ print(f"Trackio Space: https://huggingface.co/spaces/{resolved_trackio_space_id}")
877
+ else:
878
+ print("Trackio Space: derived remotely from HF_TOKEN as <hf-user>/CyberSecurity_OWASP-trackio")
879
+ if resolved_output_repo_id:
880
+ print(f"Output model repo: https://huggingface.co/{resolved_output_repo_id}")
881
+ else:
882
+ print(
883
+ "Output model repo: derived remotely from HF_TOKEN as "
884
+ "<hf-user>/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora"
885
+ )
886
+
887
+ kwargs = dict(
888
  env_repo_id=env_repo_id,
889
  output_repo_id=output_repo_id,
890
  max_steps=max_steps,
 
900
  num_generations=num_generations,
901
  seed_start=seed_start,
902
  git_sha=git_sha,
903
+ run_name=run_name,
904
+ source_mode=source_mode,
905
+ repo_url=repo_url,
906
+ repo_branch=repo_branch,
907
  )
908
+ if detach:
909
+ call = train_cybersecurity_owasp_grpo.spawn(**kwargs)
910
+ print(f"Spawned Modal training call: {call.object_id}")
 
 
 
 
 
911
  else:
912
+ result = train_cybersecurity_owasp_grpo.remote(**kwargs)
913
+ print(f"Training result: {result}")
 
 
scripts/smoke_test.sh CHANGED
@@ -1,3 +1,3 @@
1
  #!/usr/bin/env bash
2
  set -euo pipefail
3
- uv run pytest tests/test_models.py tests/test_reset_step_state.py
 
1
  #!/usr/bin/env bash
2
  set -euo pipefail
3
+ uv run python scripts/track_pytest.py tests/test_models.py tests/test_reset_step_state.py
scripts/track_pytest.py ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Run pytest and record the result as a Trackio run."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import argparse
6
+ import subprocess
7
+ import sys
8
+ import time
9
+ from pathlib import Path
10
+
11
+ PROJECT_ROOT = Path(__file__).resolve().parents[1]
12
+ sys.path.insert(0, str(PROJECT_ROOT))
13
+ sys.path.insert(0, str(PROJECT_ROOT.parent))
14
+
15
+ from training.trackio_utils import build_run_name, get_git_sha, log_trackio_metrics, trackio_run
16
+
17
+
18
+ def main() -> int:
19
+ parser = argparse.ArgumentParser(description="Run pytest with Trackio tracking.")
20
+ parser.add_argument("pytest_args", nargs="*", help="Arguments passed through to pytest.")
21
+ parser.add_argument("--run-name", default="", help="Trackio run name override.")
22
+ parser.add_argument("--difficulty", type=int, default=0)
23
+ args, passthrough = parser.parse_known_args()
24
+
25
+ run_name = args.run_name or build_run_name(
26
+ "pytest",
27
+ "smoke",
28
+ args.difficulty,
29
+ git_sha=get_git_sha(),
30
+ )
31
+ pytest_args = [*args.pytest_args, *passthrough] or ["tests"]
32
+ command = [sys.executable, "-m", "pytest", *pytest_args]
33
+ started = time.perf_counter()
34
+
35
+ with trackio_run(
36
+ run_name=run_name,
37
+ run_type="pytest",
38
+ config={
39
+ "command": " ".join(command),
40
+ "pytest_args": pytest_args,
41
+ },
42
+ group="smoke",
43
+ ):
44
+ completed = subprocess.run(command)
45
+ duration = time.perf_counter() - started
46
+ log_trackio_metrics(
47
+ {
48
+ "smoke/pytest_exit_code": completed.returncode,
49
+ "smoke/pytest_passed": completed.returncode == 0,
50
+ "smoke/duration_seconds": duration,
51
+ },
52
+ step=0,
53
+ )
54
+
55
+ return completed.returncode
56
+
57
+
58
+ if __name__ == "__main__":
59
+ raise SystemExit(main())
server/Dockerfile CHANGED
@@ -70,6 +70,7 @@ ENV PATH="/app/.venv/bin:$PATH"
70
 
71
  # Set PYTHONPATH so imports work correctly
72
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
 
73
 
74
  # Health check
75
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
 
70
 
71
  # Set PYTHONPATH so imports work correctly
72
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+ ENV ENABLE_WEB_INTERFACE=true
74
 
75
  # Health check
76
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
server/action_tools.py ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Typed action tool dispatcher for the generated app sandbox."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ from dataclasses import dataclass
7
+
8
+ try:
9
+ from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPState
10
+ from .app_sandbox import AppSandbox
11
+ except ImportError: # pragma: no cover
12
+ from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
13
+ from server.app_sandbox import AppSandbox
14
+
15
+
16
+ @dataclass(frozen=True)
17
+ class ToolResult:
18
+ message: str
19
+ visible_test_result: str | None = None
20
+
21
+
22
+ class ActionTools:
23
+ """Executes phase-gated, safe tools against one episode state."""
24
+
25
+ def __init__(
26
+ self,
27
+ state: CyberSecurityOWASPState,
28
+ visible_policy_hint: dict,
29
+ workspace_summary: dict,
30
+ ):
31
+ self.state = state
32
+ self.visible_policy_hint = visible_policy_hint
33
+ self.workspace_summary = workspace_summary
34
+ self.sandbox = AppSandbox(state)
35
+
36
+ def execute(self, action: CyberSecurityOWASPAction) -> ToolResult:
37
+ args = action.arguments or {}
38
+ if action.tool_name == "noop":
39
+ return ToolResult("No operation.")
40
+ if action.tool_name == "inspect_policy_graph":
41
+ return ToolResult(json.dumps(self.visible_policy_hint, indent=2, sort_keys=True))
42
+ if action.tool_name == "list_routes":
43
+ return ToolResult(json.dumps(self.workspace_summary["routes"], indent=2))
44
+ if action.tool_name == "read_openapi":
45
+ return ToolResult(self.sandbox.read_openapi())
46
+ if action.tool_name == "read_file":
47
+ return ToolResult(self.sandbox.read_file(str(args.get("path", ""))))
48
+ if action.tool_name == "search_code":
49
+ return ToolResult(self.sandbox.search_code(str(args.get("query", ""))))
50
+ if action.tool_name == "send_local_request":
51
+ response = self.sandbox.send_local_request(
52
+ str(args.get("method", "GET")),
53
+ str(args.get("path", "")),
54
+ args.get("user_id"),
55
+ )
56
+ return ToolResult(json.dumps(response, indent=2, sort_keys=True))
57
+ if action.tool_name == "compare_identities":
58
+ response = self.sandbox.compare_identities(
59
+ str(args.get("method", "GET")),
60
+ str(args.get("path", "")),
61
+ str(args.get("first_user_id", "")),
62
+ str(args.get("second_user_id", "")),
63
+ )
64
+ return ToolResult(json.dumps(response, indent=2, sort_keys=True))
65
+ if action.tool_name == "patch_file":
66
+ result = self.sandbox.patch_file(
67
+ str(args.get("path", "")),
68
+ content=str(args["content"]) if "content" in args else None,
69
+ diff=str(args.get("diff", "")) if "content" not in args else None,
70
+ )
71
+ changed = "no diff" if not result["diff"].strip() else "diff recorded"
72
+ return ToolResult(f"Patched {result['path']} ({changed}).")
73
+ raise ValueError(f"Unhandled tool {action.tool_name}")
server/adversarial_designer.py ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Bounded adversarial scenario targeting for synthetic local lab episodes."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from dataclasses import dataclass
6
+ from typing import Any
7
+
8
+ try:
9
+ from .curriculum import WEAKNESS_TARGETS
10
+ except ImportError: # pragma: no cover
11
+ from server.curriculum import WEAKNESS_TARGETS
12
+
13
+
14
+ TARGET_SPECS: dict[str, dict[str, Any]] = {
15
+ "same_role_cross_object": {
16
+ "description": "Same-role actor tries to read another user's object.",
17
+ "hidden_focus": ["same_tenant_other_user_blocked"],
18
+ },
19
+ "cross_tenant_boundary": {
20
+ "description": "Tenant-local admin is denied access to another tenant's resource.",
21
+ "hidden_focus": ["cross_tenant_blocked"],
22
+ },
23
+ "public_route_overlock": {
24
+ "description": "Public health route must remain unauthenticated after patching.",
25
+ "hidden_focus": ["health_public"],
26
+ },
27
+ "alternate_route_same_service": {
28
+ "description": "Alternate route/service access should follow the same policy oracle.",
29
+ "hidden_focus": ["oracle_matrix"],
30
+ },
31
+ "visible_test_edge_case": {
32
+ "description": "Visible tests are insufficient; hidden policy matrix decides success.",
33
+ "hidden_focus": ["visible_test_only_guard"],
34
+ },
35
+ }
36
+
37
+
38
+ @dataclass(frozen=True)
39
+ class BoundedAdversarialDesigner:
40
+ """Chooses safe local lab variants that target tracked agent weaknesses."""
41
+
42
+ def design(self, *, seed: int, split: str, curriculum_profile: dict[str, Any]) -> dict[str, Any]:
43
+ target = str(curriculum_profile.get("target_weakness") or "")
44
+ if target not in TARGET_SPECS:
45
+ target = WEAKNESS_TARGETS[int(seed) % len(WEAKNESS_TARGETS)]
46
+ family = f"invoices.bola_idor.{target}"
47
+ if split == "hidden_eval":
48
+ family = f"heldout.{family}"
49
+ spec = TARGET_SPECS[target]
50
+ return {
51
+ "domain": "invoices",
52
+ "bug_family": "bola_idor",
53
+ "template_id": "fastapi_basic",
54
+ "scenario_family": family,
55
+ "target_weakness": target,
56
+ "hidden_focus": list(spec["hidden_focus"]),
57
+ "description": spec["description"],
58
+ "safe_lab_only": True,
59
+ }
server/app.py CHANGED
@@ -6,6 +6,13 @@
6
 
7
  """FastAPI application for the CyberSecurity_OWASP OpenEnv server."""
8
 
 
 
 
 
 
 
 
9
  try:
10
  from openenv.core.env_server.http_server import create_app
11
  except Exception as e: # pragma: no cover
 
6
 
7
  """FastAPI application for the CyberSecurity_OWASP OpenEnv server."""
8
 
9
+ import os
10
+
11
+ # OpenEnv disables the Gradio playground unless this flag is enabled. Default it
12
+ # on so Docker/HF Spaces show the reset/step/state UI, while explicit env values
13
+ # such as ENABLE_WEB_INTERFACE=false still take precedence.
14
+ os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
15
+
16
  try:
17
  from openenv.core.env_server.http_server import create_app
18
  except Exception as e: # pragma: no cover
server/app_sandbox.py ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Ephemeral generated app sandbox operations."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import difflib
6
+ import json
7
+ from pathlib import Path
8
+ from typing import Any
9
+
10
+ try:
11
+ from ..models import CyberSecurityOWASPState
12
+ from ..safety import is_local_route
13
+ from ..validators import is_path_allowed, simulate_request
14
+ except ImportError: # pragma: no cover
15
+ from models import CyberSecurityOWASPState
16
+ from safety import is_local_route
17
+ from validators import is_path_allowed, simulate_request
18
+
19
+
20
+ class AppSandbox:
21
+ """Encapsulates all generated workspace reads, patches, and local requests."""
22
+
23
+ def __init__(self, state: CyberSecurityOWASPState):
24
+ self.state = state
25
+
26
+ @property
27
+ def workspace(self) -> Path:
28
+ return Path(str(self.state.hidden_facts["workspace"]))
29
+
30
+ def read_file(self, path: str) -> str:
31
+ return self._resolve_path(path).read_text(encoding="utf-8")
32
+
33
+ def search_code(self, query: str) -> str:
34
+ if not query:
35
+ raise ValueError("query is required")
36
+ results: list[str] = []
37
+ for rel in self.state.hidden_facts.get("editable_files", []):
38
+ path = self.workspace / rel
39
+ text = path.read_text(encoding="utf-8")
40
+ for idx, line in enumerate(text.splitlines(), start=1):
41
+ if query.lower() in line.lower():
42
+ results.append(f"{rel}:{idx}: {line}")
43
+ return "\n".join(results) or "No matches."
44
+
45
+ def patch_file(self, path: str, *, content: str | None = None, diff: str | None = None) -> dict[str, str]:
46
+ target = self._resolve_path(path, write=True)
47
+ before = target.read_text(encoding="utf-8")
48
+ if content is not None:
49
+ target.write_text(content, encoding="utf-8")
50
+ else:
51
+ self._apply_unified_diff(target, diff or "")
52
+ after = target.read_text(encoding="utf-8")
53
+ patch_diff = "".join(
54
+ difflib.unified_diff(
55
+ before.splitlines(True),
56
+ after.splitlines(True),
57
+ fromfile=path,
58
+ tofile=path,
59
+ )
60
+ )
61
+ self.state.patch_diff = patch_diff
62
+ files_touched = self.state.metrics.setdefault("files_touched", [])
63
+ if path not in files_touched:
64
+ files_touched.append(path)
65
+ return {"path": path, "diff": patch_diff}
66
+
67
+ def read_openapi(self) -> str:
68
+ routes = self.state.visible_facts.get("workspace_summary", {}).get("routes", [])
69
+ paths: dict[str, Any] = {}
70
+ for route in routes:
71
+ paths.setdefault(route["path"], {})[route["method"].lower()] = {
72
+ "x-public": bool(route.get("public", False))
73
+ }
74
+ return json.dumps(
75
+ {
76
+ "openapi": "3.1.0",
77
+ "info": {"title": "Generated invoices app", "version": "0.1.0"},
78
+ "paths": paths,
79
+ },
80
+ indent=2,
81
+ sort_keys=True,
82
+ )
83
+
84
+ def send_local_request(self, method: str, path: str, user_id: str | None = None) -> dict[str, Any]:
85
+ if not is_local_route(path):
86
+ raise ValueError("send_local_request only accepts local route paths")
87
+ return simulate_request(self.state, method, path, user_id)
88
+
89
+ def compare_identities(
90
+ self,
91
+ method: str,
92
+ path: str,
93
+ first_user_id: str,
94
+ second_user_id: str,
95
+ ) -> dict[str, Any]:
96
+ if not is_local_route(path):
97
+ raise ValueError("compare_identities only accepts local route paths")
98
+ return {
99
+ "first": simulate_request(self.state, method, path, first_user_id),
100
+ "second": simulate_request(self.state, method, path, second_user_id),
101
+ }
102
+
103
+ def _resolve_path(self, path: str, *, write: bool = False) -> Path:
104
+ allowed, normalized_or_error = is_path_allowed(self.state, path, write=write)
105
+ if not allowed:
106
+ raise ValueError(normalized_or_error)
107
+ return self.workspace / normalized_or_error
108
+
109
+ def _apply_unified_diff(self, path: Path, diff: str) -> None:
110
+ if not diff.strip():
111
+ raise ValueError("diff or content is required")
112
+ original = path.read_text(encoding="utf-8").splitlines(True)
113
+ output: list[str] = []
114
+ old_index = 0
115
+ lines = diff.splitlines(True)
116
+ i = 0
117
+ while i < len(lines):
118
+ line = lines[i]
119
+ if not line.startswith("@@"):
120
+ i += 1
121
+ continue
122
+ old_start = int(line.split()[1].split(",")[0][1:])
123
+ output.extend(original[old_index : old_start - 1])
124
+ old_index = old_start - 1
125
+ i += 1
126
+ while i < len(lines) and not lines[i].startswith("@@"):
127
+ hunk_line = lines[i]
128
+ if hunk_line.startswith(" "):
129
+ output.append(original[old_index])
130
+ old_index += 1
131
+ elif hunk_line.startswith("-"):
132
+ old_index += 1
133
+ elif hunk_line.startswith("+"):
134
+ output.append(hunk_line[1:])
135
+ elif hunk_line.startswith("\\"):
136
+ pass
137
+ i += 1
138
+ output.extend(original[old_index:])
139
+ path.write_text("".join(output), encoding="utf-8")
server/authz_oracle.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Policy-oracle matrix for deterministic authorization verification."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from dataclasses import dataclass
6
+ from typing import Any
7
+
8
+ try:
9
+ from ..models import CyberSecurityOWASPState
10
+ from ..validators import simulate_request
11
+ except ImportError: # pragma: no cover
12
+ from models import CyberSecurityOWASPState
13
+ from validators import simulate_request
14
+
15
+
16
+ @dataclass(frozen=True)
17
+ class AuthzOracle:
18
+ """Builds hidden allowed/denied tuples from policy facts."""
19
+
20
+ def matrix(self, state: CyberSecurityOWASPState) -> list[dict[str, Any]]:
21
+ hidden = state.hidden_facts
22
+ return [
23
+ {
24
+ "name": "owner_can_read_owned_invoice",
25
+ "method": "GET",
26
+ "path": f"/invoices/{hidden['owner_invoice_id']}",
27
+ "user_id": hidden["owner_user_id"],
28
+ "expected_status": 200,
29
+ "allowed": True,
30
+ },
31
+ {
32
+ "name": "resource_owner_can_read_own_invoice",
33
+ "method": "GET",
34
+ "path": f"/invoices/{hidden['other_invoice_id']}",
35
+ "user_id": hidden["intruder_user_id"],
36
+ "expected_status": 200,
37
+ "allowed": True,
38
+ },
39
+ {
40
+ "name": "billing_admin_can_read_same_tenant_invoice",
41
+ "method": "GET",
42
+ "path": f"/invoices/{hidden['other_invoice_id']}",
43
+ "user_id": hidden["admin_user_id"],
44
+ "expected_status": 200,
45
+ "allowed": True,
46
+ },
47
+ {
48
+ "name": "same_tenant_non_owner_denied",
49
+ "method": "GET",
50
+ "path": f"/invoices/{hidden['other_invoice_id']}",
51
+ "user_id": hidden["owner_user_id"],
52
+ "expected_status": 403,
53
+ "allowed": False,
54
+ },
55
+ {
56
+ "name": "cross_tenant_admin_denied",
57
+ "method": "GET",
58
+ "path": f"/invoices/{hidden['foreign_invoice_id']}",
59
+ "user_id": hidden["admin_user_id"],
60
+ "expected_status": 403,
61
+ "allowed": False,
62
+ },
63
+ {
64
+ "name": "health_remains_public",
65
+ "method": "GET",
66
+ "path": "/health",
67
+ "user_id": None,
68
+ "expected_status": 200,
69
+ "allowed": True,
70
+ },
71
+ ]
72
+
73
+ def evaluate(self, state: CyberSecurityOWASPState) -> dict[str, Any]:
74
+ cases = []
75
+ for case in self.matrix(state):
76
+ response = simulate_request(
77
+ state,
78
+ str(case["method"]),
79
+ str(case["path"]),
80
+ case.get("user_id"),
81
+ )
82
+ actual = int(response["status"])
83
+ cases.append(
84
+ {
85
+ "name": case["name"],
86
+ "allowed": bool(case["allowed"]),
87
+ "expected_status": int(case["expected_status"]),
88
+ "actual_status": actual,
89
+ "passed": actual == int(case["expected_status"]),
90
+ }
91
+ )
92
+ return {"passed": all(case["passed"] for case in cases), "cases": cases}
server/curriculum.py ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Runtime curriculum controller for closed-loop scenario selection."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from collections import defaultdict, deque
6
+ from dataclasses import dataclass, field
7
+ from typing import Any
8
+
9
+ try:
10
+ from ..models import CyberSecurityOWASPState
11
+ except ImportError: # pragma: no cover
12
+ from models import CyberSecurityOWASPState
13
+
14
+
15
+ DIFFICULTY_TIERS = ("warmup", "beginner", "intermediate", "advanced", "expert")
16
+ WEAKNESS_TARGETS = (
17
+ "same_role_cross_object",
18
+ "cross_tenant_boundary",
19
+ "public_route_overlock",
20
+ "alternate_route_same_service",
21
+ "visible_test_edge_case",
22
+ )
23
+
24
+
25
+ @dataclass
26
+ class CurriculumController:
27
+ """Tracks episode outcomes and picks the next bounded weakness target."""
28
+
29
+ window_size: int = 10
30
+ reward_trend: deque[float] = field(default_factory=lambda: deque(maxlen=10))
31
+ outcomes_by_target: dict[str, list[bool]] = field(default_factory=lambda: defaultdict(list))
32
+ failures_by_target: dict[str, int] = field(default_factory=lambda: defaultdict(int))
33
+ episodes_seen: int = 0
34
+
35
+ def select_profile(
36
+ self,
37
+ *,
38
+ seed: int,
39
+ split: str = "train",
40
+ requested_difficulty: int = 0,
41
+ ) -> dict[str, Any]:
42
+ difficulty = self._difficulty_for_split(split, requested_difficulty)
43
+ target = self._target_for_seed(seed, split)
44
+ if self.failures_by_target:
45
+ target = max(
46
+ WEAKNESS_TARGETS,
47
+ key=lambda item: (self.failures_by_target.get(item, 0), -WEAKNESS_TARGETS.index(item)),
48
+ )
49
+ return {
50
+ "difficulty": difficulty,
51
+ "difficulty_tier": DIFFICULTY_TIERS[min(difficulty, len(DIFFICULTY_TIERS) - 1)],
52
+ "target_weakness": target,
53
+ "split": split,
54
+ "episodes_seen": self.episodes_seen,
55
+ "recent_reward_mean": self._recent_reward_mean(),
56
+ "mastery": self.mastery_snapshot(),
57
+ }
58
+
59
+ def record_episode(self, state: CyberSecurityOWASPState) -> dict[str, Any]:
60
+ target = state.target_weakness or "same_role_cross_object"
61
+ success = bool(state.success)
62
+ self.episodes_seen += 1
63
+ self.outcomes_by_target[target].append(success)
64
+ if not success:
65
+ self.failures_by_target[target] += 1
66
+ self.reward_trend.append(float(state.last_reward or 0.0))
67
+ return self.mastery_snapshot()
68
+
69
+ def mastery_snapshot(self) -> dict[str, Any]:
70
+ target_mastery = {}
71
+ for target in WEAKNESS_TARGETS:
72
+ outcomes = self.outcomes_by_target.get(target, [])
73
+ target_mastery[target] = {
74
+ "episodes": len(outcomes),
75
+ "success_rate": sum(1 for item in outcomes if item) / max(1, len(outcomes)),
76
+ "failures": self.failures_by_target.get(target, 0),
77
+ }
78
+ return {
79
+ "episodes_seen": self.episodes_seen,
80
+ "recent_reward_mean": self._recent_reward_mean(),
81
+ "target_mastery": target_mastery,
82
+ }
83
+
84
+ def _difficulty_for_split(self, split: str, requested_difficulty: int) -> int:
85
+ difficulty = max(0, min(int(requested_difficulty), len(DIFFICULTY_TIERS) - 1))
86
+ if split == "hidden_eval":
87
+ return max(3, difficulty)
88
+ if self.episodes_seen >= self.window_size and self._recent_reward_mean() > 10.0:
89
+ return min(difficulty + 1, len(DIFFICULTY_TIERS) - 1)
90
+ return difficulty
91
+
92
+ def _target_for_seed(self, seed: int, split: str) -> str:
93
+ offset = 2 if split == "hidden_eval" else 0
94
+ return WEAKNESS_TARGETS[(int(seed) + offset) % len(WEAKNESS_TARGETS)]
95
+
96
+ def _recent_reward_mean(self) -> float:
97
+ if not self.reward_trend:
98
+ return 0.0
99
+ return sum(self.reward_trend) / len(self.reward_trend)
server/episode_logger.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Episode artifact logging for training, debugging, and demos."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ import os
7
+ from pathlib import Path
8
+ from typing import Any
9
+
10
+ try:
11
+ from ..models import CyberSecurityOWASPState
12
+ except ImportError: # pragma: no cover
13
+ from models import CyberSecurityOWASPState
14
+
15
+
16
+ class EpisodeArtifactLogger:
17
+ """Appends compact JSONL episode transcripts under outputs/rollouts."""
18
+
19
+ def __init__(self, output_path: str | Path | None = None):
20
+ configured = output_path or os.getenv("CYBERSECURITY_OWASP_EPISODE_LOG")
21
+ self.output_path = Path(configured) if configured else Path("outputs/rollouts/episodes.jsonl")
22
+
23
+ def log_episode(
24
+ self,
25
+ state: CyberSecurityOWASPState,
26
+ *,
27
+ final_observation: dict[str, Any] | None = None,
28
+ ) -> Path:
29
+ self.output_path.parent.mkdir(parents=True, exist_ok=True)
30
+ record = {
31
+ "episode_id": state.episode_id,
32
+ "task_id": state.task_id,
33
+ "seed": state.seed,
34
+ "split": state.split,
35
+ "difficulty": state.difficulty,
36
+ "difficulty_tier": state.difficulty_tier,
37
+ "template_id": state.template_id,
38
+ "scenario_family": state.scenario_family,
39
+ "domain": state.domain,
40
+ "bug_family": state.bug_family,
41
+ "target_weakness": state.target_weakness,
42
+ "agent_actions": state.action_history,
43
+ "observations": state.observation_history,
44
+ "final_observation": final_observation or {},
45
+ "patch_diff": state.patch_diff,
46
+ "visible_test_result": self._verifier_layer(state, "visible"),
47
+ "hidden_test_result": self._verifier_layer(state, "hidden_tests"),
48
+ "oracle_result": self._verifier_layer(state, "oracle_matrix"),
49
+ "regression_result": self._verifier_layer(state, "regression"),
50
+ "reward_breakdown": state.reward_history[-1] if state.reward_history else {},
51
+ "reward_breakdown_by_step": state.reward_history,
52
+ "final_status": "resolved" if state.success else "failed",
53
+ "failure_reason": state.failure_reason,
54
+ "safety_violations": [
55
+ flag for flag in state.anti_cheat_flags if "network" in flag or "unsafe" in flag
56
+ ],
57
+ "anti_cheat_flags": state.anti_cheat_flags,
58
+ "metrics": state.metrics,
59
+ }
60
+ with self.output_path.open("a", encoding="utf-8") as handle:
61
+ handle.write(json.dumps(record, sort_keys=True) + "\n")
62
+ state.episode_artifact_path = str(self.output_path)
63
+ return self.output_path
64
+
65
+ def _verifier_layer(self, state: CyberSecurityOWASPState, key: str) -> Any:
66
+ return (state.verification_summary or {}).get(key)
server/reward_engine.py CHANGED
@@ -5,45 +5,24 @@ from __future__ import annotations
5
  try:
6
  from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPState
7
  from ..rewards import compute_reward
8
- from ..validators import (
9
- patch_quality,
10
- run_hidden_regression_tests,
11
- run_hidden_security_tests,
12
- run_public_route_tests,
13
- run_visible_tests,
14
- verify_finding,
15
- )
16
  except ImportError: # pragma: no cover
17
  from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
18
  from rewards import compute_reward
19
- from validators import (
20
- patch_quality,
21
- run_hidden_regression_tests,
22
- run_hidden_security_tests,
23
- run_public_route_tests,
24
- run_visible_tests,
25
- verify_finding,
26
- )
27
 
28
 
29
  def evaluate_action(
30
  state: CyberSecurityOWASPState,
31
  action: CyberSecurityOWASPAction,
32
  anti_cheat_flags: list[str] | None = None,
 
 
33
  ) -> tuple[dict, dict[str, float]]:
34
- verifier_result: dict = {"anti_cheat_flags": anti_cheat_flags or []}
35
- if action.tool_name == "submit_finding":
36
- verifier_result["finding"] = verify_finding(state, action.arguments)
37
- elif action.tool_name == "run_visible_tests":
38
- verifier_result["visible"] = run_visible_tests(state)
39
- elif action.tool_name == "submit_fix":
40
- verifier_result.update(
41
- {
42
- "visible": run_visible_tests(state),
43
- "security": run_hidden_security_tests(state),
44
- "regression": run_hidden_regression_tests(state),
45
- "public_routes": run_public_route_tests(state),
46
- "patch_quality": patch_quality(state),
47
- }
48
- )
49
  return verifier_result, compute_reward(state, action, verifier_result)
 
5
  try:
6
  from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPState
7
  from ..rewards import compute_reward
8
+ from .verifier import MultiLayerVerifier
 
 
 
 
 
 
 
9
  except ImportError: # pragma: no cover
10
  from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
11
  from rewards import compute_reward
12
+ from server.verifier import MultiLayerVerifier
 
 
 
 
 
 
 
13
 
14
 
15
  def evaluate_action(
16
  state: CyberSecurityOWASPState,
17
  action: CyberSecurityOWASPAction,
18
  anti_cheat_flags: list[str] | None = None,
19
+ *,
20
+ invalid_action: bool = False,
21
  ) -> tuple[dict, dict[str, float]]:
22
+ verifier_result = MultiLayerVerifier().evaluate_action(
23
+ state,
24
+ action,
25
+ anti_cheat_flags,
26
+ invalid_action=invalid_action,
27
+ )
 
 
 
 
 
 
 
 
 
28
  return verifier_result, compute_reward(state, action, verifier_result)
server/scenario_factory.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Closed-loop scenario factory for CyberSecurity_OWASP."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import os
6
+ import tempfile
7
+ from pathlib import Path
8
+ from typing import Any
9
+ from uuid import uuid4
10
+
11
+ try:
12
+ from ..fixture_generator import visible_workspace_summary
13
+ from ..policy_graph import build_invoice_policy
14
+ from ..template_renderer import render_fastapi_basic
15
+ from .adversarial_designer import BoundedAdversarialDesigner
16
+ except ImportError: # pragma: no cover
17
+ from fixture_generator import visible_workspace_summary
18
+ from policy_graph import build_invoice_policy
19
+ from template_renderer import render_fastapi_basic
20
+ from server.adversarial_designer import BoundedAdversarialDesigner
21
+
22
+
23
+ def _make_workspace(prefix: str) -> Path:
24
+ root = Path(os.getenv("CYBERSECURITY_OWASP_WORKSPACE_ROOT", tempfile.gettempdir()))
25
+ root.mkdir(parents=True, exist_ok=True)
26
+ for _ in range(100):
27
+ workspace = root / f"{prefix}{uuid4().hex[:12]}"
28
+ try:
29
+ workspace.mkdir()
30
+ except FileExistsError:
31
+ continue
32
+ return workspace
33
+ raise RuntimeError("Unable to create isolated scenario workspace")
34
+
35
+
36
+ def _visible_policy_hint(public_hint: dict[str, Any]) -> dict[str, Any]:
37
+ """Return partial policy observability without hidden oracle/test labels."""
38
+
39
+ return {
40
+ "domain": public_hint.get("domain", "invoices"),
41
+ "policy_rules": list(public_hint.get("policy_rules", [])),
42
+ "fixture_aliases": {
43
+ "users": dict(public_hint.get("users", {})),
44
+ "resources": dict(public_hint.get("resources", {})),
45
+ },
46
+ "public_routes": list(public_hint.get("public_routes", [])),
47
+ "observation_contract": {
48
+ "visible": [
49
+ "product policy summary",
50
+ "fixture aliases needed for local requests",
51
+ "route summaries",
52
+ "visible test results",
53
+ ],
54
+ "hidden": [
55
+ "oracle matrix",
56
+ "hidden invariant tests",
57
+ "injected bug label",
58
+ "held-out family label",
59
+ ],
60
+ },
61
+ }
62
+
63
+
64
+ class ScenarioFactory:
65
+ """Compiles deterministic local app scenarios from curriculum profiles."""
66
+
67
+ def __init__(self, designer: BoundedAdversarialDesigner | None = None):
68
+ self.designer = designer or BoundedAdversarialDesigner()
69
+
70
+ def compile_scenario(
71
+ self,
72
+ seed: int,
73
+ *,
74
+ split: str = "train",
75
+ difficulty: int = 0,
76
+ curriculum_profile: dict[str, Any] | None = None,
77
+ ) -> dict[str, Any]:
78
+ profile = curriculum_profile or {
79
+ "difficulty": difficulty,
80
+ "difficulty_tier": "warmup",
81
+ "target_weakness": "same_role_cross_object",
82
+ }
83
+ adversarial_spec = self.designer.design(
84
+ seed=seed, split=split, curriculum_profile=profile
85
+ )
86
+ compiled = build_invoice_policy(seed)
87
+ workspace = _make_workspace(prefix=f"cybersecurity_owasp_{split}_{seed}_")
88
+ public_hint = _visible_policy_hint(compiled.public_hint)
89
+ editable_files = render_fastapi_basic(workspace, public_hint, compiled.hidden_facts)
90
+ workspace_summary = visible_workspace_summary(editable_files, public_hint)
91
+ workspace_summary.update(
92
+ {
93
+ "template_id": adversarial_spec["template_id"],
94
+ "target_weakness": adversarial_spec["target_weakness"],
95
+ }
96
+ )
97
+
98
+ hidden = dict(compiled.hidden_facts)
99
+ hidden.update(
100
+ {
101
+ "workspace": str(workspace),
102
+ "editable_files": editable_files,
103
+ "initial_file_hashes": {
104
+ path: (workspace / path).read_text(encoding="utf-8")
105
+ for path in editable_files
106
+ },
107
+ "adversarial_spec": adversarial_spec,
108
+ "scenario_family": adversarial_spec["scenario_family"],
109
+ "template_id": adversarial_spec["template_id"],
110
+ "target_weakness": adversarial_spec["target_weakness"],
111
+ "oracle_hidden_focus": adversarial_spec["hidden_focus"],
112
+ }
113
+ )
114
+
115
+ return {
116
+ "task_id": f"{split}-invoices-bola-{seed}",
117
+ "workspace": workspace,
118
+ "domain": adversarial_spec["domain"],
119
+ "bug_family": adversarial_spec["bug_family"],
120
+ "scenario_family": adversarial_spec["scenario_family"],
121
+ "template_id": adversarial_spec["template_id"],
122
+ "target_weakness": adversarial_spec["target_weakness"],
123
+ "difficulty": int(profile.get("difficulty", difficulty)),
124
+ "difficulty_tier": str(profile.get("difficulty_tier", "warmup")),
125
+ "curriculum_snapshot": profile,
126
+ "task_brief": (
127
+ "Inspect the generated invoices app and policy. Find the broken "
128
+ "authorization behavior, submit a finding with local evidence, patch "
129
+ "the app, preserve intended owner/admin/public behavior, then submit."
130
+ ),
131
+ "public_hint": public_hint,
132
+ "workspace_summary": workspace_summary,
133
+ "hidden_facts": hidden,
134
+ }
server/verifier.py ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Multi-layer deterministic verifier for CyberSecurity_OWASP."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ from dataclasses import dataclass
7
+ from typing import Any
8
+
9
+ try:
10
+ from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPState
11
+ from ..validators import (
12
+ patch_quality,
13
+ run_hidden_regression_tests,
14
+ run_hidden_security_tests,
15
+ run_public_route_tests,
16
+ run_visible_tests,
17
+ verify_finding,
18
+ )
19
+ from .authz_oracle import AuthzOracle
20
+ except ImportError: # pragma: no cover
21
+ from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
22
+ from validators import (
23
+ patch_quality,
24
+ run_hidden_regression_tests,
25
+ run_hidden_security_tests,
26
+ run_public_route_tests,
27
+ run_visible_tests,
28
+ verify_finding,
29
+ )
30
+ from server.authz_oracle import AuthzOracle
31
+
32
+
33
+ @dataclass
34
+ class MultiLayerVerifier:
35
+ """Aggregates visible, hidden, oracle, regression, and patch-quality checks."""
36
+
37
+ oracle: AuthzOracle = AuthzOracle()
38
+
39
+ def evaluate_action(
40
+ self,
41
+ state: CyberSecurityOWASPState,
42
+ action: CyberSecurityOWASPAction,
43
+ anti_cheat_flags: list[str] | None = None,
44
+ *,
45
+ invalid_action: bool = False,
46
+ ) -> dict[str, Any]:
47
+ verifier_result: dict[str, Any] = {
48
+ "anti_cheat_flags": anti_cheat_flags or [],
49
+ "invalid_action": invalid_action,
50
+ "repeated_action": self._is_repeated_action(state, action),
51
+ }
52
+ if action.tool_name == "submit_finding":
53
+ verifier_result["finding"] = verify_finding(state, action.arguments)
54
+ elif action.tool_name == "run_visible_tests":
55
+ verifier_result["visible"] = run_visible_tests(state)
56
+ elif action.tool_name == "submit_fix":
57
+ verifier_result.update(self.run_terminal_checks(state))
58
+ return verifier_result
59
+
60
+ def run_terminal_checks(self, state: CyberSecurityOWASPState) -> dict[str, Any]:
61
+ security = run_hidden_security_tests(state)
62
+ return {
63
+ "visible": run_visible_tests(state),
64
+ "hidden_tests": security,
65
+ "security": security,
66
+ "oracle_matrix": self.oracle.evaluate(state),
67
+ "regression": run_hidden_regression_tests(state),
68
+ "public_routes": run_public_route_tests(state),
69
+ "patch_quality": patch_quality(state),
70
+ }
71
+
72
+ def public_summary(self, verifier_result: dict[str, Any]) -> dict[str, Any]:
73
+ """Return verifier fields that are safe for state/debug summaries."""
74
+
75
+ return json.loads(json.dumps(verifier_result))
76
+
77
+ def _is_repeated_action(
78
+ self, state: CyberSecurityOWASPState, action: CyberSecurityOWASPAction
79
+ ) -> bool:
80
+ current = {"tool_name": action.tool_name, "arguments": action.arguments}
81
+ return sum(1 for item in state.action_history if item == current) > 1
tests/test_web_interface.py ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi.testclient import TestClient
2
+
3
+ from server.app import app
4
+
5
+
6
+ def test_space_root_redirects_to_openenv_web_ui():
7
+ client = TestClient(app)
8
+
9
+ response = client.get("/", follow_redirects=False)
10
+
11
+ assert response.status_code == 307
12
+ assert response.headers["location"] == "/web/"
13
+
14
+
15
+ def test_openenv_web_ui_and_api_routes_are_available():
16
+ client = TestClient(app)
17
+
18
+ web_response = client.get("/web/")
19
+ health_response = client.get("/health")
20
+ state_response = client.get("/web/state")
21
+
22
+ assert web_response.status_code == 200
23
+ assert "text/html" in web_response.headers["content-type"]
24
+ assert "Reset" in web_response.text
25
+ assert "Step" in web_response.text
26
+ assert "Get state" in web_response.text
27
+
28
+ assert health_response.status_code == 200
29
+ assert health_response.json() == {"status": "healthy"}
30
+
31
+ assert state_response.status_code == 200
32
+ state = state_response.json()
33
+ assert "episode_id" in state
34
+ assert "step_count" in state
35
+
36
+
37
+ def test_web_reset_returns_cybersecurity_observation():
38
+ client = TestClient(app)
39
+
40
+ response = client.post("/web/reset")
41
+
42
+ assert response.status_code == 200
43
+ payload = response.json()
44
+ observation = payload["observation"]
45
+ assert observation["phase"] == "discover"
46
+ assert "authorization" in observation["task_brief"]
47
+ assert "inspect_policy_graph" in observation["available_actions"]
training/eval_before_after.py CHANGED
@@ -5,6 +5,8 @@ from __future__ import annotations
5
  import json
6
  from pathlib import Path
7
 
 
 
8
 
9
  def summarize_runs(baseline: list[dict], trained: list[dict], heldout: list[dict]) -> dict:
10
  def mean(items: list[dict], key: str) -> float:
@@ -19,11 +21,27 @@ def summarize_runs(baseline: list[dict], trained: list[dict], heldout: list[dict
19
  "absolute_reward_improvement": mean(trained, "reward_total") - mean(baseline, "reward_total"),
20
  "heldout_success_rate": mean(heldout, "success"),
21
  "heldout_mean_reward": mean(heldout, "reward_total"),
 
 
 
 
 
 
 
 
22
  }
23
 
24
 
25
- def save_eval_summary(run_name: str, summary: dict) -> Path:
 
 
 
 
 
 
26
  output = Path("outputs/evals") / f"{run_name}_eval_summary.json"
27
  output.parent.mkdir(parents=True, exist_ok=True)
28
  output.write_text(json.dumps(summary, indent=2, sort_keys=True), encoding="utf-8")
 
 
29
  return output
 
5
  import json
6
  from pathlib import Path
7
 
8
+ from training.trackio_utils import log_eval_summary
9
+
10
 
11
  def summarize_runs(baseline: list[dict], trained: list[dict], heldout: list[dict]) -> dict:
12
  def mean(items: list[dict], key: str) -> float:
 
21
  "absolute_reward_improvement": mean(trained, "reward_total") - mean(baseline, "reward_total"),
22
  "heldout_success_rate": mean(heldout, "success"),
23
  "heldout_mean_reward": mean(heldout, "reward_total"),
24
+ "exploit_block_rate": mean(trained, "exploit_blocked"),
25
+ "regression_preservation_rate": mean(trained, "regression_preserved"),
26
+ "public_route_preservation_rate": mean(trained, "public_routes_preserved"),
27
+ "anti_cheat_pass_rate": mean(trained, "anti_cheat_pass"),
28
+ "invalid_action_rate": mean(trained, "invalid_action_rate"),
29
+ "timeout_rate": mean(trained, "timeout"),
30
+ "safety_violation_rate": mean(trained, "safety_violation"),
31
+ "mean_episode_length": mean(trained, "episode_length"),
32
  }
33
 
34
 
35
+ def save_eval_summary(
36
+ run_name: str,
37
+ summary: dict,
38
+ *,
39
+ track: bool = True,
40
+ trackio_config: dict | None = None,
41
+ ) -> Path:
42
  output = Path("outputs/evals") / f"{run_name}_eval_summary.json"
43
  output.parent.mkdir(parents=True, exist_ok=True)
44
  output.write_text(json.dumps(summary, indent=2, sort_keys=True), encoding="utf-8")
45
+ if track:
46
+ log_eval_summary(run_name, summary, config=trackio_config)
47
  return output
training/trackio_utils.py CHANGED
@@ -2,7 +2,12 @@
2
 
3
  from __future__ import annotations
4
 
 
 
 
5
  from datetime import datetime
 
 
6
 
7
 
8
  TRAIN_METRICS = [
@@ -34,7 +39,133 @@ TRAIN_METRICS = [
34
  ]
35
 
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  def build_run_name(model: str, algo: str, difficulty: int, git_sha: str = "nogit") -> str:
38
- stamp = datetime.utcnow().strftime("%Y%m%d-%H%M")
39
  model_slug = model.replace("/", "-")
40
  return f"CyberSecurity_OWASP-{model_slug}-{algo}-level{difficulty}-{stamp}-{git_sha[:8]}"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  from __future__ import annotations
4
 
5
+ import os
6
+ import subprocess
7
+ from contextlib import contextmanager
8
  from datetime import datetime
9
+ from pathlib import Path
10
+ from typing import Any, Iterator
11
 
12
 
13
  TRAIN_METRICS = [
 
39
  ]
40
 
41
 
42
+ EVAL_METRICS = [
43
+ "eval/baseline_success_rate",
44
+ "eval/trained_success_rate",
45
+ "eval/absolute_success_improvement",
46
+ "eval/baseline_mean_reward",
47
+ "eval/trained_mean_reward",
48
+ "eval/absolute_reward_improvement",
49
+ "eval/heldout_success_rate",
50
+ "eval/heldout_mean_reward",
51
+ "eval/exploit_block_rate",
52
+ "eval/regression_preservation_rate",
53
+ "eval/public_route_preservation_rate",
54
+ "eval/anti_cheat_pass_rate",
55
+ "eval/invalid_action_rate",
56
+ "eval/timeout_rate",
57
+ "eval/safety_violation_rate",
58
+ "eval/mean_episode_length",
59
+ ]
60
+
61
+
62
  def build_run_name(model: str, algo: str, difficulty: int, git_sha: str = "nogit") -> str:
63
+ stamp = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
64
  model_slug = model.replace("/", "-")
65
  return f"CyberSecurity_OWASP-{model_slug}-{algo}-level{difficulty}-{stamp}-{git_sha[:8]}"
66
+
67
+
68
+ def get_git_sha(default: str = "nogit") -> str:
69
+ try:
70
+ result = subprocess.run(
71
+ ["git", "rev-parse", "HEAD"],
72
+ check=True,
73
+ capture_output=True,
74
+ text=True,
75
+ )
76
+ except Exception:
77
+ return default
78
+ return result.stdout.strip() or default
79
+
80
+
81
+ def _load_trackio():
82
+ os.environ.setdefault("TRACKIO_DIR", str((Path.cwd() / "outputs" / "trackio").resolve()))
83
+ try:
84
+ import trackio
85
+ except ImportError as exc:
86
+ raise RuntimeError(
87
+ "Trackio is required for CyberSecurity_OWASP runs. Install dependencies "
88
+ "with `uv sync` and set TRACKIO_SPACE_ID when you want remote HF Spaces tracking."
89
+ ) from exc
90
+ return trackio
91
+
92
+
93
+ def init_trackio_run(
94
+ *,
95
+ run_name: str,
96
+ run_type: str,
97
+ config: dict[str, Any] | None = None,
98
+ project: str | None = None,
99
+ space_id: str | None = None,
100
+ group: str | None = None,
101
+ ):
102
+ trackio = _load_trackio()
103
+ project = project or os.getenv("TRACKIO_PROJECT", "CyberSecurity_OWASP")
104
+ space_id = space_id if space_id is not None else os.getenv("TRACKIO_SPACE_ID", "")
105
+ run_config = {
106
+ "environment": "CyberSecurity_OWASP",
107
+ "run_type": run_type,
108
+ **(config or {}),
109
+ }
110
+ kwargs: dict[str, Any] = {
111
+ "project": project,
112
+ "name": run_name,
113
+ "config": run_config,
114
+ }
115
+ if space_id:
116
+ kwargs["space_id"] = space_id
117
+ if group:
118
+ kwargs["group"] = group
119
+ return trackio.init(**kwargs)
120
+
121
+
122
+ def log_trackio_metrics(metrics: dict[str, Any], step: int | None = None) -> None:
123
+ trackio = _load_trackio()
124
+ numeric = {
125
+ key: value
126
+ for key, value in metrics.items()
127
+ if isinstance(value, (int, float, bool))
128
+ }
129
+ if step is None:
130
+ trackio.log(numeric)
131
+ else:
132
+ trackio.log(numeric, step=step)
133
+
134
+
135
+ def finish_trackio_run() -> None:
136
+ trackio = _load_trackio()
137
+ trackio.finish()
138
+
139
+
140
+ @contextmanager
141
+ def trackio_run(
142
+ *,
143
+ run_name: str,
144
+ run_type: str,
145
+ config: dict[str, Any] | None = None,
146
+ project: str | None = None,
147
+ space_id: str | None = None,
148
+ group: str | None = None,
149
+ ) -> Iterator[Any]:
150
+ run = init_trackio_run(
151
+ run_name=run_name,
152
+ run_type=run_type,
153
+ config=config,
154
+ project=project,
155
+ space_id=space_id,
156
+ group=group,
157
+ )
158
+ try:
159
+ yield run
160
+ finally:
161
+ finish_trackio_run()
162
+
163
+
164
+ def log_eval_summary(run_name: str, summary: dict[str, Any], config: dict[str, Any] | None = None) -> None:
165
+ metrics = {
166
+ f"eval/{key}": float(value)
167
+ for key, value in summary.items()
168
+ if isinstance(value, (int, float, bool))
169
+ }
170
+ with trackio_run(run_name=run_name, run_type="eval", config=config, group="eval"):
171
+ log_trackio_metrics(metrics, step=0)
training/train_grpo.py CHANGED
@@ -9,16 +9,26 @@ from __future__ import annotations
9
 
10
  import os
11
 
 
 
12
 
13
  def build_grpo_config():
14
  from trl import GRPOConfig
15
 
 
 
16
  output_dir = os.getenv("OUTPUT_DIR", "CyberSecurity_OWASP-qwen3-1.7b-grpo")
17
  trackio_space_id = os.getenv("TRACKIO_SPACE_ID", output_dir)
 
 
 
 
 
18
  return GRPOConfig(
19
  output_dir=output_dir,
20
  report_to="trackio",
21
  trackio_space_id=trackio_space_id,
 
22
  logging_steps=1,
23
  save_steps=25,
24
  learning_rate=5e-6,
 
9
 
10
  import os
11
 
12
+ from training.trackio_utils import build_run_name, get_git_sha
13
+
14
 
15
  def build_grpo_config():
16
  from trl import GRPOConfig
17
 
18
+ model_name = os.getenv("MODEL_NAME", "Qwen/Qwen3-1.7B")
19
+ difficulty = int(os.getenv("DIFFICULTY", "0"))
20
  output_dir = os.getenv("OUTPUT_DIR", "CyberSecurity_OWASP-qwen3-1.7b-grpo")
21
  trackio_space_id = os.getenv("TRACKIO_SPACE_ID", output_dir)
22
+ os.environ.setdefault("TRACKIO_PROJECT", "CyberSecurity_OWASP-grpo")
23
+ run_name = os.getenv(
24
+ "RUN_NAME",
25
+ build_run_name(model_name, "grpo", difficulty, git_sha=get_git_sha()),
26
+ )
27
  return GRPOConfig(
28
  output_dir=output_dir,
29
  report_to="trackio",
30
  trackio_space_id=trackio_space_id,
31
+ run_name=run_name,
32
  logging_steps=1,
33
  save_steps=25,
34
  learning_rate=5e-6,
uv.lock CHANGED
@@ -1283,6 +1283,49 @@ wheels = [
1283
  { url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" },
1284
  ]
1285
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1286
  [[package]]
1287
  name = "httpx"
1288
  version = "0.28.1"
@@ -2136,6 +2179,7 @@ version = "0.1.0"
2136
  source = { editable = "." }
2137
  dependencies = [
2138
  { name = "openenv-core", extra = ["core"] },
 
2139
  ]
2140
 
2141
  [package.optional-dependencies]
@@ -2153,6 +2197,7 @@ requires-dist = [
2153
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
2154
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
2155
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
 
2156
  ]
2157
  provides-extras = ["dev", "modal"]
2158
 
@@ -3411,6 +3456,26 @@ wheels = [
3411
  { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
3412
  ]
3413
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3414
  [[package]]
3415
  name = "typer"
3416
  version = "0.24.2"
@@ -3506,6 +3571,61 @@ wheels = [
3506
  { url = "https://files.pythonhosted.org/packages/31/a3/5b1562db76a5a488274b2332a97199b32d0442aca0ed193697fd47786316/uvicorn-0.46.0-py3-none-any.whl", hash = "sha256:bbebbcbed972d162afca128605223022bedd345b7bc7855ce66deb31487a9048", size = 70926, upload-time = "2026-04-23T07:15:58.355Z" },
3507
  ]
3508
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3509
  [[package]]
3510
  name = "watchfiles"
3511
  version = "1.1.1"
 
1283
  { url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" },
1284
  ]
1285
 
1286
+ [[package]]
1287
+ name = "httptools"
1288
+ version = "0.7.1"
1289
+ source = { registry = "https://pypi.org/simple" }
1290
+ sdist = { url = "https://files.pythonhosted.org/packages/b5/46/120a669232c7bdedb9d52d4aeae7e6c7dfe151e99dc70802e2fc7a5e1993/httptools-0.7.1.tar.gz", hash = "sha256:abd72556974f8e7c74a259655924a717a2365b236c882c3f6f8a45fe94703ac9", size = 258961, upload-time = "2025-10-10T03:55:08.559Z" }
1291
+ wheels = [
1292
+ { url = "https://files.pythonhosted.org/packages/c7/e5/c07e0bcf4ec8db8164e9f6738c048b2e66aabf30e7506f440c4cc6953f60/httptools-0.7.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:11d01b0ff1fe02c4c32d60af61a4d613b74fad069e47e06e9067758c01e9ac78", size = 204531, upload-time = "2025-10-10T03:54:20.887Z" },
1293
+ { url = "https://files.pythonhosted.org/packages/7e/4f/35e3a63f863a659f92ffd92bef131f3e81cf849af26e6435b49bd9f6f751/httptools-0.7.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:84d86c1e5afdc479a6fdabf570be0d3eb791df0ae727e8dbc0259ed1249998d4", size = 109408, upload-time = "2025-10-10T03:54:22.455Z" },
1294
+ { url = "https://files.pythonhosted.org/packages/f5/71/b0a9193641d9e2471ac541d3b1b869538a5fb6419d52fd2669fa9c79e4b8/httptools-0.7.1-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:c8c751014e13d88d2be5f5f14fc8b89612fcfa92a9cc480f2bc1598357a23a05", size = 440889, upload-time = "2025-10-10T03:54:23.753Z" },
1295
+ { url = "https://files.pythonhosted.org/packages/eb/d9/2e34811397b76718750fea44658cb0205b84566e895192115252e008b152/httptools-0.7.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:654968cb6b6c77e37b832a9be3d3ecabb243bbe7a0b8f65fbc5b6b04c8fcabed", size = 440460, upload-time = "2025-10-10T03:54:25.313Z" },
1296
+ { url = "https://files.pythonhosted.org/packages/01/3f/a04626ebeacc489866bb4d82362c0657b2262bef381d68310134be7f40bb/httptools-0.7.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:b580968316348b474b020edf3988eecd5d6eec4634ee6561e72ae3a2a0e00a8a", size = 425267, upload-time = "2025-10-10T03:54:26.81Z" },
1297
+ { url = "https://files.pythonhosted.org/packages/a5/99/adcd4f66614db627b587627c8ad6f4c55f18881549bab10ecf180562e7b9/httptools-0.7.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:d496e2f5245319da9d764296e86c5bb6fcf0cf7a8806d3d000717a889c8c0b7b", size = 424429, upload-time = "2025-10-10T03:54:28.174Z" },
1298
+ { url = "https://files.pythonhosted.org/packages/d5/72/ec8fc904a8fd30ba022dfa85f3bbc64c3c7cd75b669e24242c0658e22f3c/httptools-0.7.1-cp310-cp310-win_amd64.whl", hash = "sha256:cbf8317bfccf0fed3b5680c559d3459cccf1abe9039bfa159e62e391c7270568", size = 86173, upload-time = "2025-10-10T03:54:29.5Z" },
1299
+ { url = "https://files.pythonhosted.org/packages/9c/08/17e07e8d89ab8f343c134616d72eebfe03798835058e2ab579dcc8353c06/httptools-0.7.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:474d3b7ab469fefcca3697a10d11a32ee2b9573250206ba1e50d5980910da657", size = 206521, upload-time = "2025-10-10T03:54:31.002Z" },
1300
+ { url = "https://files.pythonhosted.org/packages/aa/06/c9c1b41ff52f16aee526fd10fbda99fa4787938aa776858ddc4a1ea825ec/httptools-0.7.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:a3c3b7366bb6c7b96bd72d0dbe7f7d5eead261361f013be5f6d9590465ea1c70", size = 110375, upload-time = "2025-10-10T03:54:31.941Z" },
1301
+ { url = "https://files.pythonhosted.org/packages/cc/cc/10935db22fda0ee34c76f047590ca0a8bd9de531406a3ccb10a90e12ea21/httptools-0.7.1-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:379b479408b8747f47f3b253326183d7c009a3936518cdb70db58cffd369d9df", size = 456621, upload-time = "2025-10-10T03:54:33.176Z" },
1302
+ { url = "https://files.pythonhosted.org/packages/0e/84/875382b10d271b0c11aa5d414b44f92f8dd53e9b658aec338a79164fa548/httptools-0.7.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:cad6b591a682dcc6cf1397c3900527f9affef1e55a06c4547264796bbd17cf5e", size = 454954, upload-time = "2025-10-10T03:54:34.226Z" },
1303
+ { url = "https://files.pythonhosted.org/packages/30/e1/44f89b280f7e46c0b1b2ccee5737d46b3bb13136383958f20b580a821ca0/httptools-0.7.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:eb844698d11433d2139bbeeb56499102143beb582bd6c194e3ba69c22f25c274", size = 440175, upload-time = "2025-10-10T03:54:35.942Z" },
1304
+ { url = "https://files.pythonhosted.org/packages/6f/7e/b9287763159e700e335028bc1824359dc736fa9b829dacedace91a39b37e/httptools-0.7.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f65744d7a8bdb4bda5e1fa23e4ba16832860606fcc09d674d56e425e991539ec", size = 440310, upload-time = "2025-10-10T03:54:37.1Z" },
1305
+ { url = "https://files.pythonhosted.org/packages/b3/07/5b614f592868e07f5c94b1f301b5e14a21df4e8076215a3bccb830a687d8/httptools-0.7.1-cp311-cp311-win_amd64.whl", hash = "sha256:135fbe974b3718eada677229312e97f3b31f8a9c8ffa3ae6f565bf808d5b6bcb", size = 86875, upload-time = "2025-10-10T03:54:38.421Z" },
1306
+ { url = "https://files.pythonhosted.org/packages/53/7f/403e5d787dc4942316e515e949b0c8a013d84078a915910e9f391ba9b3ed/httptools-0.7.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:38e0c83a2ea9746ebbd643bdfb521b9aa4a91703e2cd705c20443405d2fd16a5", size = 206280, upload-time = "2025-10-10T03:54:39.274Z" },
1307
+ { url = "https://files.pythonhosted.org/packages/2a/0d/7f3fd28e2ce311ccc998c388dd1c53b18120fda3b70ebb022b135dc9839b/httptools-0.7.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f25bbaf1235e27704f1a7b86cd3304eabc04f569c828101d94a0e605ef7205a5", size = 110004, upload-time = "2025-10-10T03:54:40.403Z" },
1308
+ { url = "https://files.pythonhosted.org/packages/84/a6/b3965e1e146ef5762870bbe76117876ceba51a201e18cc31f5703e454596/httptools-0.7.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:2c15f37ef679ab9ecc06bfc4e6e8628c32a8e4b305459de7cf6785acd57e4d03", size = 517655, upload-time = "2025-10-10T03:54:41.347Z" },
1309
+ { url = "https://files.pythonhosted.org/packages/11/7d/71fee6f1844e6fa378f2eddde6c3e41ce3a1fb4b2d81118dd544e3441ec0/httptools-0.7.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7fe6e96090df46b36ccfaf746f03034e5ab723162bc51b0a4cf58305324036f2", size = 511440, upload-time = "2025-10-10T03:54:42.452Z" },
1310
+ { url = "https://files.pythonhosted.org/packages/22/a5/079d216712a4f3ffa24af4a0381b108aa9c45b7a5cc6eb141f81726b1823/httptools-0.7.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:f72fdbae2dbc6e68b8239defb48e6a5937b12218e6ffc2c7846cc37befa84362", size = 495186, upload-time = "2025-10-10T03:54:43.937Z" },
1311
+ { url = "https://files.pythonhosted.org/packages/e9/9e/025ad7b65278745dee3bd0ebf9314934c4592560878308a6121f7f812084/httptools-0.7.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:e99c7b90a29fd82fea9ef57943d501a16f3404d7b9ee81799d41639bdaae412c", size = 499192, upload-time = "2025-10-10T03:54:45.003Z" },
1312
+ { url = "https://files.pythonhosted.org/packages/6d/de/40a8f202b987d43afc4d54689600ff03ce65680ede2f31df348d7f368b8f/httptools-0.7.1-cp312-cp312-win_amd64.whl", hash = "sha256:3e14f530fefa7499334a79b0cf7e7cd2992870eb893526fb097d51b4f2d0f321", size = 86694, upload-time = "2025-10-10T03:54:45.923Z" },
1313
+ { url = "https://files.pythonhosted.org/packages/09/8f/c77b1fcbfd262d422f12da02feb0d218fa228d52485b77b953832105bb90/httptools-0.7.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:6babce6cfa2a99545c60bfef8bee0cc0545413cb0018f617c8059a30ad985de3", size = 202889, upload-time = "2025-10-10T03:54:47.089Z" },
1314
+ { url = "https://files.pythonhosted.org/packages/0a/1a/22887f53602feaa066354867bc49a68fc295c2293433177ee90870a7d517/httptools-0.7.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:601b7628de7504077dd3dcb3791c6b8694bbd967148a6d1f01806509254fb1ca", size = 108180, upload-time = "2025-10-10T03:54:48.052Z" },
1315
+ { url = "https://files.pythonhosted.org/packages/32/6a/6aaa91937f0010d288d3d124ca2946d48d60c3a5ee7ca62afe870e3ea011/httptools-0.7.1-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:04c6c0e6c5fb0739c5b8a9eb046d298650a0ff38cf42537fc372b28dc7e4472c", size = 478596, upload-time = "2025-10-10T03:54:48.919Z" },
1316
+ { url = "https://files.pythonhosted.org/packages/6d/70/023d7ce117993107be88d2cbca566a7c1323ccbaf0af7eabf2064fe356f6/httptools-0.7.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:69d4f9705c405ae3ee83d6a12283dc9feba8cc6aaec671b412917e644ab4fa66", size = 473268, upload-time = "2025-10-10T03:54:49.993Z" },
1317
+ { url = "https://files.pythonhosted.org/packages/32/4d/9dd616c38da088e3f436e9a616e1d0cc66544b8cdac405cc4e81c8679fc7/httptools-0.7.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:44c8f4347d4b31269c8a9205d8a5ee2df5322b09bbbd30f8f862185bb6b05346", size = 455517, upload-time = "2025-10-10T03:54:51.066Z" },
1318
+ { url = "https://files.pythonhosted.org/packages/1d/3a/a6c595c310b7df958e739aae88724e24f9246a514d909547778d776799be/httptools-0.7.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:465275d76db4d554918aba40bf1cbebe324670f3dfc979eaffaa5d108e2ed650", size = 458337, upload-time = "2025-10-10T03:54:52.196Z" },
1319
+ { url = "https://files.pythonhosted.org/packages/fd/82/88e8d6d2c51edc1cc391b6e044c6c435b6aebe97b1abc33db1b0b24cd582/httptools-0.7.1-cp313-cp313-win_amd64.whl", hash = "sha256:322d00c2068d125bd570f7bf78b2d367dad02b919d8581d7476d8b75b294e3e6", size = 85743, upload-time = "2025-10-10T03:54:53.448Z" },
1320
+ { url = "https://files.pythonhosted.org/packages/34/50/9d095fcbb6de2d523e027a2f304d4551855c2f46e0b82befd718b8b20056/httptools-0.7.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:c08fe65728b8d70b6923ce31e3956f859d5e1e8548e6f22ec520a962c6757270", size = 203619, upload-time = "2025-10-10T03:54:54.321Z" },
1321
+ { url = "https://files.pythonhosted.org/packages/07/f0/89720dc5139ae54b03f861b5e2c55a37dba9a5da7d51e1e824a1f343627f/httptools-0.7.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:7aea2e3c3953521c3c51106ee11487a910d45586e351202474d45472db7d72d3", size = 108714, upload-time = "2025-10-10T03:54:55.163Z" },
1322
+ { url = "https://files.pythonhosted.org/packages/b3/cb/eea88506f191fb552c11787c23f9a405f4c7b0c5799bf73f2249cd4f5228/httptools-0.7.1-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:0e68b8582f4ea9166be62926077a3334064d422cf08ab87d8b74664f8e9058e1", size = 472909, upload-time = "2025-10-10T03:54:56.056Z" },
1323
+ { url = "https://files.pythonhosted.org/packages/e0/4a/a548bdfae6369c0d078bab5769f7b66f17f1bfaa6fa28f81d6be6959066b/httptools-0.7.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:df091cf961a3be783d6aebae963cc9b71e00d57fa6f149025075217bc6a55a7b", size = 470831, upload-time = "2025-10-10T03:54:57.219Z" },
1324
+ { url = "https://files.pythonhosted.org/packages/4d/31/14df99e1c43bd132eec921c2e7e11cda7852f65619bc0fc5bdc2d0cb126c/httptools-0.7.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:f084813239e1eb403ddacd06a30de3d3e09a9b76e7894dcda2b22f8a726e9c60", size = 452631, upload-time = "2025-10-10T03:54:58.219Z" },
1325
+ { url = "https://files.pythonhosted.org/packages/22/d2/b7e131f7be8d854d48cb6d048113c30f9a46dca0c9a8b08fcb3fcd588cdc/httptools-0.7.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:7347714368fb2b335e9063bc2b96f2f87a9ceffcd9758ac295f8bbcd3ffbc0ca", size = 452910, upload-time = "2025-10-10T03:54:59.366Z" },
1326
+ { url = "https://files.pythonhosted.org/packages/53/cf/878f3b91e4e6e011eff6d1fa9ca39f7eb17d19c9d7971b04873734112f30/httptools-0.7.1-cp314-cp314-win_amd64.whl", hash = "sha256:cfabda2a5bb85aa2a904ce06d974a3f30fb36cc63d7feaddec05d2050acede96", size = 88205, upload-time = "2025-10-10T03:55:00.389Z" },
1327
+ ]
1328
+
1329
  [[package]]
1330
  name = "httpx"
1331
  version = "0.28.1"
 
2179
  source = { editable = "." }
2180
  dependencies = [
2181
  { name = "openenv-core", extra = ["core"] },
2182
+ { name = "trackio" },
2183
  ]
2184
 
2185
  [package.optional-dependencies]
 
2197
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
2198
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
2199
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
2200
+ { name = "trackio", specifier = ">=0.22.0" },
2201
  ]
2202
  provides-extras = ["dev", "modal"]
2203
 
 
3456
  { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
3457
  ]
3458
 
3459
+ [[package]]
3460
+ name = "trackio"
3461
+ version = "0.25.0"
3462
+ source = { registry = "https://pypi.org/simple" }
3463
+ dependencies = [
3464
+ { name = "gradio-client" },
3465
+ { name = "huggingface-hub" },
3466
+ { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
3467
+ { name = "numpy", version = "2.4.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
3468
+ { name = "orjson" },
3469
+ { name = "pillow" },
3470
+ { name = "python-multipart" },
3471
+ { name = "starlette" },
3472
+ { name = "tomli", marker = "python_full_version < '3.11'" },
3473
+ { name = "uvicorn", extra = ["standard"] },
3474
+ ]
3475
+ wheels = [
3476
+ { url = "https://files.pythonhosted.org/packages/e7/4d/2aa0e1ca6daebdfac79fadd2ab308d5880c8d0305b2ce8b88900f95a8415/trackio-0.25.0-py3-none-any.whl", hash = "sha256:6c1ae7decef6e35d1165a6b2536d6df8c67594329bdf6bd9f1786c153a532b9f", size = 1653706, upload-time = "2026-04-23T15:45:29.887Z" },
3477
+ ]
3478
+
3479
  [[package]]
3480
  name = "typer"
3481
  version = "0.24.2"
 
3571
  { url = "https://files.pythonhosted.org/packages/31/a3/5b1562db76a5a488274b2332a97199b32d0442aca0ed193697fd47786316/uvicorn-0.46.0-py3-none-any.whl", hash = "sha256:bbebbcbed972d162afca128605223022bedd345b7bc7855ce66deb31487a9048", size = 70926, upload-time = "2026-04-23T07:15:58.355Z" },
3572
  ]
3573
 
3574
+ [package.optional-dependencies]
3575
+ standard = [
3576
+ { name = "colorama", marker = "sys_platform == 'win32'" },
3577
+ { name = "httptools" },
3578
+ { name = "python-dotenv" },
3579
+ { name = "pyyaml" },
3580
+ { name = "uvloop", marker = "platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'" },
3581
+ { name = "watchfiles" },
3582
+ { name = "websockets" },
3583
+ ]
3584
+
3585
+ [[package]]
3586
+ name = "uvloop"
3587
+ version = "0.22.1"
3588
+ source = { registry = "https://pypi.org/simple" }
3589
+ sdist = { url = "https://files.pythonhosted.org/packages/06/f0/18d39dbd1971d6d62c4629cc7fa67f74821b0dc1f5a77af43719de7936a7/uvloop-0.22.1.tar.gz", hash = "sha256:6c84bae345b9147082b17371e3dd5d42775bddce91f885499017f4607fdaf39f", size = 2443250, upload-time = "2025-10-16T22:17:19.342Z" }
3590
+ wheels = [
3591
+ { url = "https://files.pythonhosted.org/packages/eb/14/ecceb239b65adaaf7fde510aa8bd534075695d1e5f8dadfa32b5723d9cfb/uvloop-0.22.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:ef6f0d4cc8a9fa1f6a910230cd53545d9a14479311e87e3cb225495952eb672c", size = 1343335, upload-time = "2025-10-16T22:16:11.43Z" },
3592
+ { url = "https://files.pythonhosted.org/packages/ba/ae/6f6f9af7f590b319c94532b9567409ba11f4fa71af1148cab1bf48a07048/uvloop-0.22.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:7cd375a12b71d33d46af85a3343b35d98e8116134ba404bd657b3b1d15988792", size = 742903, upload-time = "2025-10-16T22:16:12.979Z" },
3593
+ { url = "https://files.pythonhosted.org/packages/09/bd/3667151ad0702282a1f4d5d29288fce8a13c8b6858bf0978c219cd52b231/uvloop-0.22.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ac33ed96229b7790eb729702751c0e93ac5bc3bcf52ae9eccbff30da09194b86", size = 3648499, upload-time = "2025-10-16T22:16:14.451Z" },
3594
+ { url = "https://files.pythonhosted.org/packages/b3/f6/21657bb3beb5f8c57ce8be3b83f653dd7933c2fd00545ed1b092d464799a/uvloop-0.22.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:481c990a7abe2c6f4fc3d98781cc9426ebd7f03a9aaa7eb03d3bfc68ac2a46bd", size = 3700133, upload-time = "2025-10-16T22:16:16.272Z" },
3595
+ { url = "https://files.pythonhosted.org/packages/09/e0/604f61d004ded805f24974c87ddd8374ef675644f476f01f1df90e4cdf72/uvloop-0.22.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:a592b043a47ad17911add5fbd087c76716d7c9ccc1d64ec9249ceafd735f03c2", size = 3512681, upload-time = "2025-10-16T22:16:18.07Z" },
3596
+ { url = "https://files.pythonhosted.org/packages/bb/ce/8491fd370b0230deb5eac69c7aae35b3be527e25a911c0acdffb922dc1cd/uvloop-0.22.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:1489cf791aa7b6e8c8be1c5a080bae3a672791fcb4e9e12249b05862a2ca9cec", size = 3615261, upload-time = "2025-10-16T22:16:19.596Z" },
3597
+ { url = "https://files.pythonhosted.org/packages/c7/d5/69900f7883235562f1f50d8184bb7dd84a2fb61e9ec63f3782546fdbd057/uvloop-0.22.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:c60ebcd36f7b240b30788554b6f0782454826a0ed765d8430652621b5de674b9", size = 1352420, upload-time = "2025-10-16T22:16:21.187Z" },
3598
+ { url = "https://files.pythonhosted.org/packages/a8/73/c4e271b3bce59724e291465cc936c37758886a4868787da0278b3b56b905/uvloop-0.22.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:3b7f102bf3cb1995cfeaee9321105e8f5da76fdb104cdad8986f85461a1b7b77", size = 748677, upload-time = "2025-10-16T22:16:22.558Z" },
3599
+ { url = "https://files.pythonhosted.org/packages/86/94/9fb7fad2f824d25f8ecac0d70b94d0d48107ad5ece03769a9c543444f78a/uvloop-0.22.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:53c85520781d84a4b8b230e24a5af5b0778efdb39142b424990ff1ef7c48ba21", size = 3753819, upload-time = "2025-10-16T22:16:23.903Z" },
3600
+ { url = "https://files.pythonhosted.org/packages/74/4f/256aca690709e9b008b7108bc85fba619a2bc37c6d80743d18abad16ee09/uvloop-0.22.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:56a2d1fae65fd82197cb8c53c367310b3eabe1bbb9fb5a04d28e3e3520e4f702", size = 3804529, upload-time = "2025-10-16T22:16:25.246Z" },
3601
+ { url = "https://files.pythonhosted.org/packages/7f/74/03c05ae4737e871923d21a76fe28b6aad57f5c03b6e6bfcfa5ad616013e4/uvloop-0.22.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:40631b049d5972c6755b06d0bfe8233b1bd9a8a6392d9d1c45c10b6f9e9b2733", size = 3621267, upload-time = "2025-10-16T22:16:26.819Z" },
3602
+ { url = "https://files.pythonhosted.org/packages/75/be/f8e590fe61d18b4a92070905497aec4c0e64ae1761498cad09023f3f4b3e/uvloop-0.22.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:535cc37b3a04f6cd2c1ef65fa1d370c9a35b6695df735fcff5427323f2cd5473", size = 3723105, upload-time = "2025-10-16T22:16:28.252Z" },
3603
+ { url = "https://files.pythonhosted.org/packages/3d/ff/7f72e8170be527b4977b033239a83a68d5c881cc4775fca255c677f7ac5d/uvloop-0.22.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:fe94b4564e865d968414598eea1a6de60adba0c040ba4ed05ac1300de402cd42", size = 1359936, upload-time = "2025-10-16T22:16:29.436Z" },
3604
+ { url = "https://files.pythonhosted.org/packages/c3/c6/e5d433f88fd54d81ef4be58b2b7b0cea13c442454a1db703a1eea0db1a59/uvloop-0.22.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:51eb9bd88391483410daad430813d982010f9c9c89512321f5b60e2cddbdddd6", size = 752769, upload-time = "2025-10-16T22:16:30.493Z" },
3605
+ { url = "https://files.pythonhosted.org/packages/24/68/a6ac446820273e71aa762fa21cdcc09861edd3536ff47c5cd3b7afb10eeb/uvloop-0.22.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:700e674a166ca5778255e0e1dc4e9d79ab2acc57b9171b79e65feba7184b3370", size = 4317413, upload-time = "2025-10-16T22:16:31.644Z" },
3606
+ { url = "https://files.pythonhosted.org/packages/5f/6f/e62b4dfc7ad6518e7eff2516f680d02a0f6eb62c0c212e152ca708a0085e/uvloop-0.22.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7b5b1ac819a3f946d3b2ee07f09149578ae76066d70b44df3fa990add49a82e4", size = 4426307, upload-time = "2025-10-16T22:16:32.917Z" },
3607
+ { url = "https://files.pythonhosted.org/packages/90/60/97362554ac21e20e81bcef1150cb2a7e4ffdaf8ea1e5b2e8bf7a053caa18/uvloop-0.22.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e047cc068570bac9866237739607d1313b9253c3051ad84738cbb095be0537b2", size = 4131970, upload-time = "2025-10-16T22:16:34.015Z" },
3608
+ { url = "https://files.pythonhosted.org/packages/99/39/6b3f7d234ba3964c428a6e40006340f53ba37993f46ed6e111c6e9141d18/uvloop-0.22.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:512fec6815e2dd45161054592441ef76c830eddaad55c8aa30952e6fe1ed07c0", size = 4296343, upload-time = "2025-10-16T22:16:35.149Z" },
3609
+ { url = "https://files.pythonhosted.org/packages/89/8c/182a2a593195bfd39842ea68ebc084e20c850806117213f5a299dfc513d9/uvloop-0.22.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:561577354eb94200d75aca23fbde86ee11be36b00e52a4eaf8f50fb0c86b7705", size = 1358611, upload-time = "2025-10-16T22:16:36.833Z" },
3610
+ { url = "https://files.pythonhosted.org/packages/d2/14/e301ee96a6dc95224b6f1162cd3312f6d1217be3907b79173b06785f2fe7/uvloop-0.22.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:1cdf5192ab3e674ca26da2eada35b288d2fa49fdd0f357a19f0e7c4e7d5077c8", size = 751811, upload-time = "2025-10-16T22:16:38.275Z" },
3611
+ { url = "https://files.pythonhosted.org/packages/b7/02/654426ce265ac19e2980bfd9ea6590ca96a56f10c76e63801a2df01c0486/uvloop-0.22.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6e2ea3d6190a2968f4a14a23019d3b16870dd2190cd69c8180f7c632d21de68d", size = 4288562, upload-time = "2025-10-16T22:16:39.375Z" },
3612
+ { url = "https://files.pythonhosted.org/packages/15/c0/0be24758891ef825f2065cd5db8741aaddabe3e248ee6acc5e8a80f04005/uvloop-0.22.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0530a5fbad9c9e4ee3f2b33b148c6a64d47bbad8000ea63704fa8260f4cf728e", size = 4366890, upload-time = "2025-10-16T22:16:40.547Z" },
3613
+ { url = "https://files.pythonhosted.org/packages/d2/53/8369e5219a5855869bcee5f4d317f6da0e2c669aecf0ef7d371e3d084449/uvloop-0.22.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:bc5ef13bbc10b5335792360623cc378d52d7e62c2de64660616478c32cd0598e", size = 4119472, upload-time = "2025-10-16T22:16:41.694Z" },
3614
+ { url = "https://files.pythonhosted.org/packages/f8/ba/d69adbe699b768f6b29a5eec7b47dd610bd17a69de51b251126a801369ea/uvloop-0.22.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:1f38ec5e3f18c8a10ded09742f7fb8de0108796eb673f30ce7762ce1b8550cad", size = 4239051, upload-time = "2025-10-16T22:16:43.224Z" },
3615
+ { url = "https://files.pythonhosted.org/packages/90/cd/b62bdeaa429758aee8de8b00ac0dd26593a9de93d302bff3d21439e9791d/uvloop-0.22.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:3879b88423ec7e97cd4eba2a443aa26ed4e59b45e6b76aabf13fe2f27023a142", size = 1362067, upload-time = "2025-10-16T22:16:44.503Z" },
3616
+ { url = "https://files.pythonhosted.org/packages/0d/f8/a132124dfda0777e489ca86732e85e69afcd1ff7686647000050ba670689/uvloop-0.22.1-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:4baa86acedf1d62115c1dc6ad1e17134476688f08c6efd8a2ab076e815665c74", size = 752423, upload-time = "2025-10-16T22:16:45.968Z" },
3617
+ { url = "https://files.pythonhosted.org/packages/a3/94/94af78c156f88da4b3a733773ad5ba0b164393e357cc4bd0ab2e2677a7d6/uvloop-0.22.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:297c27d8003520596236bdb2335e6b3f649480bd09e00d1e3a99144b691d2a35", size = 4272437, upload-time = "2025-10-16T22:16:47.451Z" },
3618
+ { url = "https://files.pythonhosted.org/packages/b5/35/60249e9fd07b32c665192cec7af29e06c7cd96fa1d08b84f012a56a0b38e/uvloop-0.22.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c1955d5a1dd43198244d47664a5858082a3239766a839b2102a269aaff7a4e25", size = 4292101, upload-time = "2025-10-16T22:16:49.318Z" },
3619
+ { url = "https://files.pythonhosted.org/packages/02/62/67d382dfcb25d0a98ce73c11ed1a6fba5037a1a1d533dcbb7cab033a2636/uvloop-0.22.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:b31dc2fccbd42adc73bc4e7cdbae4fc5086cf378979e53ca5d0301838c5682c6", size = 4114158, upload-time = "2025-10-16T22:16:50.517Z" },
3620
+ { url = "https://files.pythonhosted.org/packages/f0/7a/f1171b4a882a5d13c8b7576f348acfe6074d72eaf52cccef752f748d4a9f/uvloop-0.22.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:93f617675b2d03af4e72a5333ef89450dfaa5321303ede6e67ba9c9d26878079", size = 4177360, upload-time = "2025-10-16T22:16:52.646Z" },
3621
+ { url = "https://files.pythonhosted.org/packages/79/7b/b01414f31546caf0919da80ad57cbfe24c56b151d12af68cee1b04922ca8/uvloop-0.22.1-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:37554f70528f60cad66945b885eb01f1bb514f132d92b6eeed1c90fd54ed6289", size = 1454790, upload-time = "2025-10-16T22:16:54.355Z" },
3622
+ { url = "https://files.pythonhosted.org/packages/d4/31/0bb232318dd838cad3fa8fb0c68c8b40e1145b32025581975e18b11fab40/uvloop-0.22.1-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:b76324e2dc033a0b2f435f33eb88ff9913c156ef78e153fb210e03c13da746b3", size = 796783, upload-time = "2025-10-16T22:16:55.906Z" },
3623
+ { url = "https://files.pythonhosted.org/packages/42/38/c9b09f3271a7a723a5de69f8e237ab8e7803183131bc57c890db0b6bb872/uvloop-0.22.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:badb4d8e58ee08dad957002027830d5c3b06aea446a6a3744483c2b3b745345c", size = 4647548, upload-time = "2025-10-16T22:16:57.008Z" },
3624
+ { url = "https://files.pythonhosted.org/packages/c1/37/945b4ca0ac27e3dc4952642d4c900edd030b3da6c9634875af6e13ae80e5/uvloop-0.22.1-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b91328c72635f6f9e0282e4a57da7470c7350ab1c9f48546c0f2866205349d21", size = 4467065, upload-time = "2025-10-16T22:16:58.206Z" },
3625
+ { url = "https://files.pythonhosted.org/packages/97/cc/48d232f33d60e2e2e0b42f4e73455b146b76ebe216487e862700457fbf3c/uvloop-0.22.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:daf620c2995d193449393d6c62131b3fbd40a63bf7b307a1527856ace637fe88", size = 4328384, upload-time = "2025-10-16T22:16:59.36Z" },
3626
+ { url = "https://files.pythonhosted.org/packages/e4/16/c1fd27e9549f3c4baf1dc9c20c456cd2f822dbf8de9f463824b0c0357e06/uvloop-0.22.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:6cde23eeda1a25c75b2e07d39970f3374105d5eafbaab2a4482be82f272d5a5e", size = 4296730, upload-time = "2025-10-16T22:17:00.744Z" },
3627
+ ]
3628
+
3629
  [[package]]
3630
  name = "watchfiles"
3631
  version = "1.1.1"