Spaces:
Sleeping
Sleeping
Update CyberSecurity_OWASP environment
Browse files- .agents/skills/openenv-cli/SKILL.md +18 -0
- .agents/skills/trackio/SKILL.md +124 -0
- .agents/skills/trackio/alerts.md +199 -0
- .agents/skills/trackio/logging_metrics.md +206 -0
- .agents/skills/trackio/retrieving_metrics.md +298 -0
- .agents/skills/trackio/storage_schema.md +159 -0
- .gitignore +9 -0
- 01_ARCHITECTURE.md +12 -1
- Dockerfile +1 -0
- README.md +41 -1
- assets/architecture_diagram.mmd +51 -0
- assets/architecture_diagram.svg +90 -0
- assets/env_rl_training_flow_diagram.mmd +26 -0
- assets/env_rl_training_flow_diagram.svg +92 -0
- models.py +9 -0
- pyproject.toml +1 -0
- scenario_compiler.py +15 -40
- scripts/modal_ephemeral_train.py +45 -5
- scripts/modal_run_ephemeral.sh +6 -1
- scripts/modal_train_grpo.py +267 -75
- scripts/smoke_test.sh +1 -1
- scripts/track_pytest.py +59 -0
- server/Dockerfile +1 -0
- server/action_tools.py +73 -0
- server/adversarial_designer.py +59 -0
- server/app.py +7 -0
- server/app_sandbox.py +139 -0
- server/authz_oracle.py +92 -0
- server/curriculum.py +99 -0
- server/episode_logger.py +66 -0
- server/reward_engine.py +10 -31
- server/scenario_factory.py +134 -0
- server/verifier.py +81 -0
- tests/test_web_interface.py +47 -0
- training/eval_before_after.py +19 -1
- training/trackio_utils.py +132 -1
- training/train_grpo.py +10 -0
- uv.lock +120 -0
.agents/skills/openenv-cli/SKILL.md
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: openenv-cli
|
| 3 |
+
description: "OpenEnv CLI (`openenv`) for scaffolding, validating, building, and pushing OpenEnv environments."
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
Install: `pip install openenv-core`
|
| 7 |
+
|
| 8 |
+
The OpenEnv CLI command `openenv` is available.
|
| 9 |
+
Use `openenv --help` to view available commands.
|
| 10 |
+
|
| 11 |
+
Generated with `openenv-core v0.2.3`. Run `openenv skills add --force` to regenerate.
|
| 12 |
+
|
| 13 |
+
## Tips
|
| 14 |
+
|
| 15 |
+
- Start with `openenv init <env_name>` to scaffold a new environment
|
| 16 |
+
- Validate projects with `openenv validate`
|
| 17 |
+
- Build and deploy with `openenv build` and `openenv push`
|
| 18 |
+
- Use `openenv <command> --help` for command-specific options
|
.agents/skills/trackio/SKILL.md
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: hugging-face-trackio
|
| 3 |
+
description: Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API), firing alerts for training diagnostics, or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, alerts with webhooks, HF Space syncing, and JSON output for automation.
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Trackio - Experiment Tracking for ML Training
|
| 7 |
+
|
| 8 |
+
Trackio is an experiment tracking library for logging and visualizing ML training metrics. It syncs to Hugging Face Spaces for real-time monitoring dashboards.
|
| 9 |
+
|
| 10 |
+
## Three Interfaces
|
| 11 |
+
|
| 12 |
+
| Task | Interface | Reference |
|
| 13 |
+
|------|-----------|-----------|
|
| 14 |
+
| **Logging metrics** during training | Python API | [logging_metrics.md](logging_metrics.md) |
|
| 15 |
+
| **Firing alerts** for training diagnostics | Python API | [alerts.md](alerts.md) |
|
| 16 |
+
| **Retrieving metrics & alerts** after/during training | CLI | [retrieving_metrics.md](retrieving_metrics.md) |
|
| 17 |
+
| **Inspecting storage schema and running direct SQL** | CLI | [storage_schema.md](storage_schema.md) |
|
| 18 |
+
|
| 19 |
+
## When to Use Each
|
| 20 |
+
|
| 21 |
+
### Python API → Logging
|
| 22 |
+
|
| 23 |
+
Use `import trackio` in your training scripts to log metrics:
|
| 24 |
+
|
| 25 |
+
- Initialize tracking with `trackio.init()`
|
| 26 |
+
- Log metrics with `trackio.log()` or use TRL's `report_to="trackio"`
|
| 27 |
+
- Finalize with `trackio.finish()`
|
| 28 |
+
|
| 29 |
+
**Key concept**: For remote/cloud training, pass `space_id` — metrics sync to a Space dashboard so they persist after the instance terminates.
|
| 30 |
+
|
| 31 |
+
→ See [logging_metrics.md](logging_metrics.md) for setup, TRL integration, and configuration options.
|
| 32 |
+
|
| 33 |
+
### Python API → Alerts
|
| 34 |
+
|
| 35 |
+
Insert `trackio.alert()` calls in training code to flag important events — like inserting print statements for debugging, but structured and queryable:
|
| 36 |
+
|
| 37 |
+
- `trackio.alert(title="...", level=trackio.AlertLevel.WARN)` — fire an alert
|
| 38 |
+
- Three severity levels: `INFO`, `WARN`, `ERROR`
|
| 39 |
+
- Alerts are printed to terminal, stored in the database, shown in the dashboard, and optionally sent to webhooks (Slack/Discord)
|
| 40 |
+
|
| 41 |
+
**Key concept for LLM agents**: Alerts are the primary mechanism for autonomous experiment iteration. An agent should insert alerts into training code for diagnostic conditions (loss spikes, NaN gradients, low accuracy, training stalls). Since alerts are printed to the terminal, an agent that is watching the training script's output will see them automatically. For background or detached runs, the agent can poll via CLI instead.
|
| 42 |
+
|
| 43 |
+
→ See [alerts.md](alerts.md) for the full alerts API, webhook setup, and autonomous agent workflows.
|
| 44 |
+
|
| 45 |
+
### CLI → Retrieving
|
| 46 |
+
|
| 47 |
+
Use the `trackio` command to query logged metrics and alerts:
|
| 48 |
+
|
| 49 |
+
- `trackio list projects/runs/metrics` — discover what's available
|
| 50 |
+
- `trackio get project/run/metric` — retrieve summaries and values
|
| 51 |
+
- `trackio query project --project <name> --sql "SELECT ..."` — run catch-all read-only SQL
|
| 52 |
+
- `trackio list alerts --project <name> --json` — retrieve alerts
|
| 53 |
+
- `trackio show` — launch the dashboard
|
| 54 |
+
- `trackio sync` — sync to HF Space
|
| 55 |
+
|
| 56 |
+
**Key concept**: Add `--json` for programmatic output suitable for automation and LLM agents.
|
| 57 |
+
|
| 58 |
+
**Remote Spaces**: Add `--space <space_id_or_url>` to any `list`/`get`/`query` command to query a remote HF Space instead of local data. Use `--hf-token` for private Spaces.
|
| 59 |
+
|
| 60 |
+
→ See [retrieving_metrics.md](retrieving_metrics.md) for all commands, workflows, and JSON output formats.
|
| 61 |
+
→ See [storage_schema.md](storage_schema.md) for SQLite tables, parquet layout, and direct query examples.
|
| 62 |
+
|
| 63 |
+
## Minimal Logging Setup
|
| 64 |
+
|
| 65 |
+
```python
|
| 66 |
+
import trackio
|
| 67 |
+
|
| 68 |
+
trackio.init(project="my-project", space_id="username/trackio")
|
| 69 |
+
trackio.log({"loss": 0.1, "accuracy": 0.9})
|
| 70 |
+
trackio.log({"loss": 0.09, "accuracy": 0.91})
|
| 71 |
+
trackio.finish()
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
### Minimal Retrieval
|
| 75 |
+
|
| 76 |
+
```bash
|
| 77 |
+
trackio list projects --json
|
| 78 |
+
trackio get metric --project my-project --run my-run --metric loss --json
|
| 79 |
+
trackio query project --project my-project --sql "SELECT name FROM sqlite_master WHERE type = 'table'" --json
|
| 80 |
+
|
| 81 |
+
# Query a remote Space
|
| 82 |
+
trackio list projects --space username/my-space --json
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
## Autonomous ML Experiment Workflow
|
| 86 |
+
|
| 87 |
+
When running experiments autonomously as an LLM agent, the recommended workflow is:
|
| 88 |
+
|
| 89 |
+
1. **Set up training with alerts** — insert `trackio.alert()` calls for diagnostic conditions
|
| 90 |
+
2. **Launch training** — run the script in the background
|
| 91 |
+
3. **Poll for alerts** — use `trackio list alerts --project <name> --json --since <timestamp>` to check for new alerts
|
| 92 |
+
4. **Read metrics** — use `trackio get metric ...` to inspect specific values
|
| 93 |
+
5. **Iterate** — based on alerts and metrics, stop the run, adjust hyperparameters, and launch a new run
|
| 94 |
+
|
| 95 |
+
```python
|
| 96 |
+
import trackio
|
| 97 |
+
|
| 98 |
+
trackio.init(project="my-project", config={"lr": 1e-4})
|
| 99 |
+
|
| 100 |
+
for step in range(num_steps):
|
| 101 |
+
loss = train_step()
|
| 102 |
+
trackio.log({"loss": loss, "step": step})
|
| 103 |
+
|
| 104 |
+
if step > 100 and loss > 5.0:
|
| 105 |
+
trackio.alert(
|
| 106 |
+
title="Loss divergence",
|
| 107 |
+
text=f"Loss {loss:.4f} still high after {step} steps",
|
| 108 |
+
level=trackio.AlertLevel.ERROR,
|
| 109 |
+
)
|
| 110 |
+
if step > 0 and abs(loss) < 1e-8:
|
| 111 |
+
trackio.alert(
|
| 112 |
+
title="Vanishing loss",
|
| 113 |
+
text="Loss near zero — possible gradient collapse",
|
| 114 |
+
level=trackio.AlertLevel.WARN,
|
| 115 |
+
)
|
| 116 |
+
|
| 117 |
+
trackio.finish()
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
Then poll from a separate terminal/process:
|
| 121 |
+
|
| 122 |
+
```bash
|
| 123 |
+
trackio list alerts --project my-project --json --since "2025-01-01T00:00:00"
|
| 124 |
+
```
|
.agents/skills/trackio/alerts.md
ADDED
|
@@ -0,0 +1,199 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Trackio Alerts
|
| 2 |
+
|
| 3 |
+
Alerts let you flag important training events directly from code. They are the primary mechanism for LLM agents to diagnose runs and iterate autonomously on ML experiments.
|
| 4 |
+
|
| 5 |
+
Alerts are printed to the terminal, stored in the database, displayed in the dashboard, and optionally sent to webhooks (Slack/Discord).
|
| 6 |
+
|
| 7 |
+
<img width="2972" height="1694" alt="image" src="https://github.com/user-attachments/assets/02d938f8-51a9-4706-85c4-d95b7645bcf4" />
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
## Core API
|
| 11 |
+
|
| 12 |
+
### trackio.alert()
|
| 13 |
+
|
| 14 |
+
```python
|
| 15 |
+
trackio.alert(
|
| 16 |
+
title="Loss divergence", # Short title (required)
|
| 17 |
+
text="Loss 5.2 still high after 200 steps", # Detailed description (optional)
|
| 18 |
+
level=trackio.AlertLevel.WARN, # INFO, WARN, or ERROR (default: WARN)
|
| 19 |
+
webhook_url="https://hooks.slack.com/...", # Per-alert webhook override (optional)
|
| 20 |
+
)
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
### Alert Levels
|
| 24 |
+
|
| 25 |
+
| Level | Usage |
|
| 26 |
+
|-------|-------|
|
| 27 |
+
| `trackio.AlertLevel.INFO` | Informational milestones (checkpoints saved, eval completed) |
|
| 28 |
+
| `trackio.AlertLevel.WARN` | Potential issues (loss plateau, low accuracy, high gradient norm) |
|
| 29 |
+
| `trackio.AlertLevel.ERROR` | Critical failures (NaN loss, divergence, OOM) |
|
| 30 |
+
|
| 31 |
+
### Webhook Support
|
| 32 |
+
|
| 33 |
+
Set a global webhook URL via `trackio.init()` or the `TRACKIO_WEBHOOK_URL` environment variable. Alerts are auto-formatted for Slack and Discord URLs.
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
trackio.init(
|
| 37 |
+
project="my-project",
|
| 38 |
+
webhook_url="https://hooks.slack.com/services/...",
|
| 39 |
+
webhook_min_level=trackio.AlertLevel.WARN, # Only send WARN+ to webhook
|
| 40 |
+
)
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
Per-alert override:
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
trackio.alert(
|
| 47 |
+
title="Critical failure",
|
| 48 |
+
level=trackio.AlertLevel.ERROR,
|
| 49 |
+
webhook_url="https://hooks.slack.com/services/...", # Overrides global URL
|
| 50 |
+
)
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
Environment variables:
|
| 54 |
+
- `TRACKIO_WEBHOOK_URL` — global webhook URL
|
| 55 |
+
- `TRACKIO_WEBHOOK_MIN_LEVEL` — minimum level for webhook delivery (`info`, `warn`, `error`)
|
| 56 |
+
|
| 57 |
+
## Retrieving Alerts (CLI)
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
# List all alerts for a project
|
| 61 |
+
trackio list alerts --project my-project --json
|
| 62 |
+
|
| 63 |
+
# Filter by run or level
|
| 64 |
+
trackio list alerts --project my-project --run my-run --level error --json
|
| 65 |
+
|
| 66 |
+
# Poll for new alerts since a timestamp (efficient for agents)
|
| 67 |
+
trackio list alerts --project my-project --json --since "2025-06-01T12:00:00"
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
### JSON Output Structure
|
| 71 |
+
|
| 72 |
+
```json
|
| 73 |
+
{
|
| 74 |
+
"project": "my-project",
|
| 75 |
+
"run": null,
|
| 76 |
+
"level": null,
|
| 77 |
+
"since": "2025-06-01T12:00:00",
|
| 78 |
+
"alerts": [
|
| 79 |
+
{
|
| 80 |
+
"run": "run-name",
|
| 81 |
+
"title": "Loss divergence",
|
| 82 |
+
"text": "Loss 5.2 still high after 200 steps",
|
| 83 |
+
"level": "warn",
|
| 84 |
+
"step": 200,
|
| 85 |
+
"timestamp": "2025-06-01T12:05:30"
|
| 86 |
+
}
|
| 87 |
+
]
|
| 88 |
+
}
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
## Autonomous Agent Workflow
|
| 92 |
+
|
| 93 |
+
The recommended pattern for an LLM agent running ML experiments:
|
| 94 |
+
|
| 95 |
+
### 1. Insert Alerts Into Training Code
|
| 96 |
+
|
| 97 |
+
Add diagnostic `trackio.alert()` calls for conditions the agent should react to:
|
| 98 |
+
|
| 99 |
+
```python
|
| 100 |
+
import trackio
|
| 101 |
+
|
| 102 |
+
trackio.init(project="hyperparam-sweep", config={"lr": lr, "batch_size": bs})
|
| 103 |
+
|
| 104 |
+
for step in range(num_steps):
|
| 105 |
+
loss = train_step()
|
| 106 |
+
trackio.log({"loss": loss, "step": step})
|
| 107 |
+
|
| 108 |
+
if step > 200 and loss > 5.0:
|
| 109 |
+
trackio.alert(
|
| 110 |
+
title="Loss divergence",
|
| 111 |
+
text=f"Loss {loss:.4f} still above 5.0 after {step} steps — learning rate may be too high",
|
| 112 |
+
level=trackio.AlertLevel.ERROR,
|
| 113 |
+
)
|
| 114 |
+
|
| 115 |
+
if step > 500 and loss_delta < 0.001:
|
| 116 |
+
trackio.alert(
|
| 117 |
+
title="Training stall",
|
| 118 |
+
text=f"Loss barely changed over last 100 steps (delta={loss_delta:.6f})",
|
| 119 |
+
level=trackio.AlertLevel.WARN,
|
| 120 |
+
)
|
| 121 |
+
|
| 122 |
+
if math.isnan(loss):
|
| 123 |
+
trackio.alert(
|
| 124 |
+
title="NaN loss",
|
| 125 |
+
text="Loss became NaN — training is broken",
|
| 126 |
+
level=trackio.AlertLevel.ERROR,
|
| 127 |
+
)
|
| 128 |
+
break
|
| 129 |
+
|
| 130 |
+
trackio.finish()
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
### 2. Monitor Alerts
|
| 134 |
+
|
| 135 |
+
Alerts are automatically printed to the terminal when fired. If the agent is watching the training script's output (e.g. running in the foreground or tailing logs), it will see alerts immediately — no polling needed.
|
| 136 |
+
|
| 137 |
+
For background or detached runs, poll for alerts via CLI:
|
| 138 |
+
|
| 139 |
+
```bash
|
| 140 |
+
# Poll for alerts (run periodically)
|
| 141 |
+
trackio list alerts --project hyperparam-sweep --json --since "2025-06-01T00:00:00"
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
### 3. Inspect Metrics Around the Alert
|
| 145 |
+
|
| 146 |
+
When an alert fires, use `trackio get snapshot` to see all metrics at that point:
|
| 147 |
+
|
| 148 |
+
```bash
|
| 149 |
+
# Alert fired at step 200 — get all metrics in a ±5 step window
|
| 150 |
+
trackio get snapshot --project hyperparam-sweep --run run-1 --around 200 --window 5 --json
|
| 151 |
+
|
| 152 |
+
# Or inspect a single metric around the alert's timestamp
|
| 153 |
+
trackio get metric --project hyperparam-sweep --run run-1 --metric loss --around 200 --window 10 --json
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
### 4. React and Iterate
|
| 157 |
+
|
| 158 |
+
Based on alerts:
|
| 159 |
+
- **ERROR alerts** → stop the run, adjust hyperparameters, relaunch
|
| 160 |
+
- **WARN alerts** → inspect metrics with `trackio get snapshot ...`, decide whether to intervene
|
| 161 |
+
- **INFO alerts** → note progress, continue monitoring
|
| 162 |
+
|
| 163 |
+
### 5. Compare Across Runs
|
| 164 |
+
|
| 165 |
+
```bash
|
| 166 |
+
# Check metrics from previous runs
|
| 167 |
+
trackio get run --project hyperparam-sweep --run run-1 --json
|
| 168 |
+
trackio get metric --project hyperparam-sweep --run run-1 --metric loss --json
|
| 169 |
+
|
| 170 |
+
# Launch new run with adjusted config
|
| 171 |
+
python train.py --lr 5e-5
|
| 172 |
+
```
|
| 173 |
+
|
| 174 |
+
## Using Alerts with Transformers / TRL
|
| 175 |
+
|
| 176 |
+
When using `report_to="trackio"`, you don't control the training loop directly. Use a `TrainerCallback` to fire alerts:
|
| 177 |
+
|
| 178 |
+
```python
|
| 179 |
+
from transformers import TrainerCallback
|
| 180 |
+
|
| 181 |
+
class AlertCallback(TrainerCallback):
|
| 182 |
+
def on_log(self, args, state, control, logs=None, **kwargs):
|
| 183 |
+
if "trackio" not in args.report_to:
|
| 184 |
+
return
|
| 185 |
+
if logs and "loss" in logs:
|
| 186 |
+
if logs["loss"] > 5.0 and state.global_step > 100:
|
| 187 |
+
trackio.alert(
|
| 188 |
+
title="High loss",
|
| 189 |
+
text=f"Loss {logs['loss']:.4f} at step {state.global_step}",
|
| 190 |
+
level=trackio.AlertLevel.ERROR,
|
| 191 |
+
)
|
| 192 |
+
|
| 193 |
+
trainer = SFTTrainer(
|
| 194 |
+
model=model,
|
| 195 |
+
args=SFTConfig(output_dir="./out", report_to="trackio"),
|
| 196 |
+
callbacks=[AlertCallback()],
|
| 197 |
+
...
|
| 198 |
+
)
|
| 199 |
+
```
|
.agents/skills/trackio/logging_metrics.md
ADDED
|
@@ -0,0 +1,206 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Logging Metrics with Trackio
|
| 2 |
+
|
| 3 |
+
**Trackio** is a lightweight, free experiment tracking library from Hugging Face. It provides a wandb-compatible API for logging metrics with local-first design.
|
| 4 |
+
|
| 5 |
+
- **GitHub**: [gradio-app/trackio](https://github.com/gradio-app/trackio)
|
| 6 |
+
- **Docs**: [huggingface.co/docs/trackio](https://huggingface.co/docs/trackio/index)
|
| 7 |
+
|
| 8 |
+
## Installation
|
| 9 |
+
|
| 10 |
+
```bash
|
| 11 |
+
pip install trackio
|
| 12 |
+
# or
|
| 13 |
+
uv pip install trackio
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
## Core API
|
| 17 |
+
|
| 18 |
+
### Basic Usage
|
| 19 |
+
|
| 20 |
+
```python
|
| 21 |
+
import trackio
|
| 22 |
+
|
| 23 |
+
# Initialize a run
|
| 24 |
+
trackio.init(
|
| 25 |
+
project="my-project",
|
| 26 |
+
config={"learning_rate": 0.001, "epochs": 10}
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
# Log metrics during training
|
| 30 |
+
for epoch in range(10):
|
| 31 |
+
loss = train_epoch()
|
| 32 |
+
trackio.log({"loss": loss, "epoch": epoch})
|
| 33 |
+
|
| 34 |
+
# Finalize the run
|
| 35 |
+
trackio.finish()
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
### Key Functions
|
| 39 |
+
|
| 40 |
+
| Function | Purpose |
|
| 41 |
+
|----------|---------|
|
| 42 |
+
| `trackio.init(...)` | Start a new tracking run |
|
| 43 |
+
| `trackio.log(dict)` | Log metrics (called repeatedly during training) |
|
| 44 |
+
| `trackio.finish()` | Finalize run and ensure all metrics are saved |
|
| 45 |
+
| `trackio.show()` | Launch the local dashboard |
|
| 46 |
+
| `trackio.sync(...)` | Sync local project to HF Space |
|
| 47 |
+
|
| 48 |
+
## trackio.init() Parameters
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
trackio.init(
|
| 52 |
+
project="my-project", # Project name (groups runs together)
|
| 53 |
+
name="run-name", # Optional: name for this specific run
|
| 54 |
+
config={...}, # Hyperparameters and config to log
|
| 55 |
+
space_id="username/trackio", # Optional: sync to HF Space for remote dashboard
|
| 56 |
+
group="experiment-group", # Optional: group related runs
|
| 57 |
+
)
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
## Local vs Remote Dashboard
|
| 61 |
+
|
| 62 |
+
### Local (Default)
|
| 63 |
+
|
| 64 |
+
By default, trackio stores metrics in a local SQLite database and runs the dashboard locally:
|
| 65 |
+
|
| 66 |
+
```python
|
| 67 |
+
trackio.init(project="my-project")
|
| 68 |
+
# ... training ...
|
| 69 |
+
trackio.finish()
|
| 70 |
+
|
| 71 |
+
# Launch local dashboard
|
| 72 |
+
trackio.show()
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
Or from terminal:
|
| 76 |
+
```bash
|
| 77 |
+
trackio show --project my-project
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
### Remote (HF Space)
|
| 81 |
+
|
| 82 |
+
Pass `space_id` to sync metrics to a Hugging Face Space for persistent, shareable dashboards:
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
trackio.init(
|
| 86 |
+
project="my-project",
|
| 87 |
+
space_id="username/trackio" # Auto-creates Space if it doesn't exist
|
| 88 |
+
)
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
⚠️ **For remote training** (cloud GPUs, HF Jobs, etc.): Always use `space_id` since local storage is lost when the instance terminates.
|
| 92 |
+
|
| 93 |
+
### Sync Local to Remote
|
| 94 |
+
|
| 95 |
+
Sync existing local projects to a Space:
|
| 96 |
+
|
| 97 |
+
```python
|
| 98 |
+
trackio.sync(project="my-project", space_id="username/my-experiments")
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
## wandb Compatibility
|
| 102 |
+
|
| 103 |
+
Trackio is API-compatible with wandb. Drop-in replacement:
|
| 104 |
+
|
| 105 |
+
```python
|
| 106 |
+
import trackio as wandb
|
| 107 |
+
|
| 108 |
+
wandb.init(project="my-project")
|
| 109 |
+
wandb.log({"loss": 0.5})
|
| 110 |
+
wandb.finish()
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
## TRL Integration
|
| 114 |
+
|
| 115 |
+
When using TRL trainers, set `report_to="trackio"` for automatic metric logging:
|
| 116 |
+
|
| 117 |
+
```python
|
| 118 |
+
from trl import SFTConfig, SFTTrainer
|
| 119 |
+
import trackio
|
| 120 |
+
|
| 121 |
+
trackio.init(
|
| 122 |
+
project="sft-training",
|
| 123 |
+
space_id="username/trackio",
|
| 124 |
+
config={"model": "Qwen/Qwen2.5-0.5B", "dataset": "trl-lib/Capybara"}
|
| 125 |
+
)
|
| 126 |
+
|
| 127 |
+
config = SFTConfig(
|
| 128 |
+
output_dir="./output",
|
| 129 |
+
report_to="trackio", # Automatic metric logging
|
| 130 |
+
# ... other config
|
| 131 |
+
)
|
| 132 |
+
|
| 133 |
+
trainer = SFTTrainer(model=model, args=config, ...)
|
| 134 |
+
trainer.train()
|
| 135 |
+
trackio.finish()
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
## What Gets Logged
|
| 139 |
+
|
| 140 |
+
With TRL/Transformers integration, trackio automatically captures:
|
| 141 |
+
- Training loss
|
| 142 |
+
- Learning rate
|
| 143 |
+
- Eval metrics
|
| 144 |
+
- Training throughput
|
| 145 |
+
|
| 146 |
+
For manual logging, log any numeric metrics:
|
| 147 |
+
|
| 148 |
+
```python
|
| 149 |
+
trackio.log({
|
| 150 |
+
"train_loss": 0.5,
|
| 151 |
+
"train_accuracy": 0.85,
|
| 152 |
+
"val_loss": 0.4,
|
| 153 |
+
"val_accuracy": 0.88,
|
| 154 |
+
"epoch": 1
|
| 155 |
+
})
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
## Grouping Runs
|
| 159 |
+
|
| 160 |
+
Use `group` to organize related experiments in the dashboard sidebar:
|
| 161 |
+
|
| 162 |
+
```python
|
| 163 |
+
# Group by experiment type
|
| 164 |
+
trackio.init(project="my-project", name="baseline-v1", group="baseline")
|
| 165 |
+
trackio.init(project="my-project", name="augmented-v1", group="augmented")
|
| 166 |
+
|
| 167 |
+
# Group by hyperparameter
|
| 168 |
+
trackio.init(project="hyperparam-sweep", name="lr-0.001", group="lr_0.001")
|
| 169 |
+
trackio.init(project="hyperparam-sweep", name="lr-0.01", group="lr_0.01")
|
| 170 |
+
```
|
| 171 |
+
|
| 172 |
+
## Configuration Best Practices
|
| 173 |
+
|
| 174 |
+
Keep config minimal — only log what's useful for comparing runs:
|
| 175 |
+
|
| 176 |
+
```python
|
| 177 |
+
trackio.init(
|
| 178 |
+
project="qwen-sft-capybara",
|
| 179 |
+
name="baseline-lr2e5",
|
| 180 |
+
config={
|
| 181 |
+
"model": "Qwen/Qwen2.5-0.5B",
|
| 182 |
+
"dataset": "trl-lib/Capybara",
|
| 183 |
+
"learning_rate": 2e-5,
|
| 184 |
+
"num_epochs": 3,
|
| 185 |
+
"batch_size": 8,
|
| 186 |
+
}
|
| 187 |
+
)
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
+
## Embedding Dashboards
|
| 191 |
+
|
| 192 |
+
Embed Space dashboards in websites with query parameters:
|
| 193 |
+
|
| 194 |
+
```html
|
| 195 |
+
<iframe
|
| 196 |
+
src="https://username-trackio.hf.space/?project=my-project&metrics=train_loss,val_loss&sidebar=hidden"
|
| 197 |
+
style="width:1600px; height:500px; border:0;">
|
| 198 |
+
</iframe>
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
Query parameters:
|
| 202 |
+
- `project`: Filter to specific project
|
| 203 |
+
- `metrics`: Comma-separated metric names to show
|
| 204 |
+
- `sidebar`: `hidden` or `collapsed`
|
| 205 |
+
- `smoothing`: 0-20 (smoothing slider value)
|
| 206 |
+
- `xmin`, `xmax`: X-axis limits
|
.agents/skills/trackio/retrieving_metrics.md
ADDED
|
@@ -0,0 +1,298 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Retrieving Metrics with Trackio CLI
|
| 2 |
+
|
| 3 |
+
The `trackio` CLI provides direct terminal access to query Trackio experiment tracking data without needing to start the MCP server. Commands work against local data by default, or against a remote HF Space when `--space` is provided.
|
| 4 |
+
|
| 5 |
+
## Quick Command Reference
|
| 6 |
+
|
| 7 |
+
| Task | Command |
|
| 8 |
+
|------|---------|
|
| 9 |
+
| List projects | `trackio list projects` |
|
| 10 |
+
| List runs | `trackio list runs --project <name>` |
|
| 11 |
+
| List metrics | `trackio list metrics --project <name> --run <name>` |
|
| 12 |
+
| List system metrics | `trackio list system-metrics --project <name> --run <name>` |
|
| 13 |
+
| List alerts | `trackio list alerts --project <name> [--run <name>] [--level <level>] [--since <timestamp>]` |
|
| 14 |
+
| Get project summary | `trackio get project --project <name>` |
|
| 15 |
+
| Get run summary | `trackio get run --project <name> --run <name>` |
|
| 16 |
+
| Get metric values | `trackio get metric --project <name> --run <name> --metric <name>` |
|
| 17 |
+
| Get metric at step | `trackio get metric ... --metric <name> --step <N>` |
|
| 18 |
+
| Get metric around step | `trackio get metric ... --metric <name> --around <N> --window <W>` |
|
| 19 |
+
| Get all metrics snapshot | `trackio get snapshot --project <name> --run <name> --step <N>` |
|
| 20 |
+
| Get system metrics | `trackio get system-metric --project <name> --run <name>` |
|
| 21 |
+
| Run direct SQL | `trackio query project --project <name> --sql "SELECT ..."` |
|
| 22 |
+
| Query remote Space | `trackio list projects --space <space_id_or_url>` |
|
| 23 |
+
| Show dashboard | `trackio show [--project <name>]` |
|
| 24 |
+
| Sync to Space | `trackio sync --project <name> --space-id <space_id>` |
|
| 25 |
+
|
| 26 |
+
## Core Commands
|
| 27 |
+
|
| 28 |
+
### List Commands
|
| 29 |
+
|
| 30 |
+
```bash
|
| 31 |
+
trackio list projects # List all projects
|
| 32 |
+
trackio list projects --json # JSON output
|
| 33 |
+
|
| 34 |
+
trackio list runs --project <name> # List runs in project
|
| 35 |
+
trackio list runs --project <name> --json # JSON output
|
| 36 |
+
|
| 37 |
+
trackio list metrics --project <name> --run <name> # List metrics for run
|
| 38 |
+
trackio list metrics --project <name> --run <name> --json
|
| 39 |
+
|
| 40 |
+
trackio list system-metrics --project <name> --run <name> # List system metrics
|
| 41 |
+
trackio list system-metrics --project <name> --run <name> --json
|
| 42 |
+
|
| 43 |
+
trackio list alerts --project <name> # List alerts
|
| 44 |
+
trackio list alerts --project <name> --run <name> --json # Filter by run
|
| 45 |
+
trackio list alerts --project <name> --level error --json # Filter by level
|
| 46 |
+
trackio list alerts --project <name> --json --since <ts> # Poll since timestamp
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
### Get Commands
|
| 50 |
+
|
| 51 |
+
```bash
|
| 52 |
+
trackio get project --project <name> # Project summary
|
| 53 |
+
trackio get project --project <name> --json # JSON output
|
| 54 |
+
|
| 55 |
+
trackio get run --project <name> --run <name> # Run summary
|
| 56 |
+
trackio get run --project <name> --run <name> --json
|
| 57 |
+
|
| 58 |
+
trackio get metric --project <name> --run <name> --metric <name> # Metric values
|
| 59 |
+
trackio get metric --project <name> --run <name> --metric <name> --json
|
| 60 |
+
trackio get metric ... --metric <name> --step 200 # At exact step
|
| 61 |
+
trackio get metric ... --metric <name> --around 200 --window 10 # ±10 steps
|
| 62 |
+
trackio get metric ... --metric <name> --at-time <ts> --window 60 # ±60 seconds
|
| 63 |
+
|
| 64 |
+
trackio get snapshot --project <name> --run <name> --step 200 --json # All metrics at step
|
| 65 |
+
trackio get snapshot --project <name> --run <name> --around 200 --window 5 --json # Window
|
| 66 |
+
trackio get snapshot --project <name> --run <name> --at-time <ts> --window 60 --json
|
| 67 |
+
|
| 68 |
+
trackio get system-metric --project <name> --run <name> # All system metrics
|
| 69 |
+
trackio get system-metric --project <name> --run <name> --metric <name> # Specific metric
|
| 70 |
+
trackio get system-metric --project <name> --run <name> --json
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
### Query Command
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
trackio query project --project <name> --sql "SELECT name FROM sqlite_master WHERE type = 'table'"
|
| 77 |
+
trackio query project --project <name> --sql "PRAGMA table_info(metrics)" --json
|
| 78 |
+
trackio query project --project <name> --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name"
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
### Remote Space Queries
|
| 82 |
+
|
| 83 |
+
All `list`, `get`, and `query` commands support querying a remote HF Space with `--space`:
|
| 84 |
+
|
| 85 |
+
```bash
|
| 86 |
+
trackio list projects --space user/my-space # Space ID
|
| 87 |
+
trackio list projects --space https://user-my-space.hf.space # Space URL
|
| 88 |
+
trackio get metric --project <name> --run <name> --metric loss --space user/my-space
|
| 89 |
+
trackio query project --project <name> --sql "SELECT COUNT(*) AS num_alerts FROM alerts" --space user/my-space
|
| 90 |
+
trackio list projects --space user/private-space --hf-token hf_xxx # Private Space
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
### Dashboard Commands
|
| 94 |
+
|
| 95 |
+
```bash
|
| 96 |
+
trackio show # Launch dashboard
|
| 97 |
+
trackio show --project <name> # Load specific project
|
| 98 |
+
trackio show --theme <theme> # Custom theme
|
| 99 |
+
trackio show --mcp-server # Enable MCP server
|
| 100 |
+
trackio show --color-palette "#FF0000,#00FF00" # Custom colors
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
### Sync Commands
|
| 104 |
+
|
| 105 |
+
```bash
|
| 106 |
+
trackio sync --project <name> --space-id <space_id> # Sync to HF Space
|
| 107 |
+
trackio sync --project <name> --space-id <space_id> --private # Private space
|
| 108 |
+
trackio sync --project <name> --space-id <space_id> --force # Overwrite
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## Output Formats
|
| 112 |
+
|
| 113 |
+
All `list`, `get`, and `query` commands support two output formats:
|
| 114 |
+
|
| 115 |
+
- **Human-readable** (default): Formatted text for terminal viewing
|
| 116 |
+
- **JSON** (with `--json` flag): Structured JSON for programmatic use
|
| 117 |
+
|
| 118 |
+
## Common Patterns
|
| 119 |
+
|
| 120 |
+
### Discover Projects and Runs
|
| 121 |
+
|
| 122 |
+
```bash
|
| 123 |
+
# List all available projects
|
| 124 |
+
trackio list projects
|
| 125 |
+
|
| 126 |
+
# List runs in a project
|
| 127 |
+
trackio list runs --project my-project
|
| 128 |
+
|
| 129 |
+
# Get project overview
|
| 130 |
+
trackio get project --project my-project --json
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
### Inspect Run Details
|
| 134 |
+
|
| 135 |
+
```bash
|
| 136 |
+
# Get run summary with all metrics
|
| 137 |
+
trackio get run --project my-project --run my-run --json
|
| 138 |
+
|
| 139 |
+
# List available metrics
|
| 140 |
+
trackio list metrics --project my-project --run my-run
|
| 141 |
+
|
| 142 |
+
# Get specific metric values
|
| 143 |
+
trackio get metric --project my-project --run my-run --metric loss --json
|
| 144 |
+
```
|
| 145 |
+
|
| 146 |
+
### Query System Metrics
|
| 147 |
+
|
| 148 |
+
```bash
|
| 149 |
+
# List system metrics (GPU, etc.)
|
| 150 |
+
trackio list system-metrics --project my-project --run my-run
|
| 151 |
+
|
| 152 |
+
# Get all system metric data
|
| 153 |
+
trackio get system-metric --project my-project --run my-run --json
|
| 154 |
+
|
| 155 |
+
# Get specific system metric
|
| 156 |
+
trackio get system-metric --project my-project --run my-run --metric gpu_utilization --json
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
### Automation Scripts
|
| 160 |
+
|
| 161 |
+
```bash
|
| 162 |
+
# Extract latest metric value
|
| 163 |
+
LATEST_LOSS=$(trackio get metric --project my-project --run my-run --metric loss --json | jq -r '.values[-1].value')
|
| 164 |
+
|
| 165 |
+
# Export run summary to file
|
| 166 |
+
trackio get run --project my-project --run my-run --json > run_summary.json
|
| 167 |
+
|
| 168 |
+
# Filter runs with jq
|
| 169 |
+
trackio list runs --project my-project --json | jq '.runs[] | select(startswith("train"))'
|
| 170 |
+
|
| 171 |
+
# Run a direct SQL aggregate
|
| 172 |
+
trackio query project --project my-project --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name" --json
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
### LLM Agent Workflow
|
| 176 |
+
|
| 177 |
+
```bash
|
| 178 |
+
# 1. Discover available projects
|
| 179 |
+
trackio list projects --json
|
| 180 |
+
|
| 181 |
+
# 2. Explore project structure
|
| 182 |
+
trackio get project --project my-project --json
|
| 183 |
+
|
| 184 |
+
# 3. Inspect specific run
|
| 185 |
+
trackio get run --project my-project --run my-run --json
|
| 186 |
+
|
| 187 |
+
# 4. Query metric values
|
| 188 |
+
trackio get metric --project my-project --run my-run --metric accuracy --json
|
| 189 |
+
|
| 190 |
+
# 5. Poll for alerts (use --since for efficient incremental polling)
|
| 191 |
+
trackio list alerts --project my-project --json --since "2025-06-01T00:00:00"
|
| 192 |
+
|
| 193 |
+
# 6. When an alert fires at step N, get all metrics around that point
|
| 194 |
+
trackio get snapshot --project my-project --run my-run --around 200 --window 5 --json
|
| 195 |
+
|
| 196 |
+
# 7. Fall back to direct SQL for one-off inspection
|
| 197 |
+
trackio query project --project my-project --sql "SELECT timestamp, run_name, level, title FROM alerts ORDER BY timestamp DESC LIMIT 20" --json
|
| 198 |
+
```
|
| 199 |
+
|
| 200 |
+
## Error Handling
|
| 201 |
+
|
| 202 |
+
Commands validate inputs and return clear errors:
|
| 203 |
+
|
| 204 |
+
- Missing project: `Error: Project '<name>' not found.`
|
| 205 |
+
- Missing run: `Error: Run '<name>' not found in project '<project>'.`
|
| 206 |
+
- Missing metric: `Error: Metric '<name>' not found in run '<run>' of project '<project>'.`
|
| 207 |
+
|
| 208 |
+
All errors exit with non-zero status code and write to stderr.
|
| 209 |
+
|
| 210 |
+
## Key Options
|
| 211 |
+
|
| 212 |
+
- `--project`: Project name (required for most commands)
|
| 213 |
+
- `--run`: Run name (required for run-specific commands)
|
| 214 |
+
- `--metric`: Metric name (required for metric-specific commands)
|
| 215 |
+
- `--sql`: Read-only SQL query (for `trackio query`)
|
| 216 |
+
- `--json`: Output in JSON format instead of human-readable
|
| 217 |
+
- `--space`: HF Space ID (e.g. `user/space`) or Space URL to query remotely (for `list`/`get`/`query` commands)
|
| 218 |
+
- `--hf-token`: HF token for accessing private Spaces (for `list`/`get`/`query` commands with `--space`)
|
| 219 |
+
- `--step`: Exact step filter (for `get metric`, `get snapshot`)
|
| 220 |
+
- `--around`: Center step for window filter (for `get metric`, `get snapshot`)
|
| 221 |
+
- `--at-time`: Center ISO timestamp for window filter (for `get metric`, `get snapshot`)
|
| 222 |
+
- `--window`: Window size: ±steps for `--around`, ±seconds for `--at-time` (default: 10)
|
| 223 |
+
- `--level`: Alert level filter (`info`, `warn`, `error`) (for `list alerts`)
|
| 224 |
+
- `--since`: ISO timestamp to filter alerts after (for `list alerts`)
|
| 225 |
+
- `--theme`: Dashboard theme (for `show` command)
|
| 226 |
+
- `--mcp-server`: Enable MCP server mode (for `show` command)
|
| 227 |
+
- `--color-palette`: Comma-separated hex colors (for `show` command)
|
| 228 |
+
- `--private`: Create private Space (for `sync` command)
|
| 229 |
+
- `--force`: Overwrite existing database (for `sync` command)
|
| 230 |
+
|
| 231 |
+
## JSON Output Structure
|
| 232 |
+
|
| 233 |
+
### List Projects
|
| 234 |
+
```json
|
| 235 |
+
{"projects": ["project1", "project2"]}
|
| 236 |
+
```
|
| 237 |
+
|
| 238 |
+
### List Runs
|
| 239 |
+
```json
|
| 240 |
+
{"project": "my-project", "runs": ["run1", "run2"]}
|
| 241 |
+
```
|
| 242 |
+
|
| 243 |
+
### Project Summary
|
| 244 |
+
```json
|
| 245 |
+
{
|
| 246 |
+
"project": "my-project",
|
| 247 |
+
"num_runs": 3,
|
| 248 |
+
"runs": ["run1", "run2", "run3"],
|
| 249 |
+
"last_activity": 100
|
| 250 |
+
}
|
| 251 |
+
```
|
| 252 |
+
|
| 253 |
+
### Run Summary
|
| 254 |
+
```json
|
| 255 |
+
{
|
| 256 |
+
"project": "my-project",
|
| 257 |
+
"run": "my-run",
|
| 258 |
+
"num_logs": 50,
|
| 259 |
+
"metrics": ["loss", "accuracy"],
|
| 260 |
+
"config": {"learning_rate": 0.001},
|
| 261 |
+
"last_step": 49
|
| 262 |
+
}
|
| 263 |
+
```
|
| 264 |
+
|
| 265 |
+
### Metric Values
|
| 266 |
+
```json
|
| 267 |
+
{
|
| 268 |
+
"project": "my-project",
|
| 269 |
+
"run": "my-run",
|
| 270 |
+
"metric": "loss",
|
| 271 |
+
"values": [
|
| 272 |
+
{"step": 0, "timestamp": "2024-01-01T00:00:00", "value": 0.5},
|
| 273 |
+
{"step": 1, "timestamp": "2024-01-01T00:01:00", "value": 0.4}
|
| 274 |
+
]
|
| 275 |
+
}
|
| 276 |
+
```
|
| 277 |
+
|
| 278 |
+
### Query Result
|
| 279 |
+
```json
|
| 280 |
+
{
|
| 281 |
+
"project": "my-project",
|
| 282 |
+
"query": "SELECT name FROM sqlite_master WHERE type = 'table' ORDER BY name",
|
| 283 |
+
"columns": ["name"],
|
| 284 |
+
"rows": [
|
| 285 |
+
{"name": "alerts"},
|
| 286 |
+
{"name": "configs"},
|
| 287 |
+
{"name": "metrics"}
|
| 288 |
+
],
|
| 289 |
+
"row_count": 3
|
| 290 |
+
}
|
| 291 |
+
```
|
| 292 |
+
|
| 293 |
+
## References
|
| 294 |
+
|
| 295 |
+
- **Complete CLI documentation**: See [docs/source/cli_commands.md](docs/source/cli_commands.md)
|
| 296 |
+
- **Storage schema and direct SQL**: See [storage_schema.md](storage_schema.md)
|
| 297 |
+
- **API and MCP Server**: See [docs/source/api_mcp_server.md](docs/source/api_mcp_server.md)
|
| 298 |
+
|
.agents/skills/trackio/storage_schema.md
ADDED
|
@@ -0,0 +1,159 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Trackio Storage Schema and Direct SQL
|
| 2 |
+
|
| 3 |
+
Use this reference when you need to inspect Trackio data directly instead of going through higher-level `trackio list` or `trackio get` commands.
|
| 4 |
+
|
| 5 |
+
## Where Data Is Stored
|
| 6 |
+
|
| 7 |
+
- Local project databases live in `TRACKIO_DIR`, which defaults to `~/.cache/huggingface/trackio`.
|
| 8 |
+
- Each project is stored in its own SQLite file: `{project}.db`.
|
| 9 |
+
- Media files live under `TRACKIO_DIR/media/`.
|
| 10 |
+
- Parquet files are derived exports written from SQLite for syncing and static Spaces.
|
| 11 |
+
|
| 12 |
+
## SQLite Tables
|
| 13 |
+
|
| 14 |
+
Trackio defines its live schema in `trackio/sqlite_storage.py` inside `SQLiteStorage.init_db()`.
|
| 15 |
+
|
| 16 |
+
### `metrics`
|
| 17 |
+
|
| 18 |
+
- `id`: integer primary key
|
| 19 |
+
- `timestamp`: ISO timestamp
|
| 20 |
+
- `run_name`: run identifier
|
| 21 |
+
- `step`: integer step
|
| 22 |
+
- `metrics`: JSON text payload
|
| 23 |
+
- `log_id`: optional deduplication key
|
| 24 |
+
- `space_id`: optional pending-sync marker
|
| 25 |
+
|
| 26 |
+
Indexes:
|
| 27 |
+
|
| 28 |
+
- `(run_name, step)`
|
| 29 |
+
- `(run_name, timestamp)`
|
| 30 |
+
- unique partial index on `log_id`
|
| 31 |
+
- partial index on `space_id`
|
| 32 |
+
|
| 33 |
+
### `configs`
|
| 34 |
+
|
| 35 |
+
- `id`: integer primary key
|
| 36 |
+
- `run_name`: run identifier
|
| 37 |
+
- `config`: JSON text payload
|
| 38 |
+
- `created_at`: ISO timestamp
|
| 39 |
+
|
| 40 |
+
Constraints:
|
| 41 |
+
|
| 42 |
+
- unique `run_name`
|
| 43 |
+
- index on `run_name`
|
| 44 |
+
|
| 45 |
+
### `system_metrics`
|
| 46 |
+
|
| 47 |
+
- `id`: integer primary key
|
| 48 |
+
- `timestamp`: ISO timestamp
|
| 49 |
+
- `run_name`: run identifier
|
| 50 |
+
- `metrics`: JSON text payload
|
| 51 |
+
- `log_id`: optional deduplication key
|
| 52 |
+
- `space_id`: optional pending-sync marker
|
| 53 |
+
|
| 54 |
+
Indexes:
|
| 55 |
+
|
| 56 |
+
- `(run_name, timestamp)`
|
| 57 |
+
- unique partial index on `log_id`
|
| 58 |
+
- partial index on `space_id`
|
| 59 |
+
|
| 60 |
+
### `project_metadata`
|
| 61 |
+
|
| 62 |
+
- `key`: primary key
|
| 63 |
+
- `value`: metadata value
|
| 64 |
+
|
| 65 |
+
### `pending_uploads`
|
| 66 |
+
|
| 67 |
+
- `id`
|
| 68 |
+
- `space_id`
|
| 69 |
+
- `run_name`
|
| 70 |
+
- `step`
|
| 71 |
+
- `file_path`
|
| 72 |
+
- `relative_path`
|
| 73 |
+
- `created_at`
|
| 74 |
+
|
| 75 |
+
### `alerts`
|
| 76 |
+
|
| 77 |
+
- `id`
|
| 78 |
+
- `timestamp`
|
| 79 |
+
- `run_name`
|
| 80 |
+
- `title`
|
| 81 |
+
- `text`
|
| 82 |
+
- `level`
|
| 83 |
+
- `step`
|
| 84 |
+
- `alert_id`
|
| 85 |
+
|
| 86 |
+
Indexes:
|
| 87 |
+
|
| 88 |
+
- `run_name`
|
| 89 |
+
- `timestamp`
|
| 90 |
+
- unique partial index on `alert_id`
|
| 91 |
+
|
| 92 |
+
## Parquet Layout
|
| 93 |
+
|
| 94 |
+
Trackio flattens JSON blobs when exporting parquet:
|
| 95 |
+
|
| 96 |
+
- `{project}.parquet` comes from `metrics`
|
| 97 |
+
- `{project}_system.parquet` comes from `system_metrics`
|
| 98 |
+
- `{project}_configs.parquet` comes from `configs`
|
| 99 |
+
|
| 100 |
+
Static export layout:
|
| 101 |
+
|
| 102 |
+
- `metrics.parquet`
|
| 103 |
+
- `aux/system_metrics.parquet`
|
| 104 |
+
- `aux/configs.parquet`
|
| 105 |
+
- `runs.json`
|
| 106 |
+
- `settings.json`
|
| 107 |
+
|
| 108 |
+
The flattened parquet files keep structural columns such as `timestamp`, `run_name`, and `step`, then add one column per JSON key found in the source payload.
|
| 109 |
+
|
| 110 |
+
## Direct SQL With The CLI
|
| 111 |
+
|
| 112 |
+
Use `trackio query` for read-only SQL:
|
| 113 |
+
|
| 114 |
+
```bash
|
| 115 |
+
trackio query project --project my-project --sql "SELECT name FROM sqlite_master WHERE type = 'table' ORDER BY name" --json
|
| 116 |
+
trackio query project --project my-project --sql "PRAGMA table_info(metrics)"
|
| 117 |
+
trackio query project --project my-project --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name ORDER BY last_step DESC"
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
Remote query works too:
|
| 121 |
+
|
| 122 |
+
```bash
|
| 123 |
+
trackio query project --project my-project --sql "SELECT COUNT(*) AS num_alerts FROM alerts" --space username/my-space --json
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
`trackio query` accepts read-only `SELECT`, `WITH`, and safe schema `PRAGMA` queries.
|
| 127 |
+
|
| 128 |
+
## Common Query Patterns
|
| 129 |
+
|
| 130 |
+
Recent alerts:
|
| 131 |
+
|
| 132 |
+
```bash
|
| 133 |
+
trackio query project --project my-project --sql "SELECT timestamp, run_name, level, title, step FROM alerts ORDER BY timestamp DESC LIMIT 20"
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
Latest step per run:
|
| 137 |
+
|
| 138 |
+
```bash
|
| 139 |
+
trackio query project --project my-project --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name ORDER BY last_step DESC"
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
Recent configs:
|
| 143 |
+
|
| 144 |
+
```bash
|
| 145 |
+
trackio query project --project my-project --sql "SELECT run_name, created_at, config FROM configs ORDER BY created_at DESC"
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
+
Schema inspection:
|
| 149 |
+
|
| 150 |
+
```bash
|
| 151 |
+
trackio query project --project my-project --sql "PRAGMA index_list(metrics)"
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
## Agent Guidance
|
| 155 |
+
|
| 156 |
+
- Start with `trackio list projects --json` if you do not know the project name yet.
|
| 157 |
+
- Use `trackio get` for common summaries and metric retrieval.
|
| 158 |
+
- Fall back to `trackio query` when you need one-off aggregates, joins, or schema introspection.
|
| 159 |
+
- Prefer `--json` when another agent or script needs to consume the result.
|
.gitignore
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.env.local
|
| 2 |
+
.env.*
|
| 3 |
+
__pycache__/
|
| 4 |
+
*.pyc
|
| 5 |
+
.pytest_cache/
|
| 6 |
+
outputs/
|
| 7 |
+
codex_tmp_*/
|
| 8 |
+
|
| 9 |
+
*.egg*
|
01_ARCHITECTURE.md
CHANGED
|
@@ -14,6 +14,12 @@ The environment is intentionally not a two-agent red-team/blue-team setup. The a
|
|
| 14 |
|
| 15 |
## 2. Final architecture diagram
|
| 16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
```mermaid
|
| 18 |
flowchart TB
|
| 19 |
%% =========================
|
|
@@ -363,6 +369,12 @@ Run before/after evaluation on the same held-out suite.
|
|
| 363 |
|
| 364 |
## 8. Training flow
|
| 365 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 366 |
```text
|
| 367 |
1. Build CyberSecurity_OWASP OpenEnv server.
|
| 368 |
2. Generate 600 MVP scenarios.
|
|
@@ -476,4 +488,3 @@ Expected endpoints:
|
|
| 476 |
| OpenEnv deployment docs | Informs HF Spaces deployment, endpoints, Docker workflow, and installable client package. | 8.5/10 |
|
| 477 |
| Hackathon judging criteria | Informs demo priorities: innovation, storytelling, reward improvement, and training pipeline. | 9/10 |
|
| 478 |
| TRL/OpenEnv training example | Informs rollout function, decomposed reward functions, and Trackio logging pattern. | 8/10 |
|
| 479 |
-
|
|
|
|
| 14 |
|
| 15 |
## 2. Final architecture diagram
|
| 16 |
|
| 17 |
+
Rendered asset:
|
| 18 |
+
|
| 19 |
+

|
| 20 |
+
|
| 21 |
+
Editable source: `assets/architecture_diagram.mmd`
|
| 22 |
+
|
| 23 |
```mermaid
|
| 24 |
flowchart TB
|
| 25 |
%% =========================
|
|
|
|
| 369 |
|
| 370 |
## 8. Training flow
|
| 371 |
|
| 372 |
+
Rendered asset:
|
| 373 |
+
|
| 374 |
+

|
| 375 |
+
|
| 376 |
+
Editable source: `assets/env_rl_training_flow_diagram.mmd`
|
| 377 |
+
|
| 378 |
```text
|
| 379 |
1. Build CyberSecurity_OWASP OpenEnv server.
|
| 380 |
2. Generate 600 MVP scenarios.
|
|
|
|
| 488 |
| OpenEnv deployment docs | Informs HF Spaces deployment, endpoints, Docker workflow, and installable client package. | 8.5/10 |
|
| 489 |
| Hackathon judging criteria | Informs demo priorities: innovation, storytelling, reward improvement, and training pipeline. | 9/10 |
|
| 490 |
| TRL/OpenEnv training example | Informs rollout function, decomposed reward functions, and Trackio logging pattern. | 8/10 |
|
|
|
Dockerfile
CHANGED
|
@@ -21,6 +21,7 @@ WORKDIR /app/env
|
|
| 21 |
COPY --from=builder /app/env /app/env
|
| 22 |
ENV PATH="/app/env/.venv/bin:$PATH"
|
| 23 |
ENV PYTHONPATH="/app/env:$PYTHONPATH"
|
|
|
|
| 24 |
|
| 25 |
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
| 26 |
CMD curl -f http://localhost:8000/health || exit 1
|
|
|
|
| 21 |
COPY --from=builder /app/env /app/env
|
| 22 |
ENV PATH="/app/env/.venv/bin:$PATH"
|
| 23 |
ENV PYTHONPATH="/app/env:$PYTHONPATH"
|
| 24 |
+
ENV ENABLE_WEB_INTERFACE=true
|
| 25 |
|
| 26 |
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
| 27 |
CMD curl -f http://localhost:8000/health || exit 1
|
README.md
CHANGED
|
@@ -23,6 +23,14 @@ inspect generated app + policy -> discover authorization bug -> submit finding -
|
|
| 23 |
|
| 24 |
The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward.
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
## Quick Start
|
| 27 |
|
| 28 |
```bash
|
|
@@ -125,6 +133,24 @@ Training files are under `training/`:
|
|
| 125 |
|
| 126 |
The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief.
|
| 127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
## Modal Ephemeral Runs
|
| 129 |
|
| 130 |
Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged.
|
|
@@ -141,7 +167,7 @@ Run a temporary Modal app for a cheap environment/training smoke check:
|
|
| 141 |
uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4
|
| 142 |
```
|
| 143 |
|
| 144 |
-
The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/`.
|
| 145 |
|
| 146 |
You can also validate the GRPO config construction remotely:
|
| 147 |
|
|
@@ -179,6 +205,20 @@ uv run --extra modal modal run scripts/modal_train_grpo.py \
|
|
| 179 |
--difficulty 0
|
| 180 |
```
|
| 181 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
Defaults are derived from `HF_TOKEN`:
|
| 183 |
|
| 184 |
- Trackio Space: `<hf-user>/CyberSecurity_OWASP-trackio`
|
|
|
|
| 23 |
|
| 24 |
The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward.
|
| 25 |
|
| 26 |
+
## Diagrams
|
| 27 |
+
|
| 28 |
+

|
| 29 |
+
|
| 30 |
+

|
| 31 |
+
|
| 32 |
+
Editable Mermaid sources are available in `assets/architecture_diagram.mmd` and `assets/env_rl_training_flow_diagram.mmd`.
|
| 33 |
+
|
| 34 |
## Quick Start
|
| 35 |
|
| 36 |
```bash
|
|
|
|
| 133 |
|
| 134 |
The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief.
|
| 135 |
|
| 136 |
+
## Trackio Run Tracking
|
| 137 |
+
|
| 138 |
+
Trackio is the default tracker for official runs. Set `TRACKIO_SPACE_ID` to log to a hosted Hugging Face Trackio Space; otherwise Trackio records locally.
|
| 139 |
+
|
| 140 |
+
```bash
|
| 141 |
+
export TRACKIO_SPACE_ID=<hf-user>/CyberSecurity_OWASP-trackio
|
| 142 |
+
export TRACKIO_PROJECT=CyberSecurity_OWASP-grpo
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
Use the tracked smoke wrapper instead of invoking pytest directly when producing run artifacts:
|
| 146 |
+
|
| 147 |
+
```bash
|
| 148 |
+
bash scripts/smoke_test.sh
|
| 149 |
+
uv run python scripts/track_pytest.py tests
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
+
Evaluation summaries saved through `training.eval_before_after.save_eval_summary(...)`, Modal smoke runs, and GRPO training configs all initialize Trackio runs with CyberSecurity_OWASP run names.
|
| 153 |
+
|
| 154 |
## Modal Ephemeral Runs
|
| 155 |
|
| 156 |
Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged.
|
|
|
|
| 167 |
uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4
|
| 168 |
```
|
| 169 |
|
| 170 |
+
The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/` and the summary metrics are logged to Trackio.
|
| 171 |
|
| 172 |
You can also validate the GRPO config construction remotely:
|
| 173 |
|
|
|
|
| 205 |
--difficulty 0
|
| 206 |
```
|
| 207 |
|
| 208 |
+
If running from a public repository and you do not want Modal to package the
|
| 209 |
+
local workspace, use public source mode:
|
| 210 |
+
|
| 211 |
+
```bash
|
| 212 |
+
uv run --extra modal modal run scripts/modal_train_grpo.py \
|
| 213 |
+
--source-mode public \
|
| 214 |
+
--repo-url https://github.com/humandotlearning/CyberSecurity_OWASP.git \
|
| 215 |
+
--repo-branch master \
|
| 216 |
+
--max-steps 10 \
|
| 217 |
+
--dataset-size 16 \
|
| 218 |
+
--num-generations 2 \
|
| 219 |
+
--difficulty 0
|
| 220 |
+
```
|
| 221 |
+
|
| 222 |
Defaults are derived from `HF_TOKEN`:
|
| 223 |
|
| 224 |
- Trackio Space: `<hf-user>/CyberSecurity_OWASP-trackio`
|
assets/architecture_diagram.mmd
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
flowchart LR
|
| 2 |
+
subgraph Factory["Scenario Factory"]
|
| 3 |
+
Policy["Policy graph\nusers, roles, tenants, ownership"]
|
| 4 |
+
Templates["FastAPI template renderer\nroutes, services, auth helpers"]
|
| 5 |
+
Mutator["A01 bug mutator\none injected authorization defect"]
|
| 6 |
+
Fixtures["Fixture generator\nvisible tests + hidden facts"]
|
| 7 |
+
Compiler["Scenario compiler\nseeded workspace"]
|
| 8 |
+
Policy --> Compiler
|
| 9 |
+
Templates --> Compiler
|
| 10 |
+
Mutator --> Compiler
|
| 11 |
+
Fixtures --> Compiler
|
| 12 |
+
end
|
| 13 |
+
|
| 14 |
+
subgraph Runtime["CyberSecurity_OWASP OpenEnv Runtime"]
|
| 15 |
+
Reset["reset(seed)\ncompile fresh scenario"]
|
| 16 |
+
Env["Environment state\nphase, history, metrics, hidden facts"]
|
| 17 |
+
Tools["Typed step(action) tools\ninspect, read, request, patch, test, submit"]
|
| 18 |
+
Sandbox["Generated local app workspace\neditable app files only"]
|
| 19 |
+
Verifier["Deterministic verifier\nsecurity + regression + public routes"]
|
| 20 |
+
Reward["Reward engine\nstable component breakdown"]
|
| 21 |
+
App["FastAPI OpenEnv server\n/ws, /reset, /step, /state"]
|
| 22 |
+
Reset --> Env
|
| 23 |
+
Env --> Tools
|
| 24 |
+
Tools <--> Sandbox
|
| 25 |
+
Tools --> Verifier
|
| 26 |
+
Verifier --> Reward
|
| 27 |
+
Reward --> Env
|
| 28 |
+
Env --> App
|
| 29 |
+
end
|
| 30 |
+
|
| 31 |
+
subgraph Agent["Single LLM Agent"]
|
| 32 |
+
Obs["Observation parser"]
|
| 33 |
+
Reason["Policy and code reasoning"]
|
| 34 |
+
Act["One JSON action"]
|
| 35 |
+
Obs --> Reason --> Act
|
| 36 |
+
end
|
| 37 |
+
|
| 38 |
+
subgraph Ops["Training, Evaluation, Demo"]
|
| 39 |
+
Rollout["Rollout loop\nreset -> step* -> terminal reward"]
|
| 40 |
+
GRPO["TRL GRPO / LoRA training"]
|
| 41 |
+
Trackio["Trackio metrics\nreward and pass rates"]
|
| 42 |
+
Eval["Held-out evaluation\nunseen seeds/layouts/domains"]
|
| 43 |
+
Artifacts["Rollout artifacts\nbefore/after traces"]
|
| 44 |
+
Rollout --> GRPO --> Trackio --> Eval --> Artifacts
|
| 45 |
+
end
|
| 46 |
+
|
| 47 |
+
Compiler --> Reset
|
| 48 |
+
App --> Obs
|
| 49 |
+
Act --> App
|
| 50 |
+
Reward --> Rollout
|
| 51 |
+
GRPO --> Agent
|
assets/architecture_diagram.svg
ADDED
|
|
assets/env_rl_training_flow_diagram.mmd
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
flowchart TD
|
| 2 |
+
Start["Start run\nselect base model + config"] --> Cache["Prepare scenario splits\ntrain, validation, hidden_eval"]
|
| 3 |
+
Cache --> Baseline["Baseline evaluation\nscripted/model rollouts"]
|
| 4 |
+
Baseline --> TrainLoop["GRPO training loop"]
|
| 5 |
+
|
| 6 |
+
subgraph Episode["One OpenEnv Episode"]
|
| 7 |
+
Reset["env.reset(seed)\nnew generated app + policy"] --> Observe["Observation\nphase, hints, available tools"]
|
| 8 |
+
Observe --> Prompt["Build action prompt\nJSON action only"]
|
| 9 |
+
Prompt --> Generate["LLM generates action"]
|
| 10 |
+
Generate --> Step["env.step(action)\nphase gate + execute tool"]
|
| 11 |
+
Step --> Intermediate{"done?"}
|
| 12 |
+
Intermediate -- "no" --> Observe
|
| 13 |
+
Intermediate -- "yes" --> Final["Terminal verifier\nhidden security + regression + anti-cheat"]
|
| 14 |
+
end
|
| 15 |
+
|
| 16 |
+
TrainLoop --> Reset
|
| 17 |
+
Final --> Rewards["Reward components\ndiscovery, security, regression, public_routes,\npatch_quality, visible_tests, safety, anti_cheat"]
|
| 18 |
+
Rewards --> Update["GRPO update\nLoRA adapter checkpoint"]
|
| 19 |
+
Update --> Metrics["Trackio logging\nreward means, pass rates, invalid actions, latency"]
|
| 20 |
+
Metrics --> Validate{"Validation plateau\nor failure cluster?"}
|
| 21 |
+
Validate -- "continue" --> TrainLoop
|
| 22 |
+
Validate -- "adjust curriculum" --> Curriculum["Curriculum controller\nrebalance difficulty and traps"]
|
| 23 |
+
Curriculum --> TrainLoop
|
| 24 |
+
Validate -- "final checkpoint" --> Heldout["Held-out eval\nunseen seeds/layouts/domain combos"]
|
| 25 |
+
Heldout --> Compare["Before/after summary\nsuccess, reward, exploit-block, regression preservation"]
|
| 26 |
+
Compare --> Artifacts["Saved artifacts\noutputs/evals + outputs/rollouts"]
|
assets/env_rl_training_flow_diagram.svg
ADDED
|
|
models.py
CHANGED
|
@@ -56,8 +56,12 @@ class CyberSecurityOWASPState(State):
|
|
| 56 |
seed: int = 0
|
| 57 |
split: CyberSecurityOWASPSplit = "train"
|
| 58 |
difficulty: int = 0
|
|
|
|
| 59 |
domain: str = ""
|
| 60 |
bug_family: str = ""
|
|
|
|
|
|
|
|
|
|
| 61 |
phase: CyberSecurityOWASPPhase = "discover"
|
| 62 |
max_steps: int = 40
|
| 63 |
done: bool = False
|
|
@@ -71,6 +75,11 @@ class CyberSecurityOWASPState(State):
|
|
| 71 |
reward_history: list[dict[str, float]] = Field(default_factory=list)
|
| 72 |
visible_facts: dict[str, Any] = Field(default_factory=dict)
|
| 73 |
hidden_facts: dict[str, Any] = Field(default_factory=dict)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
metrics: dict[str, Any] = Field(default_factory=dict)
|
| 75 |
anti_cheat_flags: list[str] = Field(default_factory=list)
|
| 76 |
|
|
|
|
| 56 |
seed: int = 0
|
| 57 |
split: CyberSecurityOWASPSplit = "train"
|
| 58 |
difficulty: int = 0
|
| 59 |
+
difficulty_tier: str = "warmup"
|
| 60 |
domain: str = ""
|
| 61 |
bug_family: str = ""
|
| 62 |
+
scenario_family: str = ""
|
| 63 |
+
template_id: str = "fastapi_basic"
|
| 64 |
+
target_weakness: str = "same_role_cross_object"
|
| 65 |
phase: CyberSecurityOWASPPhase = "discover"
|
| 66 |
max_steps: int = 40
|
| 67 |
done: bool = False
|
|
|
|
| 75 |
reward_history: list[dict[str, float]] = Field(default_factory=list)
|
| 76 |
visible_facts: dict[str, Any] = Field(default_factory=dict)
|
| 77 |
hidden_facts: dict[str, Any] = Field(default_factory=dict)
|
| 78 |
+
curriculum_snapshot: dict[str, Any] = Field(default_factory=dict)
|
| 79 |
+
verification_summary: dict[str, Any] = Field(default_factory=dict)
|
| 80 |
+
patch_diff: str = ""
|
| 81 |
+
episode_artifact_path: str | None = None
|
| 82 |
+
observation_history: list[dict[str, Any]] = Field(default_factory=list)
|
| 83 |
metrics: dict[str, Any] = Field(default_factory=dict)
|
| 84 |
anti_cheat_flags: list[str] = Field(default_factory=list)
|
| 85 |
|
pyproject.toml
CHANGED
|
@@ -18,6 +18,7 @@ dependencies = [
|
|
| 18 |
# install from github
|
| 19 |
# "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
|
| 20 |
"openenv-core[core]>=0.2.2",
|
|
|
|
| 21 |
# Environment-specific dependencies
|
| 22 |
# Add all dependencies needed for your environment here
|
| 23 |
# Examples:
|
|
|
|
| 18 |
# install from github
|
| 19 |
# "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
|
| 20 |
"openenv-core[core]>=0.2.2",
|
| 21 |
+
"trackio>=0.22.0",
|
| 22 |
# Environment-specific dependencies
|
| 23 |
# Add all dependencies needed for your environment here
|
| 24 |
# Examples:
|
scenario_compiler.py
CHANGED
|
@@ -1,51 +1,26 @@
|
|
| 1 |
-
"""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
-
import tempfile
|
| 6 |
-
from pathlib import Path
|
| 7 |
from typing import Any
|
| 8 |
|
| 9 |
try:
|
| 10 |
-
from .
|
| 11 |
-
from .policy_graph import build_invoice_policy
|
| 12 |
-
from .template_renderer import render_fastapi_basic
|
| 13 |
except ImportError: # pragma: no cover
|
| 14 |
-
from
|
| 15 |
-
from policy_graph import build_invoice_policy
|
| 16 |
-
from template_renderer import render_fastapi_basic
|
| 17 |
|
| 18 |
|
| 19 |
-
def compile_scenario(
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
hidden.update(
|
| 28 |
-
{
|
| 29 |
-
"workspace": str(workspace),
|
| 30 |
-
"editable_files": editable_files,
|
| 31 |
-
"initial_file_hashes": {
|
| 32 |
-
path: (workspace / path).read_text(encoding="utf-8")
|
| 33 |
-
for path in editable_files
|
| 34 |
-
},
|
| 35 |
-
}
|
| 36 |
)
|
| 37 |
-
return {
|
| 38 |
-
"task_id": task_id,
|
| 39 |
-
"workspace": workspace,
|
| 40 |
-
"domain": "invoices",
|
| 41 |
-
"bug_family": "bola_idor",
|
| 42 |
-
"difficulty": difficulty,
|
| 43 |
-
"task_brief": (
|
| 44 |
-
"Inspect the generated invoices app and policy. Find the broken "
|
| 45 |
-
"authorization behavior, submit a finding with local evidence, patch "
|
| 46 |
-
"the app, preserve intended owner/admin/public behavior, then submit."
|
| 47 |
-
),
|
| 48 |
-
"public_hint": compiled.public_hint,
|
| 49 |
-
"workspace_summary": visible_workspace_summary(editable_files, compiled.public_hint),
|
| 50 |
-
"hidden_facts": hidden,
|
| 51 |
-
}
|
|
|
|
| 1 |
+
"""Compatibility facade for deterministic scenario compilation."""
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
|
|
|
|
|
|
| 5 |
from typing import Any
|
| 6 |
|
| 7 |
try:
|
| 8 |
+
from .server.scenario_factory import ScenarioFactory
|
|
|
|
|
|
|
| 9 |
except ImportError: # pragma: no cover
|
| 10 |
+
from server.scenario_factory import ScenarioFactory
|
|
|
|
|
|
|
| 11 |
|
| 12 |
|
| 13 |
+
def compile_scenario(
|
| 14 |
+
seed: int,
|
| 15 |
+
split: str = "train",
|
| 16 |
+
difficulty: int = 0,
|
| 17 |
+
curriculum_profile: dict[str, Any] | None = None,
|
| 18 |
+
) -> dict[str, Any]:
|
| 19 |
+
"""Compile one isolated authorization-repair scenario."""
|
| 20 |
|
| 21 |
+
return ScenarioFactory().compile_scenario(
|
| 22 |
+
seed,
|
| 23 |
+
split=split,
|
| 24 |
+
difficulty=difficulty,
|
| 25 |
+
curriculum_profile=curriculum_profile,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scripts/modal_ephemeral_train.py
CHANGED
|
@@ -62,12 +62,18 @@ class NoopTrainer:
|
|
| 62 |
|
| 63 |
|
| 64 |
@app.function(image=image, timeout=60 * 30)
|
| 65 |
-
def run_ephemeral_smoke(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
from CyberSecurity_OWASP.models import CyberSecurityOWASPAction
|
| 67 |
from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import (
|
| 68 |
CybersecurityOwaspEnvironment,
|
| 69 |
)
|
| 70 |
from training.rollout import rollout_once
|
|
|
|
| 71 |
|
| 72 |
baseline = []
|
| 73 |
oracle = []
|
|
@@ -128,8 +134,9 @@ def run_ephemeral_smoke(episodes: int = 4, seed_start: int = 0) -> dict[str, Any
|
|
| 128 |
def mean(items: list[dict[str, Any]], key: str) -> float:
|
| 129 |
return sum(float(item.get(key, 0.0)) for item in items) / max(1, len(items))
|
| 130 |
|
| 131 |
-
|
| 132 |
-
|
|
|
|
| 133 |
"mode": "smoke",
|
| 134 |
"episodes": episodes,
|
| 135 |
"seed_start": seed_start,
|
|
@@ -139,6 +146,28 @@ def run_ephemeral_smoke(episodes: int = 4, seed_start: int = 0) -> dict[str, Any
|
|
| 139 |
"baseline": baseline,
|
| 140 |
"oracle": oracle,
|
| 141 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
|
| 143 |
|
| 144 |
@app.function(image=image, timeout=60 * 10)
|
|
@@ -149,9 +178,20 @@ def run_grpo_config_check() -> str:
|
|
| 149 |
|
| 150 |
|
| 151 |
@app.local_entrypoint()
|
| 152 |
-
def main(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
if mode == "smoke":
|
| 154 |
-
result = run_ephemeral_smoke.remote(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
output_dir = PROJECT_ROOT / "outputs" / "rollouts"
|
| 156 |
output_dir.mkdir(parents=True, exist_ok=True)
|
| 157 |
output_path = output_dir / f"{result['run_name']}.json"
|
|
|
|
| 62 |
|
| 63 |
|
| 64 |
@app.function(image=image, timeout=60 * 30)
|
| 65 |
+
def run_ephemeral_smoke(
|
| 66 |
+
episodes: int = 4,
|
| 67 |
+
seed_start: int = 0,
|
| 68 |
+
trackio_space_id: str = "",
|
| 69 |
+
trackio_project: str = "CyberSecurity_OWASP-smoke",
|
| 70 |
+
) -> dict[str, Any]:
|
| 71 |
from CyberSecurity_OWASP.models import CyberSecurityOWASPAction
|
| 72 |
from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import (
|
| 73 |
CybersecurityOwaspEnvironment,
|
| 74 |
)
|
| 75 |
from training.rollout import rollout_once
|
| 76 |
+
from training.trackio_utils import log_trackio_metrics, trackio_run
|
| 77 |
|
| 78 |
baseline = []
|
| 79 |
oracle = []
|
|
|
|
| 134 |
def mean(items: list[dict[str, Any]], key: str) -> float:
|
| 135 |
return sum(float(item.get(key, 0.0)) for item in items) / max(1, len(items))
|
| 136 |
|
| 137 |
+
run_name = f"{APP_NAME}-{datetime.utcnow().strftime('%Y%m%d-%H%M%S')}"
|
| 138 |
+
result = {
|
| 139 |
+
"run_name": run_name,
|
| 140 |
"mode": "smoke",
|
| 141 |
"episodes": episodes,
|
| 142 |
"seed_start": seed_start,
|
|
|
|
| 146 |
"baseline": baseline,
|
| 147 |
"oracle": oracle,
|
| 148 |
}
|
| 149 |
+
with trackio_run(
|
| 150 |
+
run_name=run_name,
|
| 151 |
+
run_type="modal_ephemeral_smoke",
|
| 152 |
+
project=trackio_project,
|
| 153 |
+
space_id=trackio_space_id,
|
| 154 |
+
config={
|
| 155 |
+
"episodes": episodes,
|
| 156 |
+
"seed_start": seed_start,
|
| 157 |
+
"mode": "smoke",
|
| 158 |
+
},
|
| 159 |
+
group="smoke",
|
| 160 |
+
):
|
| 161 |
+
log_trackio_metrics(
|
| 162 |
+
{
|
| 163 |
+
"smoke/baseline_mean_reward": result["baseline_mean_reward"],
|
| 164 |
+
"smoke/oracle_mean_reward": result["oracle_mean_reward"],
|
| 165 |
+
"smoke/oracle_success_rate": result["oracle_success_rate"],
|
| 166 |
+
"smoke/episodes": episodes,
|
| 167 |
+
},
|
| 168 |
+
step=0,
|
| 169 |
+
)
|
| 170 |
+
return result
|
| 171 |
|
| 172 |
|
| 173 |
@app.function(image=image, timeout=60 * 10)
|
|
|
|
| 178 |
|
| 179 |
|
| 180 |
@app.local_entrypoint()
|
| 181 |
+
def main(
|
| 182 |
+
mode: str = "smoke",
|
| 183 |
+
episodes: int = 4,
|
| 184 |
+
seed_start: int = 0,
|
| 185 |
+
trackio_space_id: str = "",
|
| 186 |
+
trackio_project: str = "CyberSecurity_OWASP-smoke",
|
| 187 |
+
) -> None:
|
| 188 |
if mode == "smoke":
|
| 189 |
+
result = run_ephemeral_smoke.remote(
|
| 190 |
+
episodes=episodes,
|
| 191 |
+
seed_start=seed_start,
|
| 192 |
+
trackio_space_id=trackio_space_id,
|
| 193 |
+
trackio_project=trackio_project,
|
| 194 |
+
)
|
| 195 |
output_dir = PROJECT_ROOT / "outputs" / "rollouts"
|
| 196 |
output_dir.mkdir(parents=True, exist_ok=True)
|
| 197 |
output_path = output_dir / f"{result['run_name']}.json"
|
scripts/modal_run_ephemeral.sh
CHANGED
|
@@ -1,3 +1,8 @@
|
|
| 1 |
#!/usr/bin/env bash
|
| 2 |
set -euo pipefail
|
| 3 |
-
modal run scripts/modal_ephemeral_train.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
#!/usr/bin/env bash
|
| 2 |
set -euo pipefail
|
| 3 |
+
modal run scripts/modal_ephemeral_train.py \
|
| 4 |
+
--mode "${MODE:-smoke}" \
|
| 5 |
+
--episodes "${EPISODES:-4}" \
|
| 6 |
+
--seed-start "${SEED_START:-0}" \
|
| 7 |
+
--trackio-space-id "${TRACKIO_SPACE_ID:-}" \
|
| 8 |
+
--trackio-project "${TRACKIO_PROJECT:-CyberSecurity_OWASP-smoke}"
|
scripts/modal_train_grpo.py
CHANGED
|
@@ -19,6 +19,7 @@ from __future__ import annotations
|
|
| 19 |
import os
|
| 20 |
import pathlib
|
| 21 |
import subprocess
|
|
|
|
| 22 |
from datetime import datetime, timezone
|
| 23 |
from typing import Any
|
| 24 |
|
|
@@ -31,10 +32,62 @@ SECRET_NAME = "CyberSecurity_OWASP-secrets"
|
|
| 31 |
RUNS_DIR = pathlib.Path("/runs")
|
| 32 |
REMOTE_PROJECT = "/root/CyberSecurity_OWASP"
|
| 33 |
PROJECT_ROOT = pathlib.Path(__file__).resolve().parents[1]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
|
| 36 |
def _training_image() -> modal.Image:
|
| 37 |
-
|
| 38 |
modal.Image.from_registry(
|
| 39 |
"nvidia/cuda:12.8.0-devel-ubuntu22.04",
|
| 40 |
add_python="3.11",
|
|
@@ -49,21 +102,33 @@ def _training_image() -> modal.Image:
|
|
| 49 |
"datasets",
|
| 50 |
"huggingface_hub",
|
| 51 |
"peft",
|
|
|
|
| 52 |
"tokenizers",
|
| 53 |
"nvidia-ml-py",
|
| 54 |
"trackio>=0.25.0",
|
| 55 |
"transformers>=5.5.0",
|
| 56 |
"trl>=0.28.0",
|
| 57 |
"openenv-core[core]>=0.2.3",
|
| 58 |
-
"pydantic==2.10.6",
|
| 59 |
)
|
| 60 |
.uv_pip_install(
|
| 61 |
"unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo",
|
| 62 |
"unsloth[base] @ git+https://github.com/unslothai/unsloth",
|
| 63 |
)
|
|
|
|
| 64 |
.uv_pip_install("mergekit", "immutables==0.21", extra_options="--no-deps")
|
|
|
|
| 65 |
.uv_pip_install("trl>=0.28.0", "transformers>=5.5.0", "jmespath")
|
| 66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
PROJECT_ROOT,
|
| 68 |
remote_path=REMOTE_PROJECT,
|
| 69 |
copy=True,
|
|
@@ -76,22 +141,23 @@ def _training_image() -> modal.Image:
|
|
| 76 |
"*.pyc",
|
| 77 |
],
|
| 78 |
)
|
| 79 |
-
.run_commands(
|
| 80 |
f"python -m pip install -e {REMOTE_PROJECT}",
|
| 81 |
-
"python -c \"import os, torch; import transformers.utils.hub as hub; "
|
| 82 |
-
"hub.TRANSFORMERS_CACHE = getattr(hub, 'TRANSFORMERS_CACHE', "
|
| 83 |
-
"os.path.join(os.path.expanduser('~'), '.cache', 'huggingface', 'hub')); "
|
| 84 |
-
"from trl import GRPOConfig, GRPOTrainer; "
|
| 85 |
-
"from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import "
|
| 86 |
-
"CybersecurityOwaspEnvironment; print('trainer import ok', torch.__version__)\"",
|
| 87 |
)
|
| 88 |
-
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
|
| 92 |
app = modal.App(APP_NAME)
|
| 93 |
volume = modal.Volume.from_name(VOLUME_NAME, create_if_missing=True)
|
| 94 |
-
|
| 95 |
|
| 96 |
|
| 97 |
@app.function(
|
|
@@ -99,7 +165,7 @@ secret = modal.Secret.from_name(SECRET_NAME)
|
|
| 99 |
gpu=["L4", "A10G"],
|
| 100 |
timeout=4 * 60 * 60,
|
| 101 |
volumes={RUNS_DIR: volume},
|
| 102 |
-
secrets=
|
| 103 |
)
|
| 104 |
def check_training_imports() -> dict[str, str]:
|
| 105 |
import torch
|
|
@@ -131,7 +197,7 @@ def check_training_imports() -> dict[str, str]:
|
|
| 131 |
gpu=["L4", "A10G"],
|
| 132 |
timeout=4 * 60 * 60,
|
| 133 |
volumes={RUNS_DIR: volume},
|
| 134 |
-
secrets=
|
| 135 |
)
|
| 136 |
def train_cybersecurity_owasp_grpo(
|
| 137 |
env_repo_id: str = "",
|
|
@@ -149,16 +215,22 @@ def train_cybersecurity_owasp_grpo(
|
|
| 149 |
num_generations: int = 2,
|
| 150 |
seed_start: int = 0,
|
| 151 |
git_sha: str = "nogit",
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
) -> dict[str, str | int | float]:
|
|
|
|
| 153 |
import statistics
|
| 154 |
|
| 155 |
import torch
|
|
|
|
| 156 |
import transformers.utils.hub as transformers_hub
|
| 157 |
from datasets import Dataset
|
| 158 |
from huggingface_hub import whoami
|
| 159 |
from transformers import TrainerCallback
|
| 160 |
-
from trl import GRPOConfig, GRPOTrainer
|
| 161 |
-
from
|
| 162 |
|
| 163 |
import trackio
|
| 164 |
|
|
@@ -191,7 +263,9 @@ def train_cybersecurity_owasp_grpo(
|
|
| 191 |
|
| 192 |
model_slug = model_name.replace("/", "-")
|
| 193 |
stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
| 194 |
-
run_name =
|
|
|
|
|
|
|
| 195 |
output_dir = RUNS_DIR / run_name
|
| 196 |
output_dir.mkdir(parents=True, exist_ok=True)
|
| 197 |
|
|
@@ -324,11 +398,27 @@ def train_cybersecurity_owasp_grpo(
|
|
| 324 |
return self._step("read_openapi")
|
| 325 |
|
| 326 |
def read_file(self, path: str) -> str:
|
| 327 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 328 |
return self._step("read_file", {"path": path})
|
| 329 |
|
| 330 |
def search_code(self, query: str) -> str:
|
| 331 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 332 |
return self._step("search_code", {"query": query})
|
| 333 |
|
| 334 |
def send_local_request(
|
|
@@ -337,7 +427,17 @@ def train_cybersecurity_owasp_grpo(
|
|
| 337 |
method: str = "GET",
|
| 338 |
user_id: str | None = None,
|
| 339 |
) -> str:
|
| 340 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 341 |
return self._step(
|
| 342 |
"send_local_request",
|
| 343 |
{"path": path, "method": method, "user_id": user_id},
|
|
@@ -350,7 +450,18 @@ def train_cybersecurity_owasp_grpo(
|
|
| 350 |
second_user_id: str,
|
| 351 |
method: str = "GET",
|
| 352 |
) -> str:
|
| 353 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 354 |
return self._step(
|
| 355 |
"compare_identities",
|
| 356 |
{
|
|
@@ -367,7 +478,17 @@ def train_cybersecurity_owasp_grpo(
|
|
| 367 |
evidence: str,
|
| 368 |
policy_rule: str,
|
| 369 |
) -> str:
|
| 370 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 371 |
return self._step(
|
| 372 |
"submit_finding",
|
| 373 |
{
|
|
@@ -383,7 +504,17 @@ def train_cybersecurity_owasp_grpo(
|
|
| 383 |
content: str | None = None,
|
| 384 |
diff: str | None = None,
|
| 385 |
) -> str:
|
| 386 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 387 |
args: dict[str, Any] = {"path": path}
|
| 388 |
if content is not None:
|
| 389 |
args["content"] = content
|
|
@@ -534,7 +665,10 @@ def train_cybersecurity_owasp_grpo(
|
|
| 534 |
return control
|
| 535 |
|
| 536 |
print(f"CUDA available: {torch.cuda.is_available()}")
|
| 537 |
-
|
|
|
|
|
|
|
|
|
|
| 538 |
print(f"Trackio Space: {trackio_space_id}")
|
| 539 |
print(f"Trackio Project: {trackio_project}")
|
| 540 |
print(f"Output repo: {output_repo_id}")
|
|
@@ -547,6 +681,18 @@ def train_cybersecurity_owasp_grpo(
|
|
| 547 |
fast_inference=False,
|
| 548 |
token=hf_token,
|
| 549 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 550 |
model = FastLanguageModel.get_peft_model(
|
| 551 |
model,
|
| 552 |
r=lora_rank,
|
|
@@ -565,46 +711,68 @@ def train_cybersecurity_owasp_grpo(
|
|
| 565 |
)
|
| 566 |
FastLanguageModel.for_training(model)
|
| 567 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 568 |
training_args = GRPOConfig(
|
| 569 |
-
|
| 570 |
-
|
| 571 |
-
|
| 572 |
-
|
| 573 |
-
|
| 574 |
-
optim="adamw_8bit",
|
| 575 |
-
logging_steps=1,
|
| 576 |
-
per_device_train_batch_size=1,
|
| 577 |
-
gradient_accumulation_steps=max(2, num_generations),
|
| 578 |
-
num_generations=num_generations,
|
| 579 |
-
max_prompt_length=max_seq_length,
|
| 580 |
-
max_completion_length=max_completion_length,
|
| 581 |
-
max_steps=max_steps,
|
| 582 |
-
save_steps=max(10, max_steps),
|
| 583 |
-
report_to="trackio",
|
| 584 |
-
trackio_space_id=trackio_space_id,
|
| 585 |
-
run_name=run_name,
|
| 586 |
-
output_dir=str(output_dir),
|
| 587 |
-
push_to_hub=True,
|
| 588 |
-
hub_model_id=output_repo_id,
|
| 589 |
-
hub_private_repo=True,
|
| 590 |
-
hub_strategy="every_save",
|
| 591 |
-
gradient_checkpointing=True,
|
| 592 |
-
gradient_checkpointing_kwargs={"use_reentrant": False},
|
| 593 |
-
epsilon=0.2,
|
| 594 |
-
epsilon_high=0.28,
|
| 595 |
-
delta=1.5,
|
| 596 |
-
loss_type="bnpo",
|
| 597 |
-
mask_truncated_completions=False,
|
| 598 |
)
|
| 599 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 600 |
trainer = GRPOTrainer(
|
| 601 |
-
|
| 602 |
-
|
| 603 |
-
|
| 604 |
-
|
| 605 |
-
|
| 606 |
-
environment_factory=CyberSecurityOWASPToolEnv,
|
| 607 |
-
callbacks=[TrackioSystemMetricsCallback()],
|
| 608 |
)
|
| 609 |
trainer.train()
|
| 610 |
trainer.push_to_hub()
|
|
@@ -623,6 +791,9 @@ def train_cybersecurity_owasp_grpo(
|
|
| 623 |
"model_name": model_name,
|
| 624 |
"max_completion_length": max_completion_length,
|
| 625 |
"num_generations": num_generations,
|
|
|
|
|
|
|
|
|
|
| 626 |
}
|
| 627 |
|
| 628 |
|
|
@@ -644,6 +815,10 @@ def main(
|
|
| 644 |
num_generations: int = 2,
|
| 645 |
seed_start: int = 0,
|
| 646 |
git_sha: str = "nogit",
|
|
|
|
|
|
|
|
|
|
|
|
|
| 647 |
) -> None:
|
| 648 |
if mode == "config":
|
| 649 |
result = check_training_imports.remote()
|
|
@@ -652,6 +827,10 @@ def main(
|
|
| 652 |
if mode != "train":
|
| 653 |
raise ValueError("mode must be 'train' or 'config'")
|
| 654 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 655 |
resolved_trackio_space_id = trackio_space_id
|
| 656 |
resolved_output_repo_id = output_repo_id
|
| 657 |
if not resolved_trackio_space_id or not resolved_output_repo_id:
|
|
@@ -684,12 +863,28 @@ def main(
|
|
| 684 |
|
| 685 |
model_slug = model_name.replace("/", "-")
|
| 686 |
local_stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
| 687 |
-
|
| 688 |
f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-"
|
| 689 |
f"{local_stamp}-{git_sha[:8]}"
|
| 690 |
)
|
| 691 |
|
| 692 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 693 |
env_repo_id=env_repo_id,
|
| 694 |
output_repo_id=output_repo_id,
|
| 695 |
max_steps=max_steps,
|
|
@@ -705,17 +900,14 @@ def main(
|
|
| 705 |
num_generations=num_generations,
|
| 706 |
seed_start=seed_start,
|
| 707 |
git_sha=git_sha,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 708 |
)
|
| 709 |
-
|
| 710 |
-
|
| 711 |
-
|
| 712 |
-
print(f"Trackio Space: https://huggingface.co/spaces/{resolved_trackio_space_id}")
|
| 713 |
-
else:
|
| 714 |
-
print("Trackio Space: derived remotely from HF_TOKEN as <hf-user>/CyberSecurity_OWASP-trackio")
|
| 715 |
-
if resolved_output_repo_id:
|
| 716 |
-
print(f"Output model repo: https://huggingface.co/{resolved_output_repo_id}")
|
| 717 |
else:
|
| 718 |
-
|
| 719 |
-
|
| 720 |
-
"<hf-user>/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora"
|
| 721 |
-
)
|
|
|
|
| 19 |
import os
|
| 20 |
import pathlib
|
| 21 |
import subprocess
|
| 22 |
+
import sys
|
| 23 |
from datetime import datetime, timezone
|
| 24 |
from typing import Any
|
| 25 |
|
|
|
|
| 32 |
RUNS_DIR = pathlib.Path("/runs")
|
| 33 |
REMOTE_PROJECT = "/root/CyberSecurity_OWASP"
|
| 34 |
PROJECT_ROOT = pathlib.Path(__file__).resolve().parents[1]
|
| 35 |
+
PUBLIC_REPO_URL = "https://github.com/humandotlearning/CyberSecurity_OWASP.git"
|
| 36 |
+
PUBLIC_REPO_BRANCH = "master"
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def _load_local_env_file() -> None:
|
| 40 |
+
env_path = PROJECT_ROOT / ".env.local"
|
| 41 |
+
if not env_path.exists():
|
| 42 |
+
return
|
| 43 |
+
for raw_line in env_path.read_text(encoding="utf-8").splitlines():
|
| 44 |
+
line = raw_line.strip()
|
| 45 |
+
if not line or line.startswith("#") or "=" not in line:
|
| 46 |
+
continue
|
| 47 |
+
key, value = line.split("=", 1)
|
| 48 |
+
key = key.strip()
|
| 49 |
+
if key not in {"TRACKIO_PROJECT"}:
|
| 50 |
+
continue
|
| 51 |
+
value = value.strip().strip('"').strip("'")
|
| 52 |
+
os.environ.setdefault(key, value)
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
def _modal_secrets() -> list[modal.Secret]:
|
| 56 |
+
if _is_config_mode():
|
| 57 |
+
return []
|
| 58 |
+
return [modal.Secret.from_name(SECRET_NAME, required_keys=["HF_TOKEN"])]
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def _is_config_mode() -> bool:
|
| 62 |
+
args = sys.argv[1:]
|
| 63 |
+
for index, arg in enumerate(args):
|
| 64 |
+
if arg == "--mode" and index + 1 < len(args):
|
| 65 |
+
return args[index + 1] == "config"
|
| 66 |
+
if arg.startswith("--mode="):
|
| 67 |
+
return arg.split("=", 1)[1] == "config"
|
| 68 |
+
return False
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
_load_local_env_file()
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
def _cli_arg_value(name: str, default: str = "") -> str:
|
| 75 |
+
args = sys.argv[1:]
|
| 76 |
+
flag = f"--{name}"
|
| 77 |
+
for index, arg in enumerate(args):
|
| 78 |
+
if arg == flag and index + 1 < len(args):
|
| 79 |
+
return args[index + 1]
|
| 80 |
+
if arg.startswith(f"{flag}="):
|
| 81 |
+
return arg.split("=", 1)[1]
|
| 82 |
+
return default
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
def _source_mode() -> str:
|
| 86 |
+
return _cli_arg_value("source-mode", os.environ.get("MODAL_SOURCE_MODE", "local"))
|
| 87 |
|
| 88 |
|
| 89 |
def _training_image() -> modal.Image:
|
| 90 |
+
image = (
|
| 91 |
modal.Image.from_registry(
|
| 92 |
"nvidia/cuda:12.8.0-devel-ubuntu22.04",
|
| 93 |
add_python="3.11",
|
|
|
|
| 102 |
"datasets",
|
| 103 |
"huggingface_hub",
|
| 104 |
"peft",
|
| 105 |
+
"pillow",
|
| 106 |
"tokenizers",
|
| 107 |
"nvidia-ml-py",
|
| 108 |
"trackio>=0.25.0",
|
| 109 |
"transformers>=5.5.0",
|
| 110 |
"trl>=0.28.0",
|
| 111 |
"openenv-core[core]>=0.2.3",
|
|
|
|
| 112 |
)
|
| 113 |
.uv_pip_install(
|
| 114 |
"unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo",
|
| 115 |
"unsloth[base] @ git+https://github.com/unslothai/unsloth",
|
| 116 |
)
|
| 117 |
+
.uv_pip_install("pydantic==2.10.6")
|
| 118 |
.uv_pip_install("mergekit", "immutables==0.21", extra_options="--no-deps")
|
| 119 |
+
.uv_pip_install("llm-blender", "weave")
|
| 120 |
.uv_pip_install("trl>=0.28.0", "transformers>=5.5.0", "jmespath")
|
| 121 |
+
)
|
| 122 |
+
|
| 123 |
+
if _source_mode() == "public":
|
| 124 |
+
repo_url = _cli_arg_value("repo-url", PUBLIC_REPO_URL)
|
| 125 |
+
repo_branch = _cli_arg_value("repo-branch", PUBLIC_REPO_BRANCH)
|
| 126 |
+
image = image.run_commands(
|
| 127 |
+
f"git clone --depth 1 --branch {repo_branch} {repo_url} {REMOTE_PROJECT}",
|
| 128 |
+
f"python -m pip install -e {REMOTE_PROJECT}",
|
| 129 |
+
)
|
| 130 |
+
else:
|
| 131 |
+
image = image.add_local_dir(
|
| 132 |
PROJECT_ROOT,
|
| 133 |
remote_path=REMOTE_PROJECT,
|
| 134 |
copy=True,
|
|
|
|
| 141 |
"*.pyc",
|
| 142 |
],
|
| 143 |
)
|
| 144 |
+
image = image.run_commands(
|
| 145 |
f"python -m pip install -e {REMOTE_PROJECT}",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
)
|
| 147 |
+
|
| 148 |
+
return image.run_commands(
|
| 149 |
+
"python -c \"import os, torch; import transformers.utils.hub as hub; "
|
| 150 |
+
"hub.TRANSFORMERS_CACHE = getattr(hub, 'TRANSFORMERS_CACHE', "
|
| 151 |
+
"os.path.join(os.path.expanduser('~'), '.cache', 'huggingface', 'hub')); "
|
| 152 |
+
"from trl import GRPOConfig, GRPOTrainer; "
|
| 153 |
+
"from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import "
|
| 154 |
+
"CybersecurityOwaspEnvironment; print('trainer import ok', torch.__version__)\"",
|
| 155 |
+
).workdir(REMOTE_PROJECT)
|
| 156 |
|
| 157 |
|
| 158 |
app = modal.App(APP_NAME)
|
| 159 |
volume = modal.Volume.from_name(VOLUME_NAME, create_if_missing=True)
|
| 160 |
+
secrets = _modal_secrets()
|
| 161 |
|
| 162 |
|
| 163 |
@app.function(
|
|
|
|
| 165 |
gpu=["L4", "A10G"],
|
| 166 |
timeout=4 * 60 * 60,
|
| 167 |
volumes={RUNS_DIR: volume},
|
| 168 |
+
secrets=secrets,
|
| 169 |
)
|
| 170 |
def check_training_imports() -> dict[str, str]:
|
| 171 |
import torch
|
|
|
|
| 197 |
gpu=["L4", "A10G"],
|
| 198 |
timeout=4 * 60 * 60,
|
| 199 |
volumes={RUNS_DIR: volume},
|
| 200 |
+
secrets=secrets,
|
| 201 |
)
|
| 202 |
def train_cybersecurity_owasp_grpo(
|
| 203 |
env_repo_id: str = "",
|
|
|
|
| 215 |
num_generations: int = 2,
|
| 216 |
seed_start: int = 0,
|
| 217 |
git_sha: str = "nogit",
|
| 218 |
+
run_name: str = "",
|
| 219 |
+
source_mode: str = "local",
|
| 220 |
+
repo_url: str = PUBLIC_REPO_URL,
|
| 221 |
+
repo_branch: str = PUBLIC_REPO_BRANCH,
|
| 222 |
) -> dict[str, str | int | float]:
|
| 223 |
+
import inspect
|
| 224 |
import statistics
|
| 225 |
|
| 226 |
import torch
|
| 227 |
+
from unsloth import FastLanguageModel
|
| 228 |
import transformers.utils.hub as transformers_hub
|
| 229 |
from datasets import Dataset
|
| 230 |
from huggingface_hub import whoami
|
| 231 |
from transformers import TrainerCallback
|
| 232 |
+
from trl import GRPOConfig, GRPOTrainer, clone_chat_template
|
| 233 |
+
from trl.chat_template_utils import add_response_schema
|
| 234 |
|
| 235 |
import trackio
|
| 236 |
|
|
|
|
| 263 |
|
| 264 |
model_slug = model_name.replace("/", "-")
|
| 265 |
stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
| 266 |
+
run_name = run_name or (
|
| 267 |
+
f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-{stamp}-{git_sha[:8]}"
|
| 268 |
+
)
|
| 269 |
output_dir = RUNS_DIR / run_name
|
| 270 |
output_dir.mkdir(parents=True, exist_ok=True)
|
| 271 |
|
|
|
|
| 398 |
return self._step("read_openapi")
|
| 399 |
|
| 400 |
def read_file(self, path: str) -> str:
|
| 401 |
+
"""
|
| 402 |
+
Read an editable generated workspace file by relative path.
|
| 403 |
+
|
| 404 |
+
Args:
|
| 405 |
+
path: Relative path inside the generated editable workspace.
|
| 406 |
+
|
| 407 |
+
Returns:
|
| 408 |
+
The file contents or a safe tool error observation.
|
| 409 |
+
"""
|
| 410 |
return self._step("read_file", {"path": path})
|
| 411 |
|
| 412 |
def search_code(self, query: str) -> str:
|
| 413 |
+
"""
|
| 414 |
+
Search editable generated workspace files for a string.
|
| 415 |
+
|
| 416 |
+
Args:
|
| 417 |
+
query: Search text to find in editable generated app files.
|
| 418 |
+
|
| 419 |
+
Returns:
|
| 420 |
+
Matching file lines or a no-match message.
|
| 421 |
+
"""
|
| 422 |
return self._step("search_code", {"query": query})
|
| 423 |
|
| 424 |
def send_local_request(
|
|
|
|
| 427 |
method: str = "GET",
|
| 428 |
user_id: str | None = None,
|
| 429 |
) -> str:
|
| 430 |
+
"""
|
| 431 |
+
Send a request to the generated local app only.
|
| 432 |
+
|
| 433 |
+
Args:
|
| 434 |
+
path: Local route path such as /health or /invoices/<id>.
|
| 435 |
+
method: HTTP method to use for the local request.
|
| 436 |
+
user_id: Optional generated user identifier for authentication.
|
| 437 |
+
|
| 438 |
+
Returns:
|
| 439 |
+
JSON response from the simulated local app request.
|
| 440 |
+
"""
|
| 441 |
return self._step(
|
| 442 |
"send_local_request",
|
| 443 |
{"path": path, "method": method, "user_id": user_id},
|
|
|
|
| 450 |
second_user_id: str,
|
| 451 |
method: str = "GET",
|
| 452 |
) -> str:
|
| 453 |
+
"""
|
| 454 |
+
Compare one local request as two generated users.
|
| 455 |
+
|
| 456 |
+
Args:
|
| 457 |
+
path: Local route path to request as both generated users.
|
| 458 |
+
first_user_id: First generated user identifier.
|
| 459 |
+
second_user_id: Second generated user identifier.
|
| 460 |
+
method: HTTP method to use for both local requests.
|
| 461 |
+
|
| 462 |
+
Returns:
|
| 463 |
+
JSON summary of both simulated local responses.
|
| 464 |
+
"""
|
| 465 |
return self._step(
|
| 466 |
"compare_identities",
|
| 467 |
{
|
|
|
|
| 478 |
evidence: str,
|
| 479 |
policy_rule: str,
|
| 480 |
) -> str:
|
| 481 |
+
"""
|
| 482 |
+
Submit structured evidence for the suspected authorization bug.
|
| 483 |
+
|
| 484 |
+
Args:
|
| 485 |
+
summary: Concise description of the suspected access-control bug.
|
| 486 |
+
evidence: Local reproduction evidence from policy, code, or requests.
|
| 487 |
+
policy_rule: Policy rule that the observed behavior violates.
|
| 488 |
+
|
| 489 |
+
Returns:
|
| 490 |
+
Finding acceptance result and next phase information.
|
| 491 |
+
"""
|
| 492 |
return self._step(
|
| 493 |
"submit_finding",
|
| 494 |
{
|
|
|
|
| 504 |
content: str | None = None,
|
| 505 |
diff: str | None = None,
|
| 506 |
) -> str:
|
| 507 |
+
"""
|
| 508 |
+
Patch an editable generated app file with full content or a unified diff.
|
| 509 |
+
|
| 510 |
+
Args:
|
| 511 |
+
path: Relative path of the editable generated app file to patch.
|
| 512 |
+
content: Complete replacement file content, when using full-file patching.
|
| 513 |
+
diff: Unified diff to apply, when using diff patching.
|
| 514 |
+
|
| 515 |
+
Returns:
|
| 516 |
+
Patch application result.
|
| 517 |
+
"""
|
| 518 |
args: dict[str, Any] = {"path": path}
|
| 519 |
if content is not None:
|
| 520 |
args["content"] = content
|
|
|
|
| 665 |
return control
|
| 666 |
|
| 667 |
print(f"CUDA available: {torch.cuda.is_available()}")
|
| 668 |
+
if source_mode == "public":
|
| 669 |
+
print(f"Installed CyberSecurity_OWASP from public repo: {repo_url}@{repo_branch}")
|
| 670 |
+
else:
|
| 671 |
+
print(f"Packaged local CyberSecurity_OWASP repo; default env repo id: {env_repo_id}")
|
| 672 |
print(f"Trackio Space: {trackio_space_id}")
|
| 673 |
print(f"Trackio Project: {trackio_project}")
|
| 674 |
print(f"Output repo: {output_repo_id}")
|
|
|
|
| 681 |
fast_inference=False,
|
| 682 |
token=hf_token,
|
| 683 |
)
|
| 684 |
+
try:
|
| 685 |
+
tokenizer = add_response_schema(tokenizer)
|
| 686 |
+
except Exception as exc:
|
| 687 |
+
print(f"Tokenizer response schema add failed before cloning: {exc!r}")
|
| 688 |
+
model, tokenizer, added_tokens = clone_chat_template(
|
| 689 |
+
model,
|
| 690 |
+
tokenizer,
|
| 691 |
+
"Qwen/Qwen3-0.6B",
|
| 692 |
+
)
|
| 693 |
+
print(f"Cloned Qwen3 chat template; added {len(added_tokens)} tokens.")
|
| 694 |
+
tokenizer = add_response_schema(tokenizer)
|
| 695 |
+
|
| 696 |
model = FastLanguageModel.get_peft_model(
|
| 697 |
model,
|
| 698 |
r=lora_rank,
|
|
|
|
| 711 |
)
|
| 712 |
FastLanguageModel.for_training(model)
|
| 713 |
|
| 714 |
+
grpo_config_values = {
|
| 715 |
+
"temperature": 1.0,
|
| 716 |
+
"learning_rate": 5e-6,
|
| 717 |
+
"weight_decay": 0.001,
|
| 718 |
+
"warmup_ratio": 0.1,
|
| 719 |
+
"lr_scheduler_type": "linear",
|
| 720 |
+
"optim": "adamw_8bit",
|
| 721 |
+
"logging_steps": 1,
|
| 722 |
+
"per_device_train_batch_size": 1,
|
| 723 |
+
"gradient_accumulation_steps": max(2, num_generations),
|
| 724 |
+
"num_generations": num_generations,
|
| 725 |
+
"max_prompt_length": max_seq_length,
|
| 726 |
+
"max_completion_length": max_completion_length,
|
| 727 |
+
"max_steps": max_steps,
|
| 728 |
+
"save_steps": max(10, max_steps),
|
| 729 |
+
"report_to": "trackio",
|
| 730 |
+
"trackio_space_id": trackio_space_id,
|
| 731 |
+
"run_name": run_name,
|
| 732 |
+
"output_dir": str(output_dir),
|
| 733 |
+
"push_to_hub": True,
|
| 734 |
+
"hub_model_id": output_repo_id,
|
| 735 |
+
"hub_private_repo": True,
|
| 736 |
+
"hub_strategy": "every_save",
|
| 737 |
+
"gradient_checkpointing": True,
|
| 738 |
+
"gradient_checkpointing_kwargs": {"use_reentrant": False},
|
| 739 |
+
"epsilon": 0.2,
|
| 740 |
+
"epsilon_high": 0.28,
|
| 741 |
+
"delta": 1.5,
|
| 742 |
+
"loss_type": "bnpo",
|
| 743 |
+
"mask_truncated_completions": False,
|
| 744 |
+
}
|
| 745 |
+
grpo_config_parameters = set(inspect.signature(GRPOConfig).parameters)
|
| 746 |
+
skipped_config_keys = sorted(set(grpo_config_values) - grpo_config_parameters)
|
| 747 |
+
if skipped_config_keys:
|
| 748 |
+
print(f"Skipping unsupported GRPOConfig keys: {skipped_config_keys}")
|
| 749 |
training_args = GRPOConfig(
|
| 750 |
+
**{
|
| 751 |
+
key: value
|
| 752 |
+
for key, value in grpo_config_values.items()
|
| 753 |
+
if key in grpo_config_parameters
|
| 754 |
+
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 755 |
)
|
| 756 |
|
| 757 |
+
trainer_values = {
|
| 758 |
+
"model": model,
|
| 759 |
+
"processing_class": tokenizer,
|
| 760 |
+
"reward_funcs": cybersecurity_owasp_reward,
|
| 761 |
+
"args": training_args,
|
| 762 |
+
"train_dataset": dataset,
|
| 763 |
+
"environment_factory": CyberSecurityOWASPToolEnv,
|
| 764 |
+
"callbacks": [TrackioSystemMetricsCallback()],
|
| 765 |
+
}
|
| 766 |
+
trainer_parameters = set(inspect.signature(GRPOTrainer).parameters)
|
| 767 |
+
skipped_trainer_keys = sorted(set(trainer_values) - trainer_parameters)
|
| 768 |
+
if skipped_trainer_keys:
|
| 769 |
+
print(f"Skipping unsupported GRPOTrainer keys: {skipped_trainer_keys}")
|
| 770 |
trainer = GRPOTrainer(
|
| 771 |
+
**{
|
| 772 |
+
key: value
|
| 773 |
+
for key, value in trainer_values.items()
|
| 774 |
+
if key in trainer_parameters
|
| 775 |
+
}
|
|
|
|
|
|
|
| 776 |
)
|
| 777 |
trainer.train()
|
| 778 |
trainer.push_to_hub()
|
|
|
|
| 791 |
"model_name": model_name,
|
| 792 |
"max_completion_length": max_completion_length,
|
| 793 |
"num_generations": num_generations,
|
| 794 |
+
"source_mode": source_mode,
|
| 795 |
+
"repo_url": repo_url,
|
| 796 |
+
"repo_branch": repo_branch,
|
| 797 |
}
|
| 798 |
|
| 799 |
|
|
|
|
| 815 |
num_generations: int = 2,
|
| 816 |
seed_start: int = 0,
|
| 817 |
git_sha: str = "nogit",
|
| 818 |
+
source_mode: str = "local",
|
| 819 |
+
repo_url: str = PUBLIC_REPO_URL,
|
| 820 |
+
repo_branch: str = PUBLIC_REPO_BRANCH,
|
| 821 |
+
detach: bool = False,
|
| 822 |
) -> None:
|
| 823 |
if mode == "config":
|
| 824 |
result = check_training_imports.remote()
|
|
|
|
| 827 |
if mode != "train":
|
| 828 |
raise ValueError("mode must be 'train' or 'config'")
|
| 829 |
|
| 830 |
+
trackio_space_id = trackio_space_id or os.environ.get("TRACKIO_SPACE_ID", "")
|
| 831 |
+
trackio_project = trackio_project or os.environ.get(
|
| 832 |
+
"TRACKIO_PROJECT", "CyberSecurity_OWASP-grpo"
|
| 833 |
+
)
|
| 834 |
resolved_trackio_space_id = trackio_space_id
|
| 835 |
resolved_output_repo_id = output_repo_id
|
| 836 |
if not resolved_trackio_space_id or not resolved_output_repo_id:
|
|
|
|
| 863 |
|
| 864 |
model_slug = model_name.replace("/", "-")
|
| 865 |
local_stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
| 866 |
+
run_name = (
|
| 867 |
f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-"
|
| 868 |
f"{local_stamp}-{git_sha[:8]}"
|
| 869 |
)
|
| 870 |
|
| 871 |
+
print(f"Run name: {run_name}")
|
| 872 |
+
print(f"Source mode: {source_mode}")
|
| 873 |
+
if source_mode == "public":
|
| 874 |
+
print(f"Public repo: {repo_url}@{repo_branch}")
|
| 875 |
+
if resolved_trackio_space_id:
|
| 876 |
+
print(f"Trackio Space: https://huggingface.co/spaces/{resolved_trackio_space_id}")
|
| 877 |
+
else:
|
| 878 |
+
print("Trackio Space: derived remotely from HF_TOKEN as <hf-user>/CyberSecurity_OWASP-trackio")
|
| 879 |
+
if resolved_output_repo_id:
|
| 880 |
+
print(f"Output model repo: https://huggingface.co/{resolved_output_repo_id}")
|
| 881 |
+
else:
|
| 882 |
+
print(
|
| 883 |
+
"Output model repo: derived remotely from HF_TOKEN as "
|
| 884 |
+
"<hf-user>/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora"
|
| 885 |
+
)
|
| 886 |
+
|
| 887 |
+
kwargs = dict(
|
| 888 |
env_repo_id=env_repo_id,
|
| 889 |
output_repo_id=output_repo_id,
|
| 890 |
max_steps=max_steps,
|
|
|
|
| 900 |
num_generations=num_generations,
|
| 901 |
seed_start=seed_start,
|
| 902 |
git_sha=git_sha,
|
| 903 |
+
run_name=run_name,
|
| 904 |
+
source_mode=source_mode,
|
| 905 |
+
repo_url=repo_url,
|
| 906 |
+
repo_branch=repo_branch,
|
| 907 |
)
|
| 908 |
+
if detach:
|
| 909 |
+
call = train_cybersecurity_owasp_grpo.spawn(**kwargs)
|
| 910 |
+
print(f"Spawned Modal training call: {call.object_id}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 911 |
else:
|
| 912 |
+
result = train_cybersecurity_owasp_grpo.remote(**kwargs)
|
| 913 |
+
print(f"Training result: {result}")
|
|
|
|
|
|
scripts/smoke_test.sh
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
#!/usr/bin/env bash
|
| 2 |
set -euo pipefail
|
| 3 |
-
uv run
|
|
|
|
| 1 |
#!/usr/bin/env bash
|
| 2 |
set -euo pipefail
|
| 3 |
+
uv run python scripts/track_pytest.py tests/test_models.py tests/test_reset_step_state.py
|
scripts/track_pytest.py
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Run pytest and record the result as a Trackio run."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import argparse
|
| 6 |
+
import subprocess
|
| 7 |
+
import sys
|
| 8 |
+
import time
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
|
| 11 |
+
PROJECT_ROOT = Path(__file__).resolve().parents[1]
|
| 12 |
+
sys.path.insert(0, str(PROJECT_ROOT))
|
| 13 |
+
sys.path.insert(0, str(PROJECT_ROOT.parent))
|
| 14 |
+
|
| 15 |
+
from training.trackio_utils import build_run_name, get_git_sha, log_trackio_metrics, trackio_run
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def main() -> int:
|
| 19 |
+
parser = argparse.ArgumentParser(description="Run pytest with Trackio tracking.")
|
| 20 |
+
parser.add_argument("pytest_args", nargs="*", help="Arguments passed through to pytest.")
|
| 21 |
+
parser.add_argument("--run-name", default="", help="Trackio run name override.")
|
| 22 |
+
parser.add_argument("--difficulty", type=int, default=0)
|
| 23 |
+
args, passthrough = parser.parse_known_args()
|
| 24 |
+
|
| 25 |
+
run_name = args.run_name or build_run_name(
|
| 26 |
+
"pytest",
|
| 27 |
+
"smoke",
|
| 28 |
+
args.difficulty,
|
| 29 |
+
git_sha=get_git_sha(),
|
| 30 |
+
)
|
| 31 |
+
pytest_args = [*args.pytest_args, *passthrough] or ["tests"]
|
| 32 |
+
command = [sys.executable, "-m", "pytest", *pytest_args]
|
| 33 |
+
started = time.perf_counter()
|
| 34 |
+
|
| 35 |
+
with trackio_run(
|
| 36 |
+
run_name=run_name,
|
| 37 |
+
run_type="pytest",
|
| 38 |
+
config={
|
| 39 |
+
"command": " ".join(command),
|
| 40 |
+
"pytest_args": pytest_args,
|
| 41 |
+
},
|
| 42 |
+
group="smoke",
|
| 43 |
+
):
|
| 44 |
+
completed = subprocess.run(command)
|
| 45 |
+
duration = time.perf_counter() - started
|
| 46 |
+
log_trackio_metrics(
|
| 47 |
+
{
|
| 48 |
+
"smoke/pytest_exit_code": completed.returncode,
|
| 49 |
+
"smoke/pytest_passed": completed.returncode == 0,
|
| 50 |
+
"smoke/duration_seconds": duration,
|
| 51 |
+
},
|
| 52 |
+
step=0,
|
| 53 |
+
)
|
| 54 |
+
|
| 55 |
+
return completed.returncode
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
if __name__ == "__main__":
|
| 59 |
+
raise SystemExit(main())
|
server/Dockerfile
CHANGED
|
@@ -70,6 +70,7 @@ ENV PATH="/app/.venv/bin:$PATH"
|
|
| 70 |
|
| 71 |
# Set PYTHONPATH so imports work correctly
|
| 72 |
ENV PYTHONPATH="/app/env:$PYTHONPATH"
|
|
|
|
| 73 |
|
| 74 |
# Health check
|
| 75 |
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
|
|
|
| 70 |
|
| 71 |
# Set PYTHONPATH so imports work correctly
|
| 72 |
ENV PYTHONPATH="/app/env:$PYTHONPATH"
|
| 73 |
+
ENV ENABLE_WEB_INTERFACE=true
|
| 74 |
|
| 75 |
# Health check
|
| 76 |
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
server/action_tools.py
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Typed action tool dispatcher for the generated app sandbox."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import json
|
| 6 |
+
from dataclasses import dataclass
|
| 7 |
+
|
| 8 |
+
try:
|
| 9 |
+
from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPState
|
| 10 |
+
from .app_sandbox import AppSandbox
|
| 11 |
+
except ImportError: # pragma: no cover
|
| 12 |
+
from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
|
| 13 |
+
from server.app_sandbox import AppSandbox
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
@dataclass(frozen=True)
|
| 17 |
+
class ToolResult:
|
| 18 |
+
message: str
|
| 19 |
+
visible_test_result: str | None = None
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
class ActionTools:
|
| 23 |
+
"""Executes phase-gated, safe tools against one episode state."""
|
| 24 |
+
|
| 25 |
+
def __init__(
|
| 26 |
+
self,
|
| 27 |
+
state: CyberSecurityOWASPState,
|
| 28 |
+
visible_policy_hint: dict,
|
| 29 |
+
workspace_summary: dict,
|
| 30 |
+
):
|
| 31 |
+
self.state = state
|
| 32 |
+
self.visible_policy_hint = visible_policy_hint
|
| 33 |
+
self.workspace_summary = workspace_summary
|
| 34 |
+
self.sandbox = AppSandbox(state)
|
| 35 |
+
|
| 36 |
+
def execute(self, action: CyberSecurityOWASPAction) -> ToolResult:
|
| 37 |
+
args = action.arguments or {}
|
| 38 |
+
if action.tool_name == "noop":
|
| 39 |
+
return ToolResult("No operation.")
|
| 40 |
+
if action.tool_name == "inspect_policy_graph":
|
| 41 |
+
return ToolResult(json.dumps(self.visible_policy_hint, indent=2, sort_keys=True))
|
| 42 |
+
if action.tool_name == "list_routes":
|
| 43 |
+
return ToolResult(json.dumps(self.workspace_summary["routes"], indent=2))
|
| 44 |
+
if action.tool_name == "read_openapi":
|
| 45 |
+
return ToolResult(self.sandbox.read_openapi())
|
| 46 |
+
if action.tool_name == "read_file":
|
| 47 |
+
return ToolResult(self.sandbox.read_file(str(args.get("path", ""))))
|
| 48 |
+
if action.tool_name == "search_code":
|
| 49 |
+
return ToolResult(self.sandbox.search_code(str(args.get("query", ""))))
|
| 50 |
+
if action.tool_name == "send_local_request":
|
| 51 |
+
response = self.sandbox.send_local_request(
|
| 52 |
+
str(args.get("method", "GET")),
|
| 53 |
+
str(args.get("path", "")),
|
| 54 |
+
args.get("user_id"),
|
| 55 |
+
)
|
| 56 |
+
return ToolResult(json.dumps(response, indent=2, sort_keys=True))
|
| 57 |
+
if action.tool_name == "compare_identities":
|
| 58 |
+
response = self.sandbox.compare_identities(
|
| 59 |
+
str(args.get("method", "GET")),
|
| 60 |
+
str(args.get("path", "")),
|
| 61 |
+
str(args.get("first_user_id", "")),
|
| 62 |
+
str(args.get("second_user_id", "")),
|
| 63 |
+
)
|
| 64 |
+
return ToolResult(json.dumps(response, indent=2, sort_keys=True))
|
| 65 |
+
if action.tool_name == "patch_file":
|
| 66 |
+
result = self.sandbox.patch_file(
|
| 67 |
+
str(args.get("path", "")),
|
| 68 |
+
content=str(args["content"]) if "content" in args else None,
|
| 69 |
+
diff=str(args.get("diff", "")) if "content" not in args else None,
|
| 70 |
+
)
|
| 71 |
+
changed = "no diff" if not result["diff"].strip() else "diff recorded"
|
| 72 |
+
return ToolResult(f"Patched {result['path']} ({changed}).")
|
| 73 |
+
raise ValueError(f"Unhandled tool {action.tool_name}")
|
server/adversarial_designer.py
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Bounded adversarial scenario targeting for synthetic local lab episodes."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from dataclasses import dataclass
|
| 6 |
+
from typing import Any
|
| 7 |
+
|
| 8 |
+
try:
|
| 9 |
+
from .curriculum import WEAKNESS_TARGETS
|
| 10 |
+
except ImportError: # pragma: no cover
|
| 11 |
+
from server.curriculum import WEAKNESS_TARGETS
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
TARGET_SPECS: dict[str, dict[str, Any]] = {
|
| 15 |
+
"same_role_cross_object": {
|
| 16 |
+
"description": "Same-role actor tries to read another user's object.",
|
| 17 |
+
"hidden_focus": ["same_tenant_other_user_blocked"],
|
| 18 |
+
},
|
| 19 |
+
"cross_tenant_boundary": {
|
| 20 |
+
"description": "Tenant-local admin is denied access to another tenant's resource.",
|
| 21 |
+
"hidden_focus": ["cross_tenant_blocked"],
|
| 22 |
+
},
|
| 23 |
+
"public_route_overlock": {
|
| 24 |
+
"description": "Public health route must remain unauthenticated after patching.",
|
| 25 |
+
"hidden_focus": ["health_public"],
|
| 26 |
+
},
|
| 27 |
+
"alternate_route_same_service": {
|
| 28 |
+
"description": "Alternate route/service access should follow the same policy oracle.",
|
| 29 |
+
"hidden_focus": ["oracle_matrix"],
|
| 30 |
+
},
|
| 31 |
+
"visible_test_edge_case": {
|
| 32 |
+
"description": "Visible tests are insufficient; hidden policy matrix decides success.",
|
| 33 |
+
"hidden_focus": ["visible_test_only_guard"],
|
| 34 |
+
},
|
| 35 |
+
}
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
@dataclass(frozen=True)
|
| 39 |
+
class BoundedAdversarialDesigner:
|
| 40 |
+
"""Chooses safe local lab variants that target tracked agent weaknesses."""
|
| 41 |
+
|
| 42 |
+
def design(self, *, seed: int, split: str, curriculum_profile: dict[str, Any]) -> dict[str, Any]:
|
| 43 |
+
target = str(curriculum_profile.get("target_weakness") or "")
|
| 44 |
+
if target not in TARGET_SPECS:
|
| 45 |
+
target = WEAKNESS_TARGETS[int(seed) % len(WEAKNESS_TARGETS)]
|
| 46 |
+
family = f"invoices.bola_idor.{target}"
|
| 47 |
+
if split == "hidden_eval":
|
| 48 |
+
family = f"heldout.{family}"
|
| 49 |
+
spec = TARGET_SPECS[target]
|
| 50 |
+
return {
|
| 51 |
+
"domain": "invoices",
|
| 52 |
+
"bug_family": "bola_idor",
|
| 53 |
+
"template_id": "fastapi_basic",
|
| 54 |
+
"scenario_family": family,
|
| 55 |
+
"target_weakness": target,
|
| 56 |
+
"hidden_focus": list(spec["hidden_focus"]),
|
| 57 |
+
"description": spec["description"],
|
| 58 |
+
"safe_lab_only": True,
|
| 59 |
+
}
|
server/app.py
CHANGED
|
@@ -6,6 +6,13 @@
|
|
| 6 |
|
| 7 |
"""FastAPI application for the CyberSecurity_OWASP OpenEnv server."""
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
try:
|
| 10 |
from openenv.core.env_server.http_server import create_app
|
| 11 |
except Exception as e: # pragma: no cover
|
|
|
|
| 6 |
|
| 7 |
"""FastAPI application for the CyberSecurity_OWASP OpenEnv server."""
|
| 8 |
|
| 9 |
+
import os
|
| 10 |
+
|
| 11 |
+
# OpenEnv disables the Gradio playground unless this flag is enabled. Default it
|
| 12 |
+
# on so Docker/HF Spaces show the reset/step/state UI, while explicit env values
|
| 13 |
+
# such as ENABLE_WEB_INTERFACE=false still take precedence.
|
| 14 |
+
os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
|
| 15 |
+
|
| 16 |
try:
|
| 17 |
from openenv.core.env_server.http_server import create_app
|
| 18 |
except Exception as e: # pragma: no cover
|
server/app_sandbox.py
ADDED
|
@@ -0,0 +1,139 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Ephemeral generated app sandbox operations."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import difflib
|
| 6 |
+
import json
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from typing import Any
|
| 9 |
+
|
| 10 |
+
try:
|
| 11 |
+
from ..models import CyberSecurityOWASPState
|
| 12 |
+
from ..safety import is_local_route
|
| 13 |
+
from ..validators import is_path_allowed, simulate_request
|
| 14 |
+
except ImportError: # pragma: no cover
|
| 15 |
+
from models import CyberSecurityOWASPState
|
| 16 |
+
from safety import is_local_route
|
| 17 |
+
from validators import is_path_allowed, simulate_request
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
class AppSandbox:
|
| 21 |
+
"""Encapsulates all generated workspace reads, patches, and local requests."""
|
| 22 |
+
|
| 23 |
+
def __init__(self, state: CyberSecurityOWASPState):
|
| 24 |
+
self.state = state
|
| 25 |
+
|
| 26 |
+
@property
|
| 27 |
+
def workspace(self) -> Path:
|
| 28 |
+
return Path(str(self.state.hidden_facts["workspace"]))
|
| 29 |
+
|
| 30 |
+
def read_file(self, path: str) -> str:
|
| 31 |
+
return self._resolve_path(path).read_text(encoding="utf-8")
|
| 32 |
+
|
| 33 |
+
def search_code(self, query: str) -> str:
|
| 34 |
+
if not query:
|
| 35 |
+
raise ValueError("query is required")
|
| 36 |
+
results: list[str] = []
|
| 37 |
+
for rel in self.state.hidden_facts.get("editable_files", []):
|
| 38 |
+
path = self.workspace / rel
|
| 39 |
+
text = path.read_text(encoding="utf-8")
|
| 40 |
+
for idx, line in enumerate(text.splitlines(), start=1):
|
| 41 |
+
if query.lower() in line.lower():
|
| 42 |
+
results.append(f"{rel}:{idx}: {line}")
|
| 43 |
+
return "\n".join(results) or "No matches."
|
| 44 |
+
|
| 45 |
+
def patch_file(self, path: str, *, content: str | None = None, diff: str | None = None) -> dict[str, str]:
|
| 46 |
+
target = self._resolve_path(path, write=True)
|
| 47 |
+
before = target.read_text(encoding="utf-8")
|
| 48 |
+
if content is not None:
|
| 49 |
+
target.write_text(content, encoding="utf-8")
|
| 50 |
+
else:
|
| 51 |
+
self._apply_unified_diff(target, diff or "")
|
| 52 |
+
after = target.read_text(encoding="utf-8")
|
| 53 |
+
patch_diff = "".join(
|
| 54 |
+
difflib.unified_diff(
|
| 55 |
+
before.splitlines(True),
|
| 56 |
+
after.splitlines(True),
|
| 57 |
+
fromfile=path,
|
| 58 |
+
tofile=path,
|
| 59 |
+
)
|
| 60 |
+
)
|
| 61 |
+
self.state.patch_diff = patch_diff
|
| 62 |
+
files_touched = self.state.metrics.setdefault("files_touched", [])
|
| 63 |
+
if path not in files_touched:
|
| 64 |
+
files_touched.append(path)
|
| 65 |
+
return {"path": path, "diff": patch_diff}
|
| 66 |
+
|
| 67 |
+
def read_openapi(self) -> str:
|
| 68 |
+
routes = self.state.visible_facts.get("workspace_summary", {}).get("routes", [])
|
| 69 |
+
paths: dict[str, Any] = {}
|
| 70 |
+
for route in routes:
|
| 71 |
+
paths.setdefault(route["path"], {})[route["method"].lower()] = {
|
| 72 |
+
"x-public": bool(route.get("public", False))
|
| 73 |
+
}
|
| 74 |
+
return json.dumps(
|
| 75 |
+
{
|
| 76 |
+
"openapi": "3.1.0",
|
| 77 |
+
"info": {"title": "Generated invoices app", "version": "0.1.0"},
|
| 78 |
+
"paths": paths,
|
| 79 |
+
},
|
| 80 |
+
indent=2,
|
| 81 |
+
sort_keys=True,
|
| 82 |
+
)
|
| 83 |
+
|
| 84 |
+
def send_local_request(self, method: str, path: str, user_id: str | None = None) -> dict[str, Any]:
|
| 85 |
+
if not is_local_route(path):
|
| 86 |
+
raise ValueError("send_local_request only accepts local route paths")
|
| 87 |
+
return simulate_request(self.state, method, path, user_id)
|
| 88 |
+
|
| 89 |
+
def compare_identities(
|
| 90 |
+
self,
|
| 91 |
+
method: str,
|
| 92 |
+
path: str,
|
| 93 |
+
first_user_id: str,
|
| 94 |
+
second_user_id: str,
|
| 95 |
+
) -> dict[str, Any]:
|
| 96 |
+
if not is_local_route(path):
|
| 97 |
+
raise ValueError("compare_identities only accepts local route paths")
|
| 98 |
+
return {
|
| 99 |
+
"first": simulate_request(self.state, method, path, first_user_id),
|
| 100 |
+
"second": simulate_request(self.state, method, path, second_user_id),
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
def _resolve_path(self, path: str, *, write: bool = False) -> Path:
|
| 104 |
+
allowed, normalized_or_error = is_path_allowed(self.state, path, write=write)
|
| 105 |
+
if not allowed:
|
| 106 |
+
raise ValueError(normalized_or_error)
|
| 107 |
+
return self.workspace / normalized_or_error
|
| 108 |
+
|
| 109 |
+
def _apply_unified_diff(self, path: Path, diff: str) -> None:
|
| 110 |
+
if not diff.strip():
|
| 111 |
+
raise ValueError("diff or content is required")
|
| 112 |
+
original = path.read_text(encoding="utf-8").splitlines(True)
|
| 113 |
+
output: list[str] = []
|
| 114 |
+
old_index = 0
|
| 115 |
+
lines = diff.splitlines(True)
|
| 116 |
+
i = 0
|
| 117 |
+
while i < len(lines):
|
| 118 |
+
line = lines[i]
|
| 119 |
+
if not line.startswith("@@"):
|
| 120 |
+
i += 1
|
| 121 |
+
continue
|
| 122 |
+
old_start = int(line.split()[1].split(",")[0][1:])
|
| 123 |
+
output.extend(original[old_index : old_start - 1])
|
| 124 |
+
old_index = old_start - 1
|
| 125 |
+
i += 1
|
| 126 |
+
while i < len(lines) and not lines[i].startswith("@@"):
|
| 127 |
+
hunk_line = lines[i]
|
| 128 |
+
if hunk_line.startswith(" "):
|
| 129 |
+
output.append(original[old_index])
|
| 130 |
+
old_index += 1
|
| 131 |
+
elif hunk_line.startswith("-"):
|
| 132 |
+
old_index += 1
|
| 133 |
+
elif hunk_line.startswith("+"):
|
| 134 |
+
output.append(hunk_line[1:])
|
| 135 |
+
elif hunk_line.startswith("\\"):
|
| 136 |
+
pass
|
| 137 |
+
i += 1
|
| 138 |
+
output.extend(original[old_index:])
|
| 139 |
+
path.write_text("".join(output), encoding="utf-8")
|
server/authz_oracle.py
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Policy-oracle matrix for deterministic authorization verification."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from dataclasses import dataclass
|
| 6 |
+
from typing import Any
|
| 7 |
+
|
| 8 |
+
try:
|
| 9 |
+
from ..models import CyberSecurityOWASPState
|
| 10 |
+
from ..validators import simulate_request
|
| 11 |
+
except ImportError: # pragma: no cover
|
| 12 |
+
from models import CyberSecurityOWASPState
|
| 13 |
+
from validators import simulate_request
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
@dataclass(frozen=True)
|
| 17 |
+
class AuthzOracle:
|
| 18 |
+
"""Builds hidden allowed/denied tuples from policy facts."""
|
| 19 |
+
|
| 20 |
+
def matrix(self, state: CyberSecurityOWASPState) -> list[dict[str, Any]]:
|
| 21 |
+
hidden = state.hidden_facts
|
| 22 |
+
return [
|
| 23 |
+
{
|
| 24 |
+
"name": "owner_can_read_owned_invoice",
|
| 25 |
+
"method": "GET",
|
| 26 |
+
"path": f"/invoices/{hidden['owner_invoice_id']}",
|
| 27 |
+
"user_id": hidden["owner_user_id"],
|
| 28 |
+
"expected_status": 200,
|
| 29 |
+
"allowed": True,
|
| 30 |
+
},
|
| 31 |
+
{
|
| 32 |
+
"name": "resource_owner_can_read_own_invoice",
|
| 33 |
+
"method": "GET",
|
| 34 |
+
"path": f"/invoices/{hidden['other_invoice_id']}",
|
| 35 |
+
"user_id": hidden["intruder_user_id"],
|
| 36 |
+
"expected_status": 200,
|
| 37 |
+
"allowed": True,
|
| 38 |
+
},
|
| 39 |
+
{
|
| 40 |
+
"name": "billing_admin_can_read_same_tenant_invoice",
|
| 41 |
+
"method": "GET",
|
| 42 |
+
"path": f"/invoices/{hidden['other_invoice_id']}",
|
| 43 |
+
"user_id": hidden["admin_user_id"],
|
| 44 |
+
"expected_status": 200,
|
| 45 |
+
"allowed": True,
|
| 46 |
+
},
|
| 47 |
+
{
|
| 48 |
+
"name": "same_tenant_non_owner_denied",
|
| 49 |
+
"method": "GET",
|
| 50 |
+
"path": f"/invoices/{hidden['other_invoice_id']}",
|
| 51 |
+
"user_id": hidden["owner_user_id"],
|
| 52 |
+
"expected_status": 403,
|
| 53 |
+
"allowed": False,
|
| 54 |
+
},
|
| 55 |
+
{
|
| 56 |
+
"name": "cross_tenant_admin_denied",
|
| 57 |
+
"method": "GET",
|
| 58 |
+
"path": f"/invoices/{hidden['foreign_invoice_id']}",
|
| 59 |
+
"user_id": hidden["admin_user_id"],
|
| 60 |
+
"expected_status": 403,
|
| 61 |
+
"allowed": False,
|
| 62 |
+
},
|
| 63 |
+
{
|
| 64 |
+
"name": "health_remains_public",
|
| 65 |
+
"method": "GET",
|
| 66 |
+
"path": "/health",
|
| 67 |
+
"user_id": None,
|
| 68 |
+
"expected_status": 200,
|
| 69 |
+
"allowed": True,
|
| 70 |
+
},
|
| 71 |
+
]
|
| 72 |
+
|
| 73 |
+
def evaluate(self, state: CyberSecurityOWASPState) -> dict[str, Any]:
|
| 74 |
+
cases = []
|
| 75 |
+
for case in self.matrix(state):
|
| 76 |
+
response = simulate_request(
|
| 77 |
+
state,
|
| 78 |
+
str(case["method"]),
|
| 79 |
+
str(case["path"]),
|
| 80 |
+
case.get("user_id"),
|
| 81 |
+
)
|
| 82 |
+
actual = int(response["status"])
|
| 83 |
+
cases.append(
|
| 84 |
+
{
|
| 85 |
+
"name": case["name"],
|
| 86 |
+
"allowed": bool(case["allowed"]),
|
| 87 |
+
"expected_status": int(case["expected_status"]),
|
| 88 |
+
"actual_status": actual,
|
| 89 |
+
"passed": actual == int(case["expected_status"]),
|
| 90 |
+
}
|
| 91 |
+
)
|
| 92 |
+
return {"passed": all(case["passed"] for case in cases), "cases": cases}
|
server/curriculum.py
ADDED
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Runtime curriculum controller for closed-loop scenario selection."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from collections import defaultdict, deque
|
| 6 |
+
from dataclasses import dataclass, field
|
| 7 |
+
from typing import Any
|
| 8 |
+
|
| 9 |
+
try:
|
| 10 |
+
from ..models import CyberSecurityOWASPState
|
| 11 |
+
except ImportError: # pragma: no cover
|
| 12 |
+
from models import CyberSecurityOWASPState
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
DIFFICULTY_TIERS = ("warmup", "beginner", "intermediate", "advanced", "expert")
|
| 16 |
+
WEAKNESS_TARGETS = (
|
| 17 |
+
"same_role_cross_object",
|
| 18 |
+
"cross_tenant_boundary",
|
| 19 |
+
"public_route_overlock",
|
| 20 |
+
"alternate_route_same_service",
|
| 21 |
+
"visible_test_edge_case",
|
| 22 |
+
)
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
@dataclass
|
| 26 |
+
class CurriculumController:
|
| 27 |
+
"""Tracks episode outcomes and picks the next bounded weakness target."""
|
| 28 |
+
|
| 29 |
+
window_size: int = 10
|
| 30 |
+
reward_trend: deque[float] = field(default_factory=lambda: deque(maxlen=10))
|
| 31 |
+
outcomes_by_target: dict[str, list[bool]] = field(default_factory=lambda: defaultdict(list))
|
| 32 |
+
failures_by_target: dict[str, int] = field(default_factory=lambda: defaultdict(int))
|
| 33 |
+
episodes_seen: int = 0
|
| 34 |
+
|
| 35 |
+
def select_profile(
|
| 36 |
+
self,
|
| 37 |
+
*,
|
| 38 |
+
seed: int,
|
| 39 |
+
split: str = "train",
|
| 40 |
+
requested_difficulty: int = 0,
|
| 41 |
+
) -> dict[str, Any]:
|
| 42 |
+
difficulty = self._difficulty_for_split(split, requested_difficulty)
|
| 43 |
+
target = self._target_for_seed(seed, split)
|
| 44 |
+
if self.failures_by_target:
|
| 45 |
+
target = max(
|
| 46 |
+
WEAKNESS_TARGETS,
|
| 47 |
+
key=lambda item: (self.failures_by_target.get(item, 0), -WEAKNESS_TARGETS.index(item)),
|
| 48 |
+
)
|
| 49 |
+
return {
|
| 50 |
+
"difficulty": difficulty,
|
| 51 |
+
"difficulty_tier": DIFFICULTY_TIERS[min(difficulty, len(DIFFICULTY_TIERS) - 1)],
|
| 52 |
+
"target_weakness": target,
|
| 53 |
+
"split": split,
|
| 54 |
+
"episodes_seen": self.episodes_seen,
|
| 55 |
+
"recent_reward_mean": self._recent_reward_mean(),
|
| 56 |
+
"mastery": self.mastery_snapshot(),
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
def record_episode(self, state: CyberSecurityOWASPState) -> dict[str, Any]:
|
| 60 |
+
target = state.target_weakness or "same_role_cross_object"
|
| 61 |
+
success = bool(state.success)
|
| 62 |
+
self.episodes_seen += 1
|
| 63 |
+
self.outcomes_by_target[target].append(success)
|
| 64 |
+
if not success:
|
| 65 |
+
self.failures_by_target[target] += 1
|
| 66 |
+
self.reward_trend.append(float(state.last_reward or 0.0))
|
| 67 |
+
return self.mastery_snapshot()
|
| 68 |
+
|
| 69 |
+
def mastery_snapshot(self) -> dict[str, Any]:
|
| 70 |
+
target_mastery = {}
|
| 71 |
+
for target in WEAKNESS_TARGETS:
|
| 72 |
+
outcomes = self.outcomes_by_target.get(target, [])
|
| 73 |
+
target_mastery[target] = {
|
| 74 |
+
"episodes": len(outcomes),
|
| 75 |
+
"success_rate": sum(1 for item in outcomes if item) / max(1, len(outcomes)),
|
| 76 |
+
"failures": self.failures_by_target.get(target, 0),
|
| 77 |
+
}
|
| 78 |
+
return {
|
| 79 |
+
"episodes_seen": self.episodes_seen,
|
| 80 |
+
"recent_reward_mean": self._recent_reward_mean(),
|
| 81 |
+
"target_mastery": target_mastery,
|
| 82 |
+
}
|
| 83 |
+
|
| 84 |
+
def _difficulty_for_split(self, split: str, requested_difficulty: int) -> int:
|
| 85 |
+
difficulty = max(0, min(int(requested_difficulty), len(DIFFICULTY_TIERS) - 1))
|
| 86 |
+
if split == "hidden_eval":
|
| 87 |
+
return max(3, difficulty)
|
| 88 |
+
if self.episodes_seen >= self.window_size and self._recent_reward_mean() > 10.0:
|
| 89 |
+
return min(difficulty + 1, len(DIFFICULTY_TIERS) - 1)
|
| 90 |
+
return difficulty
|
| 91 |
+
|
| 92 |
+
def _target_for_seed(self, seed: int, split: str) -> str:
|
| 93 |
+
offset = 2 if split == "hidden_eval" else 0
|
| 94 |
+
return WEAKNESS_TARGETS[(int(seed) + offset) % len(WEAKNESS_TARGETS)]
|
| 95 |
+
|
| 96 |
+
def _recent_reward_mean(self) -> float:
|
| 97 |
+
if not self.reward_trend:
|
| 98 |
+
return 0.0
|
| 99 |
+
return sum(self.reward_trend) / len(self.reward_trend)
|
server/episode_logger.py
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Episode artifact logging for training, debugging, and demos."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import json
|
| 6 |
+
import os
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from typing import Any
|
| 9 |
+
|
| 10 |
+
try:
|
| 11 |
+
from ..models import CyberSecurityOWASPState
|
| 12 |
+
except ImportError: # pragma: no cover
|
| 13 |
+
from models import CyberSecurityOWASPState
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
class EpisodeArtifactLogger:
|
| 17 |
+
"""Appends compact JSONL episode transcripts under outputs/rollouts."""
|
| 18 |
+
|
| 19 |
+
def __init__(self, output_path: str | Path | None = None):
|
| 20 |
+
configured = output_path or os.getenv("CYBERSECURITY_OWASP_EPISODE_LOG")
|
| 21 |
+
self.output_path = Path(configured) if configured else Path("outputs/rollouts/episodes.jsonl")
|
| 22 |
+
|
| 23 |
+
def log_episode(
|
| 24 |
+
self,
|
| 25 |
+
state: CyberSecurityOWASPState,
|
| 26 |
+
*,
|
| 27 |
+
final_observation: dict[str, Any] | None = None,
|
| 28 |
+
) -> Path:
|
| 29 |
+
self.output_path.parent.mkdir(parents=True, exist_ok=True)
|
| 30 |
+
record = {
|
| 31 |
+
"episode_id": state.episode_id,
|
| 32 |
+
"task_id": state.task_id,
|
| 33 |
+
"seed": state.seed,
|
| 34 |
+
"split": state.split,
|
| 35 |
+
"difficulty": state.difficulty,
|
| 36 |
+
"difficulty_tier": state.difficulty_tier,
|
| 37 |
+
"template_id": state.template_id,
|
| 38 |
+
"scenario_family": state.scenario_family,
|
| 39 |
+
"domain": state.domain,
|
| 40 |
+
"bug_family": state.bug_family,
|
| 41 |
+
"target_weakness": state.target_weakness,
|
| 42 |
+
"agent_actions": state.action_history,
|
| 43 |
+
"observations": state.observation_history,
|
| 44 |
+
"final_observation": final_observation or {},
|
| 45 |
+
"patch_diff": state.patch_diff,
|
| 46 |
+
"visible_test_result": self._verifier_layer(state, "visible"),
|
| 47 |
+
"hidden_test_result": self._verifier_layer(state, "hidden_tests"),
|
| 48 |
+
"oracle_result": self._verifier_layer(state, "oracle_matrix"),
|
| 49 |
+
"regression_result": self._verifier_layer(state, "regression"),
|
| 50 |
+
"reward_breakdown": state.reward_history[-1] if state.reward_history else {},
|
| 51 |
+
"reward_breakdown_by_step": state.reward_history,
|
| 52 |
+
"final_status": "resolved" if state.success else "failed",
|
| 53 |
+
"failure_reason": state.failure_reason,
|
| 54 |
+
"safety_violations": [
|
| 55 |
+
flag for flag in state.anti_cheat_flags if "network" in flag or "unsafe" in flag
|
| 56 |
+
],
|
| 57 |
+
"anti_cheat_flags": state.anti_cheat_flags,
|
| 58 |
+
"metrics": state.metrics,
|
| 59 |
+
}
|
| 60 |
+
with self.output_path.open("a", encoding="utf-8") as handle:
|
| 61 |
+
handle.write(json.dumps(record, sort_keys=True) + "\n")
|
| 62 |
+
state.episode_artifact_path = str(self.output_path)
|
| 63 |
+
return self.output_path
|
| 64 |
+
|
| 65 |
+
def _verifier_layer(self, state: CyberSecurityOWASPState, key: str) -> Any:
|
| 66 |
+
return (state.verification_summary or {}).get(key)
|
server/reward_engine.py
CHANGED
|
@@ -5,45 +5,24 @@ from __future__ import annotations
|
|
| 5 |
try:
|
| 6 |
from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPState
|
| 7 |
from ..rewards import compute_reward
|
| 8 |
-
from .
|
| 9 |
-
patch_quality,
|
| 10 |
-
run_hidden_regression_tests,
|
| 11 |
-
run_hidden_security_tests,
|
| 12 |
-
run_public_route_tests,
|
| 13 |
-
run_visible_tests,
|
| 14 |
-
verify_finding,
|
| 15 |
-
)
|
| 16 |
except ImportError: # pragma: no cover
|
| 17 |
from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
|
| 18 |
from rewards import compute_reward
|
| 19 |
-
from
|
| 20 |
-
patch_quality,
|
| 21 |
-
run_hidden_regression_tests,
|
| 22 |
-
run_hidden_security_tests,
|
| 23 |
-
run_public_route_tests,
|
| 24 |
-
run_visible_tests,
|
| 25 |
-
verify_finding,
|
| 26 |
-
)
|
| 27 |
|
| 28 |
|
| 29 |
def evaluate_action(
|
| 30 |
state: CyberSecurityOWASPState,
|
| 31 |
action: CyberSecurityOWASPAction,
|
| 32 |
anti_cheat_flags: list[str] | None = None,
|
|
|
|
|
|
|
| 33 |
) -> tuple[dict, dict[str, float]]:
|
| 34 |
-
verifier_result
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
verifier_result.update(
|
| 41 |
-
{
|
| 42 |
-
"visible": run_visible_tests(state),
|
| 43 |
-
"security": run_hidden_security_tests(state),
|
| 44 |
-
"regression": run_hidden_regression_tests(state),
|
| 45 |
-
"public_routes": run_public_route_tests(state),
|
| 46 |
-
"patch_quality": patch_quality(state),
|
| 47 |
-
}
|
| 48 |
-
)
|
| 49 |
return verifier_result, compute_reward(state, action, verifier_result)
|
|
|
|
| 5 |
try:
|
| 6 |
from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPState
|
| 7 |
from ..rewards import compute_reward
|
| 8 |
+
from .verifier import MultiLayerVerifier
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
except ImportError: # pragma: no cover
|
| 10 |
from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
|
| 11 |
from rewards import compute_reward
|
| 12 |
+
from server.verifier import MultiLayerVerifier
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
|
| 15 |
def evaluate_action(
|
| 16 |
state: CyberSecurityOWASPState,
|
| 17 |
action: CyberSecurityOWASPAction,
|
| 18 |
anti_cheat_flags: list[str] | None = None,
|
| 19 |
+
*,
|
| 20 |
+
invalid_action: bool = False,
|
| 21 |
) -> tuple[dict, dict[str, float]]:
|
| 22 |
+
verifier_result = MultiLayerVerifier().evaluate_action(
|
| 23 |
+
state,
|
| 24 |
+
action,
|
| 25 |
+
anti_cheat_flags,
|
| 26 |
+
invalid_action=invalid_action,
|
| 27 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
return verifier_result, compute_reward(state, action, verifier_result)
|
server/scenario_factory.py
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Closed-loop scenario factory for CyberSecurity_OWASP."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
import tempfile
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from typing import Any
|
| 9 |
+
from uuid import uuid4
|
| 10 |
+
|
| 11 |
+
try:
|
| 12 |
+
from ..fixture_generator import visible_workspace_summary
|
| 13 |
+
from ..policy_graph import build_invoice_policy
|
| 14 |
+
from ..template_renderer import render_fastapi_basic
|
| 15 |
+
from .adversarial_designer import BoundedAdversarialDesigner
|
| 16 |
+
except ImportError: # pragma: no cover
|
| 17 |
+
from fixture_generator import visible_workspace_summary
|
| 18 |
+
from policy_graph import build_invoice_policy
|
| 19 |
+
from template_renderer import render_fastapi_basic
|
| 20 |
+
from server.adversarial_designer import BoundedAdversarialDesigner
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def _make_workspace(prefix: str) -> Path:
|
| 24 |
+
root = Path(os.getenv("CYBERSECURITY_OWASP_WORKSPACE_ROOT", tempfile.gettempdir()))
|
| 25 |
+
root.mkdir(parents=True, exist_ok=True)
|
| 26 |
+
for _ in range(100):
|
| 27 |
+
workspace = root / f"{prefix}{uuid4().hex[:12]}"
|
| 28 |
+
try:
|
| 29 |
+
workspace.mkdir()
|
| 30 |
+
except FileExistsError:
|
| 31 |
+
continue
|
| 32 |
+
return workspace
|
| 33 |
+
raise RuntimeError("Unable to create isolated scenario workspace")
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def _visible_policy_hint(public_hint: dict[str, Any]) -> dict[str, Any]:
|
| 37 |
+
"""Return partial policy observability without hidden oracle/test labels."""
|
| 38 |
+
|
| 39 |
+
return {
|
| 40 |
+
"domain": public_hint.get("domain", "invoices"),
|
| 41 |
+
"policy_rules": list(public_hint.get("policy_rules", [])),
|
| 42 |
+
"fixture_aliases": {
|
| 43 |
+
"users": dict(public_hint.get("users", {})),
|
| 44 |
+
"resources": dict(public_hint.get("resources", {})),
|
| 45 |
+
},
|
| 46 |
+
"public_routes": list(public_hint.get("public_routes", [])),
|
| 47 |
+
"observation_contract": {
|
| 48 |
+
"visible": [
|
| 49 |
+
"product policy summary",
|
| 50 |
+
"fixture aliases needed for local requests",
|
| 51 |
+
"route summaries",
|
| 52 |
+
"visible test results",
|
| 53 |
+
],
|
| 54 |
+
"hidden": [
|
| 55 |
+
"oracle matrix",
|
| 56 |
+
"hidden invariant tests",
|
| 57 |
+
"injected bug label",
|
| 58 |
+
"held-out family label",
|
| 59 |
+
],
|
| 60 |
+
},
|
| 61 |
+
}
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
class ScenarioFactory:
|
| 65 |
+
"""Compiles deterministic local app scenarios from curriculum profiles."""
|
| 66 |
+
|
| 67 |
+
def __init__(self, designer: BoundedAdversarialDesigner | None = None):
|
| 68 |
+
self.designer = designer or BoundedAdversarialDesigner()
|
| 69 |
+
|
| 70 |
+
def compile_scenario(
|
| 71 |
+
self,
|
| 72 |
+
seed: int,
|
| 73 |
+
*,
|
| 74 |
+
split: str = "train",
|
| 75 |
+
difficulty: int = 0,
|
| 76 |
+
curriculum_profile: dict[str, Any] | None = None,
|
| 77 |
+
) -> dict[str, Any]:
|
| 78 |
+
profile = curriculum_profile or {
|
| 79 |
+
"difficulty": difficulty,
|
| 80 |
+
"difficulty_tier": "warmup",
|
| 81 |
+
"target_weakness": "same_role_cross_object",
|
| 82 |
+
}
|
| 83 |
+
adversarial_spec = self.designer.design(
|
| 84 |
+
seed=seed, split=split, curriculum_profile=profile
|
| 85 |
+
)
|
| 86 |
+
compiled = build_invoice_policy(seed)
|
| 87 |
+
workspace = _make_workspace(prefix=f"cybersecurity_owasp_{split}_{seed}_")
|
| 88 |
+
public_hint = _visible_policy_hint(compiled.public_hint)
|
| 89 |
+
editable_files = render_fastapi_basic(workspace, public_hint, compiled.hidden_facts)
|
| 90 |
+
workspace_summary = visible_workspace_summary(editable_files, public_hint)
|
| 91 |
+
workspace_summary.update(
|
| 92 |
+
{
|
| 93 |
+
"template_id": adversarial_spec["template_id"],
|
| 94 |
+
"target_weakness": adversarial_spec["target_weakness"],
|
| 95 |
+
}
|
| 96 |
+
)
|
| 97 |
+
|
| 98 |
+
hidden = dict(compiled.hidden_facts)
|
| 99 |
+
hidden.update(
|
| 100 |
+
{
|
| 101 |
+
"workspace": str(workspace),
|
| 102 |
+
"editable_files": editable_files,
|
| 103 |
+
"initial_file_hashes": {
|
| 104 |
+
path: (workspace / path).read_text(encoding="utf-8")
|
| 105 |
+
for path in editable_files
|
| 106 |
+
},
|
| 107 |
+
"adversarial_spec": adversarial_spec,
|
| 108 |
+
"scenario_family": adversarial_spec["scenario_family"],
|
| 109 |
+
"template_id": adversarial_spec["template_id"],
|
| 110 |
+
"target_weakness": adversarial_spec["target_weakness"],
|
| 111 |
+
"oracle_hidden_focus": adversarial_spec["hidden_focus"],
|
| 112 |
+
}
|
| 113 |
+
)
|
| 114 |
+
|
| 115 |
+
return {
|
| 116 |
+
"task_id": f"{split}-invoices-bola-{seed}",
|
| 117 |
+
"workspace": workspace,
|
| 118 |
+
"domain": adversarial_spec["domain"],
|
| 119 |
+
"bug_family": adversarial_spec["bug_family"],
|
| 120 |
+
"scenario_family": adversarial_spec["scenario_family"],
|
| 121 |
+
"template_id": adversarial_spec["template_id"],
|
| 122 |
+
"target_weakness": adversarial_spec["target_weakness"],
|
| 123 |
+
"difficulty": int(profile.get("difficulty", difficulty)),
|
| 124 |
+
"difficulty_tier": str(profile.get("difficulty_tier", "warmup")),
|
| 125 |
+
"curriculum_snapshot": profile,
|
| 126 |
+
"task_brief": (
|
| 127 |
+
"Inspect the generated invoices app and policy. Find the broken "
|
| 128 |
+
"authorization behavior, submit a finding with local evidence, patch "
|
| 129 |
+
"the app, preserve intended owner/admin/public behavior, then submit."
|
| 130 |
+
),
|
| 131 |
+
"public_hint": public_hint,
|
| 132 |
+
"workspace_summary": workspace_summary,
|
| 133 |
+
"hidden_facts": hidden,
|
| 134 |
+
}
|
server/verifier.py
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Multi-layer deterministic verifier for CyberSecurity_OWASP."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import json
|
| 6 |
+
from dataclasses import dataclass
|
| 7 |
+
from typing import Any
|
| 8 |
+
|
| 9 |
+
try:
|
| 10 |
+
from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPState
|
| 11 |
+
from ..validators import (
|
| 12 |
+
patch_quality,
|
| 13 |
+
run_hidden_regression_tests,
|
| 14 |
+
run_hidden_security_tests,
|
| 15 |
+
run_public_route_tests,
|
| 16 |
+
run_visible_tests,
|
| 17 |
+
verify_finding,
|
| 18 |
+
)
|
| 19 |
+
from .authz_oracle import AuthzOracle
|
| 20 |
+
except ImportError: # pragma: no cover
|
| 21 |
+
from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
|
| 22 |
+
from validators import (
|
| 23 |
+
patch_quality,
|
| 24 |
+
run_hidden_regression_tests,
|
| 25 |
+
run_hidden_security_tests,
|
| 26 |
+
run_public_route_tests,
|
| 27 |
+
run_visible_tests,
|
| 28 |
+
verify_finding,
|
| 29 |
+
)
|
| 30 |
+
from server.authz_oracle import AuthzOracle
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
@dataclass
|
| 34 |
+
class MultiLayerVerifier:
|
| 35 |
+
"""Aggregates visible, hidden, oracle, regression, and patch-quality checks."""
|
| 36 |
+
|
| 37 |
+
oracle: AuthzOracle = AuthzOracle()
|
| 38 |
+
|
| 39 |
+
def evaluate_action(
|
| 40 |
+
self,
|
| 41 |
+
state: CyberSecurityOWASPState,
|
| 42 |
+
action: CyberSecurityOWASPAction,
|
| 43 |
+
anti_cheat_flags: list[str] | None = None,
|
| 44 |
+
*,
|
| 45 |
+
invalid_action: bool = False,
|
| 46 |
+
) -> dict[str, Any]:
|
| 47 |
+
verifier_result: dict[str, Any] = {
|
| 48 |
+
"anti_cheat_flags": anti_cheat_flags or [],
|
| 49 |
+
"invalid_action": invalid_action,
|
| 50 |
+
"repeated_action": self._is_repeated_action(state, action),
|
| 51 |
+
}
|
| 52 |
+
if action.tool_name == "submit_finding":
|
| 53 |
+
verifier_result["finding"] = verify_finding(state, action.arguments)
|
| 54 |
+
elif action.tool_name == "run_visible_tests":
|
| 55 |
+
verifier_result["visible"] = run_visible_tests(state)
|
| 56 |
+
elif action.tool_name == "submit_fix":
|
| 57 |
+
verifier_result.update(self.run_terminal_checks(state))
|
| 58 |
+
return verifier_result
|
| 59 |
+
|
| 60 |
+
def run_terminal_checks(self, state: CyberSecurityOWASPState) -> dict[str, Any]:
|
| 61 |
+
security = run_hidden_security_tests(state)
|
| 62 |
+
return {
|
| 63 |
+
"visible": run_visible_tests(state),
|
| 64 |
+
"hidden_tests": security,
|
| 65 |
+
"security": security,
|
| 66 |
+
"oracle_matrix": self.oracle.evaluate(state),
|
| 67 |
+
"regression": run_hidden_regression_tests(state),
|
| 68 |
+
"public_routes": run_public_route_tests(state),
|
| 69 |
+
"patch_quality": patch_quality(state),
|
| 70 |
+
}
|
| 71 |
+
|
| 72 |
+
def public_summary(self, verifier_result: dict[str, Any]) -> dict[str, Any]:
|
| 73 |
+
"""Return verifier fields that are safe for state/debug summaries."""
|
| 74 |
+
|
| 75 |
+
return json.loads(json.dumps(verifier_result))
|
| 76 |
+
|
| 77 |
+
def _is_repeated_action(
|
| 78 |
+
self, state: CyberSecurityOWASPState, action: CyberSecurityOWASPAction
|
| 79 |
+
) -> bool:
|
| 80 |
+
current = {"tool_name": action.tool_name, "arguments": action.arguments}
|
| 81 |
+
return sum(1 for item in state.action_history if item == current) > 1
|
tests/test_web_interface.py
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fastapi.testclient import TestClient
|
| 2 |
+
|
| 3 |
+
from server.app import app
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def test_space_root_redirects_to_openenv_web_ui():
|
| 7 |
+
client = TestClient(app)
|
| 8 |
+
|
| 9 |
+
response = client.get("/", follow_redirects=False)
|
| 10 |
+
|
| 11 |
+
assert response.status_code == 307
|
| 12 |
+
assert response.headers["location"] == "/web/"
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def test_openenv_web_ui_and_api_routes_are_available():
|
| 16 |
+
client = TestClient(app)
|
| 17 |
+
|
| 18 |
+
web_response = client.get("/web/")
|
| 19 |
+
health_response = client.get("/health")
|
| 20 |
+
state_response = client.get("/web/state")
|
| 21 |
+
|
| 22 |
+
assert web_response.status_code == 200
|
| 23 |
+
assert "text/html" in web_response.headers["content-type"]
|
| 24 |
+
assert "Reset" in web_response.text
|
| 25 |
+
assert "Step" in web_response.text
|
| 26 |
+
assert "Get state" in web_response.text
|
| 27 |
+
|
| 28 |
+
assert health_response.status_code == 200
|
| 29 |
+
assert health_response.json() == {"status": "healthy"}
|
| 30 |
+
|
| 31 |
+
assert state_response.status_code == 200
|
| 32 |
+
state = state_response.json()
|
| 33 |
+
assert "episode_id" in state
|
| 34 |
+
assert "step_count" in state
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def test_web_reset_returns_cybersecurity_observation():
|
| 38 |
+
client = TestClient(app)
|
| 39 |
+
|
| 40 |
+
response = client.post("/web/reset")
|
| 41 |
+
|
| 42 |
+
assert response.status_code == 200
|
| 43 |
+
payload = response.json()
|
| 44 |
+
observation = payload["observation"]
|
| 45 |
+
assert observation["phase"] == "discover"
|
| 46 |
+
assert "authorization" in observation["task_brief"]
|
| 47 |
+
assert "inspect_policy_graph" in observation["available_actions"]
|
training/eval_before_after.py
CHANGED
|
@@ -5,6 +5,8 @@ from __future__ import annotations
|
|
| 5 |
import json
|
| 6 |
from pathlib import Path
|
| 7 |
|
|
|
|
|
|
|
| 8 |
|
| 9 |
def summarize_runs(baseline: list[dict], trained: list[dict], heldout: list[dict]) -> dict:
|
| 10 |
def mean(items: list[dict], key: str) -> float:
|
|
@@ -19,11 +21,27 @@ def summarize_runs(baseline: list[dict], trained: list[dict], heldout: list[dict
|
|
| 19 |
"absolute_reward_improvement": mean(trained, "reward_total") - mean(baseline, "reward_total"),
|
| 20 |
"heldout_success_rate": mean(heldout, "success"),
|
| 21 |
"heldout_mean_reward": mean(heldout, "reward_total"),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
}
|
| 23 |
|
| 24 |
|
| 25 |
-
def save_eval_summary(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
output = Path("outputs/evals") / f"{run_name}_eval_summary.json"
|
| 27 |
output.parent.mkdir(parents=True, exist_ok=True)
|
| 28 |
output.write_text(json.dumps(summary, indent=2, sort_keys=True), encoding="utf-8")
|
|
|
|
|
|
|
| 29 |
return output
|
|
|
|
| 5 |
import json
|
| 6 |
from pathlib import Path
|
| 7 |
|
| 8 |
+
from training.trackio_utils import log_eval_summary
|
| 9 |
+
|
| 10 |
|
| 11 |
def summarize_runs(baseline: list[dict], trained: list[dict], heldout: list[dict]) -> dict:
|
| 12 |
def mean(items: list[dict], key: str) -> float:
|
|
|
|
| 21 |
"absolute_reward_improvement": mean(trained, "reward_total") - mean(baseline, "reward_total"),
|
| 22 |
"heldout_success_rate": mean(heldout, "success"),
|
| 23 |
"heldout_mean_reward": mean(heldout, "reward_total"),
|
| 24 |
+
"exploit_block_rate": mean(trained, "exploit_blocked"),
|
| 25 |
+
"regression_preservation_rate": mean(trained, "regression_preserved"),
|
| 26 |
+
"public_route_preservation_rate": mean(trained, "public_routes_preserved"),
|
| 27 |
+
"anti_cheat_pass_rate": mean(trained, "anti_cheat_pass"),
|
| 28 |
+
"invalid_action_rate": mean(trained, "invalid_action_rate"),
|
| 29 |
+
"timeout_rate": mean(trained, "timeout"),
|
| 30 |
+
"safety_violation_rate": mean(trained, "safety_violation"),
|
| 31 |
+
"mean_episode_length": mean(trained, "episode_length"),
|
| 32 |
}
|
| 33 |
|
| 34 |
|
| 35 |
+
def save_eval_summary(
|
| 36 |
+
run_name: str,
|
| 37 |
+
summary: dict,
|
| 38 |
+
*,
|
| 39 |
+
track: bool = True,
|
| 40 |
+
trackio_config: dict | None = None,
|
| 41 |
+
) -> Path:
|
| 42 |
output = Path("outputs/evals") / f"{run_name}_eval_summary.json"
|
| 43 |
output.parent.mkdir(parents=True, exist_ok=True)
|
| 44 |
output.write_text(json.dumps(summary, indent=2, sort_keys=True), encoding="utf-8")
|
| 45 |
+
if track:
|
| 46 |
+
log_eval_summary(run_name, summary, config=trackio_config)
|
| 47 |
return output
|
training/trackio_utils.py
CHANGED
|
@@ -2,7 +2,12 @@
|
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
|
|
|
|
|
|
|
|
|
| 5 |
from datetime import datetime
|
|
|
|
|
|
|
| 6 |
|
| 7 |
|
| 8 |
TRAIN_METRICS = [
|
|
@@ -34,7 +39,133 @@ TRAIN_METRICS = [
|
|
| 34 |
]
|
| 35 |
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
def build_run_name(model: str, algo: str, difficulty: int, git_sha: str = "nogit") -> str:
|
| 38 |
-
stamp = datetime.utcnow().strftime("%Y%m%d-%H%M")
|
| 39 |
model_slug = model.replace("/", "-")
|
| 40 |
return f"CyberSecurity_OWASP-{model_slug}-{algo}-level{difficulty}-{stamp}-{git_sha[:8]}"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
+
import os
|
| 6 |
+
import subprocess
|
| 7 |
+
from contextlib import contextmanager
|
| 8 |
from datetime import datetime
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
from typing import Any, Iterator
|
| 11 |
|
| 12 |
|
| 13 |
TRAIN_METRICS = [
|
|
|
|
| 39 |
]
|
| 40 |
|
| 41 |
|
| 42 |
+
EVAL_METRICS = [
|
| 43 |
+
"eval/baseline_success_rate",
|
| 44 |
+
"eval/trained_success_rate",
|
| 45 |
+
"eval/absolute_success_improvement",
|
| 46 |
+
"eval/baseline_mean_reward",
|
| 47 |
+
"eval/trained_mean_reward",
|
| 48 |
+
"eval/absolute_reward_improvement",
|
| 49 |
+
"eval/heldout_success_rate",
|
| 50 |
+
"eval/heldout_mean_reward",
|
| 51 |
+
"eval/exploit_block_rate",
|
| 52 |
+
"eval/regression_preservation_rate",
|
| 53 |
+
"eval/public_route_preservation_rate",
|
| 54 |
+
"eval/anti_cheat_pass_rate",
|
| 55 |
+
"eval/invalid_action_rate",
|
| 56 |
+
"eval/timeout_rate",
|
| 57 |
+
"eval/safety_violation_rate",
|
| 58 |
+
"eval/mean_episode_length",
|
| 59 |
+
]
|
| 60 |
+
|
| 61 |
+
|
| 62 |
def build_run_name(model: str, algo: str, difficulty: int, git_sha: str = "nogit") -> str:
|
| 63 |
+
stamp = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
|
| 64 |
model_slug = model.replace("/", "-")
|
| 65 |
return f"CyberSecurity_OWASP-{model_slug}-{algo}-level{difficulty}-{stamp}-{git_sha[:8]}"
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
def get_git_sha(default: str = "nogit") -> str:
|
| 69 |
+
try:
|
| 70 |
+
result = subprocess.run(
|
| 71 |
+
["git", "rev-parse", "HEAD"],
|
| 72 |
+
check=True,
|
| 73 |
+
capture_output=True,
|
| 74 |
+
text=True,
|
| 75 |
+
)
|
| 76 |
+
except Exception:
|
| 77 |
+
return default
|
| 78 |
+
return result.stdout.strip() or default
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
def _load_trackio():
|
| 82 |
+
os.environ.setdefault("TRACKIO_DIR", str((Path.cwd() / "outputs" / "trackio").resolve()))
|
| 83 |
+
try:
|
| 84 |
+
import trackio
|
| 85 |
+
except ImportError as exc:
|
| 86 |
+
raise RuntimeError(
|
| 87 |
+
"Trackio is required for CyberSecurity_OWASP runs. Install dependencies "
|
| 88 |
+
"with `uv sync` and set TRACKIO_SPACE_ID when you want remote HF Spaces tracking."
|
| 89 |
+
) from exc
|
| 90 |
+
return trackio
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
def init_trackio_run(
|
| 94 |
+
*,
|
| 95 |
+
run_name: str,
|
| 96 |
+
run_type: str,
|
| 97 |
+
config: dict[str, Any] | None = None,
|
| 98 |
+
project: str | None = None,
|
| 99 |
+
space_id: str | None = None,
|
| 100 |
+
group: str | None = None,
|
| 101 |
+
):
|
| 102 |
+
trackio = _load_trackio()
|
| 103 |
+
project = project or os.getenv("TRACKIO_PROJECT", "CyberSecurity_OWASP")
|
| 104 |
+
space_id = space_id if space_id is not None else os.getenv("TRACKIO_SPACE_ID", "")
|
| 105 |
+
run_config = {
|
| 106 |
+
"environment": "CyberSecurity_OWASP",
|
| 107 |
+
"run_type": run_type,
|
| 108 |
+
**(config or {}),
|
| 109 |
+
}
|
| 110 |
+
kwargs: dict[str, Any] = {
|
| 111 |
+
"project": project,
|
| 112 |
+
"name": run_name,
|
| 113 |
+
"config": run_config,
|
| 114 |
+
}
|
| 115 |
+
if space_id:
|
| 116 |
+
kwargs["space_id"] = space_id
|
| 117 |
+
if group:
|
| 118 |
+
kwargs["group"] = group
|
| 119 |
+
return trackio.init(**kwargs)
|
| 120 |
+
|
| 121 |
+
|
| 122 |
+
def log_trackio_metrics(metrics: dict[str, Any], step: int | None = None) -> None:
|
| 123 |
+
trackio = _load_trackio()
|
| 124 |
+
numeric = {
|
| 125 |
+
key: value
|
| 126 |
+
for key, value in metrics.items()
|
| 127 |
+
if isinstance(value, (int, float, bool))
|
| 128 |
+
}
|
| 129 |
+
if step is None:
|
| 130 |
+
trackio.log(numeric)
|
| 131 |
+
else:
|
| 132 |
+
trackio.log(numeric, step=step)
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
def finish_trackio_run() -> None:
|
| 136 |
+
trackio = _load_trackio()
|
| 137 |
+
trackio.finish()
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
@contextmanager
|
| 141 |
+
def trackio_run(
|
| 142 |
+
*,
|
| 143 |
+
run_name: str,
|
| 144 |
+
run_type: str,
|
| 145 |
+
config: dict[str, Any] | None = None,
|
| 146 |
+
project: str | None = None,
|
| 147 |
+
space_id: str | None = None,
|
| 148 |
+
group: str | None = None,
|
| 149 |
+
) -> Iterator[Any]:
|
| 150 |
+
run = init_trackio_run(
|
| 151 |
+
run_name=run_name,
|
| 152 |
+
run_type=run_type,
|
| 153 |
+
config=config,
|
| 154 |
+
project=project,
|
| 155 |
+
space_id=space_id,
|
| 156 |
+
group=group,
|
| 157 |
+
)
|
| 158 |
+
try:
|
| 159 |
+
yield run
|
| 160 |
+
finally:
|
| 161 |
+
finish_trackio_run()
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
def log_eval_summary(run_name: str, summary: dict[str, Any], config: dict[str, Any] | None = None) -> None:
|
| 165 |
+
metrics = {
|
| 166 |
+
f"eval/{key}": float(value)
|
| 167 |
+
for key, value in summary.items()
|
| 168 |
+
if isinstance(value, (int, float, bool))
|
| 169 |
+
}
|
| 170 |
+
with trackio_run(run_name=run_name, run_type="eval", config=config, group="eval"):
|
| 171 |
+
log_trackio_metrics(metrics, step=0)
|
training/train_grpo.py
CHANGED
|
@@ -9,16 +9,26 @@ from __future__ import annotations
|
|
| 9 |
|
| 10 |
import os
|
| 11 |
|
|
|
|
|
|
|
| 12 |
|
| 13 |
def build_grpo_config():
|
| 14 |
from trl import GRPOConfig
|
| 15 |
|
|
|
|
|
|
|
| 16 |
output_dir = os.getenv("OUTPUT_DIR", "CyberSecurity_OWASP-qwen3-1.7b-grpo")
|
| 17 |
trackio_space_id = os.getenv("TRACKIO_SPACE_ID", output_dir)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
return GRPOConfig(
|
| 19 |
output_dir=output_dir,
|
| 20 |
report_to="trackio",
|
| 21 |
trackio_space_id=trackio_space_id,
|
|
|
|
| 22 |
logging_steps=1,
|
| 23 |
save_steps=25,
|
| 24 |
learning_rate=5e-6,
|
|
|
|
| 9 |
|
| 10 |
import os
|
| 11 |
|
| 12 |
+
from training.trackio_utils import build_run_name, get_git_sha
|
| 13 |
+
|
| 14 |
|
| 15 |
def build_grpo_config():
|
| 16 |
from trl import GRPOConfig
|
| 17 |
|
| 18 |
+
model_name = os.getenv("MODEL_NAME", "Qwen/Qwen3-1.7B")
|
| 19 |
+
difficulty = int(os.getenv("DIFFICULTY", "0"))
|
| 20 |
output_dir = os.getenv("OUTPUT_DIR", "CyberSecurity_OWASP-qwen3-1.7b-grpo")
|
| 21 |
trackio_space_id = os.getenv("TRACKIO_SPACE_ID", output_dir)
|
| 22 |
+
os.environ.setdefault("TRACKIO_PROJECT", "CyberSecurity_OWASP-grpo")
|
| 23 |
+
run_name = os.getenv(
|
| 24 |
+
"RUN_NAME",
|
| 25 |
+
build_run_name(model_name, "grpo", difficulty, git_sha=get_git_sha()),
|
| 26 |
+
)
|
| 27 |
return GRPOConfig(
|
| 28 |
output_dir=output_dir,
|
| 29 |
report_to="trackio",
|
| 30 |
trackio_space_id=trackio_space_id,
|
| 31 |
+
run_name=run_name,
|
| 32 |
logging_steps=1,
|
| 33 |
save_steps=25,
|
| 34 |
learning_rate=5e-6,
|
uv.lock
CHANGED
|
@@ -1283,6 +1283,49 @@ wheels = [
|
|
| 1283 |
{ url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" },
|
| 1284 |
]
|
| 1285 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1286 |
[[package]]
|
| 1287 |
name = "httpx"
|
| 1288 |
version = "0.28.1"
|
|
@@ -2136,6 +2179,7 @@ version = "0.1.0"
|
|
| 2136 |
source = { editable = "." }
|
| 2137 |
dependencies = [
|
| 2138 |
{ name = "openenv-core", extra = ["core"] },
|
|
|
|
| 2139 |
]
|
| 2140 |
|
| 2141 |
[package.optional-dependencies]
|
|
@@ -2153,6 +2197,7 @@ requires-dist = [
|
|
| 2153 |
{ name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
|
| 2154 |
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
|
| 2155 |
{ name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
|
|
|
|
| 2156 |
]
|
| 2157 |
provides-extras = ["dev", "modal"]
|
| 2158 |
|
|
@@ -3411,6 +3456,26 @@ wheels = [
|
|
| 3411 |
{ url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
|
| 3412 |
]
|
| 3413 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3414 |
[[package]]
|
| 3415 |
name = "typer"
|
| 3416 |
version = "0.24.2"
|
|
@@ -3506,6 +3571,61 @@ wheels = [
|
|
| 3506 |
{ url = "https://files.pythonhosted.org/packages/31/a3/5b1562db76a5a488274b2332a97199b32d0442aca0ed193697fd47786316/uvicorn-0.46.0-py3-none-any.whl", hash = "sha256:bbebbcbed972d162afca128605223022bedd345b7bc7855ce66deb31487a9048", size = 70926, upload-time = "2026-04-23T07:15:58.355Z" },
|
| 3507 |
]
|
| 3508 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3509 |
[[package]]
|
| 3510 |
name = "watchfiles"
|
| 3511 |
version = "1.1.1"
|
|
|
|
| 1283 |
{ url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" },
|
| 1284 |
]
|
| 1285 |
|
| 1286 |
+
[[package]]
|
| 1287 |
+
name = "httptools"
|
| 1288 |
+
version = "0.7.1"
|
| 1289 |
+
source = { registry = "https://pypi.org/simple" }
|
| 1290 |
+
sdist = { url = "https://files.pythonhosted.org/packages/b5/46/120a669232c7bdedb9d52d4aeae7e6c7dfe151e99dc70802e2fc7a5e1993/httptools-0.7.1.tar.gz", hash = "sha256:abd72556974f8e7c74a259655924a717a2365b236c882c3f6f8a45fe94703ac9", size = 258961, upload-time = "2025-10-10T03:55:08.559Z" }
|
| 1291 |
+
wheels = [
|
| 1292 |
+
{ url = "https://files.pythonhosted.org/packages/c7/e5/c07e0bcf4ec8db8164e9f6738c048b2e66aabf30e7506f440c4cc6953f60/httptools-0.7.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:11d01b0ff1fe02c4c32d60af61a4d613b74fad069e47e06e9067758c01e9ac78", size = 204531, upload-time = "2025-10-10T03:54:20.887Z" },
|
| 1293 |
+
{ url = "https://files.pythonhosted.org/packages/7e/4f/35e3a63f863a659f92ffd92bef131f3e81cf849af26e6435b49bd9f6f751/httptools-0.7.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:84d86c1e5afdc479a6fdabf570be0d3eb791df0ae727e8dbc0259ed1249998d4", size = 109408, upload-time = "2025-10-10T03:54:22.455Z" },
|
| 1294 |
+
{ url = "https://files.pythonhosted.org/packages/f5/71/b0a9193641d9e2471ac541d3b1b869538a5fb6419d52fd2669fa9c79e4b8/httptools-0.7.1-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:c8c751014e13d88d2be5f5f14fc8b89612fcfa92a9cc480f2bc1598357a23a05", size = 440889, upload-time = "2025-10-10T03:54:23.753Z" },
|
| 1295 |
+
{ url = "https://files.pythonhosted.org/packages/eb/d9/2e34811397b76718750fea44658cb0205b84566e895192115252e008b152/httptools-0.7.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:654968cb6b6c77e37b832a9be3d3ecabb243bbe7a0b8f65fbc5b6b04c8fcabed", size = 440460, upload-time = "2025-10-10T03:54:25.313Z" },
|
| 1296 |
+
{ url = "https://files.pythonhosted.org/packages/01/3f/a04626ebeacc489866bb4d82362c0657b2262bef381d68310134be7f40bb/httptools-0.7.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:b580968316348b474b020edf3988eecd5d6eec4634ee6561e72ae3a2a0e00a8a", size = 425267, upload-time = "2025-10-10T03:54:26.81Z" },
|
| 1297 |
+
{ url = "https://files.pythonhosted.org/packages/a5/99/adcd4f66614db627b587627c8ad6f4c55f18881549bab10ecf180562e7b9/httptools-0.7.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:d496e2f5245319da9d764296e86c5bb6fcf0cf7a8806d3d000717a889c8c0b7b", size = 424429, upload-time = "2025-10-10T03:54:28.174Z" },
|
| 1298 |
+
{ url = "https://files.pythonhosted.org/packages/d5/72/ec8fc904a8fd30ba022dfa85f3bbc64c3c7cd75b669e24242c0658e22f3c/httptools-0.7.1-cp310-cp310-win_amd64.whl", hash = "sha256:cbf8317bfccf0fed3b5680c559d3459cccf1abe9039bfa159e62e391c7270568", size = 86173, upload-time = "2025-10-10T03:54:29.5Z" },
|
| 1299 |
+
{ url = "https://files.pythonhosted.org/packages/9c/08/17e07e8d89ab8f343c134616d72eebfe03798835058e2ab579dcc8353c06/httptools-0.7.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:474d3b7ab469fefcca3697a10d11a32ee2b9573250206ba1e50d5980910da657", size = 206521, upload-time = "2025-10-10T03:54:31.002Z" },
|
| 1300 |
+
{ url = "https://files.pythonhosted.org/packages/aa/06/c9c1b41ff52f16aee526fd10fbda99fa4787938aa776858ddc4a1ea825ec/httptools-0.7.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:a3c3b7366bb6c7b96bd72d0dbe7f7d5eead261361f013be5f6d9590465ea1c70", size = 110375, upload-time = "2025-10-10T03:54:31.941Z" },
|
| 1301 |
+
{ url = "https://files.pythonhosted.org/packages/cc/cc/10935db22fda0ee34c76f047590ca0a8bd9de531406a3ccb10a90e12ea21/httptools-0.7.1-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:379b479408b8747f47f3b253326183d7c009a3936518cdb70db58cffd369d9df", size = 456621, upload-time = "2025-10-10T03:54:33.176Z" },
|
| 1302 |
+
{ url = "https://files.pythonhosted.org/packages/0e/84/875382b10d271b0c11aa5d414b44f92f8dd53e9b658aec338a79164fa548/httptools-0.7.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:cad6b591a682dcc6cf1397c3900527f9affef1e55a06c4547264796bbd17cf5e", size = 454954, upload-time = "2025-10-10T03:54:34.226Z" },
|
| 1303 |
+
{ url = "https://files.pythonhosted.org/packages/30/e1/44f89b280f7e46c0b1b2ccee5737d46b3bb13136383958f20b580a821ca0/httptools-0.7.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:eb844698d11433d2139bbeeb56499102143beb582bd6c194e3ba69c22f25c274", size = 440175, upload-time = "2025-10-10T03:54:35.942Z" },
|
| 1304 |
+
{ url = "https://files.pythonhosted.org/packages/6f/7e/b9287763159e700e335028bc1824359dc736fa9b829dacedace91a39b37e/httptools-0.7.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f65744d7a8bdb4bda5e1fa23e4ba16832860606fcc09d674d56e425e991539ec", size = 440310, upload-time = "2025-10-10T03:54:37.1Z" },
|
| 1305 |
+
{ url = "https://files.pythonhosted.org/packages/b3/07/5b614f592868e07f5c94b1f301b5e14a21df4e8076215a3bccb830a687d8/httptools-0.7.1-cp311-cp311-win_amd64.whl", hash = "sha256:135fbe974b3718eada677229312e97f3b31f8a9c8ffa3ae6f565bf808d5b6bcb", size = 86875, upload-time = "2025-10-10T03:54:38.421Z" },
|
| 1306 |
+
{ url = "https://files.pythonhosted.org/packages/53/7f/403e5d787dc4942316e515e949b0c8a013d84078a915910e9f391ba9b3ed/httptools-0.7.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:38e0c83a2ea9746ebbd643bdfb521b9aa4a91703e2cd705c20443405d2fd16a5", size = 206280, upload-time = "2025-10-10T03:54:39.274Z" },
|
| 1307 |
+
{ url = "https://files.pythonhosted.org/packages/2a/0d/7f3fd28e2ce311ccc998c388dd1c53b18120fda3b70ebb022b135dc9839b/httptools-0.7.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f25bbaf1235e27704f1a7b86cd3304eabc04f569c828101d94a0e605ef7205a5", size = 110004, upload-time = "2025-10-10T03:54:40.403Z" },
|
| 1308 |
+
{ url = "https://files.pythonhosted.org/packages/84/a6/b3965e1e146ef5762870bbe76117876ceba51a201e18cc31f5703e454596/httptools-0.7.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:2c15f37ef679ab9ecc06bfc4e6e8628c32a8e4b305459de7cf6785acd57e4d03", size = 517655, upload-time = "2025-10-10T03:54:41.347Z" },
|
| 1309 |
+
{ url = "https://files.pythonhosted.org/packages/11/7d/71fee6f1844e6fa378f2eddde6c3e41ce3a1fb4b2d81118dd544e3441ec0/httptools-0.7.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7fe6e96090df46b36ccfaf746f03034e5ab723162bc51b0a4cf58305324036f2", size = 511440, upload-time = "2025-10-10T03:54:42.452Z" },
|
| 1310 |
+
{ url = "https://files.pythonhosted.org/packages/22/a5/079d216712a4f3ffa24af4a0381b108aa9c45b7a5cc6eb141f81726b1823/httptools-0.7.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:f72fdbae2dbc6e68b8239defb48e6a5937b12218e6ffc2c7846cc37befa84362", size = 495186, upload-time = "2025-10-10T03:54:43.937Z" },
|
| 1311 |
+
{ url = "https://files.pythonhosted.org/packages/e9/9e/025ad7b65278745dee3bd0ebf9314934c4592560878308a6121f7f812084/httptools-0.7.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:e99c7b90a29fd82fea9ef57943d501a16f3404d7b9ee81799d41639bdaae412c", size = 499192, upload-time = "2025-10-10T03:54:45.003Z" },
|
| 1312 |
+
{ url = "https://files.pythonhosted.org/packages/6d/de/40a8f202b987d43afc4d54689600ff03ce65680ede2f31df348d7f368b8f/httptools-0.7.1-cp312-cp312-win_amd64.whl", hash = "sha256:3e14f530fefa7499334a79b0cf7e7cd2992870eb893526fb097d51b4f2d0f321", size = 86694, upload-time = "2025-10-10T03:54:45.923Z" },
|
| 1313 |
+
{ url = "https://files.pythonhosted.org/packages/09/8f/c77b1fcbfd262d422f12da02feb0d218fa228d52485b77b953832105bb90/httptools-0.7.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:6babce6cfa2a99545c60bfef8bee0cc0545413cb0018f617c8059a30ad985de3", size = 202889, upload-time = "2025-10-10T03:54:47.089Z" },
|
| 1314 |
+
{ url = "https://files.pythonhosted.org/packages/0a/1a/22887f53602feaa066354867bc49a68fc295c2293433177ee90870a7d517/httptools-0.7.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:601b7628de7504077dd3dcb3791c6b8694bbd967148a6d1f01806509254fb1ca", size = 108180, upload-time = "2025-10-10T03:54:48.052Z" },
|
| 1315 |
+
{ url = "https://files.pythonhosted.org/packages/32/6a/6aaa91937f0010d288d3d124ca2946d48d60c3a5ee7ca62afe870e3ea011/httptools-0.7.1-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:04c6c0e6c5fb0739c5b8a9eb046d298650a0ff38cf42537fc372b28dc7e4472c", size = 478596, upload-time = "2025-10-10T03:54:48.919Z" },
|
| 1316 |
+
{ url = "https://files.pythonhosted.org/packages/6d/70/023d7ce117993107be88d2cbca566a7c1323ccbaf0af7eabf2064fe356f6/httptools-0.7.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:69d4f9705c405ae3ee83d6a12283dc9feba8cc6aaec671b412917e644ab4fa66", size = 473268, upload-time = "2025-10-10T03:54:49.993Z" },
|
| 1317 |
+
{ url = "https://files.pythonhosted.org/packages/32/4d/9dd616c38da088e3f436e9a616e1d0cc66544b8cdac405cc4e81c8679fc7/httptools-0.7.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:44c8f4347d4b31269c8a9205d8a5ee2df5322b09bbbd30f8f862185bb6b05346", size = 455517, upload-time = "2025-10-10T03:54:51.066Z" },
|
| 1318 |
+
{ url = "https://files.pythonhosted.org/packages/1d/3a/a6c595c310b7df958e739aae88724e24f9246a514d909547778d776799be/httptools-0.7.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:465275d76db4d554918aba40bf1cbebe324670f3dfc979eaffaa5d108e2ed650", size = 458337, upload-time = "2025-10-10T03:54:52.196Z" },
|
| 1319 |
+
{ url = "https://files.pythonhosted.org/packages/fd/82/88e8d6d2c51edc1cc391b6e044c6c435b6aebe97b1abc33db1b0b24cd582/httptools-0.7.1-cp313-cp313-win_amd64.whl", hash = "sha256:322d00c2068d125bd570f7bf78b2d367dad02b919d8581d7476d8b75b294e3e6", size = 85743, upload-time = "2025-10-10T03:54:53.448Z" },
|
| 1320 |
+
{ url = "https://files.pythonhosted.org/packages/34/50/9d095fcbb6de2d523e027a2f304d4551855c2f46e0b82befd718b8b20056/httptools-0.7.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:c08fe65728b8d70b6923ce31e3956f859d5e1e8548e6f22ec520a962c6757270", size = 203619, upload-time = "2025-10-10T03:54:54.321Z" },
|
| 1321 |
+
{ url = "https://files.pythonhosted.org/packages/07/f0/89720dc5139ae54b03f861b5e2c55a37dba9a5da7d51e1e824a1f343627f/httptools-0.7.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:7aea2e3c3953521c3c51106ee11487a910d45586e351202474d45472db7d72d3", size = 108714, upload-time = "2025-10-10T03:54:55.163Z" },
|
| 1322 |
+
{ url = "https://files.pythonhosted.org/packages/b3/cb/eea88506f191fb552c11787c23f9a405f4c7b0c5799bf73f2249cd4f5228/httptools-0.7.1-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:0e68b8582f4ea9166be62926077a3334064d422cf08ab87d8b74664f8e9058e1", size = 472909, upload-time = "2025-10-10T03:54:56.056Z" },
|
| 1323 |
+
{ url = "https://files.pythonhosted.org/packages/e0/4a/a548bdfae6369c0d078bab5769f7b66f17f1bfaa6fa28f81d6be6959066b/httptools-0.7.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:df091cf961a3be783d6aebae963cc9b71e00d57fa6f149025075217bc6a55a7b", size = 470831, upload-time = "2025-10-10T03:54:57.219Z" },
|
| 1324 |
+
{ url = "https://files.pythonhosted.org/packages/4d/31/14df99e1c43bd132eec921c2e7e11cda7852f65619bc0fc5bdc2d0cb126c/httptools-0.7.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:f084813239e1eb403ddacd06a30de3d3e09a9b76e7894dcda2b22f8a726e9c60", size = 452631, upload-time = "2025-10-10T03:54:58.219Z" },
|
| 1325 |
+
{ url = "https://files.pythonhosted.org/packages/22/d2/b7e131f7be8d854d48cb6d048113c30f9a46dca0c9a8b08fcb3fcd588cdc/httptools-0.7.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:7347714368fb2b335e9063bc2b96f2f87a9ceffcd9758ac295f8bbcd3ffbc0ca", size = 452910, upload-time = "2025-10-10T03:54:59.366Z" },
|
| 1326 |
+
{ url = "https://files.pythonhosted.org/packages/53/cf/878f3b91e4e6e011eff6d1fa9ca39f7eb17d19c9d7971b04873734112f30/httptools-0.7.1-cp314-cp314-win_amd64.whl", hash = "sha256:cfabda2a5bb85aa2a904ce06d974a3f30fb36cc63d7feaddec05d2050acede96", size = 88205, upload-time = "2025-10-10T03:55:00.389Z" },
|
| 1327 |
+
]
|
| 1328 |
+
|
| 1329 |
[[package]]
|
| 1330 |
name = "httpx"
|
| 1331 |
version = "0.28.1"
|
|
|
|
| 2179 |
source = { editable = "." }
|
| 2180 |
dependencies = [
|
| 2181 |
{ name = "openenv-core", extra = ["core"] },
|
| 2182 |
+
{ name = "trackio" },
|
| 2183 |
]
|
| 2184 |
|
| 2185 |
[package.optional-dependencies]
|
|
|
|
| 2197 |
{ name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
|
| 2198 |
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
|
| 2199 |
{ name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
|
| 2200 |
+
{ name = "trackio", specifier = ">=0.22.0" },
|
| 2201 |
]
|
| 2202 |
provides-extras = ["dev", "modal"]
|
| 2203 |
|
|
|
|
| 3456 |
{ url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
|
| 3457 |
]
|
| 3458 |
|
| 3459 |
+
[[package]]
|
| 3460 |
+
name = "trackio"
|
| 3461 |
+
version = "0.25.0"
|
| 3462 |
+
source = { registry = "https://pypi.org/simple" }
|
| 3463 |
+
dependencies = [
|
| 3464 |
+
{ name = "gradio-client" },
|
| 3465 |
+
{ name = "huggingface-hub" },
|
| 3466 |
+
{ name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
|
| 3467 |
+
{ name = "numpy", version = "2.4.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
|
| 3468 |
+
{ name = "orjson" },
|
| 3469 |
+
{ name = "pillow" },
|
| 3470 |
+
{ name = "python-multipart" },
|
| 3471 |
+
{ name = "starlette" },
|
| 3472 |
+
{ name = "tomli", marker = "python_full_version < '3.11'" },
|
| 3473 |
+
{ name = "uvicorn", extra = ["standard"] },
|
| 3474 |
+
]
|
| 3475 |
+
wheels = [
|
| 3476 |
+
{ url = "https://files.pythonhosted.org/packages/e7/4d/2aa0e1ca6daebdfac79fadd2ab308d5880c8d0305b2ce8b88900f95a8415/trackio-0.25.0-py3-none-any.whl", hash = "sha256:6c1ae7decef6e35d1165a6b2536d6df8c67594329bdf6bd9f1786c153a532b9f", size = 1653706, upload-time = "2026-04-23T15:45:29.887Z" },
|
| 3477 |
+
]
|
| 3478 |
+
|
| 3479 |
[[package]]
|
| 3480 |
name = "typer"
|
| 3481 |
version = "0.24.2"
|
|
|
|
| 3571 |
{ url = "https://files.pythonhosted.org/packages/31/a3/5b1562db76a5a488274b2332a97199b32d0442aca0ed193697fd47786316/uvicorn-0.46.0-py3-none-any.whl", hash = "sha256:bbebbcbed972d162afca128605223022bedd345b7bc7855ce66deb31487a9048", size = 70926, upload-time = "2026-04-23T07:15:58.355Z" },
|
| 3572 |
]
|
| 3573 |
|
| 3574 |
+
[package.optional-dependencies]
|
| 3575 |
+
standard = [
|
| 3576 |
+
{ name = "colorama", marker = "sys_platform == 'win32'" },
|
| 3577 |
+
{ name = "httptools" },
|
| 3578 |
+
{ name = "python-dotenv" },
|
| 3579 |
+
{ name = "pyyaml" },
|
| 3580 |
+
{ name = "uvloop", marker = "platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'" },
|
| 3581 |
+
{ name = "watchfiles" },
|
| 3582 |
+
{ name = "websockets" },
|
| 3583 |
+
]
|
| 3584 |
+
|
| 3585 |
+
[[package]]
|
| 3586 |
+
name = "uvloop"
|
| 3587 |
+
version = "0.22.1"
|
| 3588 |
+
source = { registry = "https://pypi.org/simple" }
|
| 3589 |
+
sdist = { url = "https://files.pythonhosted.org/packages/06/f0/18d39dbd1971d6d62c4629cc7fa67f74821b0dc1f5a77af43719de7936a7/uvloop-0.22.1.tar.gz", hash = "sha256:6c84bae345b9147082b17371e3dd5d42775bddce91f885499017f4607fdaf39f", size = 2443250, upload-time = "2025-10-16T22:17:19.342Z" }
|
| 3590 |
+
wheels = [
|
| 3591 |
+
{ url = "https://files.pythonhosted.org/packages/eb/14/ecceb239b65adaaf7fde510aa8bd534075695d1e5f8dadfa32b5723d9cfb/uvloop-0.22.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:ef6f0d4cc8a9fa1f6a910230cd53545d9a14479311e87e3cb225495952eb672c", size = 1343335, upload-time = "2025-10-16T22:16:11.43Z" },
|
| 3592 |
+
{ url = "https://files.pythonhosted.org/packages/ba/ae/6f6f9af7f590b319c94532b9567409ba11f4fa71af1148cab1bf48a07048/uvloop-0.22.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:7cd375a12b71d33d46af85a3343b35d98e8116134ba404bd657b3b1d15988792", size = 742903, upload-time = "2025-10-16T22:16:12.979Z" },
|
| 3593 |
+
{ url = "https://files.pythonhosted.org/packages/09/bd/3667151ad0702282a1f4d5d29288fce8a13c8b6858bf0978c219cd52b231/uvloop-0.22.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ac33ed96229b7790eb729702751c0e93ac5bc3bcf52ae9eccbff30da09194b86", size = 3648499, upload-time = "2025-10-16T22:16:14.451Z" },
|
| 3594 |
+
{ url = "https://files.pythonhosted.org/packages/b3/f6/21657bb3beb5f8c57ce8be3b83f653dd7933c2fd00545ed1b092d464799a/uvloop-0.22.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:481c990a7abe2c6f4fc3d98781cc9426ebd7f03a9aaa7eb03d3bfc68ac2a46bd", size = 3700133, upload-time = "2025-10-16T22:16:16.272Z" },
|
| 3595 |
+
{ url = "https://files.pythonhosted.org/packages/09/e0/604f61d004ded805f24974c87ddd8374ef675644f476f01f1df90e4cdf72/uvloop-0.22.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:a592b043a47ad17911add5fbd087c76716d7c9ccc1d64ec9249ceafd735f03c2", size = 3512681, upload-time = "2025-10-16T22:16:18.07Z" },
|
| 3596 |
+
{ url = "https://files.pythonhosted.org/packages/bb/ce/8491fd370b0230deb5eac69c7aae35b3be527e25a911c0acdffb922dc1cd/uvloop-0.22.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:1489cf791aa7b6e8c8be1c5a080bae3a672791fcb4e9e12249b05862a2ca9cec", size = 3615261, upload-time = "2025-10-16T22:16:19.596Z" },
|
| 3597 |
+
{ url = "https://files.pythonhosted.org/packages/c7/d5/69900f7883235562f1f50d8184bb7dd84a2fb61e9ec63f3782546fdbd057/uvloop-0.22.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:c60ebcd36f7b240b30788554b6f0782454826a0ed765d8430652621b5de674b9", size = 1352420, upload-time = "2025-10-16T22:16:21.187Z" },
|
| 3598 |
+
{ url = "https://files.pythonhosted.org/packages/a8/73/c4e271b3bce59724e291465cc936c37758886a4868787da0278b3b56b905/uvloop-0.22.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:3b7f102bf3cb1995cfeaee9321105e8f5da76fdb104cdad8986f85461a1b7b77", size = 748677, upload-time = "2025-10-16T22:16:22.558Z" },
|
| 3599 |
+
{ url = "https://files.pythonhosted.org/packages/86/94/9fb7fad2f824d25f8ecac0d70b94d0d48107ad5ece03769a9c543444f78a/uvloop-0.22.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:53c85520781d84a4b8b230e24a5af5b0778efdb39142b424990ff1ef7c48ba21", size = 3753819, upload-time = "2025-10-16T22:16:23.903Z" },
|
| 3600 |
+
{ url = "https://files.pythonhosted.org/packages/74/4f/256aca690709e9b008b7108bc85fba619a2bc37c6d80743d18abad16ee09/uvloop-0.22.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:56a2d1fae65fd82197cb8c53c367310b3eabe1bbb9fb5a04d28e3e3520e4f702", size = 3804529, upload-time = "2025-10-16T22:16:25.246Z" },
|
| 3601 |
+
{ url = "https://files.pythonhosted.org/packages/7f/74/03c05ae4737e871923d21a76fe28b6aad57f5c03b6e6bfcfa5ad616013e4/uvloop-0.22.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:40631b049d5972c6755b06d0bfe8233b1bd9a8a6392d9d1c45c10b6f9e9b2733", size = 3621267, upload-time = "2025-10-16T22:16:26.819Z" },
|
| 3602 |
+
{ url = "https://files.pythonhosted.org/packages/75/be/f8e590fe61d18b4a92070905497aec4c0e64ae1761498cad09023f3f4b3e/uvloop-0.22.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:535cc37b3a04f6cd2c1ef65fa1d370c9a35b6695df735fcff5427323f2cd5473", size = 3723105, upload-time = "2025-10-16T22:16:28.252Z" },
|
| 3603 |
+
{ url = "https://files.pythonhosted.org/packages/3d/ff/7f72e8170be527b4977b033239a83a68d5c881cc4775fca255c677f7ac5d/uvloop-0.22.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:fe94b4564e865d968414598eea1a6de60adba0c040ba4ed05ac1300de402cd42", size = 1359936, upload-time = "2025-10-16T22:16:29.436Z" },
|
| 3604 |
+
{ url = "https://files.pythonhosted.org/packages/c3/c6/e5d433f88fd54d81ef4be58b2b7b0cea13c442454a1db703a1eea0db1a59/uvloop-0.22.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:51eb9bd88391483410daad430813d982010f9c9c89512321f5b60e2cddbdddd6", size = 752769, upload-time = "2025-10-16T22:16:30.493Z" },
|
| 3605 |
+
{ url = "https://files.pythonhosted.org/packages/24/68/a6ac446820273e71aa762fa21cdcc09861edd3536ff47c5cd3b7afb10eeb/uvloop-0.22.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:700e674a166ca5778255e0e1dc4e9d79ab2acc57b9171b79e65feba7184b3370", size = 4317413, upload-time = "2025-10-16T22:16:31.644Z" },
|
| 3606 |
+
{ url = "https://files.pythonhosted.org/packages/5f/6f/e62b4dfc7ad6518e7eff2516f680d02a0f6eb62c0c212e152ca708a0085e/uvloop-0.22.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7b5b1ac819a3f946d3b2ee07f09149578ae76066d70b44df3fa990add49a82e4", size = 4426307, upload-time = "2025-10-16T22:16:32.917Z" },
|
| 3607 |
+
{ url = "https://files.pythonhosted.org/packages/90/60/97362554ac21e20e81bcef1150cb2a7e4ffdaf8ea1e5b2e8bf7a053caa18/uvloop-0.22.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e047cc068570bac9866237739607d1313b9253c3051ad84738cbb095be0537b2", size = 4131970, upload-time = "2025-10-16T22:16:34.015Z" },
|
| 3608 |
+
{ url = "https://files.pythonhosted.org/packages/99/39/6b3f7d234ba3964c428a6e40006340f53ba37993f46ed6e111c6e9141d18/uvloop-0.22.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:512fec6815e2dd45161054592441ef76c830eddaad55c8aa30952e6fe1ed07c0", size = 4296343, upload-time = "2025-10-16T22:16:35.149Z" },
|
| 3609 |
+
{ url = "https://files.pythonhosted.org/packages/89/8c/182a2a593195bfd39842ea68ebc084e20c850806117213f5a299dfc513d9/uvloop-0.22.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:561577354eb94200d75aca23fbde86ee11be36b00e52a4eaf8f50fb0c86b7705", size = 1358611, upload-time = "2025-10-16T22:16:36.833Z" },
|
| 3610 |
+
{ url = "https://files.pythonhosted.org/packages/d2/14/e301ee96a6dc95224b6f1162cd3312f6d1217be3907b79173b06785f2fe7/uvloop-0.22.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:1cdf5192ab3e674ca26da2eada35b288d2fa49fdd0f357a19f0e7c4e7d5077c8", size = 751811, upload-time = "2025-10-16T22:16:38.275Z" },
|
| 3611 |
+
{ url = "https://files.pythonhosted.org/packages/b7/02/654426ce265ac19e2980bfd9ea6590ca96a56f10c76e63801a2df01c0486/uvloop-0.22.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6e2ea3d6190a2968f4a14a23019d3b16870dd2190cd69c8180f7c632d21de68d", size = 4288562, upload-time = "2025-10-16T22:16:39.375Z" },
|
| 3612 |
+
{ url = "https://files.pythonhosted.org/packages/15/c0/0be24758891ef825f2065cd5db8741aaddabe3e248ee6acc5e8a80f04005/uvloop-0.22.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0530a5fbad9c9e4ee3f2b33b148c6a64d47bbad8000ea63704fa8260f4cf728e", size = 4366890, upload-time = "2025-10-16T22:16:40.547Z" },
|
| 3613 |
+
{ url = "https://files.pythonhosted.org/packages/d2/53/8369e5219a5855869bcee5f4d317f6da0e2c669aecf0ef7d371e3d084449/uvloop-0.22.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:bc5ef13bbc10b5335792360623cc378d52d7e62c2de64660616478c32cd0598e", size = 4119472, upload-time = "2025-10-16T22:16:41.694Z" },
|
| 3614 |
+
{ url = "https://files.pythonhosted.org/packages/f8/ba/d69adbe699b768f6b29a5eec7b47dd610bd17a69de51b251126a801369ea/uvloop-0.22.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:1f38ec5e3f18c8a10ded09742f7fb8de0108796eb673f30ce7762ce1b8550cad", size = 4239051, upload-time = "2025-10-16T22:16:43.224Z" },
|
| 3615 |
+
{ url = "https://files.pythonhosted.org/packages/90/cd/b62bdeaa429758aee8de8b00ac0dd26593a9de93d302bff3d21439e9791d/uvloop-0.22.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:3879b88423ec7e97cd4eba2a443aa26ed4e59b45e6b76aabf13fe2f27023a142", size = 1362067, upload-time = "2025-10-16T22:16:44.503Z" },
|
| 3616 |
+
{ url = "https://files.pythonhosted.org/packages/0d/f8/a132124dfda0777e489ca86732e85e69afcd1ff7686647000050ba670689/uvloop-0.22.1-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:4baa86acedf1d62115c1dc6ad1e17134476688f08c6efd8a2ab076e815665c74", size = 752423, upload-time = "2025-10-16T22:16:45.968Z" },
|
| 3617 |
+
{ url = "https://files.pythonhosted.org/packages/a3/94/94af78c156f88da4b3a733773ad5ba0b164393e357cc4bd0ab2e2677a7d6/uvloop-0.22.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:297c27d8003520596236bdb2335e6b3f649480bd09e00d1e3a99144b691d2a35", size = 4272437, upload-time = "2025-10-16T22:16:47.451Z" },
|
| 3618 |
+
{ url = "https://files.pythonhosted.org/packages/b5/35/60249e9fd07b32c665192cec7af29e06c7cd96fa1d08b84f012a56a0b38e/uvloop-0.22.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c1955d5a1dd43198244d47664a5858082a3239766a839b2102a269aaff7a4e25", size = 4292101, upload-time = "2025-10-16T22:16:49.318Z" },
|
| 3619 |
+
{ url = "https://files.pythonhosted.org/packages/02/62/67d382dfcb25d0a98ce73c11ed1a6fba5037a1a1d533dcbb7cab033a2636/uvloop-0.22.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:b31dc2fccbd42adc73bc4e7cdbae4fc5086cf378979e53ca5d0301838c5682c6", size = 4114158, upload-time = "2025-10-16T22:16:50.517Z" },
|
| 3620 |
+
{ url = "https://files.pythonhosted.org/packages/f0/7a/f1171b4a882a5d13c8b7576f348acfe6074d72eaf52cccef752f748d4a9f/uvloop-0.22.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:93f617675b2d03af4e72a5333ef89450dfaa5321303ede6e67ba9c9d26878079", size = 4177360, upload-time = "2025-10-16T22:16:52.646Z" },
|
| 3621 |
+
{ url = "https://files.pythonhosted.org/packages/79/7b/b01414f31546caf0919da80ad57cbfe24c56b151d12af68cee1b04922ca8/uvloop-0.22.1-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:37554f70528f60cad66945b885eb01f1bb514f132d92b6eeed1c90fd54ed6289", size = 1454790, upload-time = "2025-10-16T22:16:54.355Z" },
|
| 3622 |
+
{ url = "https://files.pythonhosted.org/packages/d4/31/0bb232318dd838cad3fa8fb0c68c8b40e1145b32025581975e18b11fab40/uvloop-0.22.1-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:b76324e2dc033a0b2f435f33eb88ff9913c156ef78e153fb210e03c13da746b3", size = 796783, upload-time = "2025-10-16T22:16:55.906Z" },
|
| 3623 |
+
{ url = "https://files.pythonhosted.org/packages/42/38/c9b09f3271a7a723a5de69f8e237ab8e7803183131bc57c890db0b6bb872/uvloop-0.22.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:badb4d8e58ee08dad957002027830d5c3b06aea446a6a3744483c2b3b745345c", size = 4647548, upload-time = "2025-10-16T22:16:57.008Z" },
|
| 3624 |
+
{ url = "https://files.pythonhosted.org/packages/c1/37/945b4ca0ac27e3dc4952642d4c900edd030b3da6c9634875af6e13ae80e5/uvloop-0.22.1-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b91328c72635f6f9e0282e4a57da7470c7350ab1c9f48546c0f2866205349d21", size = 4467065, upload-time = "2025-10-16T22:16:58.206Z" },
|
| 3625 |
+
{ url = "https://files.pythonhosted.org/packages/97/cc/48d232f33d60e2e2e0b42f4e73455b146b76ebe216487e862700457fbf3c/uvloop-0.22.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:daf620c2995d193449393d6c62131b3fbd40a63bf7b307a1527856ace637fe88", size = 4328384, upload-time = "2025-10-16T22:16:59.36Z" },
|
| 3626 |
+
{ url = "https://files.pythonhosted.org/packages/e4/16/c1fd27e9549f3c4baf1dc9c20c456cd2f822dbf8de9f463824b0c0357e06/uvloop-0.22.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:6cde23eeda1a25c75b2e07d39970f3374105d5eafbaab2a4482be82f272d5a5e", size = 4296730, upload-time = "2025-10-16T22:17:00.744Z" },
|
| 3627 |
+
]
|
| 3628 |
+
|
| 3629 |
[[package]]
|
| 3630 |
name = "watchfiles"
|
| 3631 |
version = "1.1.1"
|