Spaces:
Sleeping
Sleeping
Added sandbox
Browse files- README.md +66 -211
- docker/Dockerfile.sandbox +22 -0
- inference.py +33 -30
README.md
CHANGED
|
@@ -2,252 +2,107 @@
|
|
| 2 |
|
| 3 |
[](https://wandb.ai/ramprasathk07/patchhawk)
|
| 4 |
[](https://huggingface.co/ramprasathk07/patchhawk)
|
| 5 |
-
[](https://github.com/pytorch/openenv)
|
| 8 |
|
| 9 |
-
> **
|
| 10 |
|
| 11 |
---
|
| 12 |
|
| 13 |
-
##
|
| 14 |
-
|
| 15 |
-
```mermaid
|
| 16 |
-
graph TD
|
| 17 |
-
subgraph Data Pipeline
|
| 18 |
-
A["Meta SDK (Track A)"] -->|synthetic scenarios| D[scenarios.json]
|
| 19 |
-
B["Mutation Engine (Track B)"] -->|injected attacks| D
|
| 20 |
-
C["Benign .py files (25+)"] --> B
|
| 21 |
-
end
|
| 22 |
-
|
| 23 |
-
subgraph OpenEnv Loop
|
| 24 |
-
D --> E["PatchHawkEnv (openenv.core.Environment)"]
|
| 25 |
-
E -->|observation| F["GRPO Agent (Qwen2.5-Coder-7B + LoRA)"]
|
| 26 |
-
F -->|action 0-4| E
|
| 27 |
-
E -->|EXECUTE_SANDBOX| G["Docker Sandbox (--network none)"]
|
| 28 |
-
E -->|SUBMIT_PATCH| H["3-Stage Patch Validator"]
|
| 29 |
-
G -->|telemetry| E
|
| 30 |
-
H -->|validated?| E
|
| 31 |
-
end
|
| 32 |
-
|
| 33 |
-
subgraph Outputs
|
| 34 |
-
E -->|reward signal| F
|
| 35 |
-
F -->|metrics| I["W&B Dashboard"]
|
| 36 |
-
F -->|adapter| J["HuggingFace Hub"]
|
| 37 |
-
E -->|A2A protocol| K["FastAPI /agent/act"]
|
| 38 |
-
E --> L["Streamlit Dashboard"]
|
| 39 |
-
end
|
| 40 |
-
```
|
| 41 |
|
| 42 |
-
|
| 43 |
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
| **Training** | `patchhawk/training/train_grpo.py` | GRPO with unsloth + trl, 4-bit LoRA, W&B logging |
|
| 50 |
-
| **Data Generation** | `patchhawk/data/` | Track A (Meta SDK) + Track B (mutation engine) |
|
| 51 |
-
| **Dashboard** | `patchhawk/app/dashboard.py` | Streamlit UI for analysis & validation |
|
| 52 |
-
| **Docker** | `docker/Dockerfile.sandbox` | Minimal Python 3.11 sandbox |
|
| 53 |
|
| 54 |
---
|
| 55 |
|
| 56 |
-
##
|
| 57 |
-
|
| 58 |
-
### 1. Install
|
| 59 |
-
|
| 60 |
-
```bash
|
| 61 |
-
python3 -m venv venv && source venv/bin/activate
|
| 62 |
-
pip install -r requirements.txt
|
| 63 |
-
pip install -e .
|
| 64 |
-
|
| 65 |
-
# Copy and fill in your keys
|
| 66 |
-
cp .env.example .env
|
| 67 |
-
```
|
| 68 |
-
|
| 69 |
-
### 2. Build Docker Sandbox
|
| 70 |
-
|
| 71 |
-
```bash
|
| 72 |
-
docker build -t patchhawk-sandbox:latest -f docker/Dockerfile.sandbox .
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
### 3. Generate Scenarios
|
| 76 |
-
|
| 77 |
-
```bash
|
| 78 |
-
# Track B only (always works)
|
| 79 |
-
python -m patchhawk.data.generate_scenarios \
|
| 80 |
-
--benign-dir patchhawk/data/benign/ \
|
| 81 |
-
--output patchhawk/data/scenarios.json
|
| 82 |
-
|
| 83 |
-
# Track A + B (requires vLLM serving + synthetic-data-kit)
|
| 84 |
-
python -m patchhawk.data.generate_scenarios --use-sdk --sdk-samples 10
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
### 4. Run Tests
|
| 88 |
-
|
| 89 |
-
```bash
|
| 90 |
-
pytest tests/ -v
|
| 91 |
-
```
|
| 92 |
-
|
| 93 |
-
### 5. Train (Dry Run)
|
| 94 |
-
|
| 95 |
-
```bash
|
| 96 |
-
python -m patchhawk.training.train_grpo --dry-run
|
| 97 |
-
```
|
| 98 |
-
|
| 99 |
-
### 6. Train (Full GPU)
|
| 100 |
-
|
| 101 |
-
```bash
|
| 102 |
-
python -m patchhawk.training.train_grpo --use-docker
|
| 103 |
-
```
|
| 104 |
|
| 105 |
-
|
| 106 |
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
```
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
# Analyze code
|
| 118 |
-
curl -X POST http://localhost:8000/agent/act \
|
| 119 |
-
-H "Content-Type: application/json" \
|
| 120 |
-
-d '{"code_snippet": "import os; os.system(\"rm -rf /\")"}'
|
| 121 |
-
```
|
| 122 |
-
|
| 123 |
-
### 8. Launch Dashboard
|
| 124 |
-
|
| 125 |
-
```bash
|
| 126 |
-
streamlit run patchhawk/app/dashboard.py
|
| 127 |
-
```
|
| 128 |
|
| 129 |
---
|
| 130 |
|
| 131 |
-
##
|
| 132 |
-
These are the baseline scores from running `DRY_RUN=1 python inference.py`, matching the three required hackathon tasks using our heuristic policy baseline:
|
| 133 |
|
| 134 |
-
|
| 135 |
-
|---------|-------------|----------|-------|--------|
|
| 136 |
-
| `easy_typosquat` | Detect basic typosquatting | β
True | 1.00 | +2.00 |
|
| 137 |
-
| `medium_obfuscated` | Execution via eval() / obfuscation | β False | 0.00 | +2.00 |
|
| 138 |
-
| `hard_patch` | Write patches for subprocess backdoors | β False | 0.00 | +2.00 |
|
| 139 |
|
| 140 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
-
|
|
|
|
|
|
|
| 143 |
|
| 144 |
-
|
| 145 |
|
| 146 |
-
|
| 147 |
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
|
| 157 |
---
|
| 158 |
|
| 159 |
-
##
|
| 160 |
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
| `/agent/act` | POST | Submit code snippet, receive decision + patch |
|
| 167 |
-
|
| 168 |
-
### Request (`POST /agent/act`)
|
| 169 |
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
}
|
| 174 |
-
```
|
| 175 |
|
| 176 |
-
#
|
|
|
|
| 177 |
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
"decision": "BLOCK_PR",
|
| 181 |
-
"patch": null,
|
| 182 |
-
"confidence": 0.92,
|
| 183 |
-
"reward": 2.0,
|
| 184 |
-
"details": { "action_name": "BLOCK_PR", "step": 2 }
|
| 185 |
-
}
|
| 186 |
```
|
| 187 |
|
| 188 |
---
|
| 189 |
|
| 190 |
-
##
|
| 191 |
-
|
| 192 |
-
| Condition | Reward |
|
| 193 |
-
|-----------|--------|
|
| 194 |
-
| Correct BLOCK on malicious | +2.0 |
|
| 195 |
-
| Correct SUBMIT_PATCH (validated) | +3.0 |
|
| 196 |
-
| BLOCK on benign | β1.0 |
|
| 197 |
-
| SUBMIT_PATCH on benign (patch applied) | β1.5 |
|
| 198 |
-
| Episode ends w/o block/patch on malicious (max_steps) | β5.0 |
|
| 199 |
-
| EXECUTE_SANDBOX | +0.1 |
|
| 200 |
-
|
| 201 |
-
---
|
| 202 |
-
|
| 203 |
-
## π Safety & OpenEnv Compliance
|
| 204 |
|
| 205 |
-
|
| 206 |
-
- **Re-attack verification**: Stage 3 only checks for *attempts* (socket creation, file writes) β never permits actual harm.
|
| 207 |
-
- **SDK fallback**: If `synthetic-data-kit` CLI is not installed, Track A gracefully skips with a warning; Track B always generates β₯40 scenarios.
|
| 208 |
-
- **OpenEnv compliant**: `PatchHawkEnv` inherits `openenv.core.Environment` with proper `reset()` β `Observation` and `step(Action)` β `Observation` signatures (reward/done on observation).
|
| 209 |
-
- **Deterministic dry-run**: `--dry-run` mode requires zero GPU and no external services.
|
| 210 |
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
## π Project Structure
|
| 214 |
-
|
| 215 |
-
```
|
| 216 |
-
PatchHawk/
|
| 217 |
-
βββ patchhawk/
|
| 218 |
-
β βββ __init__.py # Config loader
|
| 219 |
-
β βββ env_models.py # Pydantic Action/Observation/State models
|
| 220 |
-
β βββ agent/
|
| 221 |
-
β β βββ __init__.py
|
| 222 |
-
β β βββ environment.py # openenv.Env implementation
|
| 223 |
-
β β βββ sandbox.py # Docker runner + patch validator
|
| 224 |
-
β β βββ server.py # FastAPI A2A protocol
|
| 225 |
-
β βββ training/
|
| 226 |
-
β β βββ __init__.py
|
| 227 |
-
β β βββ train_grpo.py # GRPO with unsloth + trl + W&B
|
| 228 |
-
β βββ data/
|
| 229 |
-
β β βββ __init__.py
|
| 230 |
-
β β βββ generate_scenarios.py
|
| 231 |
-
β β βββ sdk_config.yaml
|
| 232 |
-
β β βββ scenarios.json
|
| 233 |
-
β β βββ benign/ # 25 benign Python files
|
| 234 |
-
β βββ app/
|
| 235 |
-
β βββ __init__.py
|
| 236 |
-
β βββ dashboard.py # Streamlit dashboard
|
| 237 |
-
βββ docker/
|
| 238 |
-
β βββ Dockerfile.sandbox # Minimal Python 3.11 sandbox
|
| 239 |
-
βββ tests/
|
| 240 |
-
β βββ test_env.py # Environment tests
|
| 241 |
-
β βββ test_sandbox.py # Validator tests
|
| 242 |
-
βββ config.yaml # All hyperparameters
|
| 243 |
-
βββ requirements.txt
|
| 244 |
-
βββ setup.py
|
| 245 |
-
βββ .env.example
|
| 246 |
-
βββ README.md
|
| 247 |
```
|
| 248 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 249 |
---
|
| 250 |
|
| 251 |
## π License
|
| 252 |
-
|
| 253 |
-
MIT Β© PatchHawk Team
|
|
|
|
| 2 |
|
| 3 |
[](https://wandb.ai/ramprasathk07/patchhawk)
|
| 4 |
[](https://huggingface.co/ramprasathk07/patchhawk)
|
| 5 |
+
[](https://python.org)
|
| 6 |
+
[](https://github.com/pytorch/openenv)
|
|
|
|
| 7 |
|
| 8 |
+
> **PatchHawk is an autonomous DevSecOps agent powered by Group Relative Policy Optimization (GRPO). It doesn't just detect vulnerabilities; it validates them in isolated containers and generates verified patches.**
|
| 9 |
|
| 10 |
---
|
| 11 |
|
| 12 |
+
## π The Approach: Cyber-Physical RL Loop
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
+
Most security LLMs suffer from "hallucinated security"βthey claim a bug is fixed without ever running the code. PatchHawk solves this by implementing a **Cyber-Physical Reinforcement Learning Loop**:
|
| 15 |
|
| 16 |
+
1. **Detection**: The agent analyzes code snippets for supply-chain attacks (typosquatting, backdoors, exfiltration).
|
| 17 |
+
2. **Simulation**: The agent can choose to "Detonate" suspicious code in a hardened **Docker Sandbox** to observe real syscalls and network behavior.
|
| 18 |
+
3. **Correction**: If malicious, the agent generates a Python patch.
|
| 19 |
+
4. **Verification**: The environment automatically runs the patch through a 3-stage validation (Syntax -> Unit Tests -> Re-Attack Detonation) inside Docker.
|
| 20 |
+
5. **Reward**: The model is rewarded only if the patch **natively passes** all stages.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
+
## π§ Training Style: GRPO (Group Relative Policy Optimization)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
PatchHawk uses **GRPO**, the same technique used in DeepSeek-R1, to train our security agent via trial and error.
|
| 27 |
|
| 28 |
+
- **Trial & Error**: The model is tasked with fixing complex vulnerabilities. It generates multiple attempts (Groups) for the same problem.
|
| 29 |
+
- **XML Reasoning**: The model is trained to use absolute XML structure:
|
| 30 |
+
```xml
|
| 31 |
+
<thought>Analyze the base64 encoded string... it is a reverse shell.</thought>
|
| 32 |
+
<risk_score>0.98</risk_score>
|
| 33 |
+
<action>3</action>
|
| 34 |
+
<patch>import os...</patch>
|
| 35 |
+
```
|
| 36 |
+
- **Relative Scoring**: Instead of using a static "Teacher" model, PatchHawk compares the scores of the 4 attempts against each other. It learns that the attempt that passed the **Docker Syntax Check** is superior to the one that didn't.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
+
## π Action Space & Scoring Rubric (0.0 to 1.0 Evaluator)
|
|
|
|
| 41 |
|
| 42 |
+
The environment manages a complex reward system to move beyond sparse "win/loss" signals.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
| Action ID | Action Name | Reward (Base) | Logic |
|
| 45 |
+
| :--- | :--- | :--- | :--- |
|
| 46 |
+
| **0** | **ANALYZE** | `0.0` | "Do nothing/Observe". Optimal for benign code. |
|
| 47 |
+
| **1** | **EXECUTE_SANDBOX** | `+0.1` | Safely detonate payload in Docker and extract telemetry. |
|
| 48 |
+
| **2** | **BLOCK_PR** | `+2.0 / -1.0` | Reject PR. Heavily rewarded for malware, penalized for False Positives. |
|
| 49 |
+
| **3** | **SUBMIT_PATCH** | **+3.0 / -1.5** | **The Goal.** Reward requires a clean run in the Docker Sandbox. |
|
| 50 |
+
| **4** | **REQUEST_REVIEW** | `0.0` | Escalate to a human expert. |
|
| 51 |
|
| 52 |
+
### π Dynamic Bonuses
|
| 53 |
+
* **Risk Accuracy Bonus (+2.0)**: The agent earns a reward of `(1.0 - abs(actual - predicted)) * 2.0`. This ensures it learns to accurately classify risk even if it doesn't take the aggressive patch action.
|
| 54 |
+
* **Safety Penalty (-1.0)**: Any patch that fails a Docker syntax check or units tests results in a heavy penalty to discourage "lazy packaging".
|
| 55 |
|
| 56 |
+
---
|
| 57 |
|
| 58 |
+
## π³ Docker Usage & Security
|
| 59 |
|
| 60 |
+
PatchHawk requires a local Docker daemon. The sandbox is strictly isolated:
|
| 61 |
+
- **No Network**: Containers run with `--network none`.
|
| 62 |
+
- **Resource Caps**: Limited to `256MB RAM` and `0.5 CPU` cores.
|
| 63 |
+
- **Non-Root**: Tasks execute as a limited-privilege user.
|
| 64 |
+
- **Validation**: The 3-stage pipeline checks:
|
| 65 |
+
1. `py_compile`: Does the patch even run?
|
| 66 |
+
2. `pytest`: Does it break existing functionality?
|
| 67 |
+
3. `Re-Attack`: If we run the original exploit, does the new patch stop it?
|
| 68 |
|
| 69 |
---
|
| 70 |
|
| 71 |
+
## π Installation
|
| 72 |
|
| 73 |
+
```bash
|
| 74 |
+
# 1. Clone & Install
|
| 75 |
+
git clone https://github.com/ramprasathk07/PatchHawk.git
|
| 76 |
+
cd PatchHawk
|
| 77 |
+
pip install -r requirements.txt
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
+
# 2. Setup Environment
|
| 80 |
+
cp .env.example .env
|
| 81 |
+
# Fill in HF_TOKEN for local LLM fallback
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
# 3. Build the Validator Box
|
| 84 |
+
docker build -t patchhawk-sandbox:latest -f docker/Dockerfile.sandbox .
|
| 85 |
|
| 86 |
+
# 4. Generate the Training Dataset (1,500 samples)
|
| 87 |
+
python -m patchhawk.data.generate_scenarios --num-samples 1500
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
```
|
| 89 |
|
| 90 |
---
|
| 91 |
|
| 92 |
+
## π Dashboard & UI
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
+
Launch the **Security Operations Center (SOC)** to watch the agent work in real-time:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
+
```bash
|
| 97 |
+
streamlit run patchhawk/app/dashboard.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
```
|
| 99 |
|
| 100 |
+
Features:
|
| 101 |
+
- **Terminal Trace**: See the raw thought process (XML/JSON) of the agent.
|
| 102 |
+
- **Docker Telemetry**: View real-time output from the sandbox validation.
|
| 103 |
+
- **Reward Signal**: Audit why the agent earned (+/-) rewards for its specific decision.
|
| 104 |
+
|
| 105 |
---
|
| 106 |
|
| 107 |
## π License
|
| 108 |
+
MIT Β© Ramprasath K & The PatchHawk Team
|
|
|
docker/Dockerfile.sandbox
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π¦
PatchHawk: Isolated Python Sandbox
|
| 2 |
+
# Used for the EXECUTE_SANDBOX (Stage 1) and SUBMIT_PATCH (Stage 3) validation.
|
| 3 |
+
|
| 4 |
+
FROM python:3.11-slim
|
| 5 |
+
|
| 6 |
+
# System dependencies for unit testing
|
| 7 |
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 8 |
+
gcc \
|
| 9 |
+
python3-dev \
|
| 10 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 11 |
+
|
| 12 |
+
WORKDIR /app
|
| 13 |
+
|
| 14 |
+
# Pre-install pytest for the validator
|
| 15 |
+
RUN pip install --no-cache-dir pytest
|
| 16 |
+
|
| 17 |
+
# Create a non-privileged user for security
|
| 18 |
+
RUN useradd -m sandbox
|
| 19 |
+
USER sandbox
|
| 20 |
+
|
| 21 |
+
# The environment mounts the code into /app at runtime
|
| 22 |
+
CMD ["python3"]
|
inference.py
CHANGED
|
@@ -65,33 +65,34 @@ TASK_DEFS = [
|
|
| 65 |
# ββ Prompt builder βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 66 |
|
| 67 |
SYSTEM_PROMPT = """\
|
| 68 |
-
You are PatchHawk, a security agent that detects supply-chain vulnerabilities in Python code.
|
| 69 |
-
|
| 70 |
-
Given a code snippet
|
| 71 |
-
|
| 72 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
{
|
| 74 |
-
"reasoning": "
|
| 75 |
-
"risk_score": <float>,
|
| 76 |
-
"action_type": <int>,
|
| 77 |
-
"patch_content": "<str|null>"
|
| 78 |
}
|
| 79 |
-
|
| 80 |
-
## Action Type Guidelines
|
| 81 |
-
- **0 ANALYZE** β No immediate threat, but needs deeper review.
|
| 82 |
-
- **1 EXECUTE_SANDBOX** β Suspicious but not obviously malicious; run in isolated environment.
|
| 83 |
-
- **2 BLOCK_PR** β Severely malicious, unfixable (e.g., hidden backdoor, remote shell). Reject PR.
|
| 84 |
-
- **3 SUBMIT_PATCH** β Vulnerability can be fixed. Provide corrected code in `patch_content`.
|
| 85 |
-
- **4 REQUEST_REVIEW** β Complex or ambiguous; require human expert.
|
| 86 |
-
|
| 87 |
-
## Rules
|
| 88 |
-
- `reasoning` must be thorough: describe the flaw, its impact (CWE if known), and stepβbyβstep how to patch.
|
| 89 |
-
- Escape all double quotes inside strings with backslash (`\"`).
|
| 90 |
-
- If the code is benign, set `risk_score` β€ 0.2, `action_type` = 0, and `patch_content` = null.
|
| 91 |
-
- Never include comments or explanations outside the JSON object.
|
| 92 |
-
|
| 93 |
-
**Example valid response:**
|
| 94 |
-
{"reasoning": "Hardcoded password 'admin123' in __init__ allows credential bypass. Replace with env var.", "risk_score": 0.85, "action_type": 3, "patch_content": "import os\\nclass Malicious:\\n def __init__(self):\\n self.cache = []\\n self.password = os.getenv('DB_PASS')\\n ..."}
|
| 95 |
"""
|
| 96 |
|
| 97 |
# SYSTEM_PROMPT = """\
|
|
@@ -116,13 +117,15 @@ Given a code snippet and static analysis flags, you must respond **EXACTLY** wit
|
|
| 116 |
def _build_user_prompt(obs: PatchHawkObservation, step: int) -> str:
|
| 117 |
parts = [
|
| 118 |
f"## Step {step}",
|
| 119 |
-
f"**Code
|
| 120 |
-
f"**
|
| 121 |
-
f"**Risk
|
| 122 |
]
|
| 123 |
if obs.sandbox_telemetry:
|
| 124 |
-
parts.append(f"**Sandbox
|
| 125 |
-
|
|
|
|
|
|
|
| 126 |
return "\n\n".join(parts)
|
| 127 |
|
| 128 |
|
|
|
|
| 65 |
# ββ Prompt builder βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 66 |
|
| 67 |
SYSTEM_PROMPT = """\
|
| 68 |
+
You are PatchHawk, a highly critical security agent that detects supply-chain vulnerabilities in Python code.
|
| 69 |
+
|
| 70 |
+
Given a code snippet, your goal is to decide the safest path forward.
|
| 71 |
+
|
| 72 |
+
## π― Your Decision Heuristics:
|
| 73 |
+
1. **Risk Score Calibration**:
|
| 74 |
+
- **0.9 - 1.0 (CRITICAL)**: Code contains obvious backdoors, remote shells, `base64` obfuscated `exec`, or unauthorized `os.system` calls.
|
| 75 |
+
- **0.5 - 0.8 (HIGH)**: Suspicious imports, typosquatting (e.g. `requests` -> `requesst`), or insecure usage of `eval`.
|
| 76 |
+
- **0.0 - 0.2 (LOW)**: Benign, standard Python code.
|
| 77 |
+
2. **Action Selection**:
|
| 78 |
+
- **0 ANALYZE**: Standard observation for low-risk code.
|
| 79 |
+
- **1 EXECUTE_SANDBOX**: Choose this if you see suspicious activity but need to confirm if it makes network calls or writes files.
|
| 80 |
+
- **2 BLOCK_PR**: Use for unfixable, malicious backdoors.
|
| 81 |
+
- **3 SUBMIT_PATCH**: If the code has a fixable vulnerability (e.g. lack of sanitization, typo), you **MUST** provide the corrected code in `patch_content`.
|
| 82 |
+
- **4 REQUEST_REVIEW**: Only for extreme ambiguity.
|
| 83 |
+
|
| 84 |
+
## π Rules for Output JSON:
|
| 85 |
+
- **EXACT JSON ONLY**. No markdown blocks, no extra text.
|
| 86 |
+
- **Patch Content**: If `action_type` is 3, `patch_content` **CANNOT** be null. It must be the full, corrected Python script.
|
| 87 |
+
- **Risk Score**: Be precise. Do not default to 0.0 if you see any suspicious imports.
|
| 88 |
+
|
| 89 |
+
## Response Format:
|
| 90 |
{
|
| 91 |
+
"reasoning": "Step-by-step security analysis...",
|
| 92 |
+
"risk_score": <float>,
|
| 93 |
+
"action_type": <int>,
|
| 94 |
+
"patch_content": "<str|null>"
|
| 95 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
"""
|
| 97 |
|
| 98 |
# SYSTEM_PROMPT = """\
|
|
|
|
| 117 |
def _build_user_prompt(obs: PatchHawkObservation, step: int) -> str:
|
| 118 |
parts = [
|
| 119 |
f"## Step {step}",
|
| 120 |
+
f"**Target Code Snippet:**\n```python\n{obs.code_snippet}\n```",
|
| 121 |
+
f"**Environment Analysis Flags:** {obs.static_flags}",
|
| 122 |
+
f"**Environment Initial Risk Assessment:** {obs.risk_score}",
|
| 123 |
]
|
| 124 |
if obs.sandbox_telemetry:
|
| 125 |
+
parts.append(f"**Sandbox Telemetry (Crucial Evidence):**\n```\n{obs.sandbox_telemetry}\n```")
|
| 126 |
+
|
| 127 |
+
parts.append("\n**TASK:** Based on the above code and evidence, provide your own `risk_score` and decide the next `action_type`. If suspicious but unconfirmed, use EXECUTE_SANDBOX (1) to collect telemetry.")
|
| 128 |
+
parts.append("Respond with the required JSON object only.")
|
| 129 |
return "\n\n".join(parts)
|
| 130 |
|
| 131 |
|