Spaces:

Ramkan7
/

Patch_Hawk

Running

App Files Files Community

kanishcr7 commited on Apr 8

Commit

00ed537

1 Parent(s): eefda8d

Merged from sub

Browse files

Files changed (3) hide show

README.md +71 -56
requirements.txt +1 -1
server/app.py +40 -0

README.md CHANGED Viewed

@@ -10,19 +10,21 @@ pinned: false
 # 🦅 PatchHawk: Autonomous Supply-Chain Guard
-[![W&B](https://img.shields.io/badge/W%26B-patchhawk-blue?logo=weightsandbiases)](https://wandb.ai/ramprasathk07/patchhawk)
-[![HuggingFace](https://img.shields.io/badge/🤗_Model-patchhawk-yellow)](https://huggingface.co/ramprasathk07/patchhawk)
-[![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://python.org)
-[![OpenEnv](https://img.shields.io/badge/OpenEnv-Hackathon_Finalist-orange)](https://github.com/pytorch/openenv)
-[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
-> **PatchHawk is an state-of-the-art autonomous DevSecOps agent powered by Group Relative Policy Optimization (GRPO). It goes beyond detection by validating vulnerabilities in isolated Docker sandboxes and generating verified, syntax-correct patches.**
 ---
-## 📽️ The Vision: Cyber-Physical RL Loop
-Traditional security scanners often produce high signal-to-noise ratios and "hallucinated" vulnerabilities. PatchHawk bridges this gap by implementing a **Cyber-Physical Reinforcement Learning Loop**, where the model's reward is tied to the actual execution success of its patches in a real environment.
 ```mermaid
 graph TD
@@ -34,43 +36,46 @@ graph TD
     B -->|Patch| G[Verification Pipeline]
     G -->|Syntax Check| H{Success?}
     G -->|Unit Tests| I{Pass?}
-    G -->|Re-Attack| J{Defeated?}
     H & I & J -->|All Pass| K[Positive Reward +3.0]
     H | I | J -->|Failure| L[Negative Penalty -1.5]
-    K --> M[Model Update/Optimization]
 ```
 ---
 ## ✨ Key Features
--   🛡️ **Autonomous Detection**: Sophisticated analysis of supply-chain vectors (typosquatting, backdoors, exfiltration).
--   🐳 **Hardened Sandboxing**: High-fidelity Docker isolation with zero-network access and strict resource caps.
--   🧠 **GRPO-Driven Learning**: Uses Group Relative Policy Optimization (DeepSeek-R1 style) for reasoning and trial-and-error mastery.
--   🧩 **XML Reasoning**: Enforces a structured `<thought>...</thought>` chain for transparent decision-making.
--   📊 **SOC Dashboard**: Real-time Streamlit interface for auditing agent behavior and reward telemetry.
--   ✅ **OpenEnv Compliant**: Fully integrated with the [PyTorch OpenEnv](https://github.com/pytorch/openenv) framework.
 ---
-## 🛠 Project Structure
-The codebase is organized into modular components for training, inference, and environment simulation.
 ```text
 PatchHawk/
-├── src/envs/patchhawk/    # 📦 Core OpenEnv Submission Package
 │   ├── server/            # FastAPI environment server
-│   ├── models.py          # Type-safe contract definitions
 │   ├── client.py          # Environment interaction client
 │   └── inference.py       # Main agent execution loop
-├── patchhawk/             # 🧠 Logic & Training
 │   ├── data/              # Scenario generation & datasets
-│   ├── training/          # GRPO/Unsloth training scripts
 │   └── app/               # Streamlit SOC Dashboard
 ├── docker/                # 🐳 Container configurations
 ├── config.yaml            # Environment & Agent configuration
-└── openenv.yaml           # OpenEnv metadata
 ```
 ---
@@ -79,9 +84,10 @@ PatchHawk/
 ### Prerequisites
--   **Python 3.12+**
--   **Docker Engine** (running locally)
--   **Nvidia GPU** (8GB+ VRAM recommended for local training/inference)
 ### 1. Installation
@@ -90,79 +96,88 @@ PatchHawk/
 git clone https://github.com/ramprasathk07/PatchHawk.git
 cd PatchHawk
-# Create virtual environment and install core dependencies
 python -m venv .venv
-source .venv/bin/activate  # Windows: .venv\Scripts\activate
 pip install -e .
 ```
 ### 2. Environment Setup
 ```bash
-# Setup environment variables
 cp .env.example .env
-# Edit .env to include your HF_TOKEN and OpenAI/Anthropic keys
-# Build the validation sandbox
 docker build -t patchhawk-sandbox:latest -f docker/Dockerfile.sandbox .
 ```
 ### 3. Running the Agent (Dry Run)
 ```bash
-# Start the environment server
 python -m server.app --port 8000
-# Execute the inference loop
 python src/envs/patchhawk/inference.py --env-url http://localhost:8000
 ```
 ---
-## 💎 Reward Rubric (Action Space)
-PatchHawk implements a granular scoring system to guide the agent toward safe and effective decisions.
 | Action ID | Action Name | Base Reward | Success Criteria |
 | :--- | :--- | :--- | :--- |
-| **0** | `ANALYZE` | `0.0` | Observation step; used for data gathering. |
-| **1** | `DETONATE` | `+0.1` | Successfully extract telemetry from Docker. |
-| **2** | `BLOCK_PR` | `+2.0 / -1.0` | Rewarded for malware; penalized for False Positives. |
-| **3** | `SUBMIT_PATCH` | `+3.0 / -1.5` | **The Goal.** Requires pass in Syntax -> Test -> Re-Attack. |
-| **4** | `ESCALATE` | `0.0` | Hand off to human expert if uncertainty is high. |
-### Dynamic Scaling
--   **Risk Accuracy**: Agent receives up to `+2.0` bonus for predicting the exact risk score.
--   **Safety Multiplier**: Frequent failed syntax checks trigger a decay factor on all rewards.
 ---
 ## 📈 Dashboard & UI
-Launch the **Security Operations Center (SOC)** to watch the agent reason in real-time.
 ```bash
 streamlit run patchhawk/app/dashboard.py
 ```
--   **Terminal Trace**: Live XML reasoning logs.
--   **Docker Monitor**: Real-time stdout/stderr from the sandbox.
--   **Reward Audit**: Detailed breakdown of why specific points were awarded.
 ---
-## 🗺️ Roadmap
--   [ ] **Multi-Agent Coordination**: Deploying "Attacker" vs "Defender" models for automated red-teaming.
--   [ ] **CVE Ingestion**: Automated generation of training scenarios from current NVD databases.
--   [ ] **Cross-Language Support**: Expanding beyond Python to Go, Javascript, and Rust.
--   [ ] **Kubernetes Native**: Orchestrating sandboxes at scale using K8s instead of local Docker.
 ---
 ## 📝 License
-Distributed under the **MIT License**. See `LICENSE` or the project root for more information.
-Developed with ❤️ by **Ramprasath K & The PatchHawk Team**
-Ramprasath K & The PatchHawk Team

 # 🦅 PatchHawk: Autonomous Supply-Chain Guard
+[![Weights & Biases](https://img.shields.io/badge/Weights%20%26%20Biases-FFBE00?logo=weightsandbiases&logoColor=black)](https://wandb.ai)
+[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co)
+[![Python 3.12](https://img.shields.io/badge/Python-3.12-blue?logo=python&logoColor=white)](https://python.org)
+[![OpenEnv](https://img.shields.io/badge/OpenEnv-Compliant-2ea44f)](https://openenv.dev)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+**Built for the OpenEnv Hackathon 2026 by Meta**
+PatchHawk is an autonomous DevSecOps agent powered by Group Relative Policy Optimization (GRPO). It moves beyond static vulnerability detection by validating findings inside isolated Docker sandboxes and generating verified, syntactically correct patches. The system closes the loop between detection, validation, and remediation through a cyber‑physical reinforcement learning feedback cycle.
 ---
+## 📽️ The Vision: Cyber‑Physical RL Loop
+Traditional security scanners suffer from high false‑positive rates and often report vulnerabilities that cannot be exploited or fixed in practice. PatchHawk addresses this by implementing a reinforcement learning loop where the model's reward is tied directly to the success of its patches inside a real execution environment.
 ```mermaid
 graph TD
     B -->|Patch| G[Verification Pipeline]
     G -->|Syntax Check| H{Success?}
     G -->|Unit Tests| I{Pass?}
+    G -->|Re‑Attack| J{Defeated?}
     H & I & J -->|All Pass| K[Positive Reward +3.0]
     H | I | J -->|Failure| L[Negative Penalty -1.5]
+    K --> M[Model Update / Optimization]
+    L --> M
 ```
+The agent learns to produce patches that not only compile but also withstand re‑execution of the original exploit vector.
 ---
 ## ✨ Key Features
+-   🛡️ **Autonomous Detection**: Sophisticated supply‑chain analysis identifying typosquatting, backdoors, data exfiltration, and malicious logic in dependencies.
+-   🐳 **Hardened Sandboxing**: High‑fidelity Docker isolation with network‑disabled execution, strict resource caps, and ephemeral file systems to safely detonate suspicious code.
+-   🧠 **GRPO‑Driven Learning**: Group Relative Policy Optimization (inspired by DeepSeek‑R1) enables trial‑and‑error mastery and structured reasoning without a separate critic model.
+-   🧩 **XML Reasoning Traces**: All agent decisions are accompanied by a machine‑readable `<thought>...</thought>` block, providing full auditability of the decision‑making process.
+-   📊 **SOC Dashboard**: Real‑time Streamlit interface for monitoring agent behavior, sandbox telemetry, and reward breakdowns.
+-   ✅ **OpenEnv Compliance**: Fully integrated with the PyTorch OpenEnv framework, ensuring reproducible and shareable reinforcement learning environments.
 ---
+## 🛠️ Project Structure
 ```text
 PatchHawk/
+├── src/envs/patchhawk/    # 📦 OpenEnv Submission Package
 │   ├── server/            # FastAPI environment server
+│   ├── models.py          # Type‑safe contract definitions
 │   ├── client.py          # Environment interaction client
 │   └── inference.py       # Main agent execution loop
+├── patchhawk/             # 🧠 Core Logic & Training
 │   ├── data/              # Scenario generation & datasets
+│   ├── training/          # GRPO / Unsloth training scripts
 │   └── app/               # Streamlit SOC Dashboard
 ├── docker/                # 🐳 Container configurations
 ├── config.yaml            # Environment & Agent configuration
+├── openenv.yaml           # OpenEnv metadata
+├── .env.example           # Environment variable template
+└── README.md
 ```
 ---
 ### Prerequisites
+-   Python 3.12 or higher
+-   Docker Engine (running locally, with buildx available)
+-   NVIDIA GPU (8 GB VRAM or more recommended for training and inference)
+-   Hugging Face account and token (for model access)
 ### 1. Installation
 git clone https://github.com/ramprasathk07/PatchHawk.git
 cd PatchHawk
+# Create and activate a virtual environment
 python -m venv .venv
+source .venv/bin/activate      # On Windows: .venv\Scripts\activate
+# Install core dependencies
 pip install -e .
 ```
 ### 2. Environment Setup
 ```bash
+# Copy the environment template and populate your keys
 cp .env.example .env
+# Edit .env to include HF_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.
+# Build the validation sandbox Docker image
 docker build -t patchhawk-sandbox:latest -f docker/Dockerfile.sandbox .
 ```
 ### 3. Running the Agent (Dry Run)
 ```bash
+# Start the environment server (in one terminal)
 python -m server.app --port 8000
+# Execute the inference loop (in another terminal)
 python src/envs/patchhawk/inference.py --env-url http://localhost:8000
 ```
 ---
+## 💎 Reward Rubric
+The agent is guided by a granular reward structure that encourages safe, effective, and verifiable actions.
 | Action ID | Action Name | Base Reward | Success Criteria |
 | :--- | :--- | :--- | :--- |
+| **0** | `ANALYZE` | `0.0` | Observation step; used solely for data gathering. |
+| **1** | `DETONATE` | `+0.1` | Successfully extract telemetry from the Docker sandbox. |
+| **2** | `BLOCK_PR` | `+2.0 / -1.0` | Positive reward when correctly blocking a malicious PR; negative penalty for false positives. |
+| **3** | `SUBMIT_PATCH` | `+3.0 / -1.5` | The primary goal. Reward requires passing syntax check, unit tests, and a re‑attack validation. |
+| **4** | `ESCALATE` | `0.0` | Hands off to a human expert when uncertainty exceeds a configurable threshold. |
+### Dynamic Scaling Factors
+-   **Risk Accuracy Bonus**: Up to `+2.0` additional reward for accurately predicting the risk score of a vulnerability.
+-   **Safety Multiplier**: Repeated syntax check failures apply a decay factor to all future rewards.
 ---
 ## 📈 Dashboard & UI
+Launch the **Security Operations Center (SOC)** dashboard to observe the agent's reasoning in real time.
 ```bash
 streamlit run patchhawk/app/dashboard.py
 ```
+The dashboard provides:
+-   Live XML reasoning logs from the agent.
+-   Real‑time stdout/stderr streams from the Docker sandbox.
+-   Detailed audit trail of reward assignments and verification outcomes.
 ---
+## 🗺️ Roadmap & Future Work
+-   [ ] **Multi‑Agent Coordination**: Deploy attacker and defender models for automated red‑teaming exercises.
+-   [ ] **CVE Ingestion**: Automatically generate training scenarios from the National Vulnerability Database (NVD).
+-   [ ] **Cross-Language Support**: Expand beyond Python to Go, JavaScript, Rust, and Java.
+-   [ ] **Kubernetes Native**: Orchestrate sandboxes at scale using Kubernetes instead of local Docker.
+-   [ ] **Fine‑Tuned Vulnerability Model**: Train a specialized 7B parameter LLM (e.g., VulnLLM‑R) on vulnerability‑fixing commits.
+-   [ ] **Context‑Aware Analysis**: Integrate Code Property Graph (CPG) slicing for LLM‑based semantic vulnerability detection.
+-   [ ] **Silent Patch Detection**: Identify security‑relevant commits that were not publicly disclosed.
+-   [ ] **AI‑Generated Code Audit**: Trace vulnerabilities back to AI coding assistants (e.g., GitHub Copilot, ChatGPT).
+-   [ ] **Automated PR Remediation**: Generate and submit fix‑containing pull requests for detected vulnerabilities.
+-   [ ] **Adversarial Training Loop**: Implement a self‑improving LLM‑vs‑LLM red‑team / blue‑team training regimen.
+-   [ ] **Supply‑Chain Malware Detection**: Extend dependency analysis to identify novel, unpublished attack patterns.
 ---
 ## 📝 License
+Distributed under the **MIT License**. See the LICENSE file in the repository root for full details.
+Developed with ❤️ by **Ramprasath K & The PatchHawk Team** for the OpenEnv Hackathon 2026 hosted by Meta.

requirements.txt CHANGED Viewed

@@ -1,5 +1,5 @@
 # Core
-openenv-core>=0.2.0
 openai>=1.0.0
 numpy>=1.24.0
 PyYAML>=6.0

 # Core
+openenv-core[ui]>=0.2.0
 openai>=1.0.0
 numpy>=1.24.0
 PyYAML>=6.0

server/app.py CHANGED Viewed

@@ -30,6 +30,7 @@ from openenv.core import create_app
 from patchhawk.agent.environment import PatchHawkEnv
 from patchhawk.env_models import PatchHawkAction, PatchHawkObservation
 def _env_factory() -> PatchHawkEnv:
@@ -50,6 +51,45 @@ def create_openenv_app():
 app = create_openenv_app()
 def main(port: int | None = None) -> None:
     """Start the PatchHawk OpenEnv server."""

 from patchhawk.agent.environment import PatchHawkEnv
 from patchhawk.env_models import PatchHawkAction, PatchHawkObservation
+from fastapi.responses import HTMLResponse
 def _env_factory() -> PatchHawkEnv:
 app = create_openenv_app()
+@app.get("/", response_class=HTMLResponse)
+def root_dashboard():
+    return """
+    <!DOCTYPE html>
+    <html>
+    <head>
+        <title>PatchHawk | Autonomous DevSecOps SOC</title>
+        <style>
+            body { font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; background-color: #0d1117; color: #c9d1d9; display: flex; flex-direction: column; align-items: center; justify-content: center; height: 100vh; margin: 0; }
+            .container { background: #161b22; padding: 40px; border-radius: 12px; border: 1px solid #30363d; box-shadow: 0 10px 30px rgba(0,0,0,0.5); text-align: center; max-width: 600px; }
+            h1 { color: #58a6ff; margin-bottom: 10px; }
+            p { font-size: 1.1em; color: #8b949e; line-height: 1.6; }
+            .status { display: inline-block; padding: 5px 15px; border-radius: 20px; background: #238636; color: white; font-weight: bold; margin: 20px 0; }
+            .links { display: flex; gap: 10px; justify-content: center; margin-top: 30px; }
+            .btn { text-decoration: none; padding: 12px 25px; border-radius: 6px; font-weight: bold; transition: 0.3s; }
+            .btn-blue { background: #1f6feb; color: white; }
+            .btn-blue:hover { background: #388bfd; }
+            .btn-outline { border: 1px solid #30363d; color: #58a6ff; }
+            .btn-outline:hover { background: #30363d; }
+            .badge { background: #30363d; padding: 4px 10px; border-radius: 4px; font-family: monospace; }
+        </style>
+    </head>
+    <body>
+        <div class="container">
+            <h1>🦅 PatchHawk SOC</h1>
+            <p>Autonomous Supply-Chain Vulnerability & Patching Agent</p>
+            <div class="status">● ENVIRONMENT LIVE</div>
+            <p>The OpenEnv API Spec is running correctly at <span class="badge">port: 7860</span>.</p>
+            <div class="links">
+                <a href="/web" class="btn btn-blue">Open Env Explorer</a>
+                <a href="/docs" class="btn btn-outline">API Docs (Swagger)</a>
+            </div>
+            <p style="margin-top:20px; font-size:0.9em;">Evaluation URL: <span class="badge">/reset</span></p>
+        </div>
+    </body>
+    </html>
+    """
 def main(port: int | None = None) -> None:
     """Start the PatchHawk OpenEnv server."""