Rithwik Ravi commited on
Commit ·
084f95a
1
Parent(s): 9541ba6
fix: restore space metadata
Browse files
README.md
CHANGED
|
@@ -1,7 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Dynamic Guardrail Generator
|
| 2 |
-
**Team Winnovators (Rithwik
|
| 3 |
|
| 4 |
-
🔗 **[Hugging Face Space URL]** | 🔗 **[YouTube 2-Min Pitch Video]** | 🔗 **[Google Colab Training
|
| 5 |
|
| 6 |
---
|
| 7 |
|
|
@@ -20,17 +29,17 @@ Current industry solutions are fatally flawed:
|
|
| 20 |
|
| 21 |
## 💡 Our Solution: The OpenEnv Compiler Architecture
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
Running inside
|
| 26 |
|
| 27 |
-
By forcing the agent to map threats to a structured AST
|
| 28 |
|
| 29 |
---
|
| 30 |
|
| 31 |
-
## ⚙️ Reward Engineering & Pipeline
|
| 32 |
|
| 33 |
-
To train our autonomous compiler, we built a High-Fidelity RLVR (Reinforcement Learning with Verifiable Rewards) pipeline.
|
| 34 |
|
| 35 |
### The Log-Barrier Multi-Objective Reward
|
| 36 |
To mathematically eradicate "Refusal Collapse", we designed a rigorous deterministic reward surface:
|
|
@@ -38,22 +47,24 @@ To mathematically eradicate "Refusal Collapse", we designed a rigorous determini
|
|
| 38 |
Reward = (1.0 * Recall) - (2.0 * math.log1p(FPR))
|
| 39 |
```
|
| 40 |
- **Recall (True Positive Rate):** A linear reward for successfully neutralizing adversarial payloads.
|
| 41 |
-
- **FPR (False Positive Rate):** A severe logarithmic penalty for blocking benign user queries, mathematically forcing the agent to preserve application utility.
|
| 42 |
|
| 43 |
-
###
|
| 44 |
-
We
|
|
|
|
|
|
|
| 45 |
|
| 46 |
---
|
| 47 |
|
| 48 |
-
## 📈 Results &
|
| 49 |
|
| 50 |
Our training resulted in an agent capable of generating highly targeted logic graphs that dynamically adapt to new threat vectors.
|
| 51 |
|
| 52 |

|
| 53 |
-
*Figure 1: GRPO Training Curve demonstrating the agent escaping refusal-collapse
|
| 54 |
|
| 55 |
-
### Decoupled Telemetry & A/B Comparison
|
| 56 |
-
We built a rich, non-blocking telemetry dashboard (FastAPI + Server-Sent Events) that streams live metrics without impacting the execution time of the strict OpenEnv evaluation loop.
|
| 57 |
|
| 58 |
Our UI features a **Live A/B Performance Delta** capability. The `evaluate.py` inference script runs dual-passes—temporarily disabling the trained LoRA adapter via `model.disable_adapter()` to evaluate the base Qwen2.5 weights against our RL-trained agent in real-time. The dashboard plots the diverging trajectories of both the Reward metrics and the FPR, alongside a live Threat Feed and JSON AST Viewer.
|
| 59 |
|
|
@@ -61,27 +72,30 @@ Our UI features a **Live A/B Performance Delta** capability. The `evaluate.py` i
|
|
| 61 |
|
| 62 |
## 💻 Local Run Instructions
|
| 63 |
|
| 64 |
-
|
| 65 |
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
-
|
| 79 |
-
We have bundled a master orchestrator that automatically cleans up ports, boots the FastAPI Core Server (Port 8000) and Telemetry UI Server (Port 8001) into the background, and triggers the Headless OpenEnv Evaluator (`evaluate.py`).
|
| 80 |
|
| 81 |
```bash
|
| 82 |
python run_all.py
|
| 83 |
```
|
| 84 |
|
| 85 |
-
|
| 86 |
Once the orchestrator initializes, open your browser to:
|
| 87 |
[http://127.0.0.1:8001/ui](http://127.0.0.1:8001/ui) to watch the live A/B comparison and Threat Feed stream in real-time.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Dynamic Guardrail Generator
|
| 3 |
+
emoji: 🛡️
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: false
|
| 8 |
+
license: mit
|
| 9 |
+
---
|
| 10 |
# Dynamic Guardrail Generator
|
| 11 |
+
**Team Winnovators (Rithwik & Parveshh)**
|
| 12 |
|
| 13 |
+
🔗 **[Hugging Face Space URL]** | 🔗 **[YouTube 2-Min Pitch Video]** | 🔗 **[Google Colab Training PoC]**
|
| 14 |
|
| 15 |
---
|
| 16 |
|
|
|
|
| 29 |
|
| 30 |
## 💡 Our Solution: The OpenEnv Compiler Architecture
|
| 31 |
|
| 32 |
+
Aligning with **Theme #3.1 (Professional Tasks: Cybersecurity/Blue-Teaming)**, we solved this by separating the intelligence from the execution.
|
| 33 |
|
| 34 |
+
The **Dynamic Guardrail Generator** treats the LLM as an autonomous Blue-Team engineer. Running inside our strict `OpenEnv` grading environment, the agent does not evaluate prompts directly. Instead, it synthesizes a highly constrained, Pydantic-validated **JSON Guardrail Logic Graph** (a Domain Specific Language).
|
| 35 |
|
| 36 |
+
By forcing the agent to map threats to a structured AST using strict `LogicNodes` (`AND`, `OR`, `NOT`) and `SemanticFilters` (such as `entropy_threshold`, `length_limit`, `regex_pattern`, and `keyword_match`), we entirely bypass brittle spaghetti-code generation, eliminate runtime hallucinations, and execute the defense with zero-latency deterministic logic.
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
+
## ⚙️ Reward Engineering & Pipeline
|
| 41 |
|
| 42 |
+
To train our autonomous compiler, we built a High-Fidelity RLVR (Reinforcement Learning with Verifiable Rewards) pipeline.
|
| 43 |
|
| 44 |
### The Log-Barrier Multi-Objective Reward
|
| 45 |
To mathematically eradicate "Refusal Collapse", we designed a rigorous deterministic reward surface:
|
|
|
|
| 47 |
Reward = (1.0 * Recall) - (2.0 * math.log1p(FPR))
|
| 48 |
```
|
| 49 |
- **Recall (True Positive Rate):** A linear reward for successfully neutralizing adversarial payloads.
|
| 50 |
+
- **FPR (False Positive Rate):** A severe non-linear logarithmic penalty for blocking benign user queries, mathematically forcing the agent to preserve application utility.
|
| 51 |
|
| 52 |
+
### Dual-Compute Strategy
|
| 53 |
+
We utilized **Unsloth (4-bit quantization)** and **Hugging Face TRL (GRPO)** on `Qwen/Qwen2.5-0.5B-Instruct` to keep the memory footprint under 8GB VRAM.
|
| 54 |
+
- **Cloud Proof of Concept:** We provided a verifiable Google Colab notebook running on a T4 GPU as a 4-step proof of learning.
|
| 55 |
+
- **Local High-Fidelity Training:** Our actual production LoRA adapter was trained locally for 250 steps on a dedicated **RTX 4070 GPU** to achieve high-fidelity semantic parsing and complex graph synthesis.
|
| 56 |
|
| 57 |
---
|
| 58 |
|
| 59 |
+
## 📈 Results & UI Dashboard
|
| 60 |
|
| 61 |
Our training resulted in an agent capable of generating highly targeted logic graphs that dynamically adapt to new threat vectors.
|
| 62 |
|
| 63 |

|
| 64 |
+
*Figure 1: GRPO Training Curve demonstrating the agent escaping refusal-collapse.*
|
| 65 |
|
| 66 |
+
### Decoupled Telemetry & Live A/B Comparison
|
| 67 |
+
We built a rich, non-blocking telemetry dashboard (`FastAPI` + Server-Sent Events) that streams live metrics without impacting the execution time of the strict OpenEnv evaluation loop.
|
| 68 |
|
| 69 |
Our UI features a **Live A/B Performance Delta** capability. The `evaluate.py` inference script runs dual-passes—temporarily disabling the trained LoRA adapter via `model.disable_adapter()` to evaluate the base Qwen2.5 weights against our RL-trained agent in real-time. The dashboard plots the diverging trajectories of both the Reward metrics and the FPR, alongside a live Threat Feed and JSON AST Viewer.
|
| 70 |
|
|
|
|
| 72 |
|
| 73 |
## 💻 Local Run Instructions
|
| 74 |
|
| 75 |
+
We have battle-tested this environment specifically for Windows local deployments.
|
| 76 |
|
| 77 |
+
### 1. Windows GPU Setup (Critical Fixes)
|
| 78 |
+
To bypass known PyTorch and Triton compiler conflicts on Windows, you must configure your environment exactly as follows:
|
| 79 |
+
|
| 80 |
+
1. **Python Version:** Create a virtual environment using **Python 3.13** (Avoid Python 3.14 to maintain dependency compatibility).
|
| 81 |
+
2. **Install PyTorch 2.11 (CUDA 12.6):** Standard `requirements.txt` installs will pull CPU wheels. You must install PyTorch from the `cu126` index:
|
| 82 |
+
```bash
|
| 83 |
+
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 --upgrade
|
| 84 |
+
```
|
| 85 |
+
3. **Install Dependencies & Triton Compiler:**
|
| 86 |
+
```bash
|
| 87 |
+
pip install -r requirements.txt
|
| 88 |
+
pip install triton-windows
|
| 89 |
+
```
|
| 90 |
+
*(Note: If Triton throws a `Python.h` missing error, create a directory junction linking your base Python `include` folder to your project root `Include` folder).*
|
| 91 |
|
| 92 |
+
### 2. Run the Master Orchestrator
|
| 93 |
+
We have bundled a master orchestrator (`run_all.py`) that automatically cleans up zombie ports, boots the FastAPI Core Server (Port 8000) and Telemetry UI Server (Port 8001) into the background, and triggers the Headless OpenEnv Evaluator (`evaluate.py`).
|
| 94 |
|
| 95 |
```bash
|
| 96 |
python run_all.py
|
| 97 |
```
|
| 98 |
|
| 99 |
+
### 3. View the Dashboard
|
| 100 |
Once the orchestrator initializes, open your browser to:
|
| 101 |
[http://127.0.0.1:8001/ui](http://127.0.0.1:8001/ui) to watch the live A/B comparison and Threat Feed stream in real-time.
|