Rithwik Ravi commited on
Commit
084f95a
·
1 Parent(s): 9541ba6

fix: restore space metadata

Browse files
Files changed (1) hide show
  1. README.md +43 -29
README.md CHANGED
@@ -1,7 +1,16 @@
 
 
 
 
 
 
 
 
 
1
  # Dynamic Guardrail Generator
2
- **Team Winnovators (Rithwik Ravi Kumar & Parveshh Prabhu)**
3
 
4
- 🔗 **[Hugging Face Space URL]** | 🔗 **[YouTube 2-Min Pitch Video]** | 🔗 **[Google Colab Training Proof]**
5
 
6
  ---
7
 
@@ -20,17 +29,17 @@ Current industry solutions are fatally flawed:
20
 
21
  ## 💡 Our Solution: The OpenEnv Compiler Architecture
22
 
23
- Instead of relying on fragile text matching or latency-heavy secondary LLM inference, the **Dynamic Guardrail Generator** flips the paradigm by treating the LLM as an autonomous Blue-Team compiler.
24
 
25
- Running inside a strict `OpenEnv` grading environment, our agent does not evaluate prompts directly. Instead, it synthesizes a highly constrained, Pydantic-validated **JSON Guardrail Logic Graph** (a Domain Specific Language).
26
 
27
- By forcing the agent to map threats to a structured AST (Abstract Syntax Tree) using strict `LogicNodes` (`AND`, `OR`, `NOT`) and `SemanticFilters` (such as `entropy_threshold`, `length_limit`, `regex_pattern`, and `keyword_match`), we entirely eliminate runtime hallucinations and execute the defense with zero-latency deterministic logic.
28
 
29
  ---
30
 
31
- ## ⚙️ Reward Engineering & Pipeline Setup
32
 
33
- To train our autonomous compiler, we built a High-Fidelity RLVR (Reinforcement Learning with Verifiable Rewards) pipeline.
34
 
35
  ### The Log-Barrier Multi-Objective Reward
36
  To mathematically eradicate "Refusal Collapse", we designed a rigorous deterministic reward surface:
@@ -38,22 +47,24 @@ To mathematically eradicate "Refusal Collapse", we designed a rigorous determini
38
  Reward = (1.0 * Recall) - (2.0 * math.log1p(FPR))
39
  ```
40
  - **Recall (True Positive Rate):** A linear reward for successfully neutralizing adversarial payloads.
41
- - **FPR (False Positive Rate):** A severe logarithmic penalty for blocking benign user queries, mathematically forcing the agent to preserve application utility.
42
 
43
- ### The Compute Pipeline
44
- We architected the training loop to thrive within a highly constrained **8GB VRAM footprint**. By utilizing **Unsloth (4-bit quantization)** and **Hugging Face TRL (GRPO)**, we optimized `Qwen/Qwen2.5-0.5B-Instruct`. GRPO mathematically eliminates the memory overhead required by standard PPO Critic models, allowing us to train large effective batch sizes locally on consumer hardware.
 
 
45
 
46
  ---
47
 
48
- ## 📈 Results & Proof of Learning
49
 
50
  Our training resulted in an agent capable of generating highly targeted logic graphs that dynamically adapt to new threat vectors.
51
 
52
  ![Training Reward Curve](reward_curve.png)
53
- *Figure 1: GRPO Training Curve demonstrating the agent escaping refusal-collapse, maximizing security recall while minimizing False Positives.*
54
 
55
- ### Decoupled Telemetry & A/B Comparison UI
56
- We built a rich, non-blocking telemetry dashboard (FastAPI + Server-Sent Events) that streams live metrics without impacting the execution time of the strict OpenEnv evaluation loop.
57
 
58
  Our UI features a **Live A/B Performance Delta** capability. The `evaluate.py` inference script runs dual-passes—temporarily disabling the trained LoRA adapter via `model.disable_adapter()` to evaluate the base Qwen2.5 weights against our RL-trained agent in real-time. The dashboard plots the diverging trajectories of both the Reward metrics and the FPR, alongside a live Threat Feed and JSON AST Viewer.
59
 
@@ -61,27 +72,30 @@ Our UI features a **Live A/B Performance Delta** capability. The `evaluate.py` i
61
 
62
  ## 💻 Local Run Instructions
63
 
64
- To test the evaluation pipeline and view the live A/B Comparison Dashboard locally:
65
 
66
- **1. Environment Setup (Python 3.13+ Recommended):**
67
- ```bash
68
- # Create and activate virtual environment
69
- python -m venv .venv
70
- # Windows: .\.venv\Scripts\Activate.ps1
71
- # Mac/Linux: source .venv/bin/activate
72
-
73
- # Install dependencies (ensure PyTorch matches your CUDA version)
74
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
75
- pip install -r requirements.txt
76
- ```
 
 
 
77
 
78
- **2. Run the Master Orchestrator:**
79
- We have bundled a master orchestrator that automatically cleans up ports, boots the FastAPI Core Server (Port 8000) and Telemetry UI Server (Port 8001) into the background, and triggers the Headless OpenEnv Evaluator (`evaluate.py`).
80
 
81
  ```bash
82
  python run_all.py
83
  ```
84
 
85
- **3. View the Dashboard:**
86
  Once the orchestrator initializes, open your browser to:
87
  [http://127.0.0.1:8001/ui](http://127.0.0.1:8001/ui) to watch the live A/B comparison and Threat Feed stream in real-time.
 
1
+ ---
2
+ title: Dynamic Guardrail Generator
3
+ emoji: 🛡️
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
9
+ ---
10
  # Dynamic Guardrail Generator
11
+ **Team Winnovators (Rithwik & Parveshh)**
12
 
13
+ 🔗 **[Hugging Face Space URL]** | 🔗 **[YouTube 2-Min Pitch Video]** | 🔗 **[Google Colab Training PoC]**
14
 
15
  ---
16
 
 
29
 
30
  ## 💡 Our Solution: The OpenEnv Compiler Architecture
31
 
32
+ Aligning with **Theme #3.1 (Professional Tasks: Cybersecurity/Blue-Teaming)**, we solved this by separating the intelligence from the execution.
33
 
34
+ The **Dynamic Guardrail Generator** treats the LLM as an autonomous Blue-Team engineer. Running inside our strict `OpenEnv` grading environment, the agent does not evaluate prompts directly. Instead, it synthesizes a highly constrained, Pydantic-validated **JSON Guardrail Logic Graph** (a Domain Specific Language).
35
 
36
+ By forcing the agent to map threats to a structured AST using strict `LogicNodes` (`AND`, `OR`, `NOT`) and `SemanticFilters` (such as `entropy_threshold`, `length_limit`, `regex_pattern`, and `keyword_match`), we entirely bypass brittle spaghetti-code generation, eliminate runtime hallucinations, and execute the defense with zero-latency deterministic logic.
37
 
38
  ---
39
 
40
+ ## ⚙️ Reward Engineering & Pipeline
41
 
42
+ To train our autonomous compiler, we built a High-Fidelity RLVR (Reinforcement Learning with Verifiable Rewards) pipeline.
43
 
44
  ### The Log-Barrier Multi-Objective Reward
45
  To mathematically eradicate "Refusal Collapse", we designed a rigorous deterministic reward surface:
 
47
  Reward = (1.0 * Recall) - (2.0 * math.log1p(FPR))
48
  ```
49
  - **Recall (True Positive Rate):** A linear reward for successfully neutralizing adversarial payloads.
50
+ - **FPR (False Positive Rate):** A severe non-linear logarithmic penalty for blocking benign user queries, mathematically forcing the agent to preserve application utility.
51
 
52
+ ### Dual-Compute Strategy
53
+ We utilized **Unsloth (4-bit quantization)** and **Hugging Face TRL (GRPO)** on `Qwen/Qwen2.5-0.5B-Instruct` to keep the memory footprint under 8GB VRAM.
54
+ - **Cloud Proof of Concept:** We provided a verifiable Google Colab notebook running on a T4 GPU as a 4-step proof of learning.
55
+ - **Local High-Fidelity Training:** Our actual production LoRA adapter was trained locally for 250 steps on a dedicated **RTX 4070 GPU** to achieve high-fidelity semantic parsing and complex graph synthesis.
56
 
57
  ---
58
 
59
+ ## 📈 Results & UI Dashboard
60
 
61
  Our training resulted in an agent capable of generating highly targeted logic graphs that dynamically adapt to new threat vectors.
62
 
63
  ![Training Reward Curve](reward_curve.png)
64
+ *Figure 1: GRPO Training Curve demonstrating the agent escaping refusal-collapse.*
65
 
66
+ ### Decoupled Telemetry & Live A/B Comparison
67
+ We built a rich, non-blocking telemetry dashboard (`FastAPI` + Server-Sent Events) that streams live metrics without impacting the execution time of the strict OpenEnv evaluation loop.
68
 
69
  Our UI features a **Live A/B Performance Delta** capability. The `evaluate.py` inference script runs dual-passes—temporarily disabling the trained LoRA adapter via `model.disable_adapter()` to evaluate the base Qwen2.5 weights against our RL-trained agent in real-time. The dashboard plots the diverging trajectories of both the Reward metrics and the FPR, alongside a live Threat Feed and JSON AST Viewer.
70
 
 
72
 
73
  ## 💻 Local Run Instructions
74
 
75
+ We have battle-tested this environment specifically for Windows local deployments.
76
 
77
+ ### 1. Windows GPU Setup (Critical Fixes)
78
+ To bypass known PyTorch and Triton compiler conflicts on Windows, you must configure your environment exactly as follows:
79
+
80
+ 1. **Python Version:** Create a virtual environment using **Python 3.13** (Avoid Python 3.14 to maintain dependency compatibility).
81
+ 2. **Install PyTorch 2.11 (CUDA 12.6):** Standard `requirements.txt` installs will pull CPU wheels. You must install PyTorch from the `cu126` index:
82
+ ```bash
83
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 --upgrade
84
+ ```
85
+ 3. **Install Dependencies & Triton Compiler:**
86
+ ```bash
87
+ pip install -r requirements.txt
88
+ pip install triton-windows
89
+ ```
90
+ *(Note: If Triton throws a `Python.h` missing error, create a directory junction linking your base Python `include` folder to your project root `Include` folder).*
91
 
92
+ ### 2. Run the Master Orchestrator
93
+ We have bundled a master orchestrator (`run_all.py`) that automatically cleans up zombie ports, boots the FastAPI Core Server (Port 8000) and Telemetry UI Server (Port 8001) into the background, and triggers the Headless OpenEnv Evaluator (`evaluate.py`).
94
 
95
  ```bash
96
  python run_all.py
97
  ```
98
 
99
+ ### 3. View the Dashboard
100
  Once the orchestrator initializes, open your browser to:
101
  [http://127.0.0.1:8001/ui](http://127.0.0.1:8001/ui) to watch the live A/B comparison and Threat Feed stream in real-time.