File size: 4,644 Bytes
95cbc5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
## CommitGuard: project context (load this first)



This file is the **single source of truth for agents**. It compresses `../prd.md` into must-know facts so you can make correct decisions at 3 AM.



If youre unsure: re-read `../prd.md` and then update this file to match.



## What were building



**CommitGuard** is a **Meta OpenEnv** reinforcement learning environment where an LLM agent learns to detect exploitable vulnerabilities in **code commits** (single-file diffs) and output a vulnerability verdict + CWE type + exploit sketch.



The environment runs as an **HTTP server (FastAPI in Docker)**, hosted on **Hugging Face Spaces**. Training runs with **TRL GRPO + Unsloth** on **Llama3.23BInstruct**, using verifiable rewards from dataset ground truth (RLVR).



## Why this matters (the thesis)



AI writes code at AI speed. Security review still runs on human cycles. Offense can now scale with the same LLM tooling. **Were building the RL environment that trains AI-paced commit-time security review.**



## Who its for



- **Hackathon judges / Meta partner engineers**: want innovation + evidence (learning curve) + clean story.

- **Meta researchers**: want RLVR framing, cheating-prevention, and extensibility.

- **HF community**: wants a runnable Space + reproducible training notebook.



## 30-second pitch (verbatim; memorize)



> "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it  defense is on human time, offense is on AI time, and that asymmetry breaks the security model.

>


> CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR  verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."



## Locked stack (do not change)



- **Env framework**: Meta OpenEnv **0.2.3+**

- **Server**: **FastAPI** in **Docker**

- **Hosting**: **Hugging Face Space**

- **Data**: **Devign** (Devign/DetectBERT subset); filtered to single-file commits <80 LOC; ~balanced

- **Model**: **Llama3.23BInstruct**

- **Training**: **TRL** with **GRPO**

- **Optimization**: **Unsloth** 4bit + **LoRA r=8**

- **Infra**: **HF Jobs A10G** for training; **GCP VM with T4** for dev/stability

- **Action serialization**: **XML-tag free-text** (not JSON-mode)

- **Logging**: **Weights & Biases**



Operational preference: **use CLI** for HF + GCP actions (repeatable, copy/paste-able, no UI-clicking).



## Submission deliverables (P0)



- **HF Space** deployed; `/health` returns 200; `/docs` works

- **Training notebook / script** produces a measurable learning curve (or triggers fallback)

- **Plots** committed (reward curve + baseline vs trained)

- **Demo video** (6090s) showing before/after behavior on one example

- **README** with all required links (Space, notebook, video, repo, wandb)



## Hard constraints (time + scope)



- **Deadline**: Sunday **5:00 PM IST** (non-negotiable)

- **Scope freeze**: **midnight Saturday (00:00 IST)**  after this, no new features

- **Episode constraints**: max **5 steps** per episode; context requests cost reward



## Explicit non-goals (do not drift)



- Not a production CI security tool; **research environment only**

- No real exploit execution sandbox in v1 (pattern match only)

- No multi-file / repo-level reasoning in v1 (single-file commits, <=80 LOC)

- No multi-agent self-play in v1

- No network/runtime attacks, no social engineering

- No cover all CWEs: v1 focuses on **top 10 CWEs** in Devign

- No fancy frontend: HF Space default UI is enough



## If something breaks: pre-approved fallbacks (no debate)



These are legal pivots from `../prd.md` 7.2. If trigger happens, switch immediately and log it in `decision_log.md`.



- **OOM on Llama3.23B on A10G**  use **Qwen2.51.5BInstruct** (trigger: first test step crashes)

- **HF Jobs queue > 30 min**  use **GCP A10G on-demand**

- **3-action env not shipped by midnight**  ship **2-action env** (analyze + verdict)

- **Tiered reward buggy**  ship **binary reward only**

- **Training curve still flat at 10 AM Sunday**  ship **qualitative comparison narrative**

- **Demo video recording fails twice**  ship **side-by-side text trace in README**



## Next file to read



Read `architecture.md` next. Then read your per-person task list (e.g. `../tasks_niti.md`) if present.