commitguard-env / .agent /project_context.md
Nitishkumar-ai's picture
Deployment Build (Final): Professional Structure + Blog
95cbc5b
## CommitGuard: project context (load this first)
This file is the **single source of truth for agents**. It compresses `../prd.md` into must-know facts so you can make correct decisions at 3 AM.
If youre unsure: re-read `../prd.md` and then update this file to match.
## What were building
**CommitGuard** is a **Meta OpenEnv** reinforcement learning environment where an LLM agent learns to detect exploitable vulnerabilities in **code commits** (single-file diffs) and output a vulnerability verdict + CWE type + exploit sketch.
The environment runs as an **HTTP server (FastAPI in Docker)**, hosted on **Hugging Face Spaces**. Training runs with **TRL GRPO + Unsloth** on **Llama3.23BInstruct**, using verifiable rewards from dataset ground truth (RLVR).
## Why this matters (the thesis)
AI writes code at AI speed. Security review still runs on human cycles. Offense can now scale with the same LLM tooling. **Were building the RL environment that trains AI-paced commit-time security review.**
## Who its for
- **Hackathon judges / Meta partner engineers**: want innovation + evidence (learning curve) + clean story.
- **Meta researchers**: want RLVR framing, cheating-prevention, and extensibility.
- **HF community**: wants a runnable Space + reproducible training notebook.
## 30-second pitch (verbatim; memorize)
> "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
>
> CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."
## Locked stack (do not change)
- **Env framework**: Meta OpenEnv **0.2.3+**
- **Server**: **FastAPI** in **Docker**
- **Hosting**: **Hugging Face Space**
- **Data**: **Devign** (Devign/DetectBERT subset); filtered to single-file commits <80 LOC; ~balanced
- **Model**: **Llama3.23BInstruct**
- **Training**: **TRL** with **GRPO**
- **Optimization**: **Unsloth** 4bit + **LoRA r=8**
- **Infra**: **HF Jobs A10G** for training; **GCP VM with T4** for dev/stability
- **Action serialization**: **XML-tag free-text** (not JSON-mode)
- **Logging**: **Weights & Biases**
Operational preference: **use CLI** for HF + GCP actions (repeatable, copy/paste-able, no UI-clicking).
## Submission deliverables (P0)
- **HF Space** deployed; `/health` returns 200; `/docs` works
- **Training notebook / script** produces a measurable learning curve (or triggers fallback)
- **Plots** committed (reward curve + baseline vs trained)
- **Demo video** (6090s) showing before/after behavior on one example
- **README** with all required links (Space, notebook, video, repo, wandb)
## Hard constraints (time + scope)
- **Deadline**: Sunday **5:00 PM IST** (non-negotiable)
- **Scope freeze**: **midnight Saturday (00:00 IST)** after this, no new features
- **Episode constraints**: max **5 steps** per episode; context requests cost reward
## Explicit non-goals (do not drift)
- Not a production CI security tool; **research environment only**
- No real exploit execution sandbox in v1 (pattern match only)
- No multi-file / repo-level reasoning in v1 (single-file commits, <=80 LOC)
- No multi-agent self-play in v1
- No network/runtime attacks, no social engineering
- No cover all CWEs: v1 focuses on **top 10 CWEs** in Devign
- No fancy frontend: HF Space default UI is enough
## If something breaks: pre-approved fallbacks (no debate)
These are legal pivots from `../prd.md` 7.2. If trigger happens, switch immediately and log it in `decision_log.md`.
- **OOM on Llama3.23B on A10G** use **Qwen2.51.5BInstruct** (trigger: first test step crashes)
- **HF Jobs queue > 30 min** use **GCP A10G on-demand**
- **3-action env not shipped by midnight** ship **2-action env** (analyze + verdict)
- **Tiered reward buggy** ship **binary reward only**
- **Training curve still flat at 10 AM Sunday** ship **qualitative comparison narrative**
- **Demo video recording fails twice** ship **side-by-side text trace in README**
## Next file to read
Read `architecture.md` next. Then read your per-person task list (e.g. `../tasks_niti.md`) if present.