## CommitGuard: project context (load this first) This file is the **single source of truth for agents**. It compresses `../prd.md` into must-know facts so you can make correct decisions at 3 AM. If youre unsure: re-read `../prd.md` and then update this file to match. ## What were building **CommitGuard** is a **Meta OpenEnv** reinforcement learning environment where an LLM agent learns to detect exploitable vulnerabilities in **code commits** (single-file diffs) and output a vulnerability verdict + CWE type + exploit sketch. The environment runs as an **HTTP server (FastAPI in Docker)**, hosted on **Hugging Face Spaces**. Training runs with **TRL GRPO + Unsloth** on **Llama3.23BInstruct**, using verifiable rewards from dataset ground truth (RLVR). ## Why this matters (the thesis) AI writes code at AI speed. Security review still runs on human cycles. Offense can now scale with the same LLM tooling. **Were building the RL environment that trains AI-paced commit-time security review.** ## Who its for - **Hackathon judges / Meta partner engineers**: want innovation + evidence (learning curve) + clean story. - **Meta researchers**: want RLVR framing, cheating-prevention, and extensibility. - **HF community**: wants a runnable Space + reproducible training notebook. ## 30-second pitch (verbatim; memorize) > "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it defense is on human time, offense is on AI time, and that asymmetry breaks the security model. > > CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it." ## Locked stack (do not change) - **Env framework**: Meta OpenEnv **0.2.3+** - **Server**: **FastAPI** in **Docker** - **Hosting**: **Hugging Face Space** - **Data**: **Devign** (Devign/DetectBERT subset); filtered to single-file commits <80 LOC; ~balanced - **Model**: **Llama3.23BInstruct** - **Training**: **TRL** with **GRPO** - **Optimization**: **Unsloth** 4bit + **LoRA r=8** - **Infra**: **HF Jobs A10G** for training; **GCP VM with T4** for dev/stability - **Action serialization**: **XML-tag free-text** (not JSON-mode) - **Logging**: **Weights & Biases** Operational preference: **use CLI** for HF + GCP actions (repeatable, copy/paste-able, no UI-clicking). ## Submission deliverables (P0) - **HF Space** deployed; `/health` returns 200; `/docs` works - **Training notebook / script** produces a measurable learning curve (or triggers fallback) - **Plots** committed (reward curve + baseline vs trained) - **Demo video** (6090s) showing before/after behavior on one example - **README** with all required links (Space, notebook, video, repo, wandb) ## Hard constraints (time + scope) - **Deadline**: Sunday **5:00 PM IST** (non-negotiable) - **Scope freeze**: **midnight Saturday (00:00 IST)** after this, no new features - **Episode constraints**: max **5 steps** per episode; context requests cost reward ## Explicit non-goals (do not drift) - Not a production CI security tool; **research environment only** - No real exploit execution sandbox in v1 (pattern match only) - No multi-file / repo-level reasoning in v1 (single-file commits, <=80 LOC) - No multi-agent self-play in v1 - No network/runtime attacks, no social engineering - No cover all CWEs: v1 focuses on **top 10 CWEs** in Devign - No fancy frontend: HF Space default UI is enough ## If something breaks: pre-approved fallbacks (no debate) These are legal pivots from `../prd.md` 7.2. If trigger happens, switch immediately and log it in `decision_log.md`. - **OOM on Llama3.23B on A10G** use **Qwen2.51.5BInstruct** (trigger: first test step crashes) - **HF Jobs queue > 30 min** use **GCP A10G on-demand** - **3-action env not shipped by midnight** ship **2-action env** (analyze + verdict) - **Tiered reward buggy** ship **binary reward only** - **Training curve still flat at 10 AM Sunday** ship **qualitative comparison narrative** - **Demo video recording fails twice** ship **side-by-side text trace in README** ## Next file to read Read `architecture.md` next. Then read your per-person task list (e.g. `../tasks_niti.md`) if present.