## CommitGuard: project context (load this first)

This file is the **single source of truth for agents**. It compresses `../prd.md` into must-know facts so you can make correct decisions at 3 AM.

If youre unsure: re-read `../prd.md` and then update this file to match.

## What were building

**CommitGuard** is a **Meta OpenEnv** reinforcement learning environment where an LLM agent learns to detect exploitable vulnerabilities in **code commits** (single-file diffs) and output a vulnerability verdict + CWE type + exploit sketch.

The environment runs as an **HTTP server (FastAPI in Docker)**, hosted on **Hugging Face Spaces**. Training runs with **TRL GRPO + Unsloth** on **Llama3.23BInstruct**, using verifiable rewards from dataset ground truth (RLVR).

## Why this matters (the thesis)

AI writes code at AI speed. Security review still runs on human cycles. Offense can now scale with the same LLM tooling. **Were building the RL environment that trains AI-paced commit-time security review.**

## Who its for

- **Hackathon judges / Meta partner engineers**: want innovation + evidence (learning curve) + clean story.
- **Meta researchers**: want RLVR framing, cheating-prevention, and extensibility.
- **HF community**: wants a runnable Space + reproducible training notebook.

## 30-second pitch (verbatim; memorize)

> "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it  defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
>
> CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR  verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."

## Locked stack (do not change)

- **Env framework**: Meta OpenEnv **0.2.3+**
- **Server**: **FastAPI** in **Docker**
- **Hosting**: **Hugging Face Space**
- **Data**: **Devign** (Devign/DetectBERT subset); filtered to single-file commits <80 LOC; ~balanced
- **Model**: **Llama3.23BInstruct**
- **Training**: **TRL** with **GRPO**
- **Optimization**: **Unsloth** 4bit + **LoRA r=8**
- **Infra**: **HF Jobs A10G** for training; **GCP VM with T4** for dev/stability
- **Action serialization**: **XML-tag free-text** (not JSON-mode)
- **Logging**: **Weights & Biases**

Operational preference: **use CLI** for HF + GCP actions (repeatable, copy/paste-able, no UI-clicking).

## Submission deliverables (P0)

- **HF Space** deployed; `/health` returns 200; `/docs` works
- **Training notebook / script** produces a measurable learning curve (or triggers fallback)
- **Plots** committed (reward curve + baseline vs trained)
- **Demo video** (6090s) showing before/after behavior on one example
- **README** with all required links (Space, notebook, video, repo, wandb)

## Hard constraints (time + scope)

- **Deadline**: Sunday **5:00 PM IST** (non-negotiable)
- **Scope freeze**: **midnight Saturday (00:00 IST)**  after this, no new features
- **Episode constraints**: max **5 steps** per episode; context requests cost reward

## Explicit non-goals (do not drift)

- Not a production CI security tool; **research environment only**
- No real exploit execution sandbox in v1 (pattern match only)
- No multi-file / repo-level reasoning in v1 (single-file commits, <=80 LOC)
- No multi-agent self-play in v1
- No network/runtime attacks, no social engineering
- No cover all CWEs: v1 focuses on **top 10 CWEs** in Devign
- No fancy frontend: HF Space default UI is enough

## If something breaks: pre-approved fallbacks (no debate)

These are legal pivots from `../prd.md` 7.2. If trigger happens, switch immediately and log it in `decision_log.md`.

- **OOM on Llama3.23B on A10G**  use **Qwen2.51.5BInstruct** (trigger: first test step crashes)
- **HF Jobs queue > 30 min**  use **GCP A10G on-demand**
- **3-action env not shipped by midnight**  ship **2-action env** (analyze + verdict)
- **Tiered reward buggy**  ship **binary reward only**
- **Training curve still flat at 10 AM Sunday**  ship **qualitative comparison narrative**
- **Demo video recording fails twice**  ship **side-by-side text trace in README**

## Next file to read

Read `architecture.md` next. Then read your per-person task list (e.g. `../tasks_niti.md`) if present.