Spaces:
Sleeping
Sleeping
CommitGuard - Project Context & Instructions
This file is the foundational mandate for the CommitGuard project. It defines the technical standards, security protocols, and operational workflows that must be followed by all agents.
π Project Overview
CommitGuard is a specialized RL environment built on Meta OpenEnv for commit-time vulnerability detection. It trains LLM agents (primarily Llama-3.2-3B-Instruct) to identify exploitable vulnerabilities in single-file code commits using Reinforcement Learning from Verifiable Rewards (RLVR).
- Objective: Bridge the gap between AI-speed code generation and human-paced security review.
- Framework: Meta OpenEnv (v0.2.3+).
- Incentive: Tiered rewards grounded in dataset truth (Devign), not LLM judgment.
π Engineering Standards (Non-Negotiable)
1. The "No-Leak" Rule (Highest Priority)
The agent must NEVER see ground truth labels (is_vulnerable, cwe, etc.) during an episode.
- Constraint:
CommitGuardObservationand all reward calculations must be stripped of label fields before being presented to the model. - Validation:
tests/test_no_leak.pymust remain green. Any change that causes a leak is a blocking failure.
2. Python Architecture
- Typed Dataclasses: Use
@dataclass(frozen=True, slots=True)for all API shapes (Actions, Observations, State). - Strict Typing: Every function and variable must be type-annotated end-to-end.
- No Untyped Dicts: Dicts are for internal parsing only; convert to dataclasses at all boundaries.
- Defensive Parsing: XML parsers must handle malformed model output without crashing, returning safe defaults and structured errors.
3. XML Action Format
Models must emit exactly one top-level <action> block to ensure robust parsing.
- Structure:
<action><action_type>...</action_type><fields>...</fields></action> - Types:
request_context,analyze,verdict.
π οΈ Operational Workflows
1. Evaluation Pipeline (scripts/evaluate.py)
This script executes local inference on test samples to compute accuracy metrics.
- Deterministic Selection: It iterates through
data/devign_test.jsonl. - Strict Scoring:
is_correctrequires both a correct binary verdict AND a correct CWE type match (if vulnerable). - Inference: Uses Unsloth/FastLanguageModel for accelerated evaluation.
2. Training Pipeline (scripts/train_grpo.py)
- Framework: Uses TRL's
GRPOTrainerwith Unsloth 4-bit quantization. - Local Rewards: Reward functions are computed in-process (
get_reward_local) to eliminate latency.
3. Visualization (plots/)
plot_reward_curve.py: Visualizes reward trends fromeval_results.json.plot_per_cwe.py: Generates bar charts showing accuracy breakdown by CWE category.plot_baseline_vs_trained.py: Compares untrained vs. trained model performance.
π Critical Files
commitguard_env/: Core logic (environment, reward model, XML parser).data/:devign_filtered.jsonl(training) anddevign_test.jsonl(testing).scripts/: Training, evaluation, and environment setup runbooks (GCP/Lightning)..agent/: Internal state, technical contracts, and hackathon milestones.
β³ Hackathon Mandate
- Scope Freeze: No new features after midnight Saturday IST. Focus strictly on reliability, documentation, and evaluation.
- Fallback Triggers: If OOM or performance blockers occur, pivot immediately to documented fallbacks (e.g., Qwen-1.5B) and log in
.agent/decision_log.md.