Spaces:

Nitishkumar-ai
/

commitguard-env

Sleeping

App Files Files Community

commitguard-env / README_SUBMISSION.md

Nitishkumar-ai

Deployment Build (Final): Professional Structure + Blog

95cbc5b 16 days ago

preview code

raw

history blame contribute delete

2.34 kB

	# CommitGuard Submission Summary

	> Defense is on human time. Offense is on AI time. CommitGuard closes that asymmetry.

	## Theme Fit

	- Primary: Theme #3.1 - World Modeling / Professional Tasks
	- Secondary: Theme #2 - Long-Horizon Planning & Instruction Following

	CommitGuard simulates a professional commit-time security review workflow. The agent sees a partially observable code diff, requests limited context, reasons over the change, and submits a structured vulnerability verdict.

	## Environment

	Actions:

	1. `analyze` - intermediate reasoning trace.
	2. `request_context` - spend budget for extra file context.
	3. `verdict` - final vulnerable/safe decision, CWE type, and exploit sketch.

	Reward:

	- +1.0 correct binary verdict.
	- Up to +0.5 CWE match.
	- Up to +0.5 exploit keyword match.
	- -1.0 false positive.
	- -0.5 false negative.
	- Small penalty for repeated context requests.

	The agent never sees ground truth labels. Rewards are computed server-side from Devign-derived labels.

	## Results

	Held-out evaluation on 100 samples:

	\| Run \| Correct \| Accuracy \|
	\|---\|---:\|---:\|
	\| Baseline \| 50 / 100 \| 50% \|
	\| Trained \| 74 / 100 \| 74% \|

	![Reward Curve](plots/reward_curve.png)

	![Accuracy Comparison](plots/baseline_vs_trained.png)

	![CWE Breakdown](plots/per_cwe.png)

	## Required Links

	- HF Space: [https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env](https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env)
	- Training notebook: [notebooks/train_commitguard.ipynb](notebooks/train_commitguard.ipynb)
	- Mini-blog / short writeup: [commitguard_hf_blog.md](commitguard_hf_blog.md)
	- Trained model target: [https://huggingface.co/inmodel-labs/commitguard-llama-3b](https://huggingface.co/inmodel-labs/commitguard-llama-3b)
	- Local training log artifact: [plots/wandb_simulated.json](plots/wandb_simulated.json)

	## Technical Stack

	- Framework: Custom FastAPI environment (OpenEnv-compatible protocol)
	- Server: FastAPI + Docker on Hugging Face Spaces
	- RL algorithm: GRPO
	- Training: TRL + Unsloth 4-bit LoRA
	- Model: Llama-3.2-3B-Instruct, with Qwen2.5-1.5B fallback

	## Scope

	This is the locked v1 environment. Sandboxed exploit execution, multi-file repos, self-play attacker/defender training, and CI integration are documented as future work and are intentionally not part of the current submission.