Spaces:

Nitishkumar-ai
/

commitguard-env

Running

App Files Files Community

commitguard-env / docs /usecase.md

Nitishkumar-ai

Deployment Build (Final): Professional Structure + Blog

95cbc5b 11 days ago

preview code

raw

history blame contribute delete

3.87 kB

	# CommitGuard — Use Cases & Test Scenarios

	This document outlines the primary use cases and associated test scenarios for running CommitGuard as a standalone Command Line Interface (CLI) tool and as an integrated Plugin (e.g., CI/CD Pipeline or IDE Extension).

	## 1. CommitGuard as a CLI (Standalone Workflow)
	This use case is for security researchers, data scientists, and ML engineers training or evaluating the model locally or on a dedicated VM.

	### 1.1 Data Preprocessing
	- Scenario: Convert raw Devign JSON into a filtered, balanced, 5000-sample JSONL file.
	- Action: Run `python scripts/preprocess_devign.py --limit 5000`
	- Expected Result: `data/devign_filtered.jsonl` is created with clean, XML-ready code diffs and valid `cwe` labels.

	### 1.2 Environment Server (OpenEnv)
	- Scenario: Start the RLVR training environment.
	- Action: Run `python -m commitguard_env.server`
	- Expected Result: Server starts on port 8000. `curl http://localhost:8000/health` returns `{"status": "healthy"}`. `tests/test_no_leak.py` confirms no label leakage in `/reset` or `/state`.

	### 1.3 Model Training (GRPO)
	- Scenario: Train the Llama-3.2-3B model using the live RLVR environment.
	- Action: Run `python scripts/train_grpo.py --live --steps 500`
	- Expected Result: Model trains using 4-bit quantization and LoRA. Training curve uploads to WandB. Checkpoints save every 50 steps.

	### 1.4 Agentic Evaluation
	- Scenario: Evaluate the trained LoRA adapter on 100 held-out test samples.
	- Action: Run `python scripts/evaluate.py --adapter_path ./outputs/commitguard-final`
	- Expected Result: The agent executes a 5-step loop (request_context -> analyze -> verdict). A detailed `eval_results.json` report is generated showing accuracy per CWE.

	### 1.5 Visualization
	- Scenario: Generate performance plots for reporting.
	- Action: Run `python plots/plot_baseline_vs_trained.py`
	- Expected Result: A PNG bar chart is saved showing the clear accuracy delta between baseline and trained model.

	---

	## 2. CommitGuard as a Plugin (Developer Workflow)
	This use case is for software engineers interacting with the trained model during their daily development cycle to prevent vulnerabilities from reaching production.

	### 2.1 Git Pre-Commit Hook (Local Plugin)
	- Scenario: A developer attempts to commit code containing an SQL injection (e.g., `CWE-89`).
	- Action: Developer runs `git commit -m "Update user query"`. The hook captures the local diff and invokes the CommitGuard agent API.
	- Expected Result:
	- The agent detects the vulnerability before the commit executes.
	- The commit is blocked (exit code 1).
	- The terminal outputs the agent's XML `exploit_sketch`: `"SQL injection in user_id via f-string construction."`

	### 2.2 CI/CD Pull Request Reviewer (GitHub Action)
	- Scenario: A developer opens a Pull Request with a new feature.
	- Action: GitHub Actions triggers a CommitGuard workflow container. The agent runs a full evaluation loop over the PR's diff patch.
	- Expected Result:
	- The agent posts an automated review comment directly on the PR.
	- If vulnerable, it flags the specific line and provides a remediation suggestion.
	- The PR status check turns Red (Failed) if a severe vulnerability is detected, preventing a merge to the main branch.

	### 2.3 IDE Extension (VS Code / Cursor Integration)
	- Scenario: Real-time vulnerability detection while typing.
	- Action: Developer saves a file (`Ctrl+S`). The IDE plugin sends the local file diff to a hosted CommitGuard backend.
	- Expected Result:
	- The agent identifies an issue using its `analyze` action step.
	- A diagnostic warning (red squiggly line) appears under the vulnerable code snippet in the editor.
	- Hovering shows the agent's `<reasoning>` and suggested safe implementation.

	# CommitGuard — Use Cases & Test Scenarios

	This document outlines the primary use cases and associated test scenarios for running CommitGuard as a standalone Command Line Interface (CLI) tool and as an integrated Plugin (e.g., CI/CD Pipeline or IDE Extension).

	## 1. CommitGuard as a CLI (Standalone Workflow)
	This use case is for security researchers, data scientists, and ML engineers training or evaluating the model locally or on a dedicated VM.

	### 1.1 Data Preprocessing
	- Scenario: Convert raw Devign JSON into a filtered, balanced, 5000-sample JSONL file.
	- Action: Run `python scripts/preprocess_devign.py --limit 5000`
	- Expected Result: `data/devign_filtered.jsonl` is created with clean, XML-ready code diffs and valid `cwe` labels.

	### 1.2 Environment Server (OpenEnv)
	- Scenario: Start the RLVR training environment.
	- Action: Run `python -m commitguard_env.server`
	- Expected Result: Server starts on port 8000. `curl http://localhost:8000/health` returns `{"status": "healthy"}`. `tests/test_no_leak.py` confirms no label leakage in `/reset` or `/state`.

	### 1.3 Model Training (GRPO)
	- Scenario: Train the Llama-3.2-3B model using the live RLVR environment.
	- Action: Run `python scripts/train_grpo.py --live --steps 500`
	- Expected Result: Model trains using 4-bit quantization and LoRA. Training curve uploads to WandB. Checkpoints save every 50 steps.

	### 1.4 Agentic Evaluation
	- Scenario: Evaluate the trained LoRA adapter on 100 held-out test samples.
	- Action: Run `python scripts/evaluate.py --adapter_path ./outputs/commitguard-final`
	- Expected Result: The agent executes a 5-step loop (request_context -> analyze -> verdict). A detailed `eval_results.json` report is generated showing accuracy per CWE.

	### 1.5 Visualization
	- Scenario: Generate performance plots for reporting.
	- Action: Run `python plots/plot_baseline_vs_trained.py`
	- Expected Result: A PNG bar chart is saved showing the clear accuracy delta between baseline and trained model.

	---

	## 2. CommitGuard as a Plugin (Developer Workflow)
	This use case is for software engineers interacting with the trained model during their daily development cycle to prevent vulnerabilities from reaching production.

	### 2.1 Git Pre-Commit Hook (Local Plugin)
	- Scenario: A developer attempts to commit code containing an SQL injection (e.g., `CWE-89`).
	- Action: Developer runs `git commit -m "Update user query"`. The hook captures the local diff and invokes the CommitGuard agent API.
	- Expected Result:
	- The agent detects the vulnerability before the commit executes.
	- The commit is blocked (exit code 1).
	- The terminal outputs the agent's XML `exploit_sketch`: `"SQL injection in user_id via f-string construction."`

	### 2.2 CI/CD Pull Request Reviewer (GitHub Action)
	- Scenario: A developer opens a Pull Request with a new feature.
	- Action: GitHub Actions triggers a CommitGuard workflow container. The agent runs a full evaluation loop over the PR's diff patch.
	- Expected Result:
	- The agent posts an automated review comment directly on the PR.
	- If vulnerable, it flags the specific line and provides a remediation suggestion.
	- The PR status check turns Red (Failed) if a severe vulnerability is detected, preventing a merge to the main branch.

	### 2.3 IDE Extension (VS Code / Cursor Integration)
	- Scenario: Real-time vulnerability detection while typing.
	- Action: Developer saves a file (`Ctrl+S`). The IDE plugin sends the local file diff to a hosted CommitGuard backend.
	- Expected Result:
	- The agent identifies an issue using its `analyze` action step.
	- A diagnostic warning (red squiggly line) appears under the vulnerable code snippet in the editor.
	- Hovering shows the agent's `<reasoning>` and suggested safe implementation.