Spaces:

Nitishkumar-ai
/

commitguard-env

Sleeping

App Files Files Community

commitguard-env / docs /deployment.md

Nitishkumar-ai

Deployment Build (Final): Professional Structure + Blog

95cbc5b 11 days ago

preview code

raw

history blame contribute delete

8 kB

	# 🚀 CommitGuard — Comprehensive GCP Deployment & Training Guide (A10G)

	This document is a deep-dive, step-by-step manual for deploying the CommitGuard environment and training pipeline to a Google Cloud Platform (GCP) instance. We are targeting an NVIDIA A10G GPU to execute GRPO (Group Relative Policy Optimization) on the Llama-3.2-3B model.

	---

	## 📋 1. Prerequisites: Setting Up Your Toolbox
	Before you touch the cloud, you must ensure your local environment and external accounts are configured. These are the building blocks of the entire run.

	### A. GCP Account & Project Setup
	* Active Project: You must have a GCP project created. Note your `PROJECT_ID`.
	* GPU Quota: By default, GCP projects have 0 quota for GPUs. You must navigate to `IAM & Admin > Quotas` and request a limit increase for `NVIDIA_A10G_GPUS` in your desired region (e.g., `us-central1`). Do this 24 hours in advance.

	### B. Weights & Biases (WandB) for Visualization
	* Why? RL training can be unstable. WandB allows you to monitor the "Reward" and "KL Divergence" curves in real-time from your browser.
	* Action: Create a free account at [wandb.ai](https://wandb.ai), navigate to your settings, and copy your API Key.

	### C. Hugging Face Account & Llama Access
	* Model Gating: Llama-3.2-3B is a gated model. You must visit the [model page](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and apply for access. Approval usually takes 30-60 minutes.
	* Access Token: Generate a "Write" token in your Hugging Face settings to allow the VM to download the model and upload your finished adapters.

	### D. Local gcloud CLI Initialization
	* Installation: Install the Google Cloud SDK on your laptop.
	* Authentication: Run `gcloud auth login` and `gcloud config set project [YOUR_PROJECT_ID]`. This allows your local terminal to "talk" to GCP.

	---

	## 🛠 Step 1: Provisioning the High-Performance VM
	We are using the G2 Standard 4 machine. It is specifically designed for AI workloads.

	### Detailed Breakdown of the Creation Command
	* `--machine-type g2-standard-4`: Provides 4 vCPUs and 16GB of system RAM, ensuring the CPU doesn't bottleneck the GPU.
	* `--accelerator type=nvidia-a10g,count=1`: Attaches the A10G GPU. Its 24GB of VRAM is the "Goldilocks" zone for 3B parameter models—enough to handle the model plus the multiple "generations" required by the GRPO algorithm.
	* `--image-family common-cu121`: Uses a specialized Google image that comes with CUDA 12.1 and NVIDIA drivers pre-installed. This saves you 30 minutes of manual driver installation.
	* `--provisioning-model=SPOT`: CRITICAL FOR BUDGET. Spot instances use excess capacity and are ~70% cheaper than standard instances. If the instance is reclaimed by Google, your 50-step checkpoints ensure you don't lose much progress.

	```bash
	gcloud compute instances create commitguard-trainer \
	--project=[PROJECT_ID] \
	--zone=us-central1-a \
	--machine-type=g2-standard-4 \
	--accelerator=count=1,type=nvidia-a10g \
	--image-project=ml-images \
	--image-family=common-cu121 \
	--boot-disk-size=100GB \
	--boot-disk-type=pd-balanced \
	--maintenance-policy=TERMINATE \
	--provisioning-model=SPOT
	```

	---

	## 🏗 Step 2: Environment Preparation
	Once the VM is "Running," we need to turn it into a specialized CommitGuard lab.

	### A. Secure Connection (SSH)
	Connect to the machine's terminal:
	```bash
	gcloud compute ssh commitguard-trainer --zone=us-central1-a
	```

	### B. Repository & Virtual Environment
	We isolate our dependencies to prevent conflicts with system-level Python packages.
	```bash
	# Clone the project
	git clone https://github.com/[YOUR_USER]/commitguard.git
	cd commitguard

	# Create a 'venv' (Virtual Environment)
	python3 -m venv .venv
	source .venv/bin/activate

	# Authenticate with Hugging Face (Required for gated Llama models)
	huggingface-cli login
	```

	### C. Installing the "Train" Stack
	The `-e ".[train]"` command installs the `commitguard` package in "editable" mode along with all optional training libraries like `torch`, `peft`, and `trl`.
	```bash
	pip install -U pip
	pip install -e ".[train]"

	# Flash Attention 2: This is a specialized kernel that makes Llama training
	# significantly faster and more memory-efficient on A10G hardware.
	pip install flash-attn --no-build-isolation
	```

	---

	## 📡 Step 3: Launching the Verifiable Reward Server
	CommitGuard uses RLVR. In this setup, the model doesn't just "guess" if it's right; it submits an action to a server that calculates a reward based on hard evidence.

	### Running in the Background
	Since training takes hours, we run the server in the background using the `&` symbol.
	```bash
	# Start the server
	python -m commitguard_env.server &

	# Verify Health: This ensures the database and API are ready.
	# If this fails, the trainer will hang indefinitely.
	curl http://localhost:8000/health
	# You should see: {"status":"healthy"}
	```

	---

	## 🧠 Step 4: Executing the GRPO Training Run
	GRPO is a "reinforcement learning" algorithm. It asks the model to generate 4 different answers for the same code diff, compares them to each other, and rewards the ones that follow the XML format and correctly identify the vulnerability.

	### Hyperparameter Explanation
	* `--steps 500`: The model will see roughly 2,000 examples (4 generations x 500 steps).
	* `4-bit Quantization`: Automatically handled by the script. It "compresses" the model weights so they fit into the GPU's memory without losing accuracy.
	* `LoRA r=8`: "Low-Rank Adaptation." Instead of training 3 billion parameters, we only train about 5 million. This makes training stable and fast.
	* `--live`: Tells the script to fetch rewards from the server we started in Step 3.

	```bash
	# Login to WandB so your graphs show up online
	export WANDB_API_KEY=[YOUR_WANDB_KEY]

	python scripts/train_grpo.py \
	--model_name "meta-llama/Llama-3.2-3B-Instruct" \
	--output_dir "./outputs/commitguard-final" \
	--steps 500 \
	--live \
	--wandb "commitguard-rlvr"
	```

	---

	## 💾 Step 5: Post-Run Weight Management & Cleanup
	Once the 500 steps are complete, the "brain" of your agent exists as a LoRA adapter in the `./outputs` folder.

	### A. Permanent Storage (Hugging Face)
	The VM's disk is temporary. Move your weights to Hugging Face immediately.
	```bash
	huggingface-cli login --token [YOUR_HF_TOKEN]
	huggingface-cli upload [HF_USERNAME]/commitguard-llama3b-adapter ./outputs/commitguard-final
	```

	### B. Cost Control: Deleting the VM
	DO NOT FORGET THIS STEP. An idle A10G instance costs money every hour.
	```bash
	# Exit the VM
	exit

	# Delete from your local terminal
	gcloud compute instances delete commitguard-trainer --zone=us-central1-a
	```

	---

	## 🆘 Critical Troubleshooting

	### "CUDA Out of Memory"
	* Symptom: Training crashes with a long error ending in `OutOfMemoryError`.
	* Fix: The "Group" in GRPO is currently set to 4 generations. Open `scripts/train_grpo.py` and change `num_generations=4` to `num_generations=2`. This cuts memory usage in half.

	### "Connection Refused"
	* Symptom: Reward function returns -1.0 for everything or throws errors.
	* Fix: Your environment server crashed or wasn't started. Run `ps aux \| grep server` to check if it is still running.

	### The "Midnight Fallback"
	If the 3B model is too slow for the submission deadline:
	* Switch to the 1.5B Qwen model. It uses the same XML format but is 2x faster.
	* Command: `python scripts/train_grpo.py --model_name "Qwen/Qwen2.5-1.5B-Instruct" ...`

	---

	## ✅ Final Success Checklist
	1. [ ] Health Check: `curl` returns healthy.
	2. [ ] WandB Tracking: You can see the `reward` curve moving on the website.
	3. [ ] Checkpoints: You see folders like `checkpoint-50`, `checkpoint-100` in the output directory.
	4. [ ] Clean Exit: The VM is deleted after the adapter is uploaded to Hugging Face.