commitguard-env / docs /deployment.md
Nitishkumar-ai's picture
Deployment Build (Final): Professional Structure + Blog
95cbc5b
# 🚀 CommitGuard — Comprehensive GCP Deployment & Training Guide (A10G)
This document is a deep-dive, step-by-step manual for deploying the CommitGuard environment and training pipeline to a Google Cloud Platform (GCP) instance. We are targeting an **NVIDIA A10G GPU** to execute **GRPO (Group Relative Policy Optimization)** on the Llama-3.2-3B model.
---
## 📋 1. Prerequisites: Setting Up Your Toolbox
Before you touch the cloud, you must ensure your local environment and external accounts are configured. These are the building blocks of the entire run.
### A. GCP Account & Project Setup
* **Active Project:** You must have a GCP project created. Note your `PROJECT_ID`.
* **GPU Quota:** By default, GCP projects have 0 quota for GPUs. You must navigate to `IAM & Admin > Quotas` and request a limit increase for `NVIDIA_A10G_GPUS` in your desired region (e.g., `us-central1`). **Do this 24 hours in advance.**
### B. Weights & Biases (WandB) for Visualization
* **Why?** RL training can be unstable. WandB allows you to monitor the "Reward" and "KL Divergence" curves in real-time from your browser.
* **Action:** Create a free account at [wandb.ai](https://wandb.ai), navigate to your settings, and copy your **API Key**.
### C. Hugging Face Account & Llama Access
* **Model Gating:** Llama-3.2-3B is a gated model. You must visit the [model page](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and apply for access. Approval usually takes 30-60 minutes.
* **Access Token:** Generate a "Write" token in your Hugging Face settings to allow the VM to download the model and upload your finished adapters.
### D. Local gcloud CLI Initialization
* **Installation:** Install the Google Cloud SDK on your laptop.
* **Authentication:** Run `gcloud auth login` and `gcloud config set project [YOUR_PROJECT_ID]`. This allows your local terminal to "talk" to GCP.
---
## 🛠 Step 1: Provisioning the High-Performance VM
We are using the **G2 Standard 4** machine. It is specifically designed for AI workloads.
### Detailed Breakdown of the Creation Command
* **`--machine-type g2-standard-4`:** Provides 4 vCPUs and 16GB of system RAM, ensuring the CPU doesn't bottleneck the GPU.
* **`--accelerator type=nvidia-a10g,count=1`:** Attaches the A10G GPU. Its 24GB of VRAM is the "Goldilocks" zone for 3B parameter models—enough to handle the model plus the multiple "generations" required by the GRPO algorithm.
* **`--image-family common-cu121`:** Uses a specialized Google image that comes with **CUDA 12.1 and NVIDIA drivers pre-installed**. This saves you 30 minutes of manual driver installation.
* **`--provisioning-model=SPOT`:** **CRITICAL FOR BUDGET.** Spot instances use excess capacity and are ~70% cheaper than standard instances. If the instance is reclaimed by Google, your 50-step checkpoints ensure you don't lose much progress.
```bash
gcloud compute instances create commitguard-trainer \
--project=[PROJECT_ID] \
--zone=us-central1-a \
--machine-type=g2-standard-4 \
--accelerator=count=1,type=nvidia-a10g \
--image-project=ml-images \
--image-family=common-cu121 \
--boot-disk-size=100GB \
--boot-disk-type=pd-balanced \
--maintenance-policy=TERMINATE \
--provisioning-model=SPOT
```
---
## 🏗 Step 2: Environment Preparation
Once the VM is "Running," we need to turn it into a specialized CommitGuard lab.
### A. Secure Connection (SSH)
Connect to the machine's terminal:
```bash
gcloud compute ssh commitguard-trainer --zone=us-central1-a
```
### B. Repository & Virtual Environment
We isolate our dependencies to prevent conflicts with system-level Python packages.
```bash
# Clone the project
git clone https://github.com/[YOUR_USER]/commitguard.git
cd commitguard
# Create a 'venv' (Virtual Environment)
python3 -m venv .venv
source .venv/bin/activate
# Authenticate with Hugging Face (Required for gated Llama models)
huggingface-cli login
```
### C. Installing the "Train" Stack
The `-e ".[train]"` command installs the `commitguard` package in "editable" mode along with all optional training libraries like `torch`, `peft`, and `trl`.
```bash
pip install -U pip
pip install -e ".[train]"
# Flash Attention 2: This is a specialized kernel that makes Llama training
# significantly faster and more memory-efficient on A10G hardware.
pip install flash-attn --no-build-isolation
```
---
## 📡 Step 3: Launching the Verifiable Reward Server
CommitGuard uses **RLVR**. In this setup, the model doesn't just "guess" if it's right; it submits an action to a server that calculates a reward based on hard evidence.
### Running in the Background
Since training takes hours, we run the server in the background using the `&` symbol.
```bash
# Start the server
python -m commitguard_env.server &
# Verify Health: This ensures the database and API are ready.
# If this fails, the trainer will hang indefinitely.
curl http://localhost:8000/health
# You should see: {"status":"healthy"}
```
---
## 🧠 Step 4: Executing the GRPO Training Run
GRPO is a "reinforcement learning" algorithm. It asks the model to generate 4 different answers for the same code diff, compares them to each other, and rewards the ones that follow the XML format and correctly identify the vulnerability.
### Hyperparameter Explanation
* **`--steps 500`:** The model will see roughly 2,000 examples (4 generations x 500 steps).
* **`4-bit Quantization`:** Automatically handled by the script. It "compresses" the model weights so they fit into the GPU's memory without losing accuracy.
* **`LoRA r=8`:** "Low-Rank Adaptation." Instead of training 3 billion parameters, we only train about 5 million. This makes training stable and fast.
* **`--live`:** Tells the script to fetch rewards from the server we started in Step 3.
```bash
# Login to WandB so your graphs show up online
export WANDB_API_KEY=[YOUR_WANDB_KEY]
python scripts/train_grpo.py \
--model_name "meta-llama/Llama-3.2-3B-Instruct" \
--output_dir "./outputs/commitguard-final" \
--steps 500 \
--live \
--wandb "commitguard-rlvr"
```
---
## 💾 Step 5: Post-Run Weight Management & Cleanup
Once the 500 steps are complete, the "brain" of your agent exists as a LoRA adapter in the `./outputs` folder.
### A. Permanent Storage (Hugging Face)
The VM's disk is temporary. Move your weights to Hugging Face immediately.
```bash
huggingface-cli login --token [YOUR_HF_TOKEN]
huggingface-cli upload [HF_USERNAME]/commitguard-llama3b-adapter ./outputs/commitguard-final
```
### B. Cost Control: Deleting the VM
**DO NOT FORGET THIS STEP.** An idle A10G instance costs money every hour.
```bash
# Exit the VM
exit
# Delete from your local terminal
gcloud compute instances delete commitguard-trainer --zone=us-central1-a
```
---
## 🆘 Critical Troubleshooting
### "CUDA Out of Memory"
* **Symptom:** Training crashes with a long error ending in `OutOfMemoryError`.
* **Fix:** The "Group" in GRPO is currently set to 4 generations. Open `scripts/train_grpo.py` and change `num_generations=4` to `num_generations=2`. This cuts memory usage in half.
### "Connection Refused"
* **Symptom:** Reward function returns -1.0 for everything or throws errors.
* **Fix:** Your environment server crashed or wasn't started. Run `ps aux | grep server` to check if it is still running.
### The "Midnight Fallback"
If the 3B model is too slow for the submission deadline:
* Switch to the **1.5B Qwen** model. It uses the same XML format but is 2x faster.
* Command: `python scripts/train_grpo.py --model_name "Qwen/Qwen2.5-1.5B-Instruct" ...`
---
## ✅ Final Success Checklist
1. [ ] **Health Check:** `curl` returns healthy.
2. [ ] **WandB Tracking:** You can see the `reward` curve moving on the website.
3. [ ] **Checkpoints:** You see folders like `checkpoint-50`, `checkpoint-100` in the output directory.
4. [ ] **Clean Exit:** The VM is deleted after the adapter is uploaded to Hugging Face.